WO2025161598A1 - Procédé d'entraînement de réseau et appareil de communication - Google Patents
Procédé d'entraînement de réseau et appareil de communicationInfo
- Publication number
- WO2025161598A1 WO2025161598A1 PCT/CN2024/131251 CN2024131251W WO2025161598A1 WO 2025161598 A1 WO2025161598 A1 WO 2025161598A1 CN 2024131251 W CN2024131251 W CN 2024131251W WO 2025161598 A1 WO2025161598 A1 WO 2025161598A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bit width
- model
- parameter
- information
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. Transmission Power Control [TPC] or power classes
- H04W52/02—Power saving arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W52/00—Power management, e.g. Transmission Power Control [TPC] or power classes
- H04W52/02—Power saving arrangements
- H04W52/0209—Power saving arrangements in terminal devices
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0446—Resources in time domain, e.g. slots or frames
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/04—Wireless resource allocation
- H04W72/044—Wireless resource allocation based on the type of the allocated resource
- H04W72/0453—Resources in frequency domain, e.g. a carrier in FDMA
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/20—Control channels or signalling for resource management
- H04W72/23—Control channels or signalling for resource management in the downlink direction of a wireless link, i.e. towards a terminal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W92/00—Interfaces specially adapted for wireless communication networks
- H04W92/04—Interfaces between hierarchically different network devices
- H04W92/10—Interfaces between hierarchically different network devices between terminal device and access point, i.e. wireless air interface
Definitions
- the present application relates to the field of communication technology, and more particularly, to a method and a communication device for network training.
- AI model training involves network training.
- network training such as neural network training, federated learning training, or knowledge distillation
- model parameters, gradient parameters, or datasets of predetermined precision such as 32-bit floating-point type, float32
- this network training may involve a large number of floating-point operations.
- High-precision floating-point calculations are typically slow and consume more power. This not only results in greater energy resource consumption and transmission overhead, but may also inhibit network training execution on devices with limited available computing resources or power constraints, limiting the promotion of AI in the wireless field.
- the present application provides a method and a communication device for network training, in order to save resources while ensuring network training performance.
- a communication method is provided.
- the method can be performed by a first device (e.g., a first terminal), or can be performed by a chip or circuit of the first device, and this application does not limit this.
- a first device e.g., a first terminal
- the first device can be a terminal device, or a chip, chip system or circuit in the terminal device, or a functional module in the terminal device that can call and execute a program.
- the method includes: obtaining the first bit width of the first parameter, the first bit width belongs to a first bit width set, the first parameter belongs to a first parameter set, multiple bit widths in the first bit width set have a corresponding relationship with multiple parameters in the first parameter set, multiple parameters are used to train the first model, each parameter includes one or more of the following: model parameters and/or gradient parameters; sending first information, the first information is used to indicate the first bit width.
- multiple bit widths in the first bit width set have a corresponding relationship with multiple parameters in the first parameter set, which can be understood as: multiple bit widths correspond one-to-one to multiple parameters, or multiple parameters correspond to one bit width, that is, the multiple bit widths corresponding to multiple parameters can be completely different, or the bit widths corresponding to some parameters can be the same.
- the first device obtains the bit width of the first parameter and can transmit it to the second device through the air interface, so that the first bit width of the first parameter is aligned between the first device and the second device, so that after the subsequent local model training of the first device, the training result of the first parameter can be reported based on the first bit width, and the second device can send down the training result after the merger of the first parameters based on the first bit width, that is, the first device can transmit different first bit widths corresponding to different first parameters through the air interface to achieve network training and convergence.
- this implementation method can flexibly indicate different bit widths corresponding to different model parameters (for example, float32), and is more applicable. It can not only ensure network training performance, but also save resources, such as reducing air interface overhead, reducing storage overhead and computing overhead, saving terminal hardware resource consumption, which is conducive to the terminal training other more complex models, and can also ensure the promotion of artificial intelligence in the wireless field.
- the first parameter set includes N second parameters, the second bit width corresponding to at least one of the N second parameters is different from the first bit width, the second bit width belongs to the first bit width set, and N is an integer greater than or equal to 1.
- the present application does not limit whether the second bit widths corresponding to the N second parameters are the same, as long as at least two of the multiple bit widths corresponding to the multiple parameters in the first parameter set are different.
- the bit widths corresponding to different model parameters can be different, which is more applicable. It can not only ensure network training performance, but also reduce transmission overhead and computing overhead, save terminal hardware resource consumption, and ensure the promotion of artificial intelligence in the wireless field.
- the method further includes: obtaining a first training result, the first training result corresponding to the first parameter, the first training result being obtained based on training of the first model; and sending the first training result according to the first bit width.
- the first device locally trains the first model to obtain the training result of the first parameter, and can report it to the second device based on the first bit width. That is, the bit width of the first training result is the first bit width.
- the bit widths of the training results corresponding to different parameters can be different, which is more applicable. It can not only ensure network training performance, but also reduce air interface transmission overhead and computing overhead, save terminal power consumption, and help the first device save resources.
- the method further includes: receiving a second training result, the bit width corresponding to the second training result being the first bit width, and the second training result being used by the first terminal to train the first model.
- the second training result is obtained by aggregating at least two first training results reported by at least two terminals, and the at least two terminals are used to jointly train the first model.
- the second device can merge the first training results for the first parameter from multiple terminals and send down the merged second training result, wherein the bit width of the second training result is the first bit width, that is, the bit width of the first training result of the first parameter reported by the terminal device is consistent with the bit width of the second training result of the merged first parameter sent down by the second device.
- the bit widths of the first training results corresponding to the same parameter reported by different terminals can be different, so the bit widths of the second training results corresponding to the merged same parameter sent down by the second device to different terminals can also be different.
- This scheme has strong applicability and can not only ensure network training performance, but also reduce air interface transmission overhead and computing overhead, saving terminal power consumption.
- the method further includes: sending second information, where the second information is used to indicate that the bit width corresponding to the first parameter is updated to a third bit width, and the third bit width is different from the first bit width.
- the first device can update the first bit width of the first parameter to the second device. For example, if the third bit width is greater than the first bit width during network training of the first model, and the battery capacity of the first device increases, or the accuracy of the first parameter required for training the first model is improved, the first device can instruct the second device to update the first bit width to the third bit width, such as the first bit width is float16 and the second bit width is (floating point type 32 bits, float32), to improve network training performance.
- the first device can instruct the second device to update the first bit width to the third bit width, such as the first bit width is double float64 and the second bit width is float16. This can reduce transmission overhead and computing overhead while ensuring network training performance, saving terminal power consumption.
- the second information includes a first time unit, and the second information is used to indicate that within the first time unit, the bit width corresponding to the first parameter is updated to a third bit width.
- this application does not limit the update time and number of updates of the first bit width of the first parameter, which may depend on factors such as the terminal's capability information, the terminal's hardware conditions, the terminal's hardware power, the terminal's battery capacity, and the current network training status.
- the first device can instruct the second device to update the first bit width of the first parameter to the third bit width at a certain moment, or the first device can also instruct the second device to update the first bit width of the first parameter to the third bit width within a certain time period. It has strong flexibility and applicability, which helps to improve network training performance and convergence performance.
- obtaining the first bit width of the first parameter includes: receiving third information, where the third information is used to indicate the first bit width.
- the first device may obtain the first bit width of the first parameter through signaling instructions.
- the first bit width of the first parameter may also be predefined or preconfigured. This application does not limit the implementation method of the first device obtaining the first bit width.
- the method before receiving the third information, further includes: sending a second bit width set, the first terminal supports bit widths in the second bit width set, and the first bit width belongs to the second bit width set.
- the first device can send a set of multiple bit widths supported by the first device, namely, a second bit width set, to the second device.
- the second device can select the first bit width from the second bit width set based on time-frequency resources, the bit width requirement range of different terminals for the first parameter, the hardware conditions of different terminals, hardware power, battery capacity, or at least one of the data set quality of different terminals, and indicate the first bit width to the first device.
- the second device allocates the first bit width to the first device in a targeted manner, which can save resources as much as possible while ensuring network training performance, reduce terminal power consumption, and reduce signaling overhead and computing overhead.
- the method before obtaining the first bit width of the first parameter, the method also includes: receiving fourth information, the fourth information is used to request training of the first model, the fourth information includes one or more of the following: structural parameters of the first model, parameter quantities of the first model, or a fourth bit width; wherein the fourth bit width is a bit width supported by the network side.
- the first bit width is less than or equal to the fourth bit width.
- the second device can send a reference bit width, that is, the fourth bit width, to the first device, so that the first device can determine the first bit width based on the fourth bit width, so that both the first device and the second device support the first bit width of the first parameter, which is beneficial to improving the network training performance and convergence performance.
- the first bit width is less than or equal to the fourth bit width, resources can be saved as much as possible, the power consumption of the terminal can be reduced, and the signaling overhead and computing overhead can be reduced.
- a communication method is provided.
- the method can be performed by a second device, or by a chip or circuit of the second device, which is not limited in this application.
- the following description uses the second device as an example.
- the second device can be a network device, or a chip, chip system, or circuit in the network device, or a functional module in the network device that can call and execute a program.
- the method includes: obtaining the first bit width of the first parameter, the first bit width belongs to a first bit width set, the first parameter belongs to a first parameter set, multiple bit widths in the first bit width set have a corresponding relationship with multiple parameters in the first parameter set, multiple parameters are used to train the first model, and each parameter includes one or more of the following: model parameters and/or gradient parameters.
- the first parameter set further includes N second parameters, the second bit width corresponding to at least one of the N second parameters is different from the first bit width, the second bit width belongs to the first bit width set, and N is an integer greater than or equal to 1.
- the method further includes: receiving at least two first training results reported by at least two terminals, where the at least two terminals are used to jointly train the first model; performing aggregation processing based on the at least two first training results to obtain a second training result, where the second training result is used to train the first model; and sending the second training result according to the first bit width.
- the method further includes: receiving second information, where the second information is used to indicate that the bit width corresponding to the first parameter is updated to a third bit width, and the third bit width is different from the first bit width.
- the second information includes a first time unit, and the second information is used to indicate that within the first time unit, the bit width corresponding to the first parameter is updated to a third bit width.
- obtaining the first bit width of the first parameter includes: receiving first information, where the first information is used to indicate the first bit width.
- obtaining the first bit width of the first parameter includes: receiving a second bit width set; selecting the first bit width from the second bit width set; and sending third information, where the third information is used to indicate the first bit width.
- the method before obtaining the first bit width of the first parameter, also includes: sending fourth information, the fourth information is used to request training of the first model, and the fourth information includes one or more of the following: structural parameters of the first model, parameter quantity of the first model, or a fourth bit width; wherein the fourth bit width is a bit width supported by the network side.
- the first bit width is less than or equal to the fourth bit width.
- a communication method is provided.
- the method can be performed by a first device, or by a chip or circuit of the first device, which is not limited in this application.
- the first device can be a terminal device, or a chip, chip system, or circuit in the terminal device, or a functional module in the terminal device that can call and execute a program.
- the method includes: obtaining a first bit width of first data, the first bit width belongs to a first bit width set, the first data belongs to a first data set, multiple bit widths in the first bit width set correspond to multiple data in the first data set, and the first data set is used to train a first model; sending first information, the first information is used to indicate the first bit width.
- the multiple bit widths in the first bit width set have a corresponding relationship with the multiple data in the first data set, which can be understood as: the multiple bit widths correspond one-to-one to the multiple data, or the multiple data correspond to one bit width, that is, the multiple bit widths corresponding to the multiple data may be completely different, or the bit widths corresponding to some data may be the same.
- the first device obtains the bit width of the first data and can transmit it to the second device through the air interface, so that the first device and the second device align the first bit width of the first data, so that the second device can send the first data set based on the first bit width, so that the first device uses the first data set to train the first model to achieve network training and convergence.
- this implementation method can flexibly indicate different bit widths (such as float32) corresponding to different data in the same data set, and is more applicable. It can not only ensure network training performance, but also save resources, such as reducing air interface overhead, reducing storage overhead and computing overhead, saving terminal hardware resource consumption, which is conducive to the terminal training other more complex models, and can also ensure the promotion of artificial intelligence in the wireless field.
- a communication method is provided.
- the method can be performed by a second device, or by a chip or circuit of the second device, which is not limited in this application.
- the second device can be a network device, or a chip, chip system, or circuit in a network device, or a functional module in a network device that can call and execute a program.
- the method includes: obtaining a first bit width of first data, the first bit width belongs to a first bit width set, the first data belongs to a first data set, multiple bit widths in the first bit width set correspond to multiple data in the first data set, and the first data set is used to train a first model.
- a communication device which includes: a processing unit for obtaining a first bit width of a first parameter, the first bit width belongs to a first bit width set, the first parameter belongs to a first parameter set, multiple bit widths in the first bit width set have a corresponding relationship with multiple parameters in the first parameter set, and multiple parameters are used to train a first model, each parameter including one or more of the following: model parameters and/or gradient parameters; a transceiver unit for sending first information, and the first information is used to indicate the first bit width.
- the transceiver unit may perform the reception and transmission processing in the aforementioned first aspect, and the processing unit of the communication device may perform other processing except the reception and transmission in the aforementioned first aspect.
- a communication device comprising: a processing unit for obtaining a first bit width of a first parameter, the first bit width belonging to a first bit width set, the first parameter belonging to a first parameter set, multiple bit widths in the first bit width set corresponding to multiple parameters in the first parameter set, multiple parameters being used to train a first model, each parameter including one or more of the following: model parameters and/or gradient parameters.
- the transceiver unit may perform the reception and transmission processing in the aforementioned second aspect, and the processing unit of the communication device may perform other processing except reception and transmission in the aforementioned second aspect.
- a communication device which includes: a processing unit for obtaining a first bit width of first data, the first bit width belongs to a first bit width set, the first data belongs to a first data set, the multiple bit widths in the first bit width set have a corresponding relationship with the multiple data in the first data set, and the first data set is used to train a first model; a transceiver unit for sending first information, and the first information is used to indicate the first bit width.
- the transceiver unit can perform the reception and transmission processing in the aforementioned third aspect, and the processing unit of the communication device can perform other processing except reception and transmission in the aforementioned third aspect.
- a communication device comprising: a processing unit for obtaining a first bit width of first data, the first bit width belongs to a first bit width set, the first data belongs to a first data set, multiple bit widths in the first bit width set correspond to multiple data in the first data set, and the first data set is used to train a first model.
- the transceiver unit can perform the reception and transmission processing in the aforementioned fourth aspect, and the processing unit of the communication device can perform other processing except reception and transmission in the aforementioned fourth aspect.
- a communication device comprising a processing circuit for executing a computer program so that the device executes the method of the first to fourth aspects above and any possible implementation thereof.
- the processing circuit is one or more processors, or all or part of the circuits in one or more processors used for processing functions.
- the communication device further includes a memory, which is used to store the computer program, and the memory is one or more.
- the memory may be integrated with the processor, or the memory may be set separately from the processor, or the memory may be located within the processor.
- the communication device further includes a transceiver circuit, such as a transceiver or an input-output circuit.
- a transceiver circuit such as a transceiver or an input-output circuit.
- a communication system including: a terminal device and a network device, the terminal device is used to execute the method in the above-mentioned first aspect or third aspect and any possible implementation thereof, and the network device is used to execute the method in the above-mentioned second aspect or fourth aspect and any possible implementation thereof.
- a communication system comprising: a first device and a second device, the first device being used to execute the method in the above-mentioned first aspect or third aspect and any possible implementation thereof, and the second device being used to execute the method in the above-mentioned second aspect or fourth aspect and any possible implementation thereof.
- a computer-readable storage medium which stores a computer program or code.
- the computer program or code When the computer program or code is run on a computer, the computer executes the method in the above-mentioned first to fourth aspects and any possible implementation thereof.
- a chip or chip system comprising at least one processing circuit, which is used to run a computer program so that the chip executes the methods in the above-mentioned first to fourth aspects and any possible implementation thereof.
- the chip may include an output circuit or interface for sending information or data, and an input circuit or interface for receiving information or data.
- a computer program product comprising: computer program code, when the computer When the computer program code runs on the computer, it executes the method in the first to fourth aspects and any possible implementation thereof.
- FIG. 1 and 2 are schematic diagrams of a communication system applicable to an embodiment of the present application
- FIG3 is a schematic diagram of the structure of a neuron
- FIG4 is a schematic diagram of the structure of a neural network
- FIG5 is a schematic block diagram of an autoencoder
- FIG6 is a schematic diagram of an AI application framework
- FIG7 is a schematic interactive flow chart of a communication method provided in an embodiment of the present application.
- FIG8 is a schematic interaction flow chart of another communication method provided in an embodiment of the present application.
- FIG9 is a schematic interaction flow chart of another communication method provided in an embodiment of the present application.
- FIG10 is a schematic interaction flow chart of another communication method provided in an embodiment of the present application.
- FIG11 is a schematic block diagram of a communication device provided in an embodiment of the present application.
- FIG12 is a schematic block diagram of another communication device provided in an embodiment of the present application.
- the technical solutions provided in this application can be applied to various communication systems, such as fifth-generation 5G or new radio (NR) systems, long-term evolution (LTE) systems, LTE frequency division duplex (FDD) systems, LTE time division duplex (TDD) systems, wireless local area network (WLAN) systems, satellite communication systems, future communication systems such as sixth-generation mobile communication systems, or a fusion system of multiple systems.
- NR fifth-generation 5G or new radio
- LTE long-term evolution
- FDD frequency division duplex
- TDD LTE time division duplex
- WLAN wireless local area network
- future communication systems such as sixth-generation mobile communication systems
- the technical solutions provided in this application can also be applied to device-to-device (D2D) communication, vehicle-to-everything (V2X) communication, machine-to-machine (M2M) communication, machine type communication (MTC), and Internet of Things (IoT) communication systems or other communication systems.
- D2D device-to-device
- V2X vehicle-to
- a device in a communication system can send signals to or receive signals from another device.
- the signals may include information, signaling, or data.
- the term “device” may also be replaced by an entity, a network entity, a communication device, a communication module, a node, a communication node, and the like.
- This application uses devices as examples for description.
- a communication system may include at least one terminal device and at least one network device.
- a network device may send downlink signals to a terminal device, and/or a terminal device may send uplink signals to a network device.
- terminal device/network device in this application may be replaced by a terminal device that performs the corresponding communication method in this application with the network device.
- the terminal devices in the embodiments of the present application include various devices with wireless communication functions, which can be used to connect people, objects, machines, etc.
- the terminal devices can be widely used in various scenarios, such as: cellular communication, D2D, V2X, peer-to-peer (P2P), M2M, MTC, IoT, virtual reality (VR), augmented reality (AR), industrial control, autonomous driving, telemedicine, smart grid, smart furniture, smart office, smart wearable, smart transportation, smart city drones, robots, remote sensing, passive sensing, positioning, navigation and tracking, autonomous delivery, etc.
- the terminal device can be a terminal in any of the above scenarios, such as an MTC terminal, IoT terminal, etc.
- the terminal device can be a user equipment (UE) of the 3rd Generation Partnership Project (3GPP) standard, a terminal, a fixed device, a mobile station device or a mobile device, a subscriber unit, a handheld device, a vehicle-mounted device, a wearable device, a cellular phone, a smart phone, a Session Initialization Protocol (SIP) phone, a wireless data card, a personal digital assistant (PDA), or a similar device.
- 3GPP 3rd Generation Partnership Project
- the present invention relates to a personal digital assistant (PDA), a computer, a tablet computer, a notebook computer, a wireless modem, a handheld device (handset), a laptop computer, a computer with wireless transceiver capabilities, a smart book, a vehicle, a satellite, a global positioning system (GPS) device, a target tracking device, an aircraft (such as a drone, a helicopter, a multi-copter, a quadcopter, or an airplane), a ship, a remote control device, a smart home device, an industrial device, or a device built into the above devices (such as a communication module, modem, or chip in the above devices), or other processing devices connected to a wireless modem.
- PDA personal digital assistant
- a computer such as a drone, a helicopter, a multi-copter, a quadcopter, or an airplane
- a remote control device such as a smart home device, an industrial device, or a device built into the above devices (such as a communication module, modem
- a UE can also be used to act as a base station.
- a UE can act as a scheduling entity that provides sidelink signals between UEs in scenarios such as V2X, D2D, or P2P.
- the device for realizing the function of the terminal device can be the terminal device, or it can be a device that can support the terminal device to realize the function, such as a chip system or a chip, which can be installed in the terminal device. It can be composed of chips, or it can include chips and other discrete devices.
- the network device in the embodiment of the present application may be a device for communicating with a terminal device, and the network device may include an access network device or a wireless access network device, such as a base station.
- the access network device in the embodiment of the present application may refer to a radio access network (RAN) node (or device) that connects the terminal device to a wireless network.
- RAN radio access network
- the above-mentioned RAN may be a cellular system related to the 3rd Generation Partnership Project (3GPP), such as a 5G mobile communication system, or a future-oriented evolution system (such as a sixth generation (6G) mobile communication system).
- 3GPP 3rd Generation Partnership Project
- 5G mobile communication system such as a 5G mobile communication system
- 6G sixth generation
- RAN may also be an open radio access network (open RAN, O-RAN or ORAN), a cloud radio access network (CRAN), or a wireless fidelity (WiFi) system.
- RAN may also be a communication system that is a fusion of two or more of the above systems.
- Network equipment can broadly cover various names as follows, or be replaced with the following names, such as: NodeB, evolved NodeB (eNB), next generation NodeB (gNB), relay station, access point, transmitting and receiving point (TRP), transmitting point (TP), master station, auxiliary station, multi-standard wireless (motor slide retainer, MSR) node, home base station, network controller, access node, wireless node, access point (AP), transmission node, transceiver node, building base band unit (BBU), remote radio unit (RRU), active antenna unit (AAU), remote radio head (RRH), central unit (CU), distributed unit (DU), positioning node, etc.
- NodeB evolved NodeB (eNB), next generation NodeB (gNB), relay station, access point, transmitting and receiving point (TRP), transmitting point (TP), master station, auxiliary station, multi-standard wireless (motor slide retainer, MSR) node, home base station, network controller, access node, wireless node, access point (AP), transmission node, transce
- a base station may be a macro base station, a micro base station, a relay node, a donor node, or the like, or a combination thereof.
- a base station may also refer to a communication module, a modem, or a chip used to be provided in the aforementioned device or apparatus.
- a base station may also be a mobile switching center and a device that performs base station functions in D2D, V2X, and M2M communications, a network-side device in a 6G network, or a device that performs base station functions in future communication systems.
- a base station may support networks with the same or different access technologies. The embodiments of this application do not limit the specific technology and specific device form adopted by the network equipment.
- Base stations can be fixed or mobile.
- a helicopter or drone can be configured to act as a mobile base station, and one or more cells can move based on the location of the mobile base station.
- a helicopter or drone can be configured to act as a device that communicates with another base station.
- a RAN node can be a CU, DU, central unit-control plane (CU-CP), central unit-user plane (CU-UP), or RU.
- the CU and DU can be separate or included in the same network element, such as the BBU.
- a radio unit (RU) can be included in a radio frequency device or radio unit, such as an RRU, AAU, or RRH.
- a RAN node may support one or more types of fronthaul interfaces, with different fronthaul interfaces corresponding to DUs and RUs with different functions. If the fronthaul interface between the DU and RU is a common public radio interface (CPRI), the DU is configured to implement one or more baseband functions, and the RU is configured to implement one or more radio frequency functions.
- CPRI common public radio interface
- the interface can be an enhanced common public radio interface (eCPRI).
- eCPRI enhanced common public radio interface
- the division between the DU and RU is different, corresponding to different types (Categories) of eCPRI, such as eCPRI Category A, B, C, D, E, and F.
- the DU is configured to implement layer mapping and one or more functions preceding it (i.e., coding, rate matching, scrambling, modulation, or one or more of layer mapping), while other functions after layer mapping (e.g., resource element (RE) mapping, digital beamforming (BF), or one or more of inverse fast Fourier transform (IFFT)/cyclic prefix addition) are moved to the RU for implementation.
- layers mapping e.g., resource element (RE) mapping, digital beamforming (BF), or one or more of inverse fast Fourier transform (IFFT)/cyclic prefix addition
- the DU For uplink transmission, based on RE demapping, the DU is configured to implement demapping and one or more functions preceding it (i.e., decoding, rate matching, descrambling, demodulation, inverse discrete Fourier transform (IDFT), channel equalization, or one or more of RE demapping), while other functions after demapping (e.g., one or more of digital BF or FFT/CP removal) are moved to the RU for implementation.
- functions preceding it i.e., decoding, rate matching, descrambling, demodulation, inverse discrete Fourier transform (IDFT), channel equalization, or one or more of RE demapping
- IDFT inverse discrete Fourier transform
- channel equalization e.g., digital BF or FFT/CP removal
- the processing unit for implementing baseband functions in the BBU is called a baseband high layer (BBH) unit, and the processing unit for implementing baseband functions in the RRU/AAU/RRH is called a baseband low layer (BBL) unit.
- BHB baseband high layer
- BBL baseband low layer
- CU or CU-CP and CU-UP
- DU or RU may have different names, but those skilled in the art will understand their meanings.
- the radio access network may also be an open radio access network (O-RAN/ORAN) architecture.
- CU may also be called O-CU (open CU)
- DU may also be called O-DU
- CU-CP may also be called O-CU-CP
- CU-UP may also be called O-CU-UP
- RU may also be called O-RU.
- Any of the CU (or CU-CP, CU-UP), DU and RU in this application may be implemented by a software module, a hardware module, or a combination of a software module and a hardware module.
- the device for implementing the function of the network device can be the network device, or it can be a device that can support the network device to implement the function, such as a chip system or chip, which can be installed in the network device.
- the chip system can be composed of a chip, or it can include a chip and other discrete devices.
- Network devices and terminal devices can be deployed on land, including indoors or outdoors, handheld or vehicle-mounted; they can also be deployed on the water surface; they can also be deployed on aircraft, balloons and satellites in the air.
- the embodiments of this application do not limit the scenarios in which network devices and terminal devices are located.
- terminal devices and network devices can be hardware devices, or they can be software functions running on dedicated hardware, software functions running on general-purpose hardware, such as virtualization functions instantiated on a platform (e.g., a cloud platform), or entities including dedicated or general-purpose hardware devices and software functions. This application does not limit the specific forms of terminal devices and network devices.
- Network equipment may also include core network equipment, such as access and mobility management function (AMF), or operations, administration and maintenance equipment (OAM), or third-party equipment, such as over-the-top (OTT) equipment or cloud servers, or equipment equipped with AI modules, such as RAN intelligent controller (RIC).
- AMF access and mobility management function
- OAM operations, administration and maintenance equipment
- OTT over-the-top
- RIC RAN intelligent controller
- the terminal device may be a terminal device or a component of the terminal device (such as a chip or circuit).
- the network device may be a network device equipped with one or more AI modules.
- the network device may be a core network device, an access network node (RAN node), or one or more devices in the OAM.
- the AI module may be a RIC, such as a near-real-time RIC or a non-real-time RIC.
- the near-real-time RIC is provided in a RAN node (e.g., a CU or DU), and the non-real-time RIC is provided in the OAM, a cloud server, a core network device, or other network devices.
- FIG 1 is a schematic diagram of a wireless communication system 100 applicable to an embodiment of the present application.
- the wireless communication system includes a wireless access network 100.
- the wireless access network 100 can be a next-generation (e.g., 6G or higher) wireless access network, or a traditional (e.g., 5G, 4G, 3G, or 2G) wireless access network.
- One or more terminal devices 120a-120j, collectively referred to as 120
- Figure 1 is only a schematic diagram, and the wireless communication system may also include other devices, such as core network devices, wireless relay devices, and/or wireless backhaul devices, which are not shown in Figure 1.
- the wireless communication system may include multiple network devices and multiple terminal devices simultaneously, without limitation.
- a network device may serve one or more terminal devices simultaneously.
- a terminal device may also access one or more network devices simultaneously.
- the embodiments of the present application do not limit the number of terminal devices and network devices included in the wireless communication system.
- FIG 2 is a schematic diagram of a wireless communication system 200 applicable to an embodiment of the present application.
- the wireless communication system 200 may include at least one network device, such as the network device 210 shown in Figure 2.
- the wireless communication system 200 may also include at least one terminal device, such as the terminal device 220 and the terminal device 230 shown in Figure 2.
- the wireless communication system 200 may also include an AI network element (also known as an AI entity), such as the AI network element 240 shown in Figure 2, for performing AI-related operations, such as constructing a training data set or training an AI model.
- an AI network element also known as an AI entity
- the network device 210 may send data related to the training of the AI model to the AI network element 240, which constructs a training data set and trains the AI model.
- the data related to the training of the AI model may include data reported by the terminal device.
- the AI network element 240 may send the results of the operations related to the AI model to the network device 210, and forward them to the terminal device through the network device 210.
- the results of the operations related to the AI model may include at least one of the following: an AI model that has completed training, an evaluation result or a test result of the model, etc.
- a portion of the trained AI model may be deployed on the network device 210, and another portion may be deployed on the terminal device.
- the trained AI model may be deployed on the network device 210.
- the trained AI model may be deployed on the terminal device.
- Figure 2 illustrates only the example of a direct connection between AI network element 240 and network device 210.
- AI network element 240 may also be connected to a terminal device.
- AI network element 240 may be connected to both network device 210 and a terminal device simultaneously.
- AI network element 240 may be connected to network device 210 through a third-party network element. This embodiment of the present application does not limit the connection relationship between the AI network element and other network elements.
- the AI network element 240 can also be provided as a module in a network device and/or a terminal device, for example, in the network device shown in FIG. Equipment 110b or terminal equipment.
- the AI network element 240 and the network device 210 may be different modules of the same device, or may be separate different devices.
- Figures 1 and 2 are simplified schematic diagrams for ease of understanding.
- the communication system may also include other devices, such as wireless relay devices and/or wireless backhaul devices, which are not shown in Figures 1 and 2.
- the communication system may include multiple network devices, multiple terminal devices, or AI nodes. The embodiment of the present application does not limit the number of network devices and terminal devices included in the communication system.
- An AI model is an algorithm or computer program that can implement AI functions.
- the AI model characterizes the mapping relationship between the input and output of the model, or in other words, the AI model is a function model that maps input of a certain dimension to output of a certain dimension, and the parameters of the function model can be obtained through machine learning training.
- m and n are parameters of the AI model, and m and n can be obtained through machine learning training.
- the AI models mentioned in the embodiments below of this application are not limited to neural networks, linear regression models, decision tree models, support vector machines (SVM), Bayesian networks, Q learning models or other machine learning (ML) models.
- AI model can be implemented as a hardware circuit, software, or a combination of software and hardware, without limitation.
- Non-limiting examples of software include: program code, program, subroutine, instruction, instruction set, code, code segment, software module, application, or software application.
- ML is an implementation of artificial intelligence.
- Machine learning is a method that empowers machines to learn, enabling them to perform functions that cannot be directly programmed. In practical terms, machine learning is a method that uses data to train models and then uses these models to make predictions.
- machine learning methods such as neural networks (NNs), decision trees, and support vector machines.
- Machine learning theory primarily involves the design and analysis of algorithms that enable computers to learn automatically. Machine learning algorithms automatically analyze data to identify patterns and use these patterns to make predictions about unknown data.
- Neural networks are a specific implementation of AI or machine learning. According to the universal approximation theorem, neural networks can theoretically approximate any continuous function, enabling them to learn arbitrary mappings. Neural networks are mathematical models that mimic the behavioral characteristics of animal neural networks for information processing. The concept of neural networks is derived from the neuronal structure of the brain. Each neuron performs a weighted sum operation on its input values and passes the result of this weighted sum operation through an activation function to produce an output.
- a neural network can be composed of neural units, which can be a computational unit that takes xs and an intercept 1 as input.
- a neural network is formed by connecting many of these single neural units, meaning that the output of one neural unit can be the input of another.
- the input of each neural unit can be connected to the local receptive field of the previous layer to extract features from that local receptive field, which can be an area consisting of several neural units.
- the AI model of this application can be a deep neural network (DNN).
- DNN can include feedforward neural networks (FNN), convolutional neural networks (CNN), and recurrent neural networks (RNN).
- FNN feedforward neural networks
- CNN convolutional neural networks
- RNN recurrent neural networks
- Figure 3 is a schematic diagram of a neuron structure.
- the bias of the weighted sum is b.
- b can be an integer, a decimal, or a complex number, etc.
- the form of the activation function can be diversified.
- the output of the neuron is: As shown in Figure 3.
- the activation functions of different neurons in a neural network can be the same or different.
- Neural networks generally have a multi-layer structure, with each layer containing one or more logic-based decision-making units, known as neurons. Increasing the depth and/or width of a neural network can improve its expressive power, providing more powerful information extraction and abstract modeling capabilities for complex systems.
- the depth of a neural network can be understood as the number of layers, while the number of neurons in each layer can be referred to as the width of that layer.
- FIG4 is a schematic diagram of the layer relationship of a neural network.
- a neural network includes an input layer and an output layer.
- the input layer processes the input through neurons and then passes the result to the output layer, which then generates the output of the neural network.
- a neural network consisting of an input layer, hidden layers, and an output layer, as shown in Figure 4.
- the input layer processes the input through neurons and then passes the result to the intermediate hidden layer.
- the hidden layer then passes the calculation result to the output layer or an adjacent hidden layer.
- the output layer generates the neural network's output.
- a neural network can consist of one or more sequentially connected hidden layers, without limitation.
- a loss function can be defined. This function measures the difference between the model's predicted value and the actual value. During neural network training, the loss function describes the gap or discrepancy between the neural network's output and the ideal target value.
- Neural network training involves adjusting neural network parameters to ensure that the loss function's value is below a threshold or meets the target requirement. Neural network parameters can include at least one of the following: the number of neural network layers, their width, neuron weights, and parameters in the neuron activation function.
- Federated learning is a distributed machine learning method in which multiple participants exchange model parameters through a secure mechanism without sharing the original training data, thereby achieving the effect of collaborative AI training. Its original intention was to effectively help multiple institutions use data and conduct machine learning modeling while meeting the requirements of user privacy protection and data security.
- what is transmitted between nodes is not the data itself, but the intermediate results obtained during training, such as model parameters or gradients.
- Federated learning can be divided into three categories based on the distribution of data sources among the participating parties: horizontal federated learning, vertical federated learning, and federated transfer learning.
- Horizontal federated learning also known as feature-aligned federated learning, involves splitting the dataset horizontally (i.e., along the user dimension) when two datasets have significant overlap in user features but minimal overlap in users. Training is then performed on the data that contains the same user features but not identical users.
- Vertical federated learning also known as sample-aligned federated learning, involves splitting the dataset vertically (i.e., along the feature dimension) when two datasets have significant overlap in users but minimal overlap in user features. Training is then performed on the data that contains the same users but not identical user features.
- Federated transfer learning involves using transfer learning to overcome insufficient data or labels without data segmentation when both datasets have minimal overlap in users and user features.
- SSB codebook-based synchronization signal block
- CSI-RS channel state information-reference signal
- L1-RSRP physical layer reference signal received power
- ID the beam identity
- AI/ML can be used to train a model that uses the received SSB/CSI-RS signal, the received SSB/CSI-RS signal strength, or the estimated channel as input to infer the optimal beam ID and feedback it to the base station.
- Each user can collect their own receive beam/channel information and the corresponding optimal beam ID as samples (i.e., local samples) for training the aforementioned AI/ML models.
- the number of samples each user can collect is limited, and the performance of the model trained solely on local data will be limited.
- the optimal beam ID may only be a subset of the SSB/CSI-RS codebook. If users send their local data to a server, the server will aggregate the data from all users for model training. While this improves model performance, it also carries the risk of leaking user privacy information, such as inferring the user's current location through the channel. To address this issue, federated learning can be used.
- the central node distributes a global model to each participating user.
- Each user trains the global model using their local data to obtain a local model.
- the local model's parameter information such as gradients and weights, is then sent (encrypted) to the server.
- the server then performs model aggregation (MA) to update the global model, which is then sent to each user.
- MA model aggregation
- An autoencoder is a neural network for unsupervised learning. Its characteristic is that it uses input data as label data, so AE can also be understood as a neural network for self-supervised learning. AE can be used for data compression and recovery. For example, the encoder in AE can compress (encode) data A to obtain data B, and the decoder in AE can decompress (decode) data B to recover data A. Alternatively, it can be understood that the decoder is the inverse operation of the encoder.
- the AI model in the embodiment of the present application may include an encoder and a decoder.
- the encoder and the decoder are used in combination, and it can be understood that the encoder and the decoder are a matching AI model.
- the encoder and the decoder can be deployed in the terminal device and the network device respectively.
- this The AI model in the application embodiment can be a single-end model, which can be deployed on a terminal device or a network device.
- the two-end model can also be called a bilateral model, a collaborative model, or a dual model.
- a two-end model is a model composed of multiple sub-models. The sub-models that make up the model must match each other. These sub-models can be deployed on different nodes.
- the embodiments of the present application relate to an encoder for compressing CSI and a decoder for recovering compressed CSI.
- the encoder and decoder are used in combination, and it can be understood that the encoder and decoder are matching AI models.
- An encoder may include one or more AI models, and the decoder matched with the encoder also includes one or more AI models.
- the number of AI models included in the matching encoder and decoder is the same and corresponds one to one.
- a set of matched encoders and decoders can be specifically two parts of the same AE.
- the encoder and decoder are deployed on different nodes respectively.
- the AE model is a typical bilateral model.
- the encoder and decoder of the AE model are usually trained together and can be used in combination with each other.
- the encoder can process the input V to obtain the processed result z, and the decoder can decode the encoder output z into the desired output V'.
- CSI feedback can be implemented based on the AI model of AE.
- the UE side compresses and quantizes the CSI through the encoder, and the base station recovers the CSI through the decoder.
- the input of the model is the CSI fed back by the UE, and the output is the recovered CSI.
- the training of the model requires the CSI fed back by the UE as the true value label of the recovered CSI.
- ground truth usually refers to data that is believed to be accurate or real.
- a dataset refers to data used for model training, model validation, or model testing in machine learning. The quantity and quality of the data will affect the effectiveness of machine learning.
- a training dataset is used to train an AI model.
- the training dataset may include the input of the AI model, or the input and target output of the AI model.
- a training dataset includes one or more training data.
- the training data may include training samples input to the AI model, or the target output of the AI model.
- the target output is the target value of the output of the AI model, and may also be referred to as the output true value, comparison true value, label, sample label, or label sample.
- the label is the true value.
- training datasets can include simulated data collected through simulation platforms, experimental data collected in experimental scenarios, or measured data collected in actual communication networks. Because the geographical environments and channel conditions in which data are generated vary, such as indoor and outdoor locations, mobile speeds, frequency bands, or antenna configurations, the collected data can be categorized during acquisition. For example, data with the same channel propagation environment and antenna configuration can be grouped together.
- Model training essentially involves learning certain characteristics from training data.
- an AI model such as a neural network
- the goal is to ensure that the model's output is as close as possible to the desired predicted value. This is done by comparing the network's predictions with the desired target values.
- the weight vectors of each layer of the AI model are then updated based on the difference between the two. (Of course, this initialization process typically precedes the first update, where parameters are preconfigured for each layer of the AI model.) For example, if the network's prediction is too high, the weight vectors are adjusted to predict a lower value. This adjustment is repeated until the AI model predicts the desired target value, or a value very close to it. Therefore, it's necessary to predefine how to compare the difference between the predicted and target values.
- the AI model is a neural network, and adjusting the model parameters of the neural network includes adjusting at least one of the following parameters: the number of layers, width, weights of neurons, or parameters in the activation function of neurons of the neural network.
- Inference data can be used as input to a trained AI model for inference.
- the inference data is input into the AI model, and the corresponding output is the inference result.
- the design of an AI model primarily involves data collection (e.g., collecting training data and/or inference data), model training, and model inference. Furthermore, it can also include the application of inference results.
- Figure 6 is an AI application framework.
- the data source is used to provide training data sets and inference data.
- the AI model is obtained by analyzing or training the training data provided by the data source.
- the AI model represents the mapping relationship between the input and output of the model. Learning the AI model through the model training node is equivalent to learning the model using the training data.
- the AI model trained in the model training link is used to perform reasoning based on the reasoning data provided by the data source to obtain the reasoning result. This link can also be understood as: inputting the reasoning data into the AI model, and obtaining the output through the AI model, and the output is the reasoning result.
- the reasoning result can indicate: the configuration parameters used (executed) by the execution object, and/or the operations performed by the execution object.
- the reasoning results are published in the reasoning result application link.
- the reasoning results can be uniformly planned by the execution (actor) entity.
- the execution entity can send the reasoning results to one or more execution objects (for example, access network equipment or terminal equipment, etc.) for execution.
- the execution entity can also feedback the performance of the model to the data source to facilitate the subsequent implementation of model update training.
- a communication system may include network elements with artificial intelligence capabilities.
- the aforementioned AI model design-related steps may be performed by one or more network elements with artificial intelligence capabilities.
- AI functions (such as AI modules or AI entities) may be configured within existing network elements in the communication system to implement AI-related operations, such as AI model training and/or inference.
- the existing network element may be an access network device or a terminal device.
- an independent network element may be introduced into the communication system to perform AI-related operations, such as training an AI model.
- the independent network element may be referred to as an AI network element (or AI node, AI entity), etc., and the embodiments of the present application do not limit this name.
- the AI network element may be directly connected to an access network device in the communication system, or indirectly connected to the access network device through a third-party network element.
- the third-party network element may be a network device such as an authentication management function (AMF) network element, a user plane function (UPF) network element, an OAM, a server (such as a cloud server), or other network element, without limitation.
- AMF authentication management function
- UPF user plane function
- OAM a server
- the independent AI network element can be deployed on one or more of the following: access network equipment, terminal equipment, or core network. Alternatively, it can be deployed on a server, such as a cloud server, or on an over-the-top (OTT) device.
- AI network element 240 in the communication system shown in FIG2 is shown.
- the training process of different models can be deployed in different devices or nodes, or in the same device or node.
- the inference process of different models can be deployed in different devices or nodes, or in the same device or node.
- the terminal device can train the matching encoder and decoder, and then send the model parameters of the decoder to the network device.
- the network device trains the matching encoder and decoder, it can indicate the model parameters of the encoder to the terminal device.
- the AI network element can train the matching encoder and decoder, and then send the model parameters of the encoder to the terminal device and the model parameters of the decoder to the network device. Then, the model inference phase corresponding to the encoder is performed in the terminal device, and the model inference phase corresponding to the decoder is performed in the network device.
- the model parameters may include one or more of the following structural parameters of the model (such as the number of layers and/or weights of the model, etc.), the input parameters of the model (such as input dimension, number of input ports), or the output parameters of the model (such as output dimension, number of output ports).
- the input dimension may refer to the size of an input data.
- the input dimension corresponding to the sequence may indicate the length of the sequence.
- the number of input ports may refer to the number of input data.
- the output dimension may refer to the size of an output data.
- the output dimension corresponding to the sequence may indicate the length of the sequence.
- the number of output ports may refer to the number of output data.
- Knowledge distillation uses a teacher-student model, where a complex and large model serves as the teacher, while the student model, with a simpler structure, assists in the training of the student model.
- the teacher with its strong learning ability, can transfer its acquired knowledge to the less capable student model, thereby enhancing the student model's generalization capabilities.
- the complex, cumbersome, but effective teacher model is not deployed online and serves merely as a mentor.
- the flexible and lightweight student model is the one that is actually deployed online for prediction tasks.
- model distillation In model distillation, a common practice is to transfer the teacher model's input and output as a dataset to the student model, which then trains the model based on the dataset. Similarly, during dual-end model training, the dataset must be transferred to the peer end to ensure that the peer end trains the network based on the received dataset.
- Bit width can be understood as the number of bits that data occupies in a computer system.
- bit width refers to the amount of data that can be transferred at once by memory or video memory. Simply put, it's the width of data that can be transferred at once. The larger the bit width, the more data can be transferred at once, significantly improving graphics card performance.
- the training data used to train the AI model includes training samples and sample labels.
- the training sample is a channel determined by the terminal device.
- the sample labels are the real channel information, i.e., the true CSI.
- the training data can only include training samples, or the training samples are the sample labels.
- true CSI can be understood as high-precision CSI.
- the specific training process includes: the model training node uses the encoder to process the channel information, that is, the training sample, to obtain CSI feedback information, and uses the decoder to process the feedback information to obtain the recovered channel information, that is, the CSI recovery information, and then calculates the difference between the CSI recovery information and the corresponding sample label, that is, the value of the loss function, and updates the parameters of the encoder and decoder according to the value of the loss function, so that the difference between the recovered channel information and the corresponding sample label is minimized, that is, the loss function is minimized.
- the loss function can be the minimum mean square error (MSE) or cosine similarity. Repeat the above operations to obtain an encoder and decoder that meet the target requirements.
- the above-mentioned model training node can be a terminal device, a network device, or other network elements with AI functions in the communication system.
- the AI model for CSI compression as an example.
- the AI model can also be used in other scenarios within CSI feedback.
- the AI model can be used for CSI prediction, i.e., predicting channel information at one or more future moments based on channel information measured at one or more historical moments.
- the embodiments of this application do not limit the specific use of the AI model in CSI feedback scenarios.
- a network training scheme with predetermined accuracy is usually adopted, that is, network model parameters, network gradient parameters, or data sets of the same bit width (for example, floating point type 32 bits, or larger) are transmitted over the air interface to achieve network training and convergence.
- bit width for example, floating point type 32 bits, or larger
- the above network training may involve a large number of floating-point value operations, and high-precision floating-point calculations are usually slow and power-consuming.
- floating-point data is usually 32 bits or larger, requiring a large storage space and time.
- high-bit-width floating-point values may inhibit the execution of neural networks on devices with less available computing resources or limited power.
- training neural networks with high-bit-width floating points will result in higher hardware consumption.
- transmitting network model parameters, network gradient parameters, or data sets over the air interface using the same bit width not only results in high energy resource consumption and air interface transmission overhead, but also limits the promotion of artificial intelligence in the wireless field.
- the embodiments of the present application provide a communication method and a communication device that can effectively improve network training performance.
- the communication method provided by the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
- the embodiment provided in the present application can be applied to the communication system shown in Figure 1 or Figure 2 above, without limitation.
- the present application proposes the following methods shown in Figures 7 to 10. It should be understood that the method embodiments shown in Figures 7 to 10 can be combined with each other, and the steps in the method embodiments shown in Figures 7 to 10 can be referenced to each other.
- the method embodiments shown in Figures 8 to 10 can be regarded as possible implementation methods for realizing the functions of the method embodiment shown in Figure 7.
- Figure 7 is a flow chart of a communication method 700 provided in an embodiment of the present application. As shown in Figure 7, the method flow can be executed by a first device and a second device.
- the first device for example, the first terminal
- the second device can be a network device, or a chip or circuit in the network device, or a functional module in the network device that can call and execute a program.
- the first device or the second device can also be an AI entity (also called an AI network element), such as a model training network element, a model storage network element, or a model inference network element, such as an OAM, OTT, or a third-party device such as a cloud server.
- an AI entity also called an AI network element
- a model training network element such as a model training network element, a model storage network element, or a model inference network element, such as an OAM, OTT, or a third-party device such as a cloud server.
- a first device obtains a first bit width of a first parameter.
- the first bit width belongs to the first bit width set
- the first parameter belongs to the first parameter set
- the multiple bit widths in the first bit width set have a corresponding relationship with the multiple parameters in the first parameter set
- the multiple parameters are used to train the first model
- each parameter includes one or more of the following: model parameters and/or gradient parameters.
- the present application does not specifically limit the number of bit widths in the first bit width set and the number of parameters in the first parameter set.
- the first bit width may represent the number of bits occupied by the first parameter
- the multiple bit widths in the first bit width set represent the number of bits occupied by the multiple parameters in the first parameter set.
- the bit width may be float16, float32, double float64, or (128-bit floating point type, complex 128), etc.
- the bit width may be indicated by bits, for example, bit “00” is used to represent float16, bit “01” is used to represent float32, bit “10” is used to represent double float64, and bit "11” is used to represent complex 128. This application does not specifically limit the value and representation of the bit width.
- the first parameter may include a first model parameter and/or a first gradient parameter, wherein the first model may be a neural network, a Gaussian mixture model (GMM), a variational autoencoder (VAE), or a generative adversarial network (GAN).
- GMM Gaussian mixture model
- VAE variational autoencoder
- GAN generative adversarial network
- the model parameters of the neural network may include one or more of the following: the number of layers of the neural network, the width of the neural network, the weights or biases of the neurons, the parameters in the activation function of the neurons, the input parameters of the model (such as the input dimension, the number of input ports), the output parameters of the model (such as the output dimension, the number of output ports), the number of ... number), or identification information of the first parameter, etc.
- the input parameters of the model such as the input dimension, the number of input ports
- the output parameters of the model such as the output dimension, the number of output ports
- identification information of the first parameter etc.
- the gradient parameter of the neural network may include one or more of the following: the number of layers of the neural network, the width of the neural network, the gradient corresponding to the weight or bias of the neuron, or the identification information of the first parameter, etc.
- the first parameter may include one or more of the following: the number of layers of the neural network, the width of the neural network, the gradient corresponding to the weight or bias of the neuron, or the identification information of the first parameter, etc.
- the gradient is a vector that can represent the partial derivative of an n-dimensional function f with respect to n variables.
- the gradient indicates the point at which the directional derivative of a function reaches its maximum value along that direction.
- the function changes most rapidly along that direction (the direction of the gradient) at that point, with the highest rate of change (the modulus of the gradient).
- the gradient parameter is a parameter used to describe the rate of change of a function at a particular point.
- multiple bit widths in the first bit width set have a corresponding relationship with multiple parameters in the first parameter set, which can be understood as: multiple bit widths correspond one-to-one to multiple parameters, or multiple parameters correspond to one bit width.
- the first bit width set includes bit width #1 and bit width #2, and the multiple parameters in the first parameter set include parameter #1 and parameter #2.
- bit width #1 can correspond to parameter #1, indicating that the number of bits occupied by parameter #1 is bit width #1
- bit width #2 can correspond to parameter #2, indicating that the number of bits occupied by parameter #2 is bit width #2.
- the first bit width set includes bit width #1 and bit width #2, and the multiple parameters in the first parameter set include parameter #1, parameter #2, bit width #3, and bit width #4.
- bit width #1 can correspond to parameter #1 and parameter #2, indicating that the number of bits occupied by parameter #1 and parameter #2 is both bit width #1
- bit width #2 can correspond to parameter #3 and bit width #4, indicating that the number of bits occupied by parameter #3 and parameter #4 is both bit width #2.
- the first parameter set includes N second parameters, the second bit width corresponding to at least one of the N second parameters is different from the first bit width, the second bit width belongs to the first bit width set, and N is an integer greater than or equal to 1.
- the present application does not limit whether the second bit widths corresponding to the N second parameters are the same, as long as at least two of the multiple bit widths corresponding to the multiple parameters in the first parameter set are different.
- N is equal to 1
- the first parameter set includes two parameters, namely the first parameter and the second parameter, wherein the first bit width corresponding to the first parameter is different from the second bit width corresponding to the second parameter;
- bit widths corresponding to the multiple second parameters can be the same or different. It should be noted that when bit width #1, bit width #2, and bit width #3 are the same, the first bit width corresponding to the first parameter is different from the bit width #1, bit width #2, and bit width #3.
- the correspondence between multiple bit widths in the first bit width set and multiple parameters in the first parameter set can be predefined, and the predefinition can include pre-definition, such as protocol definition; or, the correspondence can be configured or pre-configured through signaling, and the preconfiguration can be achieved by pre-saving corresponding codes, tables or other methods that can be used to indicate the correspondence in the first device and/or the second device.
- the predefinition can include pre-definition, such as protocol definition
- the correspondence can be configured or pre-configured through signaling, and the preconfiguration can be achieved by pre-saving corresponding codes, tables or other methods that can be used to indicate the correspondence in the first device and/or the second device.
- the correspondence may exist in the form of a table, function, text, or string, such as for storage or transmission.
- the following table illustrates the correspondence between the multiple bit widths in the first bit width set and the multiple parameters in the first parameter set.
- the correspondence between the multiple bit widths in the first bit width set and the multiple model parameters in the first parameter set can satisfy the following Tables 1 and 2
- the correspondence between the multiple bit widths in the first bit width set and the multiple gradient parameters in the first parameter set can satisfy the following Tables 3 and 4.
- the first bit width set includes bit width #1, bit width #2, bit width #3, and bit width #4.
- the first parameter set includes model parameter #1, model parameter #2, model parameter #3, and model parameter #4, where bit width #1 corresponds to model parameter #1, bit width #2 corresponds to model parameter #2, bit width #3 corresponds to model parameter #3, and bit width #4 corresponds to model parameter #4, that is, one bit width corresponds to one model parameter, or in other words, each model parameter has a different bit width. If the first parameter is model parameter #1, the first bit width is bit width #1.
- bit width #1 is float16
- bit width #2 is float32
- bit width #3 is double float64
- bit width #4 is complex 128
- model parameter #1 is the number of layers of the neural network
- model parameter #2 is the width of the neural network
- model parameter #3 is the input dimension of the model
- model parameter #4 is the output dimension of the model
- the bit widths of the model parameters (such as the number of layers, width, input dimension, and output dimension of the neural network) and the bit widths of the merged model parameters (such as the number of layers, width, input dimension, and output dimension of the neural network) sent by the second device are float16, float32, double float64, and complex 128, respectively.
- bit width #1, bit width #2, bit width #3, and bit width #4 in Table 1 above can be represented by bits, for example, using bits “00”, “01", “10”, or “11” to represent bit width #1, bit width #2, bit width #3, and bit width #4, respectively.
- the first bit width set includes bit width #1 and bit width #2.
- the first parameter set includes model parameter #1, model parameter #2, model parameter #3 and model parameter #4, where bit width #1 corresponds to model parameter #1, and bit width #2 corresponds to model parameter #2, model parameter #3 and model parameter #4. That is, one bit width can correspond to one model parameter, or one bit width corresponds to multiple model parameters, or different model parameters can have the same bit width or different bit widths.
- bit width #1 is float16 and bit width #2 is float32
- model parameter #1 is the number of layers of the neural network
- model parameter #2 is the width of the neural network
- model parameter #3 is the input dimension of the model
- model parameter #4 is the output dimension of the model
- bit width of the model parameters such as the number of layers of the neural network
- the bit width of the merged model parameters such as the number of layers of the neural network
- the bit widths of the model parameters (such as the width, input dimension, and output dimension) obtained by network training reported by the first device and the bit widths of the merged model parameters (such as the width, input dimension, and output dimension) sent by the second device are float32, double float64, and complex 128, respectively.
- bit width #1 and bit width #2 in Table 2 above can be represented by bits, for example, using bits "0" and "1" to represent bit width #1 and bit width #2, respectively.
- the first bit width set includes bit width #1, bit width #2, bit width #3, and bit width #4.
- the first parameter set includes gradient parameter #1, gradient parameter #2, gradient parameter #3, and gradient parameter #4.
- Bit width #1 corresponds to gradient parameter #1
- bit width #2 corresponds to gradient parameter #2
- bit width #3 corresponds to gradient parameter #3
- bit width #4 corresponds to gradient parameter #4. That is, one bit width corresponds to one gradient parameter, or in other words, each gradient parameter has a different bit width. If the first parameter is gradient parameter #1, the first bit width is bit width #1.
- the corresponding second bit widths are bit width #2, bit width #3, and bit width #4, respectively.
- bit width #1 is float16
- bit width #2 is float32
- bit width #3 is double float64
- bit width #4 is complex 128, then the bit widths of the gradient parameters (e.g., gradient parameter #1, gradient parameter #2, gradient parameter #3, and gradient parameter #4) obtained from network training reported by the first device, and the bit widths of the merged gradient parameters (e.g., gradient parameter #1, gradient parameter #2, gradient parameter #3, and gradient parameter #4) sent by the second device are float16, float32, double float64, and complex 128, respectively.
- bit width #1, bit width #2, bit width #3, and bit width #4 in Table 3 above can be represented by bits, for example, using bits “00,” “01,” “10,” or “11” to represent bit width #1, bit width #2, bit width #3, and bit width #4, respectively.
- the first bit width set includes bit width #1, bit width #2, and bit width #3.
- the first parameter set includes gradient parameter #1, gradient parameter #2, gradient parameter #3, and gradient parameter #4.
- bit width #1 corresponds to gradient parameter #1 and gradient parameter #2
- bit width #2 corresponds to gradient parameter #3
- bit width #3 corresponds to gradient parameter #4. That is, one bit width can be used for A gradient parameter corresponds to one bit width, or a plurality of gradient parameters corresponds to one bit width, or different gradient parameters can have the same bit width or different bit widths.
- bit width #1 is float16
- bit width #2 is float32
- bit width #3 is double float64
- bit width #1, bit width #2, and bit width #3 in Table 4 above can be represented by bits, for example, using bits “0", “1", and “default” to represent bit width #1, bit width #2, and bit width #3, respectively.
- the present application does not specifically limit the number of parameters in the first parameter set involved in Tables 1 to 4, and the number of bit widths in the first bit width set, or in other words, the present application does not limit the number of corresponding relationships between multiple bit widths and multiple parameters involved in Tables 1 to 4 (for example, a row in a table).
- bit width #1 to bit width #3, and bit width #4 in Table 1 can be independently formed into new tables
- bit width #1 and bit width #2, and bit width #3 and bit width #4 in Table 4 can be independently formed into new tables, that is, Tables 1 to 4 can be split into multiple other tables for examples, and the present application does not limit this, nor does it limit the splitting method.
- the present application does not limit the values of the bit width involved in Tables 1 to 4 above, and the model parameters and/or gradient parameters corresponding to the bit width, wherein the values of the bit width and the parameters corresponding to the bit width can be predefined or preconfigured, or can be configured by signaling.
- the above Tables 1 to 4 can be implemented independently or in combination.
- the above Tables 1 and 3 can be combined into one table, and this application does not limit this.
- one or more rows in Table 1 can be reflected in one table with one or more corresponding rows in Table 3.
- the corresponding relationship shown in the first two rows of Table 1 and the corresponding relationship shown in the first two rows of Table 3 can be combined into one table, and the corresponding relationship shown in the last two rows of Table 1 and the corresponding relationship shown in the last two rows of Table 3 can be combined into one table, and this application does not limit this.
- the following is an example of a specific implementation method in which the first device obtains the first bit width of the first parameter.
- the first bit width of the first parameter can be determined autonomously by the first device.
- the first device can determine the first bit width based on its own capability information, hardware conditions, hardware power, or battery capacity. For example, if the current memory of the first device cannot support network training with a large number of model parameters, or the current power of the first device cannot support high-precision bit width model training, the first device can selectively determine the first bit width.
- the first bit width can be a bit width with lower precision, such as float16.
- the first bit width can be predefined or preconfigured.
- the predefinition can include predefinition, such as protocol definition.
- the preconfiguration can be achieved by pre-saving the corresponding code, table or other methods that can be used to indicate the first bit width in the first device. This application does not limit its specific implementation method.
- the first bit width can be configured via signaling.
- the first bit width of the first parameter can be determined by a second device (e.g., a base station) and indicated to the first communication device.
- the second device sends third information to the first device, where the third information indicates the first bit width.
- the first device receives the third information from the second device and determines the first bit width of the first parameter based on the third information.
- the method further includes: the first device sending a second bit width set to the second device, wherein the second bit width set includes the first bit width.
- the first device supports the bit widths in the second bit width set. Accordingly, the second device receives the second bit width set from the first device and selects the first bit width from the second bit width set.
- the second bit width set may be a set including multiple bit widths supported by the first device.
- the second bit width set includes float16, float32, and double float64, indicating that the first bit width corresponding to the first parameter supported by the first device may be float16, float32, or double float64.
- the second bit width set may include the upper and lower limits of the bit widths supported by the first device.
- the second bit width set includes float16 and float32, indicating that the first bit width corresponding to the first parameter supported by the first device may be any bit width between float16 and float32. This application does not limit the representation of the second bit width set.
- bit widths in the second bit width set are relative to the first bit width. That is, the first bit width corresponding to the first parameter reported by the first device can be any bit width in the second bit width set.
- the second bit width set can include one or more of the following: float16, float32, double float64, or complex 128. That is, the first bit width corresponding to the first parameter can be any one of float16, float32, double float64, or complex 128.
- the multiple bit widths in the first bit width set are for the multiple model parameters and/or multiple gradient parameters of the first model.
- the first bit width set float16, float32 and double float64 correspond to model parameter #1, model parameter #2 and model parameter #3 respectively.
- the values of the multiple bit widths in the first bit width set and the multiple bit widths in the second bit width set can be The same or different, this application does not limit this.
- the first bit width is selected from the second bit width set.
- the second device can select the first bit width from the second bit width set based on at least one of time-frequency resources, the bit width requirement range of different terminals for the first parameter, the hardware conditions of different terminals, hardware power, battery capacity, or data set quality of different terminals, and indicate the first bit width to the first device.
- the second device may comprehensively consider factors such as the size of currently available time-frequency resources, the relatively low battery capacity of terminal #1, the relatively good hardware conditions of terminal #2, and the relatively high precision requirement of terminal #2 for the first parameter, and ultimately determine to allocate float16 as the bitwidth to terminal #1 and double float64 as the bitwidth to terminal #2.
- the second device also indicates to terminal #1 that the first bitwidth corresponding to the first parameter is float16, and to terminal #2 that the first bitwidth corresponding to the first parameter is double float64. It should be understood that this application does not limit the specific implementation manner in which the second device selects the first bitwidth from the second bitwidth set.
- the second device may send a reference bit width (e.g., an example of the fourth bit width) to the first device.
- a reference bit width e.g., an example of the fourth bit width
- the second device sends fourth information to the first device, the fourth information being used to request training of the first model, the fourth information including the fourth bit width.
- the fourth bit width is a bit width supported by the second device. Accordingly, the first device receives the fourth information from the second device and determines the first bit width.
- the fourth message may be a broadcast message, for example, the second device may send a fourth request message to multiple first devices; or, the fourth message may be high-layer signaling, such as radio resource control (RRC) signaling.
- RRC radio resource control
- the first device determines the first bit width based on the fourth bit width, in which case the first bit width is typically less than or equal to the fourth bit width.
- the fourth bit width may be float32 or float32.
- the fourth bit width determined by the first device may be float32.
- the fourth bit width serves as a reference bit width provided by the second device to the first device, and the first bit width determined by the first device may be less than or equal to the fourth bit width, or may be greater than the fourth bit width, and this application does not impose any limitations on this.
- the fourth information may further include structural parameters of the first model and/or parameter quantities of the first model.
- the structural parameters of the first model may include one or more of the following: fully connected, CNN, RNN, or Transformer (e.g., encoder-decoder), and the parameter quantities of the first model include floating-point operations and/or addition and multiplication operations.
- the fourth bit width can be a bit width supported by the second device, can be the maximum bit width, such as double float64, can be a bit width range supported by the second device, such as float16 to double float64, or can be a bit width set supported by the second device, such as float32, double float64, and complex 128.
- the fourth bit width can be a specific bit width value supported by the second device, can be replaced by a bit width set supported by the second device, or can be replaced by the upper and lower limits of the bit width supported by the second device (corresponding to the bit width range). It should be understood that this application does not limit the specific implementation of the fourth bit width.
- S720 The first device sends first information to the second device.
- the second device receives the first information from the first device.
- the first information is used to indicate the first bit width.
- the first device can send the bit widths corresponding to multiple parameters in the first parameter set to the second device, where the multiple parameters include the first parameter, or in other words, the first device can report multiple bit widths corresponding to all model parameters and/or gradient parameters used for first model training to the second device, where the multiple bit widths include the first bit width, for example, the first device sends the above-mentioned Table 1 and/or Table 3 to the second device.
- the first device may send only the first bit width corresponding to the first parameter to the second device, such as the bit width #1 corresponding to the model parameter #1 in Table 1 above, or the bit width #2 corresponding to the gradient parameter #3 in Table 4 above.
- bit width #2, bit width #3 and bit width #4 in the first bit width set are predefined or preconfigured.
- bit width #2, bit width #3 and bit width #4 can be regarded as a default value such as float32. Since the bit width #1 corresponding to model parameter #1 adopts a non-default value, the first device can only report the bit width #1 corresponding to model parameter #1 to the second device, such as float16, and report Table 1 is not required at this time; alternatively, the first device can also report the complete Table 1 to the second device, including model parameters #1 to model parameters #4 and their corresponding bit widths #1 to bit width #4; alternatively, the first device can also report Table 1, and only mark model parameter #1 and its bit width #1 separately in Table 1, etc.
- the first information may exist in the form of bits, for example, bit “00” represents float16, "01” represents float32, “10” represents double float64, “11” represents complex 128, etc.
- the first device and the second device align multiple model parameters and their corresponding bit widths for training the first model. Subsequently, the first device can train the first model and report the training results of the multiple model parameters to the second device, and the second device can perform aggregation processing based on the training results of the multiple model parameters reported by different first devices, and send the aggregated training results of the multiple model parameters to different first devices, so that multiple first devices can update local models until the training of the first model is completed.
- At least two first devices obtain a first training result, the first training result corresponding to the first parameter, the first training result being obtained based on training of the first model, and then the at least two first devices send the first training result to the second device according to the first bit width. That is, the bit width of the first training results reported by at least two first devices is the first bit width.
- the first parameter includes model parameter #1, model parameter #2 or model parameter #3, wherein the first bit width of the first parameter obtained by terminal #1 corresponds to float16, float32 or double float64, respectively, and the first bit width of the first parameter obtained by terminal #2 corresponds to float32, float32 or complex 128, respectively.
- the first training result sent by terminal #1 to the second device includes the training result of model parameter #1, model parameter #2 or model parameter #3, and the corresponding bit widths are float16, float32 or double float64, respectively.
- terminal #1 reports the training results of model parameter #1, model parameter #2 or model parameter #3 with a bit width of float16, float32 or double float64, respectively.
- the first training result sent by terminal #2 to the second device includes the training results of model parameter #1, model parameter #2 or model parameter #3, and the corresponding bit widths are float32, float32 or complex 128, respectively.
- terminal #2 reports the training results of model parameter #1, model parameter #2 or model parameter #3 with bit widths of float32, float32 or complex 128, respectively.
- a second device receives at least two first training results from at least two first devices, aggregates the at least two first training results to obtain a second training result, and then sends the second training result to the at least two first devices based on the first bit width. Accordingly, the at least two first devices each receive the second training results and train or update the first model based on the second training results.
- the bit width of the second training result sent by the second device is the first bit width.
- the first training result sent by terminal #1 to the second device includes the training results of model parameter #1, model parameter #2 or model parameter #3, and the corresponding bit widths are float16, float32 or float32, respectively.
- terminal #1 reports the training results of model parameter #1, model parameter #2 or model parameter #3 with a bit width of float16, float32 or float32, respectively
- the first training result sent by terminal #2 to the second device includes the training results of model parameter #1, model parameter #2 or model parameter #3, and the corresponding bit widths are float16, float32 or float32.
- terminal #2 reports the training results of model parameter #1, model parameter #2 or model parameter #3 with a bit width of float16, float16 or float32 respectively
- the second device can merge the training results of model parameter #1, model parameter #2 or model parameter #3 reported by terminal #1 and terminal #2 according to rule #1 to obtain a second training result, that is, the second training result includes the training result of model parameter #1, model parameter #2 or model parameter #3 after being merged.
- rule #1 may be to merge according to a standard of the maximum bit width.
- the second device may merge the model parameters corresponding to float16 or float32 into double float64 model parameters. Further, the second device then converts the merged double float64 model parameters #1, model parameter #2, or model parameter #3 into float16, float32, or float32 model parameters #1, model parameter #2, and model parameter #3, respectively, and sends them to terminal #1 for terminal #1 to update or train the first model.
- the second device then converts the merged double float64 model parameters #1, model parameter #2, or model parameter #3 into float16, float16, or float32 model parameters #1, model parameter #2, and model parameter #3, respectively, and sends them to terminal #2 for terminal #2 to update or train the first model.
- rule #1 can be to merge according to the standard of the maximum bit width currently used.
- the second device can merge the model parameters corresponding to float16 or float32 into float32 model parameters.
- the second device converts the merged float32 model parameter #1, model parameter #2 or model parameter #3 into float16, float32 or float32 model parameter #1, model parameter #2 and model parameter #3 in sequence, and sends them to terminal #1 for terminal #1 to update or train the first model.
- the second device converts the merged float32 model parameter #1, model parameter #2 or model parameter #3 into float16, float16 or float32 model parameter #1, model parameter #2 and model parameter #3 in sequence, and sends them to terminal #2 for terminal #2 to update or train the first model.
- one or more bit widths in the first parameter set can be updated at any time.
- the first device can send second information to the second device, where the second information is used to indicate that the first bit width corresponding to the first parameter is updated to a third bit width, which is different from the first bit width.
- the third bit width is greater than the first bit width.
- the first device can instruct the second device to update the first bit width to the third bit width, for example, the first bit width is float16 and the second bit width is float32, so as to improve the network training performance.
- the third bit width is smaller than the first bit width.
- the first device can instruct the second device to update the first bit width to the third bit width, for example, the first bit width is double float64 and the second bit width is float16, thereby reducing overhead while ensuring network training performance.
- the second information may include a first time unit, which is used to indicate that within the first time unit, the first bit width corresponding to the first parameter is updated to the third bit width.
- the first time unit may be a certain moment or a certain time period. For example, if the first time unit is 15:00, the second information is used to indicate that the first bit width of the first parameter is updated to the third bit width at 15:00, that is, before 15:00 Before 15:00, the bit width of the training result or merged result of the model parameters exchanged between the first device and the second device is the first bit width, and after 15:00, the bit width of the training result or merged result of the model parameters exchanged between the first device and the second device is the third bit width.
- the second information is used to indicate that the first bit width of the first parameter is updated to the third bit width during 10:00-14:00, that is, within 10:00-14:00, the bit width of the training result or merged result of the model parameters exchanged between the first device and the second device is the third bit width, and before 10:00 or after 14:00, the bit width of the training result or merged result of the model parameters exchanged between the first device and the second device is the first bit width.
- a first device obtains a first bit width of first data and sends first information to a second device.
- the first information indicates the first bit width.
- the first bit width belongs to a first bit width set
- the first data belongs to a first data set
- multiple bit widths in the first bit width set correspond to multiple data (e.g., samples) in the first data set
- the first data set is used to train a first model.
- a dataset may also be referred to as a data group.
- the first data is data in the first dataset, and the first dataset corresponds to the first model.
- multiple data in the first dataset are used to train the first model.
- the first data may be part of the data in the first dataset. For example, if the first dataset includes four data, the first data may be one, two, or three of the data.
- the first data in this implementation method can correspond to the above-mentioned first parameter
- the first data set can correspond to the above-mentioned first parameter set
- multiple data correspond to multiple parameters. Therefore, the implementation method of the first device obtaining the first bit width of the first data in this implementation method, the implementation method of the first device sending the first information, the specific interpretation of the first bit width, the correspondence between the multiple bit widths in the first bit width set and the multiple data in the first data set, etc.
- the specific interpretations and expressions can be referred to the above-mentioned relevant descriptions. For the sake of brevity, they will not be explained here.
- the bit width corresponding to the multiple parameters for training the first model or the training results of the data set transmitted by the first device over the air interface can be non-single and variable to achieve network training and convergence.
- This implementation method not only saves resources, such as reducing air interface overhead, storage overhead and computing overhead, saving hardware resource consumption of the terminal, and helping the first device to train more complex models, but also can update the model parameters, gradient parameters and/or bit width corresponding to the data set of the first model in real time and flexibly. While improving network training performance, it can ensure the promotion of artificial intelligence in the wireless field.
- Figure 8 is a flow chart of a communication method 800 provided in an embodiment of the present application. As shown in Figure 8, the method is illustrated by taking the first device as a terminal device and the second device as a network device as the execution subjects, and taking the first parameter as a model parameter, and the terminal device and the network device transmitting model parameters of different bit widths over the air interface as an example. The method includes the following steps.
- S810 The network device sends fourth information to the terminal device.
- the terminal device receives the fourth information from the network device.
- the fourth information includes one or more of the following: model structure, model parameter quantity, or model reference bit width (i.e., fourth bit width).
- model structure is CNN
- model parameter quantity is floating-point operation quantity
- model reference bit width is double float64.
- the network device determines the model structure, model parameter quantity, or model reference bit width based on the training task of the first model, and sends fourth information to the terminal device that may potentially participate in the model training.
- the training task of the first model such as a horizontal federated learning training task or a vertical federated learning task, can be initiated by the terminal device, or it can also be initiated by the network device, which is not limited in this application.
- S820 The terminal device sends a model bit width table to the network device.
- the network device receives the model bit width table from the terminal device.
- the model bit width table indicates the correspondence between the first parameter set and the first bit width set.
- the model bit width table can refer to Table 1 or Table 2 in the above method 700.
- the model bit width table includes model parameters #1 to model parameters #4 (for example, the first parameter set), and the corresponding bit widths include bit width #1 to bit width #4 (for example, the first bit width set).
- a terminal device can determine whether to participate in the training of a federated learning task based on its own hardware conditions, hardware power, battery capacity, and other factors. For example, if the current memory of the terminal device cannot support network training with a large number of model parameters, or the current battery of the terminal device cannot support high-precision bit-width model training, the terminal device can selectively report a model bit-width table, that is, at least two model parameters among the multiple model parameters included in the model bit-width table have different bit-widths (or precisions).
- the model bit width table reported by the terminal device can only reflect non-default values and their corresponding model parameters to reduce air interface transmission overhead.
- each bit width of the plurality of bit widths (eg, bit width #1 to bit width #4) included in the first bit width set is It can be a specific bit width value (such as float16), or it can be a bit width range (such as float16 to Double float64), or it can be the upper limit of the bit width (such as Double float64) and the lower limit of the bit width (such as float16) supported by the terminal device, or a combination of the bit width value and the bit width range.
- This application does not limit its expression.
- model bit width tables reported by different terminal devices may be the same or different, that is, for the same model parameter, the bit widths supported by different terminal devices may be the same or different.
- bit widths of the model parameters of different layers under the first model may be the same or different.
- the model bit width table #1 reported by UE1 the first model includes model parameter #1 corresponding to bit width #1, and model parameter #2 corresponding to bit width #2; in the model bit width table #2 reported by UE2, the first model includes model parameter #1 corresponding to bit width #1, and model parameter #2 corresponding to bit width #3.
- the first model is a neural network, which includes two layers, the model parameter #1 of the first layer corresponds to bit width #1, and the model parameter #2 corresponds to bit width #2, and the model parameter #1 of the second layer corresponds to bit width #2, and the model parameter #2 corresponds to bit width #3.
- each bit width in the model bit width table reported by the terminal device is a specific bit width value
- the following step S840 does not need to be executed.
- the model bit width table reported by the terminal device includes a bit width range
- the following step S840 needs to be executed, that is, the specific bit width is specified by the network device.
- the network device confirms that the terminal device participates in federated learning.
- the network device may send a signaling to the terminal device to determine whether the terminal device supports participation in federated learning. Furthermore, the network device sends a detailed model structure to the terminal device that confirms participation in federated learning training, which may include one or more of the number of model layers, the number of model features, the method of using the normalization layer, or the method of using the activation function. It should be understood that the detailed model structure can be any relevant information that can indicate the structure of the first model. It should be noted that if the model structure of the first model adopts Transformer, the detailed model structure sent by the network device may also include information such as the Head and/or position encoding number. Optionally, the detailed model structure may also be sent to the terminal device in step S810, and this application does not specifically limit this.
- the network device allocates model bit width #1 to the terminal device, and accordingly, the terminal device receives model bit width #1 (eg, first bit width) from the network device.
- model bit width #1 eg, first bit width
- the network device determines a specific model bit width #1 based on the bit width range and notifies the terminal device. For example, the network device may determine model bit width #1 from the bit width range based on at least one of time-frequency resources, bit width requirements of different terminal devices, hardware conditions of different terminal devices, hardware power, or battery capacity, and indicate model bit width #1 to the terminal device. For specific implementation methods, reference may be made to the relevant description of method 700 above. Optionally, the network device may indicate model bit width #1 using bits.
- the terminal device and the network device are aligned with each other in the first parameter set and the corresponding bit widths of the model parameters.
- the terminal device sends model parameters to the network device according to the model bit width table and/or model bit width #1.
- the network device receives the model parameters from the terminal device.
- the terminal device locally trains the network according to the detailed model structure sent by the network device, obtains a training result (e.g., a first training result) based on the local data, and performs one or more forward and backward transfers. For example, the terminal device adjusts the bit width of each model parameter according to the reported model bit width table, and uploads the training results of the model parameters with the corresponding bit width. For example, the terminal device reports the training results of the locally trained model parameters #1 and model parameters #2 in float16 and float32 respectively.
- a training result e.g., a first training result
- the network device sends the combined model parameters to the terminal device according to the model bit width table and/or model bit width #1.
- the terminal device receives the model parameters from the network device.
- each model parameter has its own corresponding bit width.
- the network device receives multiple model parameters (e.g., model parameter #1 and model parameter #2) reported by different terminal devices, and merges the multiple model parameters to obtain a merged training result (e.g., a second training result), and then sends the training results of the merged model parameters to different terminal devices according to the model bit width table reported by different terminal devices, or the model bit width #1 assigned by the network device.
- the network device sends the training results of the merged model parameters #1 and model parameters #2 to terminal device #1 in float16 and float32, respectively, and the network device sends the training results of the merged model parameters #1 and model parameters #2 to terminal device #2 in float32 and double float64, respectively.
- bit width corresponding to the training results of the merged model parameters #1 and model parameters #2 sent by the network device is consistent with the bit width in the model bit width table, or the model bit width #1 assigned by the network device. For example, if the training result of model parameter #1 reported by the terminal device corresponds to float16, and the training result of the merged model parameter #1 corresponds to float32, the network device needs to convert the training result of the merged model parameter #1 to float16 before executing step S850 and then send it to the terminal device.
- the implementation method of the network device merging model parameters #1 or model parameters #2 reported by different terminal devices can refer to the relevant description of method 700 above.
- the terminal device updates or trains the model based on the training results of the merged model parameters.
- the terminal device may update the bit width corresponding to at least one model parameter in the above-mentioned model bit width table.
- the terminal device may update the bit width corresponding to at least one model parameter in the above-mentioned model bit width table.
- This application does not limit the update time and update times of the bit width of at least one model parameter.
- bit widths of the multiple model parameters used for model training transmitted over the air interface by terminal devices can be variable and non-single, ensuring the generalization and convergence capabilities of the network.
- This implementation reduces the number of bits exchanged, lowering air interface transmission overhead, conserving resources, and saving hardware resources on terminal devices. It also allows for flexible updating of the bit widths of model parameters during federated learning, improving network training performance while ensuring the widespread adoption of AI in the wireless field.
- Figure 9 is a flow chart of a communication method 900 provided in an embodiment of the present application. As shown in Figure 9, the method is illustrated by taking the first device as a terminal device and the second device as a network device as the execution entities, and taking the first parameter as a gradient parameter, and the terminal device and the network device transmitting gradient parameters of different bit widths over the air interface as an example. The method includes the following steps.
- S910 The network device sends fourth information to the terminal device.
- the terminal device receives the fourth information from the network device.
- the fourth information includes one or more of the following: model structure, model parameter quantity, or gradient reference bit width (i.e., fourth bit width).
- model structure is CNN
- model parameter quantity is floating-point operation quantity
- gradient reference bit width is float32
- the training task of the first model such as a horizontal federated learning training task or a vertical federated learning task, can be initiated by a terminal device, or it can be initiated by a network device, and this application does not limit this.
- the terminal device sends the gradient bit width table to the network device.
- the network device receives the gradient bit width table from the terminal device.
- the gradient bit width table indicates the correspondence between the first parameter set and the first bit width set.
- the gradient bit width table may refer to Table 3 or Table 4 in the above method 700.
- the gradient bit width table includes gradient parameters #1 to gradient parameters #4 (for example, the first parameter set), and the corresponding bit widths include bit widths #1 to #4 (for example, the first bit width set).
- the gradient bit width tables reported by different terminal devices can be the same or different. That is, for the same model parameter, the bit widths supported by different terminal devices can be the same or different. In addition, the bit widths of the gradient parameters for different layers under the first model can be the same or different. For specific examples, please refer to the description of the model parameter bit width involved in step S820 of method 800 above.
- each bit width in the gradient bit width table reported by the terminal device is a specific bit width value
- the following step S940 does not need to be executed.
- the gradient bit width table reported by the terminal device includes a bit width range
- the following step S940 needs to be executed, that is, the specific bit width is specified by the network device.
- the network device confirms that the terminal device participates in federated learning.
- the network device allocates gradient bit width #1 to the terminal device, and accordingly, the terminal device receives the gradient bit width #1 (eg, the first bit width) from the network device.
- the gradient bit width #1 eg, the first bit width
- steps S930 and S940 For the implementation of the above steps S930 and S940, reference may be made to the relevant descriptions in steps S830 and S840 of the above method 800.
- the terminal device and the network device are aligned with each other in the first parameter set and the corresponding bit widths thereof.
- the terminal device sends gradient parameters to the network device according to the gradient bit width table and/or gradient bit width #1.
- the network device receives the gradient parameters from the terminal device.
- the terminal device locally trains the network according to the gradient detailed structure issued by the network device, obtains the Bottom network training result (e.g., the first training result) based on the local data, and performs one or more forward and backward passes. For example, the terminal device adjusts the bit width of each gradient parameter according to the reported gradient bit width table, and uploads the training results of the gradient parameters with the corresponding bit width. For example, the terminal device reports the Bottom training results of the locally trained gradient parameter #1 and gradient parameter #2 in float16 and float32 respectively.
- the Bottom network training result e.g., the first training result
- S960 The network device sends the combined gradient parameters to the terminal device according to the gradient bit width table and/or gradient bit width #1.
- the terminal device receives the gradient parameters from the network device.
- the network device receives multiple gradient parameters (such as gradient parameter #1 and gradient parameter #2) reported by different terminal devices, and converts the bit widths of the multiple gradient parameters to the bit width required by the TOP network using the gradient bit width table reported by the terminal device, and then merges the multiple gradient parameters to obtain the merged training result (such as the second training result), and then calculates the reverse gradient based on the loss, and sends the merged gradient parameter training results to different terminal devices based on the gradient bit width table reported by different terminal devices, or the gradient bit width #1 allocated by the network device. For example, the network device sends the merged gradient parameter #1 and gradient parameter #2 training results to the terminal devices in float16 and float32 respectively.
- multiple gradient parameters such as gradient parameter #1 and gradient parameter #2
- the network device sends the merged gradient parameter #1 and gradient parameter #2 training results to the terminal devices in float16 and float32 respectively.
- End device #1 and the network device send the merged training results of gradient parameter #1 and gradient parameter #2 to terminal device #2 in float32 and double float64, respectively.
- the network device needs to convert the merged training result of gradient parameter #1 to float16 before executing step S950 and then send it to the terminal device.
- the implementation method of the network device merging gradient parameters #1 or gradient parameters #2 reported by different terminal devices can refer to the relevant description of method 700 above.
- the terminal device updates or trains the model based on the training results of the merged gradient parameters.
- the terminal device may update the bit width corresponding to at least one gradient parameter in the above-mentioned gradient bit width table.
- the terminal device may update the bit width corresponding to at least one gradient parameter in the above-mentioned gradient bit width table.
- This application does not limit the update time and number of updates of the bit width of at least one gradient parameter.
- bit widths of the multiple gradient parameters used for model training transmitted over the air interface by terminal devices can be variable and non-single, ensuring the generalization and convergence capabilities of the network.
- This implementation reduces the number of bits exchanged, lowering air interface transmission overhead, conserving resources, and saving hardware resources on terminal devices. It also allows for flexible updating of the gradient parameter bit widths during federated learning, improving network training performance while ensuring the widespread adoption of AI in the wireless field.
- Figure 10 is a flow chart of a communication method 1000 provided in an embodiment of the present application. As shown in Figure 10 , the method is executed by a first device, which is a terminal device, and a second device, which is a network device. The method is described using an example in which the terminal device and the network device transmit first data sets of different bit widths over an air interface. The method includes the following steps.
- the terminal device sends a data set bit width table to the network device.
- the network device receives the data set bit width table from the terminal device.
- the data set bit width table indicates the correspondence between the first data set and the first bit width set.
- the data set bit width table includes data #1 to data #4 (for example, the first data set), and the corresponding bit widths include bit width #1 to bit width #4 (for example, the first bit width set), that is, each bit width corresponds to a bit width, and at least two bit widths among the bit width #1 to bit width #4 are different.
- the representation of the dataset bit width table can refer to the model bit width table involved in step S820 of the above method 800, or the relevant description of the gradient bit width table involved in step S920 of the above method 900.
- each bit width in the data set bit width table reported by the terminal device is a specific bit width value
- the following step S1020 does not need to be executed.
- the data set bit width table reported by the terminal device includes a data range, for example, the first data set includes multiple data (for example, sample#1, sample#2 and sample#3), and the bit width range corresponding to sample#1 reported by the terminal device is float16 to float32, then the following step S1020 needs to be executed, that is, the specific bit width of sample#1 is specified by the network device.
- the network device allocates data bit width #1 to the terminal device, and accordingly, the terminal device receives data bit width #1 (eg, the first bit width) from the network device.
- data bit width #1 eg, the first bit width
- the network device may determine data bit width #1 from the data range based on at least one of the time-frequency resources and the bit width requirement range of each data contained in the first data set (or the importance of each data), and indicate data bit width #1 to the terminal device. For example, if the bit width range corresponding to sample #1 reported by the terminal device is float16 to float32, the bit width of sample #1 allocated by the network device may be float32.
- the network device may indicate data bit width #1 in a bit manner. For specific implementation methods, please refer to the relevant description of the above method 700.
- the network device sends a first data set to the terminal device according to the data set bit width table and/or data bit width #1.
- the terminal device receives the first data set from the network device.
- a network device can send a data set to a terminal device based on a data set bit width table and/or data bit width #1.
- Each data in the data set has its own corresponding bit width, and at least two data have different bit widths.
- the terminal device implements model training based on the data set.
- the network device transmits the data set to the terminal device based on the data set bit width table and/or data bit width #1, so that the terminal device performs network training based on the received data set.
- the terminal device can update the bit width corresponding to at least one data in the above data set table.
- the specific implementation method can refer to the relevant description of the above method 700. This application does not limit the update time and number of updates of the bit width of at least one data.
- the terminal device transmits the data set for model training over the air interface with the corresponding bit width, so that the network device sends the data set according to the bit width corresponding to each data.
- This implementation method can reduce the amount of interactive bits, reduce air interface transmission overhead, save resources, and save hardware of terminal devices. Resource consumption can be reduced.
- the bit width of any data in the data set can be flexibly updated, which can improve the network training performance while ensuring the promotion of artificial intelligence in the wireless field.
- Figure 11 is a schematic diagram of a communication device 2000 provided in an embodiment of the present application.
- the communication device 2000 includes a processing module 2001 and a communication module 2002.
- the communication device 2000 can be a first device, including a terminal device, or a communication device applied to a terminal device, or used in conjunction with a terminal device and capable of implementing a method executed by the terminal device, such as a chip, a chip system or a circuit.
- the communication device 2000 can be a second device, including a network device, or a communication device applied to a network device, or used in conjunction with a network device and capable of implementing a method executed by the network device, such as a chip, a chip system or a circuit.
- the communication module may also be referred to as a transceiver module, transceiver, transceiver, or transceiver device.
- the processing module may also be referred to as a processor, processing board, processing unit, or processing device.
- the communication module is used to perform the sending and receiving operations of the terminal device or network device in the above method.
- the device used to implement the receiving function in the communication module can be considered a receiving unit, and the device used to implement the sending function in the communication module can be considered a sending unit. That is, the communication module includes a receiving unit and a sending unit.
- the processing module 2001 can be used to implement the processing function of the first device in the above embodiments
- the communication module 2002 can be used to implement the transceiver function of the first device in the above embodiments.
- the processing module 2001 is used to obtain the first bit width of the first parameter, the first bit width belongs to the first bit width set, the first parameter belongs to the first parameter set, the multiple bit widths in the first bit width set have a corresponding relationship with the multiple parameters in the first parameter set, the multiple parameters are used to train the first model, and each parameter includes one or more of the following: model parameters and/or gradient parameters; the communication module 2002 is used to send the first information, and the first information is used to indicate the first bit width.
- the first parameter set includes N second parameters, the second bit width corresponding to at least one of the N second parameters is different from the first bit width, the second bit width belongs to the first bit width set, and N is an integer greater than or equal to 1.
- the processing module 2001 is further configured to obtain a first training result, the first training result corresponds to the first parameter, and the first training result is obtained based on training of the first model; the communication module 2002 is further configured to send the first training result according to the first bit width.
- the communication module 2002 is further configured to receive a second training result, the bit width corresponding to the second training result being the first bit width, and the second training result is used by the first terminal to train the first model.
- the second training result is obtained by aggregating at least two first training results reported by at least two terminals, and the at least two terminals are used to jointly train the first model.
- the communication module 2002 is further configured to send second information, where the second information is used to indicate that the bit width corresponding to the first parameter is updated to a third bit width, which is different from the first bit width.
- the second information includes a first time unit, and the second information is used to indicate that within the first time unit, the bit width corresponding to the first parameter is updated to a third bit width.
- the communication module 2002 is further configured to receive third information, where the third information is used to indicate the first bit width.
- the communication module 2002 is further configured to send a second bit width set, the first terminal supports bit widths in the second bit width set, and the first bit width belongs to the second bit width set.
- the communication module 2002 is also used to receive fourth information, which is used to request training of the first model, and the fourth information includes one or more of the following: structural parameters of the first model, parameter quantity of the first model, or a fourth bit width; wherein the fourth bit width is the bit width supported by the network side.
- the first bit width is less than or equal to the fourth bit width.
- the processing module 2001 can be used to implement the processing function of the first device in the above embodiments
- the communication module 2002 can be used to implement the transceiver function of the first device in the above embodiments.
- the processing module 2001 is used to obtain the first bit width of the first data, the first bit width belongs to a first bit width set, the first data belongs to a first data set, the multiple bit widths in the first bit width set correspond to the multiple data in the first data set, and the first data set is used to train the first model; the communication module 2002 is used to send the first information, and the first information is used to indicate the first bit width.
- the processing module 2001 can be used to implement the processing function of the second device in the above embodiments
- the communication module 2002 can be used to implement the transceiver function of the second device in the above embodiments.
- the processing module 2001 is used to obtain the first bit width of the first parameter, the first bit width belongs to the first bit width set, the first parameter belongs to the first parameter set, multiple bit widths in the first bit width set have a corresponding relationship with multiple parameters in the first parameter set, and multiple parameters are used to obtain the first bit width of the first parameter.
- each parameter includes one or more of the following: a model parameter and/or a gradient parameter.
- the first parameter set further includes N second parameters, at least one of the N second parameters corresponds to a second bit width different from the first bit width, the second bit width belongs to the first bit width set, and N is an integer greater than or equal to 1.
- the communication module 2002 is used to receive at least two first training results reported by at least two terminals, and the at least two terminals are used to jointly train the first model; the processing module 2001 is used to perform aggregation processing based on the at least two first training results to obtain a second training result, and the second training result is used to train the first model; the communication module 2002 is also used to send the second training result according to the first bit width.
- the communication module 2002 is further configured to receive second information, where the second information is used to indicate that the bit width corresponding to the first parameter is updated to a third bit width, and the third bit width is different from the first bit width.
- the second information includes a first time unit, and the second information is used to indicate that within the first time unit, the bit width corresponding to the first parameter is updated to a third bit width.
- the communication module 2002 is further configured to receive first information, where the first information is used to indicate a first bit width.
- the communication module 2002 is further configured to receive a second bit width set; the processing module 2001 is further configured to select a first bit width from the second bit width set; and the communication module 2002 is further configured to send third information, where the third information is configured to indicate the first bit width.
- the communication module 2002 is also used to send fourth information, which is used to request training of the first model.
- the fourth information includes one or more of the following: structural parameters of the first model, parameter quantities of the first model, or a fourth bit width; wherein the fourth bit width is the bit width supported by the network side.
- the first bit width is less than or equal to the fourth bit width.
- the processing module 2001 can be used to implement the processing function of the second device in the above embodiments
- the communication module 2002 can be used to implement the transceiver function of the second device in the above embodiments.
- the processing module 2001 is used to obtain the first bit width of the first data, the first bit width belongs to a first bit width set, the first data belongs to a first data set, multiple bit widths in the first bit width set correspond to multiple data in the first data set, and the first data set is used to train the first model.
- the aforementioned communication module and/or processing module can be implemented by a virtual module, for example, the processing module can be implemented by a software functional unit or a virtual device, and the communication module can be implemented by a software function or a virtual device.
- the processing module or the communication module can also be implemented by a physical device, for example, if the device is implemented using a chip/circuit (such as an integrated circuit or a logic circuit, etc.).
- the communication module can be an input and output circuit and/or a communication interface, performing input operations (corresponding to the aforementioned receiving operations) and output operations (corresponding to the aforementioned sending operations); the processing module is an integrated processor or microprocessor or circuit (such as an integrated circuit or a logic circuit, etc.).
- the division of modules in this application is illustrative and represents only a logical functional division. In actual implementation, other division methods may be used. Furthermore, the functional modules in the examples of this application may be integrated into a single processor, exist physically as separate modules, or two or more modules may be integrated into a single module. The aforementioned integrated modules may be implemented in either hardware or software functional modules.
- FIG12 is a schematic diagram of another communication device 3000 provided in an embodiment of the present application.
- the communication device 3000 may be the aforementioned first device or second device, or a chip or chip system for the aforementioned first device or second device.
- the communication device 3000 may be the aforementioned terminal device or network device, or a chip or chip system for the aforementioned terminal device or network device.
- the chip system may be composed of a chip, or may include a chip and other discrete devices.
- the communication device 3000 can be used to implement the functions of any device (e.g., the first device or the second device) in the communication system described in the above examples.
- the communication device 3000 may include at least one processing circuit 3010.
- the processing circuit 3010 is coupled to a memory, and the memory may be located within the device, or the memory may be integrated with the processor, or the memory may be located outside the device.
- the communication device 3000 may also include at least one memory 3020.
- the memory 3020 stores the necessary computer programs, computer programs or instructions and/or data for implementing any of the above examples; the processing circuit 3010 may execute the computer program stored in the memory 3020 to complete the method in any of the above examples.
- the communication device 3000 may also include a transceiver circuit 3030, and the communication device 3000 can exchange information with other devices through the transceiver circuit 3030.
- the transceiver circuit 3030 can be a transceiver, a circuit, a bus, a module, a pin, or other types of transceiver circuits.
- the transceiver circuit 3030 in the device 3000 can also be an input-output circuit, or an interface circuit, which can input information (or receive information) and output information (or send information).
- the transceiver circuit 3030 can be a transmitter, a receiver, a transceiver, or a communication interface, which is not limited here.
- the processing circuit 3010 may be one or more processors, or all or part of the processing circuits in one or more processors.
- the processing circuit 3010 is an integrated processor, microprocessor, integrated circuit or logic circuit, etc.
- the processor can determine output information based on input information.
- Coupling in this application refers to an indirect coupling or communication connection between devices, units, or modules, which can be electrical, mechanical, or other forms, and is used for information exchange between devices, units, or modules.
- the processing circuit 3010 may operate in conjunction with the memory 3020 and the transceiver circuit 3030.
- the specific connection medium between the processing circuit 3010, memory 3020, and transceiver circuit 3030 is not limited in this application.
- the processing circuit 3010, the memory 3020, and the transceiver circuit 3030 are interconnected via a bus 3040.
- the bus may include an address bus, a data bus, a control bus, or other types of buses.
- FIG12 shows one bus 3040, but this does not mean that there is only one bus or only one type of bus.
- processors mentioned in the embodiments of the present application may be the following devices or the circuitry of the following devices used for processing functions: a central processing unit (CPU), other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or any conventional processor.
- the memory mentioned in the embodiments of the present application can be a volatile memory and/or a non-volatile memory.
- the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
- the volatile memory can be a random access memory (RAM).
- RAM can be used as an external cache.
- RAM includes the following forms: static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced synchronous dynamic random access memory (ESDRAM), synchronous link dynamic random access memory (SLDRAM), and direct rambus RAM (DR RAM).
- SRAM static RAM
- DRAM dynamic RAM
- SDRAM synchronous DRAM
- DDR SDRAM double data rate synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SLDRAM synchronous link dynamic random access memory
- DR RAM direct rambus RAM
- the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, the memory (storage module) can be integrated into the processor.
- memory described herein is intended to include, but is not limited to, these and any other suitable types of memory.
- the method described in the above embodiment can be executed by the terminal device and the network device, or can be executed by the chip, chip system or circuit of the terminal device and the network device, and the chip, chip system or circuit can be installed in the terminal device and the network device.
- An embodiment of the present application provides a computer-readable storage medium on which computer instructions for implementing the methods executed by a device (such as a terminal device or a network device) in the above-mentioned method embodiments are stored.
- the computer program when executed by a computer, the computer can implement the methods performed by a device (such as a terminal device, or a network device, etc.) in each embodiment of the above method.
- a device such as a terminal device, or a network device, etc.
- An embodiment of the present application provides a computer program product comprising instructions, which, when executed by a computer, implement the methods performed by a device (such as a terminal device, or a network device (or a positioning device), etc.) in the above-mentioned method embodiments.
- a device such as a terminal device, or a network device (or a positioning device), etc.
- An embodiment of the present application provides a communication system, which includes the terminal device and/or network device described in each of the above embodiments.
- the system includes the terminal device and/or network device described in the above embodiments.
- the system includes the terminal device and/or network device described in the above embodiments.
- At least one means one or more, and “more” means two or more.
- “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
- a and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone, where A and B can be singular or plural.
- the character "/” generally indicates that the previous and next associated objects are in an “or” relationship.
- “At least one of the following items” or similar expressions refers to any combination of these items, including any combination of single items or plural items.
- At least one of a, b and c can mean: a, or b, or c, or a and b, or a and c, or b and c, or a, b and c.
- a, b and c can be single or multiple, respectively.
- used for indication can include direct indication and indirect indication.
- direct indication of information A means including information A; implicit indication of information A means indicating information A through the correspondence between information A and information B and the direct indication of information B.
- the correspondence between information A and information B can be predefined, pre-stored, pre-burned, or pre-configured.
- information C is used to determine information D, which includes both information D being determined solely based on information C and information D being determined based on information C and other information. Furthermore, information C can also be used to determine information D indirectly, for example, where information D is determined based on information E, and information E is determined based on information C.
- network element A sends information A to network element B can be understood as the destination end of the information A or the intermediate network element in the transmission path between the destination end and the network element B, which may include directly or indirectly sending information to network element B.
- Network element B receives information A from network element A can be understood as the source end of the information A or the intermediate network element in the transmission path between the source end and the network element A, which may include directly or indirectly receiving information from network element A.
- the information may be processed as necessary between the source end and the destination end of the information transmission, such as format changes, but the destination end can understand the valid information from the source end. Similar expressions in this application can be understood similarly and will not be elaborated here.
- network element A sends information A to network element B
- network element B which can be understood as the destination end of the information A or the intermediate network element in the transmission path between the destination end and the network element B, and can include directly or indirectly sending information to network element B
- network element B receives information A from network element A
- the information may be processed as necessary between the source end and the destination end of the information transmission, such as format changes, etc., but the destination end can understand the valid information from the source end. Similar expressions in this application can be understood similarly and will not be repeated here.
- the AI model is mainly used as an example for illustrative description. It can be understood that the above AI model can also be used for other purposes.
- the methods and operations implemented by the terminal device or positioning device can also be implemented by components (such as chips or circuits) that can be implemented by the terminal device or positioning device, without limitation.
- the disclosed systems, devices and methods can be implemented in other ways.
- the device embodiments described above are merely schematic.
- the division of the units is merely a logical function division.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separate, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed across multiple network units. Some or all of these units may be selected to achieve the purpose of this embodiment according to actual needs.
- each functional unit in each embodiment of the present application can be integrated into one processing unit, or each unit can be a separate object. It can exist logically, or two or more units can be integrated into one unit.
- the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
- the computer software product is stored in a storage medium and includes several instructions for enabling a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, and other media that can store program codes.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Neurology (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
L'invention concerne un procédé d'entraînement de réseau et un appareil de communication, ceux-ci étant appliqués à un scénario dans lequel l'IA est combinée à un réseau sans fil. Le procédé consiste à acquérir une première largeur de bit d'un premier paramètre et à envoyer des premières informations, les premières informations étant utilisées pour indiquer la première largeur de bit. La première largeur de bit appartient à un premier ensemble de largeurs de bit, et le premier paramètre appartient à un premier ensemble de paramètres ; il y a des correspondances entre une pluralité de largeurs de bit dans le premier ensemble de largeurs de bit et une pluralité de paramètres dans le premier ensemble de paramètres ; la pluralité de paramètres sont utilisés pour entraîner un premier modèle ; et chaque paramètre comprend un ou plusieurs des éléments suivants : un paramètre de modèle et/ou un paramètre de gradient. Le procédé peut permettre d'économiser les ressources tout en garantissant les performances d'entraînement de réseau.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410149482.0A CN120409588A (zh) | 2024-02-01 | 2024-02-01 | 一种用于网络训练的方法和通信装置 |
| CN202410149482.0 | 2024-02-01 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025161598A1 true WO2025161598A1 (fr) | 2025-08-07 |
Family
ID=96519835
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2024/131251 Pending WO2025161598A1 (fr) | 2024-02-01 | 2024-11-11 | Procédé d'entraînement de réseau et appareil de communication |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN120409588A (fr) |
| WO (1) | WO2025161598A1 (fr) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108021981A (zh) * | 2016-10-31 | 2018-05-11 | 北京中科寒武纪科技有限公司 | 一种神经网络训练方法及装置 |
| CN108596328A (zh) * | 2018-04-26 | 2018-09-28 | 北京市商汤科技开发有限公司 | 一种定点化方法及装置、计算机设备 |
| CN114611705A (zh) * | 2020-11-23 | 2022-06-10 | 华为技术有限公司 | 数据处理方法、机器学习的训练方法及相关装置、设备 |
| US20220245459A1 (en) * | 2021-02-02 | 2022-08-04 | Samsung Electronics Co., Ltd. | Method, system and apparatus for federated learning |
| CN115034396A (zh) * | 2021-03-08 | 2022-09-09 | Oppo广东移动通信有限公司 | 模型训练方法、装置、存储介质及电子设备 |
-
2024
- 2024-02-01 CN CN202410149482.0A patent/CN120409588A/zh active Pending
- 2024-11-11 WO PCT/CN2024/131251 patent/WO2025161598A1/fr active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108021981A (zh) * | 2016-10-31 | 2018-05-11 | 北京中科寒武纪科技有限公司 | 一种神经网络训练方法及装置 |
| CN108596328A (zh) * | 2018-04-26 | 2018-09-28 | 北京市商汤科技开发有限公司 | 一种定点化方法及装置、计算机设备 |
| CN114611705A (zh) * | 2020-11-23 | 2022-06-10 | 华为技术有限公司 | 数据处理方法、机器学习的训练方法及相关装置、设备 |
| US20220245459A1 (en) * | 2021-02-02 | 2022-08-04 | Samsung Electronics Co., Ltd. | Method, system and apparatus for federated learning |
| CN115034396A (zh) * | 2021-03-08 | 2022-09-09 | Oppo广东移动通信有限公司 | 模型训练方法、装置、存储介质及电子设备 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120409588A (zh) | 2025-08-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116419257A (zh) | 一种通信方法及装置 | |
| US20250226946A1 (en) | Training dataset obtaining method and apparatus | |
| CN117693021A (zh) | 一种波束管理方法 | |
| EP4255011A1 (fr) | Procédé et appareil de traitement d'informations | |
| CN117851819A (zh) | 一种获取训练数据集的方法和装置 | |
| WO2025140055A1 (fr) | Procédé, appareil et système d'entraînement distribué, et module de puce et support de stockage | |
| WO2025161598A1 (fr) | Procédé d'entraînement de réseau et appareil de communication | |
| CN120786656A (zh) | 信息传输方法、装置及系统 | |
| CN120786657A (zh) | 信息传输方法、装置及系统 | |
| US20250156759A1 (en) | Low complexity ml model training over multiple gnbs | |
| WO2025140663A1 (fr) | Procédé, appareil et système d'acquisition de données d'un modèle | |
| CN120934711A (zh) | 一种通信方法和通信装置 | |
| WO2025185425A1 (fr) | Modèle sans fil, procédé et dispositif de traitement d'informations, et système | |
| WO2025140003A1 (fr) | Procédé de communication et appareil de communication | |
| CN120474649A (zh) | 通信的方法和通信装置 | |
| WO2025067480A1 (fr) | Procédé, appareil et système de communication | |
| CN117639867A (zh) | 一种通信方法及装置 | |
| WO2025039193A1 (fr) | Procédé d'entraînement de réseau neuronal et appareil de communication | |
| WO2024169600A1 (fr) | Procédé et appareil de communication | |
| CN120835309A (zh) | 通信方法、装置、芯片模组、存储介质及程序产品 | |
| CN120454900A (zh) | 通信方法和通信装置 | |
| CN121077596A (zh) | 通信的方法和通信装置 | |
| CN119922177A (zh) | 一种传输人工智能模型的方法和通信装置 | |
| CN119721157A (zh) | 人工智能模型的训练方法与通信装置 | |
| WO2025113198A1 (fr) | Procédé de transmission d'informations et appareil de communication |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24921620 Country of ref document: EP Kind code of ref document: A1 |