WO2021114968A1 - 一种调度方法及装置 - Google Patents
一种调度方法及装置 Download PDFInfo
- Publication number
- WO2021114968A1 WO2021114968A1 PCT/CN2020/126756 CN2020126756W WO2021114968A1 WO 2021114968 A1 WO2021114968 A1 WO 2021114968A1 CN 2020126756 W CN2020126756 W CN 2020126756W WO 2021114968 A1 WO2021114968 A1 WO 2021114968A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- scheduled
- state information
- information set
- neural network
- terminal device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/12—Wireless traffic scheduling
- H04W72/121—Wireless traffic scheduling for groups of terminals or users
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W72/00—Local resource management
- H04W72/50—Allocation or scheduling criteria for wireless resources
- H04W72/52—Allocation or scheduling criteria for wireless resources based on load
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
Definitions
- This application relates to the field of communication technology, and in particular to a scheduling method and device.
- Wireless resource scheduling plays a vital role in cellular networks, and its essence is to allocate resources such as available wireless spectrum according to the current quality of service (quality of service, QoS) requirements of each user equipment.
- quality of service quality of service
- media access control (MAC) layer scheduling mainly solves the problems of time-frequency resource allocation, modulation and coding scheme (modulation and coding scheme, MCS) selection, user pairing, and precoding. Achieve a compromise between system throughput and fairness.
- a deep reinforcement learning (deep reinforcement learning, DRL) algorithm is usually combined to obtain a better scheduling strategy.
- DRL deep reinforcement learning
- a deep neural network is usually used.
- the input neuron scale and output neuron scale of the deep neural network are determined by the state and action space of the system, and the state and action space of the system are determined by the system to be scheduled.
- the number of user equipment is related, so the deep neural network will change with the number of user equipment to be scheduled.
- This application provides a scheduling method and device for decoupling the scheduling problem of a deep neural network from the number of user equipments, so as to solve the problem of difficulty in self-adapting the number of user equipments in a deep reinforcement learning scheduling algorithm.
- this application provides a scheduling method, which can be applied to a network device, and can also be applied to a chip or a chip set in the network device.
- the method includes: processing the first state information sets of the K terminal devices to be scheduled to obtain the second state information sets of the K terminal devices to be scheduled; and inputting the second state information sets of each terminal device to be scheduled into the first A neural network model to determine the scheduled weight of each terminal device to be scheduled to obtain K scheduled weights; determine the scheduling result according to the K scheduled weights, and the scheduling result indicates the scheduled terminal device; wherein , K is an integer greater than or equal to 1; wherein, the second state information set of any terminal device to be dispatched includes the state information of any terminal device to be dispatched and the terminal device to be dispatched and other terminals to be dispatched State correlation data between devices; the dimension of the second state information set of any one of the K to be dispatched terminal devices to be dispatched is H, and H is an integer greater than or equal to 1; the first neural network model Determine
- the state information of all terminal devices to be dispatched is processed, and then the processed state information of each terminal device to be dispatched is input into the same neural network model to obtain the result. That is, in the scheduling process, the neural network model is shared by all terminal devices to be scheduled, and the neural network model can be applied to all terminal devices to be scheduled, so that the neural network model and the number of terminal devices to be scheduled can be solved during scheduling. Coupled, the neural network model can be applied to scenarios with different numbers of terminal devices to be dispatched, with good adaptability and scalability.
- the first state information set of any terminal device to be scheduled includes at least one of the following state information: the instantaneous estimated throughput of the terminal device, the average throughput of the terminal device, the buffer size of the terminal device, and the terminal device The package waiting time.
- the scheduling result is determined according to the K scheduled weights
- the specific method may include: taking the identifier of the terminal device corresponding to the largest scheduled weight among the K scheduled weights as the scheduling result; Or, use the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights as the scheduling result; or, process the scheduled weight of one of the K scheduled weights into the first value, The remaining K-1 scheduled weights are processed into second values, and the sequence consisting of the processed first value and K-1 second values is the scheduling result, wherein the first value in the scheduling result corresponds to The terminal equipment is the scheduled terminal equipment.
- the scheduling result can be flexibly determined, so as to accurately indicate the scheduled terminal equipment.
- the third state information set is input into the second neural network model to obtain the value corresponding to the third state information set, and the value is used to update the first neural network model and the second neural network model.
- the model parameters of the first neural network model and the second neural network model can be continuously trained and updated, and the first neural network model and the second neural network model can be made more accurate, thereby making the scheduling more accurate.
- it is not affected by the number of terminal devices to be scheduled, so that the updated model parameters of the first neural network model and the model parameters of the second neural network are independent of the number of terminal devices to be scheduled, so that all The first neural network model and the second neural network model are applied to a wider range of scenarios.
- the third state information set is obtained by processing the second state information sets of the K to-be-scheduled terminal devices.
- the second state information set of the K to-be-scheduled terminal devices is processed to obtain the third state information set
- the specific method may be: the second state information set for the K to-be-scheduled terminal devices Take the average value of each item of state information in the state information set to obtain the third state information set; or, select the maximum value of each item of state information in the second state information set of the K to-be-scheduled terminal devices , Forming the third state information set; or, selecting the minimum value of each item of state information in the second state information sets of the K to-be-scheduled terminal devices to form the third state information set.
- the third state information set can be flexibly obtained, so that the value of the updated model parameter can be accurately obtained through the third state information set.
- any one of the first neural network model and the second neural network model is determined based on the H, and may include: the first neural network model and the second neural network model
- the dimension of the input of any one of the neural network models is related to the H.
- this application provides a scheduling method, which can be applied to a network device, and can also be applied to a chip or a chip set in the network device.
- the method includes: obtaining a fifth state information set based on the fourth state information set of K to-be-scheduled terminal devices and the state information set of the system, and inputting the fifth state information set into a third neural network model to determine a weight set Determine the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled based on the weight set to obtain K scheduled weights; determine the scheduling result according to the K scheduled weights, the scheduling result Indicate the terminal device to be scheduled; where K is an integer greater than or equal to 1; the dimension of the fourth state information set of any terminal device to be scheduled in the K terminal devices to be scheduled is L, and L is greater than or equal to 1.
- the fourth set of state information of any terminal device to be scheduled includes the state information of any terminal device to be scheduled and the state associated data between any terminal device to be scheduled and other terminal devices to be scheduled; so
- the number of weights included in the weight set is the same as the L; the third neural network model is determined based on the L.
- a state information set obtained based on the state information set of all terminal devices to be dispatched and the state information set of the system is input into a neural network model to obtain the result. That is, in the scheduling process, the neural network model is shared by all terminal devices to be scheduled, and the neural network model can be applied to all terminal devices to be scheduled, so that the neural network model and the number of terminal devices to be scheduled can be solved during scheduling. Coupled, the neural network model can be applied to scenarios with different numbers of terminal devices to be dispatched, with good adaptability and scalability.
- the state information set of the system includes at least one of the following state information: average throughput of the system, system fairness, and system packet loss rate.
- a fifth state information set is obtained based on the fourth state information set of the K terminal devices to be scheduled and the state information set of the system.
- the specific method may be: Four state information sets are processed to obtain a sixth state information set; the sixth state information set and the state information set of the system are combined into the fifth state information set.
- the fifth state information set can be accurately obtained by the above method, so that the scheduling process can be decoupled from the number of terminal devices to be scheduled, and then scheduling that is independent of the number of terminal devices to be scheduled can be realized.
- the fourth state information set of the K terminal devices to be scheduled is processed to obtain a sixth state information set.
- the specific method may be: for the fourth state information set of the K terminal devices to be scheduled Take the average value of each item of state information in the state information set to obtain the sixth state information set; or select the maximum value of each item of state information in the fourth state information set of the K to-be-scheduled terminal devices , Forming the sixth state information set; or, selecting the minimum value of each item of state information in the fourth state information sets of the K to-be-scheduled terminal devices to form the sixth state information set.
- the sixth state information set can be flexibly obtained, so that the fifth state information set can be accurately obtained subsequently.
- the seventh state information set is input into the fourth neural network model to obtain the value corresponding to the seventh state information set, and the value is used to update the third neural network model and the fourth neural network model.
- the model parameters of the third neural network model and the fourth neural network model can be continuously trained and updated, and the third neural network model and the fourth neural network model can be made more accurate, thereby making the scheduling more accurate.
- it is not affected by the number of terminal devices to be scheduled, so that the model parameters of the updated third neural network model and the model parameters of the fourth neural network can be independent of the number of terminal devices to be scheduled, so that all The first neural network model and the second neural network model are applied to a wider range of scenarios.
- the seventh state information set is the same as the fifth state information set.
- any one of the third neural network model and the fourth neural network model is determined based on the L, and may include: the third neural network model and the fourth neural network model.
- the dimension of the input of any one of the neural network models is related to the L.
- the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled is determined based on the weight set.
- the value of each state information in the fourth state information set of each terminal device to be scheduled in the terminal device to be scheduled is weighted and summed to obtain the scheduled weight of each terminal device to be scheduled.
- the weight in the weight set can represent the weight of each state information, so that the scheduled weight of each terminal device to be scheduled can be accurately determined according to the weight of each state information, and then the scheduling result can be accurately determined.
- the scheduling result is determined according to the K scheduled weights.
- the specific method may be: taking the identifier of the terminal device corresponding to the largest scheduled weight among the K scheduled weights as the scheduling result; Or, use the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights as the scheduling result; or, process the scheduled weight of one of the K scheduled weights into the first value, The remaining K-1 scheduled weights are processed into second values, and the sequence consisting of the processed first value and K-1 second values is the scheduling result, wherein the first value in the scheduling result corresponds to The terminal equipment is the scheduled terminal equipment.
- the scheduling result can be flexibly determined, so as to accurately indicate the scheduled terminal equipment.
- this application provides a scheduling device, which may be a network device, or a chip or chipset in the network device.
- the scheduling device may include a processing unit, and may also include a communication unit.
- the processing unit may be a processor, and the communication unit may be a transceiver;
- the scheduling device may also include a storage unit, and the storage unit may be a memory; the storage unit is used to store instructions, the The processing unit executes the instructions stored in the storage unit, so that the network device executes the corresponding function in the first aspect or the second aspect described above.
- the processing unit may be a processor, and the communication unit may be an input/output interface, a pin or a circuit, etc.; the processing unit executes instructions stored in the storage unit, This allows the network device to perform the corresponding function in the first aspect or the second aspect described above.
- the storage unit can be a storage module (for example, register, cache, etc.) in the chip or chipset, or a storage unit (for example, read only memory, random access memory, etc.) located outside the chip or chipset in the network device. Fetch memory, etc.).
- the processing unit may be further divided into a first processing unit and a second processing unit. Specifically, the first processing unit implements the functions related to state processing in the first aspect and the second aspect, respectively, and the second The processing unit respectively implements the scheduling process in the first aspect or the second aspect described above.
- a scheduling device which includes a processor, a communication interface, and a memory.
- the communication interface is used to transmit information, and/or messages, and/or data between the scheduling device and other devices.
- the memory is used to store computer-executable instructions.
- the processor executes the computer-executed instructions stored in the memory, so that the scheduling device executes the design described in any one of the first aspect or the second aspect. Scheduling method.
- an embodiment of the present application provides a scheduling device.
- the scheduling device includes a processor.
- the processor executes a computer program or instruction in a memory, the method described in the first or second aspect is carried out.
- an embodiment of the present application provides a scheduling device.
- the scheduling device includes a processor and a memory.
- the memory is used to store a computer to execute a computer program or instruction; the processor is used to execute a computer stored in the memory.
- the computer program or instruction is executed to make the communication device execute the corresponding method as shown in the first aspect or the second aspect.
- an embodiment of the present application provides a scheduling device.
- the scheduling device includes a processor, a memory, and a transceiver.
- the transceiver is used to receive signals or send signals; and the memory is used to store program codes or Instructions; the processor is used to call the program code or instructions from the memory to execute the method described in the first or second aspect above.
- an embodiment of the present application provides a scheduling device, the scheduling device includes a processor and an interface circuit, the interface circuit is configured to receive computer program codes or instructions and transmit them to the processor; the processor The computer program code or instruction is executed to execute the corresponding method as shown in the above-mentioned first aspect or second aspect.
- an embodiment of the present application provides a communication system, which may include the aforementioned terminal equipment to be scheduled and a scheduling device.
- a computer-readable storage medium provided by an embodiment of the present application, the computer-readable storage medium stores program instructions, and when the program instructions run on a network device, the network device executes the first aspect of the embodiment of the present application And any possible design, or the second aspect or any possible design of the second aspect.
- the computer-readable storage medium may be any available medium that can be accessed by a computer.
- computer-readable media may include non-transitory computer-readable media, random-access memory (RAM), read-only memory (ROM), and electrically erasable In addition to programmable read-only memory (electrically EPROM, EEPROM), CD-ROM or other optical disk storage, magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can Any other medium accessed by the computer.
- RAM random-access memory
- ROM read-only memory
- EEPROM electrically erasable
- CD-ROM or other optical disk storage magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures and can Any other medium accessed by the computer.
- an embodiment of the present application provides a computer program product including computer program code or instructions, which when the computer program code or instructions are executed, enable the method described in the first aspect or the second aspect to be implemented .
- the computer program product may be a computer program product including a non-transitory computer readable medium.
- Figure 1 is a schematic diagram of a fully connected neural network provided by this application.
- FIG. 2 is an architecture diagram of a communication system provided by this application.
- FIG. 3 is a block diagram of a scheduling method provided by this application.
- FIG. 4 is a flowchart of a scheduling method provided by this application.
- FIG. 5 is a block diagram of another scheduling method provided by this application.
- FIG. 6 is a flowchart of another scheduling method provided by this application.
- FIG. 7 is a block diagram of another scheduling method provided by this application.
- Figure 8 is a performance analysis diagram provided by this application.
- Figure 9 is another performance analysis diagram provided by this application.
- Figure 10 is another performance analysis diagram provided by this application.
- FIG. 11 is another performance analysis diagram provided by this application.
- FIG. 12 is a schematic structural diagram of a scheduling device provided by this application.
- FIG. 13 is a structural diagram of a scheduling device provided by this application.
- FIG. 14 is a schematic structural diagram of a network device provided by this application.
- the embodiments of the present application provide a scheduling method and device, which are used to decouple the scheduling problem of a deep neural network from the number of user equipments, so as to solve the problem of difficulty in adapting the number of user equipments of the deep reinforcement learning scheduling algorithm.
- the method and device described in the present application are based on the same inventive concept, and because the method and the device have similar principles for solving the problem, the implementation of the device and the method can be referred to each other, and the repetition will not be repeated.
- Reinforcement learning means that the agent (Agent) learns by interacting with the environment.
- the agent takes actions on the environment according to the feedback state of the environment, thereby obtaining rewards and the state of the next moment. The goal is to make The agent accumulates the greatest reward in a period of time.
- the reinforcement signal provided by the environment makes an evaluation of the quality of the generated action (usually a scalar signal). In this way, the agent obtains knowledge in the action-evaluation environment and improves the action plan to adapt to the environment.
- Common reinforcement learning algorithms include Q-learning, policy gradient, actor-critic and so on.
- the commonly used reinforcement learning algorithm is usually deep reinforcement learning, which mainly combines reinforcement learning and deep learning, and uses neural networks to model strategies/value functions to adapt to larger input/output dimensions. It is called deep reinforcement learning ( DRL).
- a fully connected neural network is also called a multi-layer perceptron (MLP).
- An MLP includes an input layer (left), an output layer (right), and multiple hidden layers (middle). Each layer contains several nodes, called neurons. Among them, the neurons of two adjacent layers are connected in pairs.
- Figure 1 shows an exemplary fully connected neural network.
- Figure 1 shows an example of two hidden layers, where x is the input, y is the output, w is the weight matrix, and b is the bias vector.
- MAC layer scheduling mainly solves the problems of time-frequency resource allocation, MCS selection, user pairing, precoding, etc., and achieves a compromise between system throughput and fairness through MAC scheduling.
- At least one item (items) refers to one item (items) or multiple items (items), and multiple item (items) refers to two items (items) or more than two items (items).
- the scheduling method provided by the embodiments of this application can be applied to various communication systems, for example, satellite communication systems, internet of things (IoT), narrowband internet of things (NB-IoT) systems, and global Mobile communication system (global system for mobile communications, GSM), enhanced data rate for GSM evolution (EDGE), wideband code division multiple access (WCDMA), multiple code division Address 2000 system (code division multiple access, CDMA2000), time division-synchronization code division multiple access system (time division-synchronization code division multiple access, TD-SCDMA), long term evolution system (long term evolution, LTE), fifth generation (5G) ) Communication systems, such as 5G new radio (NR), and the three major application scenarios of the next-generation 5G mobile communication system.
- GSM global system for mobile communications
- EDGE enhanced data rate for GSM evolution
- WCDMA wideband code division multiple access
- CDMA2000 code division Address 2000 system
- time division-synchronization code division multiple access system time division-synchronization code division multiple access
- TD-SCDMA time division-synchronization
- Enhanced mobile broadband eMBB
- ultra-reliable and low-latency communication ultra-reliable Latency communications
- mMTC massive machine type communications
- FIG. 2 shows the architecture of a possible communication system to which the scheduling method provided in an embodiment of the present application is applicable.
- the architecture of the communication system includes a network device and a terminal device (such as the terminal shown in FIG. 2). Device 1 and terminal device 2). among them:
- the terminal device is also called a mobile station (mobile station, MS), user equipment (user equipment, UE), or terminal (terminal).
- the terminal device may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices, or other processing devices connected to a wireless modem.
- the terminal device may also be a subscriber unit, a cellular phone, a smart phone, a wireless data card, a personal digital assistant (PDA) computer, a tablet computer, and a wireless modem. (modem), handheld device (handset), laptop computer (laptop computer), machine type communication (machine type communication, MTC) terminal, etc.
- PDA personal digital assistant
- MTC machine type communication
- the network device may be a device deployed in a wireless access network to provide a wireless communication function for terminal devices.
- the network device may be a base station (base station, BS).
- the base station may include various forms of macro base stations, micro base stations (also referred to as small stations), relay stations, and access points.
- the names of devices with base station functions may be different.
- the base station in an LTE system, the base station is called an evolved NodeB (evolved NodeB, eNB or eNodeB).
- node B node B
- gNB new radio controller
- the above-mentioned devices or devices that provide wireless communication functions for terminal devices are collectively referred to as network devices.
- the network device may perform MAC layer resource scheduling.
- a communication system is usually composed of cells, and each cell contains a base station, and the base station provides communication services to at least one terminal device.
- the base station includes a baseband unit (BBU) and a remote radio unit (RRU).
- BBU and RRU can be placed in different places, for example: RRU is remote, placed in a high-traffic area, BBU placed in the central computer room.
- the BBU and RRU can also be placed in the same computer room.
- the BBU and RRU may also be different components under one rack. This application does not limit this.
- the number of network devices and terminal devices in FIG. 1 is not a limitation on the communication system.
- the communication system may include more network devices, and may include fewer or more terminals. equipment. It should be understood that, in addition to the network device and the terminal device, the communication system may also include other devices, which are not listed here in this application.
- a DRL algorithm can be used. This type of algorithm uses the interaction between the agent in the DRL and the wireless transmission environment to continuously update its own parameters to obtain better decision-making strategies.
- the agent first obtains the current state of the communication system, and makes a decision based on this state. After the decision is executed, the communication system enters the next state and feeds back rewards.
- the agent adjusts its own decision-making parameters according to the rewards.
- the agent interacts with the environment iteratively, continuously adjusts its parameters to obtain greater rewards, and finally converges to obtain a better scheduling strategy.
- the strategy of the agent is usually implemented by a deep neural network.
- the input neuron scale and output neuron scale of the deep neural network are determined by the state and action space of the system, and the state and action space of the system are related to the terminal to be scheduled in the system.
- the number of devices is related, so the deep neural network will change with the number of terminal devices to be scheduled.
- Method 1 will cause performance loss when the number of terminal devices does not match, and because the state and action space is very large, the corresponding neural network scale is also large. In this case, the training of the neural network is more difficult. Convergence is also not guaranteed.
- Method 2 has poor flexibility in traversing all terminal devices and will bring greater storage complexity.
- this application proposes a new scheduling method and device that decouples the deep neural network from the number of terminal devices in the scheduling problem, so as to solve the problem of difficulty in adapting the number of terminal devices to the deep reinforcement learning scheduling algorithm.
- the scheduling method provided by this application can be implemented based on DRL.
- the states of multiple terminal devices (UE) to be scheduled can be processed first, and then the state of a single terminal device can pass through the same nerve.
- the output corresponding to the multiple terminal devices to be scheduled is obtained, and the obtained output is processed to obtain the scheduling result.
- the processing of the states of the plurality of terminal devices to be scheduled may be performed by a newly added state processing module (unit).
- the naming of the state processing module is only an example, and there may be other types. Different naming is not limited in this application.
- the output corresponding to the multiple terminal devices to be scheduled may be the score of each terminal device to be scheduled, which is not limited in this application.
- FIG. 3 shows a block diagram of a scheduling method.
- the scheduling method provided in the embodiments of the present application can be applied to a network device, and can also be applied to a chip or chipset in a network device.
- the scheduling method provided in the present application will be described in detail below by taking the application to a network device as an example.
- the embodiment of the present application provides a scheduling method, which is suitable for the communication system as shown in FIG. 2.
- the scheduling method in this embodiment can be applied to the actor-critic algorithm (discrete action) in reinforcement learning.
- the specific process of the method may include:
- Step 401 The network device processes the first state information sets of the K terminal devices to be scheduled to obtain the second state information sets of the K terminal devices to be scheduled, where K is an integer greater than or equal to 1; wherein, any one of them is to be scheduled
- the second state information set of the terminal device includes the state information of any one of the terminal devices to be scheduled and the state association data between the any terminal device to be scheduled and other terminal devices to be scheduled; the K terminal devices to be scheduled
- the dimension of the second state information set of any terminal device to be scheduled is H, and H is an integer greater than or equal to 1.
- the network device may perform step 401 through a first processing module (which may be referred to as a state processing module).
- a first processing module which may be referred to as a state processing module.
- the state dimension of all terminal devices to be scheduled can be considered as K*F, where F Is an integer greater than or equal to 1; after the network device processes the first state information set of the K terminal devices to be scheduled through the first processing module, the state dimensions of all terminal devices to be scheduled become K*H, here can be expressed as K*1*H.
- the state dimension (that is, the dimension of the second state information set) of each terminal device to be scheduled is H, which can be represented by 1*H or 2*(H/2) and so on.
- the first state information set of each terminal device to be scheduled may be understood as a characteristic of each terminal device to be scheduled.
- the above process can refer to the block diagram of the scheduling method shown in FIG. 5, the K*1*H is obtained from the K*F dimension UE (that is, the terminal device to be scheduled) through the state processing module (first processing module). The relevant block diagrams of the K UE states.
- the H is greater than or equal to the F, and the H is set based on the F.
- the H is also the number of state information included in each second state information set.
- the above-mentioned first processing module is to explicitly extract the correlation between the characteristics (state information) of the terminal devices to be scheduled as a component of the state information collection of the terminal devices to be scheduled , And the state information of the terminal device to be scheduled is not lost during the processing, that is, it can be considered that the second state information set of the terminal device to be scheduled after processing includes the user's own state information and other terminal devices to be scheduled.
- the impact of terminal equipment. That is, through the above processing, the second state information set of any terminal device to be dispatched can include the state information of any terminal device to be dispatched and the relationship between the terminal device to be dispatched and other terminal devices to be dispatched. State associated data.
- the other terminal devices to be dispatched herein may be part or all of the terminal devices to be dispatched among the K terminal devices to be dispatched except for any one of the terminal devices to be dispatched. This is not limited.
- the operations used in the first processing module may include embedding, inter-user normalization, attention, etc.
- embedding refers to mapping the input user characteristics (that is, the first state information set of K to-be-scheduled terminal devices) to another space through a shared neural network, and the dimensionality can be transformed, such as mapping from the F-dimensional space To F'-dimensional space.
- the F' is the H.
- the main function of the attention mechanism is to extract the correlation between user characteristics (that is, the status information of the terminal equipment to be dispatched).
- the principle is to transform all user characteristics into Q, M, and V through three shared neural networks.
- Matrix the dimension is M*dm
- the operation of attention can be expressed as Wherein, the dm is the H.
- the first state information set of any terminal device to be scheduled may include at least one of the following state information: the instantaneous estimated throughput of the terminal device, the average throughput of the terminal device, and the terminal device's average throughput. Cache size, packet waiting time of terminal equipment, etc.
- Step 402 The network device separately inputs the second state information set of each terminal device to be scheduled into the first neural network model, determines the scheduled weight of each terminal device to be scheduled, and obtains K scheduled weights;
- the first neural network model is determined based on the H.
- the process in step 402 can be understood as a reasoning process.
- the first neural network model can make a decision only by focusing on the second state information set of a single terminal device to be scheduled, thereby realizing the first
- the neural network model has nothing to do with the number K of terminal equipment to be dispatched.
- the second state information sets of the K to-be-scheduled terminal devices shown in FIG. 5 are input into the first neural network through the first neural network model, that is, K 1*H second state information sets. After the model, K*1 outputs are obtained, and K are scheduled weights.
- the scheduled weight of each terminal device to be scheduled may be the scheduled probability, score, etc. of each terminal device to be scheduled.
- the first neural network model is determined based on the H can be understood as the dimension of the input of the first neural network model is related to the H, that is, the first neural network The number of neurons in the input layer of the model is related to the H.
- the dimension of the input of the first neural network model is equal to the H, that is, the number of neurons in the input layer of the first neural network model is H.
- the first neural network model is independent of the K.
- the first neural network model may be referred to as a strategy neural network.
- the first neural network model may be a fully connected neural network, wherein the activation function of the hidden layer of the first neural network model may be ReLU, and the activation function of the output layer may be softmax.
- Step 403 The network device determines a scheduling result according to the K scheduled weights, and the scheduling result indicates the scheduled terminal device.
- the network device may determine the scheduling result according to the K scheduled weights by performing the following three operations (actions), the following three operations are exemplified as the operations in FIG. 5:
- Operation a1 The network device uses the identifier of the terminal device corresponding to the largest scheduled weight among the K scheduled weights as the scheduling result.
- the operation a1 may be implemented by the operation argmax, and the obtained identification of the terminal device may be the sequence number of the terminal device among the K to-be-scheduled terminal devices.
- Operation a2 The network device uses the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights as the scheduling result.
- the operation a2 may be implemented by the operation argmin, and the same obtained identification of the terminal device may be the sequence number of the terminal device among the K to-be-scheduled terminal devices.
- Operation a3 The network device processes one of the K scheduled weights into a first value, processes the remaining K-1 scheduled weights into a second value, and combines the processed first value with The sequence composed of K-1 second values is the scheduling result, wherein the terminal device corresponding to the first value in the scheduling result is the scheduled terminal device.
- the scheduling result obtained by the operation a3 may be a one-hot code.
- the first value is 0 and the second value is 1, which is 111011...1 Or the first value is 1, and the second value is 0, that is, 000100...0.
- the first value is 0
- the second value is 1, which is 111011...1
- the first value is 1, and the second value is 0, that is, 000100...0.
- the neural network model training is performed before or during the execution of the above steps.
- the third state information set is input into the second neural network model to obtain the value corresponding to the third state information set (see FIG. 5), and the value is used to update the first state information set.
- the second neural network model may be called a value neural network
- the hidden layer of the second neural network model may be the same as the hidden layer of the first neural network model
- the output layer of the second neural network model The activation function of can be a linear activation function.
- the determination of the second neural network model based on the H can be understood as the dimension of the input of the second neural network model is related to the H, that is, the neural network of the input layer of the second neural network model
- the number of yuan is related to the H.
- the dimension of the input of the second neural network model is equal to the H, that is, the number of neurons in the input layer of the second neural network model is H.
- the value corresponding to the third state information set can be understood as a value, and when the value is used to update the model parameters of the first neural network model and the second neural network model, Specifically, the model parameters of the first neural network model and the second neural network model may be updated by combining the value and the reward obtained from the environment.
- the process of inputting a single set of third state information into the second neural network model to obtain a value in the above method has nothing to do with the number K of the terminal devices to be scheduled, and the second neural network model has nothing to do with the K.
- the training process of the neural network model is not related to the K, so that it is not affected by the number of terminal devices to be scheduled, and the training complexity is reduced.
- the third state information set is obtained by processing the second state information sets of the K to-be-scheduled terminal devices, such as the averaging operation on the leftmost side of the virtual box in FIG. 5.
- the third state information set may be obtained by processing part of the second state information set in the second state information sets of the K to-be-scheduled terminal devices.
- the third state information may also be obtained by processing the second state information sets of the K to-be-scheduled terminal devices and the state information sets of the terminal devices in other training samples.
- the third state information set may be obtained by processing state information sets of terminal devices in other training samples.
- the above-mentioned data obtained from the third state information set can be regarded as a set of data.
- the above-mentioned process can be performed multiple times, that is, multiple sets of data similar to the third state can be obtained.
- the state information set of the information set repeatedly executes the process of inputting a value into the second neural network model.
- the number of terminal devices involved in each group of data can be arbitrary, that is, it is decoupled from the number of terminal devices (irrelevant).
- the network device processes the second state information sets of the K to-be-scheduled terminal devices to obtain the third state information set, which may include the following three methods:
- Method b1 The network device takes an average value for each item of state information in the second state information sets of the K to-be-scheduled terminal devices to obtain the third state information set.
- the method b1 can be understood as an average operation (average, avg.) (a kind of dimensionality reduction operation), which is only shown as an average operation in FIG. 5.
- Method b2 The network device selects the maximum value of each item of state information in the second state information sets of the K to-be-scheduled terminal devices to form the third state information set.
- the method b2 can be understood as a maximum dimensionality reduction operation.
- Method b3 The network device selects the minimum value of each item of state information in the second state information sets of the K to-be-scheduled terminal devices to form the third state information set.
- the method b3 can be understood as a minimum dimensionality reduction operation.
- the input of the system (that is, the input that needs to be processed by the first processing module) is a 5*4 matrix (hereinafter referred to as the first state matrix), and each row of the first state matrix is All the characteristics of a terminal device to be dispatched (that is, all the state information in the first state information set of the dispatch terminal device).
- each terminal device to be scheduled After entering the first processing module, for example, the operation of normalization between users is performed, and each column of the first state matrix is normalized by the normalization method mentioned above, and the output is the 5*1*4th
- the embedding or attention mechanism is the same, but the output dimensions become 5*1*F' and 5*1*dm, F'and dm are preset parameters.
- each terminal device to be scheduled has a 1*4 state vector, and the state vector of each terminal device to be scheduled is passed through the first neural network model.
- the hidden layer is 2, and the number of hidden layer neurons is 128 fully connected neural network, the activation function of the hidden layer is ReLU, the output activation function of the neural network is softmax, and the final output dimension is 5*1, which respectively represent the probability of scheduling a certain terminal device to be scheduled.
- the second state matrix is averaged according to the first dimension, and the 1*4 average state obtained is then passed through the second neural network model to output a value.
- the hidden layer of the second neural network model and the The first neural network model is the same, and the activation function of the final output layer is a linear activation function.
- the state information of all terminal devices to be scheduled is processed, and then the processed state information of each terminal device to be scheduled is input into the same neural network model to obtain the result. That is, in the scheduling process, the neural network model is shared by all terminal devices to be scheduled, and the neural network model can be applied to all terminal devices to be scheduled, so that the neural network model and the number of terminal devices to be scheduled can be solved during scheduling. Coupled, the neural network model can be applied to scenarios with different numbers of terminal devices to be dispatched, with good adaptability and scalability.
- the K when the K is equal to 1, that is to say, when there is only one terminal device to be scheduled, the other terminal devices to be scheduled are also the terminal to be scheduled.
- the device itself In this case, the above method can also be used to perform, but the state information set of the terminal device to be scheduled is obtained, that is, the second state information set of the terminal device to be scheduled and the terminal device to be scheduled The first state information set is the same. Then directly input the second state information set of the terminal device to be scheduled into the first neural network model, and finally obtain a scheduling weight. That is to say, the above scheduling method is fully applicable when K is equal to 1. In this way, the foregoing scheduling method is more independent of the number of terminal devices to be scheduled, and the compatibility of scheduling is better.
- the process of processing the state information collection of the terminal device to be scheduled can also be omitted, and when the output has only one scheduled weight, the process of further determining the scheduling result of the scheduled weight is omitted.
- This application There is no restriction on this.
- the embodiment of the present application also provides another scheduling method, which is suitable for the communication system shown in FIG. 2.
- the scheduling method in this embodiment can be applied to the reinforcement learning actor-critic algorithm (continuous action).
- the specific process of the method may include:
- Step 601 The network device obtains a fifth state information set based on the fourth state information set of the K terminal devices to be scheduled and the state information set of the system, where K is an integer greater than or equal to 1; among the K terminal devices to be scheduled The dimension of the fourth state information set of any terminal device to be scheduled is L, and L is an integer greater than or equal to 1; the fourth state information set of any terminal device to be scheduled includes the state information of any terminal device to be scheduled And the state association data between any one of the terminal devices to be scheduled and other terminal devices to be scheduled.
- the state information set of the system may include at least one of the following state information: average throughput of the system, system fairness, and system packet loss rate.
- the network device may perform step 601 through a second processing module (which may be referred to as a state processing module).
- a second processing module which may be referred to as a state processing module.
- the specific method for the network device to obtain a fifth state information set based on the fourth state information set of the K terminal devices to be scheduled and the state information set of the system may be as follows: The fourth state information set of the device is processed to obtain a sixth state information set; the sixth state information set and the state information set of the system are combined into the fifth state information set.
- the state dimension of all to-be-scheduled terminal devices formed by the fourth state information sets of the K to-be-scheduled terminal devices is K*L, and then the dimension of the sixth state information set is 1*L;
- the state information set has a dimension of J (it can be understood as 1*J here), where J is an integer greater than or equal to 1, and then the fifth state information set obtained by the final combination may have a dimension of 1*G, where G is An integer greater than or equal to 2.
- G can be equal to J+L, that is, the above set combination can be understood as the addition of two sets.
- the foregoing process may be the flow of the second processing module in the block diagram of the scheduling method as shown in FIG. 7. Wherein, in FIG. 7, the fourth information state set of each terminal device to be scheduled is indicated by the UE state.
- the fourth state information set of the K to-be-scheduled terminal devices may be the same as the second state information of the K-to-be-scheduled terminal devices involved in the embodiment shown in FIG. 4 Set, the L is equal to the H at this time.
- the fourth state information set of the K terminal devices to be scheduled may also be obtained by the network device processing the first state information set of the K terminal devices to be scheduled, and the specific processing method can be referred to The embedding, inter-user normalization, attention and other operations involved in the embodiment shown in FIG. 4 are not repeated here in this application.
- the network device processes the fourth state information set of the K to-be-scheduled terminal devices to obtain a sixth state information set.
- the specific method may include the following three methods:
- Method c1 The network device takes an average value for each item of state information in the fourth state information set of the K to-be-scheduled terminal devices to obtain the sixth state information set.
- Method c2 The network device selects the maximum value of each item of state information in the fourth state information set of the K to-be-scheduled terminal devices to form the sixth state information set.
- Method c3 The network device selects the minimum value of each item of state information in the fourth state information set of the K to-be-scheduled terminal devices to form the sixth state information set.
- the obtained sixth state information set is the same as that involved in the embodiment shown in FIG. 4
- the third state information sets obtained by the network device processing the second state information sets of the K to-be-scheduled terminal devices are the same.
- the foregoing methods c1-c3 are respectively similar to the methods b1-b3 involved in FIG.
- Step 602 The network device inputs the fifth state information set into a third neural network model to determine a weight set; the third neural network model is determined based on the L; the number of weights contained in the weight set is equal to The L is the same.
- the process in step 602 can be understood as a reasoning process.
- the third neural network model can output the corresponding result only by paying attention to one state information set (here, the fifth state information set).
- the realization of the third neural network model has nothing to do with the number K of terminal devices to be dispatched. For example, as shown in FIG. 7, when the dimension of the fifth state information set is 1*G, after the fifth state information set is input into the third neural network model, the dimension of the weight set is obtained as 1*L, that is, the number of weights included in the weight set is L.
- the weight set may be regarded as the weight of each item of state information in the fourth state information base set of each terminal device to be scheduled in the K terminal devices to be scheduled. It can also be understood that the L weights respectively represent the weights of the scores of the L status information of each terminal device to be scheduled. Optionally, the L weights may be continuous values.
- the third neural network model is determined based on the L, which can be understood as the dimension of the input of the third neural network model is related to the L, that is, the third neural network model
- the number of neurons in the input layer of the network model is related to the L.
- the dimension of the input of the third neural network model is equal to the G, that is, the number of neurons in the input layer of the first neural network model is G.
- the third neural network model is independent of the K.
- Step 603 The network device determines the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled based on the weight set, and obtains K scheduled weights.
- the network device determines the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled based on the set of weights.
- the specific method may be: the network device is based on the weight.
- the value set respectively performs a weighted summation of the value of each state information in the fourth state information set of each terminal device to be scheduled in the K terminal devices to be scheduled to obtain the scheduled value of each terminal device to be scheduled.
- Weights For example, the above process can be understood as a matrix with a dimension of 1*L (weight set) and a matrix with a dimension of K*L (the fourth state information set of K to-be-scheduled terminal devices) to obtain K*1.
- Scheduling weights that is, K scheduled weights.
- K scheduled weights are shown as scores.
- the scheduled weight of each terminal device to be scheduled may also be the scheduled probability of each terminal device to be scheduled, and so on.
- the third neural network model may be referred to as a strategy neural network.
- the third neural network model may be a fully connected neural network, wherein the activation function of the hidden layer of the third neural network model may be ReLU, and the output of the output layer may be a multi-dimensional Gaussian distribution, for example, the output layer may be two One, corresponding to the output mean and variance, the corresponding activation functions can be tanh and softplus respectively, and the final output result of the third neural network model is obtained after sampling.
- Step 604 The network device determines a scheduling result according to the K scheduled weights, and the scheduling result indicates the scheduled terminal device.
- the network device may determine the scheduling result according to the K scheduled weights by executing the following three actions:
- Operation d1 The network device uses the identifier of the terminal device corresponding to the largest scheduled weight among the K scheduled weights as the scheduling result.
- Operation d2 The network device uses the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights as the scheduling result.
- Operation d3 The network device processes one of the K scheduled weights into a first value, processes the remaining K-1 scheduled weights into a second value, and adds the processed first value to the
- the sequence composed of K-1 second values is the scheduling result, wherein the terminal device corresponding to the first value in the scheduling result is the scheduled terminal device.
- the neural network model training is performed before or during the execution of the above steps.
- the seventh state information set is input into the fourth neural network model to obtain the value corresponding to the seventh state information set (see FIG. 7), and the value is used to update the first state information set.
- the fourth neural network model may be called a value neural network
- the hidden layer of the fourth neural network model may be the same as the hidden layer of the third neural network model
- the output layer of the fourth neural network model The activation function of can be a linear activation function.
- the determination of the fourth neural network model based on the L can be understood as the dimension of the input of the fourth neural network model is related to the L, that is, the neural network of the input layer of the fourth neural network model The number of yuan is related to the L.
- the dimension of the input of the fourth neural network model is equal to the G, that is, the number of neurons in the input layer of the fourth neural network model is G.
- the value corresponding to the seventh state information set can be understood as a value, and when the value is used to update the model parameters of the third neural network model and the fourth neural network model, Specifically, the model parameters of the third neural network model and the fourth neural network model may be updated by combining the value and the reward obtained from the environment.
- the process of inputting a single set of seventh state information into the fourth neural network model to obtain a value in the above method has nothing to do with the number K of the terminal devices to be scheduled, and the fourth neural network model has nothing to do with the K. , So as to realize that the training process of the neural network model is independent of the K, so that it is not affected by the number of terminal devices to be dispatched, and the training complexity is reduced.
- the seventh state information set may be the same as the fifth state information set, that is, the obtained fifth information state set may be directly used for training.
- the seventh state information set may be obtained based on processing the state information set of the terminal device and the state information set of the system in the fifth state information set and other training samples.
- the seventh state information may be obtained by processing based on the state information set of the terminal device in other training samples and the state information set of the system.
- the above-mentioned data obtained from the seventh state information set can be regarded as a set of data.
- the above-mentioned process can be performed multiple times, that is, multiple sets of data similar to the seventh state can be obtained.
- the state information set of the information set repeatedly executes the process of inputting a value into the fourth neural network model.
- the number of terminal devices involved in each group of data can be arbitrary, that is, it is decoupled from the number of terminal devices (irrelevant).
- the state information set of the system can include the current performance indicators of the system ⁇ average throughput, fairness, packet loss rate ⁇ , the dimension is 1*3. Then the dimension of the first state matrix composed of the fourth state information set of 5 terminal devices to be scheduled is 5*32, and the dimension of the second state matrix obtained by averaging the first state matrix is 1*32, then the system global state (That is, the fifth state information set mentioned above) has a dimension of 1*35.
- the third neural network model has two output layers. , Corresponding to the output mean and variance, the output layer activation functions are tanh and softplus, and the final output dimension is 2*32, representing the mean and variance of the weights of the state information of the terminal equipment to be dispatched. After sampling, 1*32 is obtained. Weight collection. The weight set obtained afterwards is respectively multiplied by the state sub-vector in the fourth state information set of each terminal device to be scheduled to obtain the score of each terminal device to be scheduled, and the terminal device to be scheduled is scheduled based on the score.
- the hidden layer of the fourth neural network model can be 2 layers, the number of hidden neurons can be 512, and the activation function of the final output layer can be a linear activation function.
- the global state of the system with a dimension of 1*35 can be input to the fourth neural network model to obtain a value.
- a state information set obtained based on the state information set of all terminal devices to be scheduled and the state information set of the system is input into a neural network model to obtain a result. That is, in the scheduling process, the neural network model is shared by all terminal devices to be scheduled, and the neural network model can be applied to all terminal devices to be scheduled, so that the neural network model and the number of terminal devices to be scheduled can be solved during scheduling. Coupled, the neural network model can be applied to scenarios with different numbers of terminal devices to be dispatched, with good adaptability and scalability.
- the other terminal devices to be scheduled are also the terminal to be scheduled.
- the device itself In this case, the above-mentioned method can also be used to execute, except that the value of the fourth state information set includes the state information of the terminal device to be scheduled, and the obtained fifth state information set is also only related to the state information to be scheduled.
- the state information of the dispatch terminal equipment is related to the state information collection of the system. After the weight set is finally obtained, only one scheduled weight of the terminal device to be scheduled is obtained. That is to say, the above scheduling method is fully applicable when K is equal to 1. In this way, the foregoing scheduling method is more independent of the number of terminal devices to be scheduled, and the compatibility of scheduling is better.
- the process of further determining the scheduling result of the scheduled weight can be omitted, which is not limited in this application.
- the network device in the scheduling method shown in FIG. 6, in step 601, may be based on the first information state set of the K terminal devices to be scheduled (that is, the implementation shown in FIG. 4 The first state information set involved in the example) and the state information set of the system obtain an eighth state information set.
- the method for obtaining the eighth state information set has the same principle as the method for obtaining the fifth state information set, and can refer to each other.
- the dimension of the obtained eighth state information set may be F+J.
- the number of weight sets obtained by inputting the eighth state information set into the third neural network model is the same as the F.
- step 603 the network device separately performs an evaluation of each state information in the eighth state information set of each of the K to-be-scheduled terminal devices based on the weight set. The value is weighted and summed to obtain the scheduled weight of each terminal device to be scheduled. Finally, step 604 is executed. It should be understood that the principles in the foregoing process are all the same as the principles of the scheduling method shown in FIG.
- the training process involved in the embodiment shown in FIG. 4 and whose training process in the embodiment shown in FIG. 6 is independent of the number of terminal devices to be dispatched increases the flexibility of training.
- the above-mentioned training process may be a prior training process or an offline training process.
- Exemplarily, the above two training processes can be summarized as the following offline (online) training process:
- Step 1 Initialize the strategy neural network ⁇ ⁇ and the value neural network V ⁇ , where ⁇ is the coefficient to be trained for the strategy neural network, and ⁇ is the coefficient to be trained for the value neural network.
- Step 2 the time t acquired the status of all of the terminal apparatus to be scheduled set information S t, t A is obtained in accordance with the operation of the neural network policy ⁇ ⁇ , and performing scheduling.
- Step 3 Obtain the state information set s t+1 of all the terminal devices to be scheduled at the next time (time t+1), and obtain the reward r t .
- Step 4 to save ⁇ s t, a t, r t, s t + 1 ⁇ as the training samples.
- Step 5 Repeat steps 2-4 to update the neural network after accumulating a batch of training samples (batch size).
- the update steps are as follows:
- i is the serial number of the training sample in this batch of data
- ⁇ is the discount factor
- V() represents the output of the value neural network, that is, the value.
- the updated gradient g of the strategy neural network parameter ⁇ and the value neural network parameter ⁇ can be expressed as: among them, Respectively represent the partial derivative of ⁇ and ⁇ , ⁇ v is a coefficient, weighing the proportion of strategy and value in the gradient update.
- network devices cannot obtain training samples or very few training samples are obtained.
- Traditional reinforcement learning scheduling cannot be trained in this scenario.
- network devices in other cells can share their training samples, so that the network device can complete the training.
- the proportional fair (PF) algorithm is used as the baseline, and the neural network is trained using the 5UE configuration.
- the solution provided in this application is used to verify the throughput, fairness, and fairness in the 5, 10, 20, and 50 UE cases, respectively.
- the packet loss rate performance is shown in Figure 8, Figure 9, Figure 10, and Figure 11. It can be seen from Figures 8-11 that the training strategy in the case of using 5UE can also be applied to 10UE, 20UE, and 50UE, and can maintain a stable performance gain. Therefore, it can be explained that the scheduling method provided in the present application can decouple the neural network from the number of user equipment in the scheduling problem, thereby solving the problem of the difficulty of adapting the number of user equipment in the deep reinforcement learning scheduling algorithm.
- the PF algorithm mentioned above can achieve a good compromise between throughput and fairness, so it is widely used.
- the following takes the PF algorithm as an example to introduce the scheduling algorithm based on the deterministic model and formula.
- the PF algorithm can select scheduled users according to the following formula:
- R & lt i (t) is the throughput of user i at time t is estimated, which is determined by factors channel conditions, and the like where the user buffer, and T i (t) is the cumulative throughput of user i at time t in the history.
- R i (t) / T i (t) is a balance between throughput and fairness metric: the current estimate of the throughput R i (t) increases, indicating that the user better channel conditions, and the buffer has sufficient
- the data needs to be sent, so the metric value is larger; at the same time, the larger the cumulative throughput T i (t), the more data the user has sent.
- the sending opportunity should be reduced, so the metric value is smaller .
- the embodiments of the present application also provide a scheduling device.
- the scheduling device may be a network device, or a device in a network device (for example, a chip or a chip system or a chip set or a chip used for execution related Part of the method function), or a device that can be used with network equipment.
- the scheduling device can be applied to the communication system shown in FIG. 2 to implement the scheduling method shown in FIG. 4 or FIG. 6.
- the scheduling device may include a module corresponding to the method/operation/step/action performed by the network device in the above method embodiment.
- the module may be a hardware circuit, software, or hardware circuit. Combined with software implementation. For example, referring to FIG.
- the scheduling apparatus 1200 may include: a first processing unit 1201, a second processing unit 1202, and a communication unit 1203.
- the first processing unit 1201 and the second processing unit 1202 can be combined into one processing unit, and the combined processing unit can have the functions of the first processing unit 1201 and the second processing unit 1202 at the same time.
- a certain unit may also be referred to as a certain module or others, and the naming is not limited in this application.
- the scheduling apparatus 1200 when used to perform the operation of the network device in the scheduling method described in FIG. 4:
- the first processing unit 1201 is configured to process the first state information sets of the K terminal devices to be scheduled to obtain the second state information sets of the K terminal devices to be scheduled, where K is an integer greater than or equal to 1; where , The second state information set of any terminal device to be scheduled includes the state information of any terminal device to be scheduled and the state associated data between any terminal device to be scheduled and other terminal devices to be scheduled; said K The dimension of the second state information set of any one of the terminal devices to be scheduled is H, and H is an integer greater than or equal to 1.
- the second processing unit 1202 is configured to separately The second set of state information is input to the first neural network model, the scheduled weights of each terminal device to be scheduled are determined, and K scheduled weights are obtained; the first neural network model is determined based on the H; and the first neural network model is determined based on the H; The K scheduled weights determine the scheduling result, and the scheduling result indicates the terminal device to be scheduled; the communication unit 1203 is configured to output the scheduling result.
- the first state information set of any terminal device to be scheduled includes at least one of the following state information: the instantaneous estimated throughput of the terminal device, the average throughput of the terminal device, the buffer size of the terminal device, and the packet waiting of the terminal device time.
- the second processing unit 1202 when determining the scheduling result according to the K scheduled weights, is specifically configured to: correspond to the largest scheduled weight among the K scheduled weights As the scheduling result; or, the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights is used as the scheduling result; or, among the K scheduled weights One of the scheduled weights is processed into the first value, the remaining K-1 scheduled weights are processed into the second value, and the sequence consisting of the processed first value and K-1 second value is the scheduling result, Wherein, the terminal device corresponding to the first value in the scheduling result is a scheduled terminal device.
- the second processing unit 1202 is further configured to input a third state information set into a second neural network model to obtain a value corresponding to the third state information set, and the value is used for Update the model parameters of the first neural network model and the second neural network model; wherein the dimension of the third state information set is the H, and the second neural network model is determined based on the H.
- the third state information set is obtained by the second processing unit 1202 processing the second state information sets of the K to-be-scheduled terminal devices.
- the second processing unit 1202 processes the second state information sets of the K to-be-scheduled terminal devices to obtain the third state information set
- it is specifically configured to: target the K-to-be-scheduled terminals Take the average of each item of state information in the second state information set of the device to obtain the third state information set; or, select each state in the second state information set of the K to-be-scheduled terminal devices
- the maximum value of information constitutes the third state information set; or, the minimum value of each item of state information in the second state information set of the K to-be-scheduled terminal devices is selected to form the third state information set .
- any one of the first neural network model and the second neural network model is determined based on the H, including: the first neural network model and the second neural network model
- the dimension of the input of any one of the neural network models is related to the H.
- the scheduling apparatus 1200 when used to perform the operation of the network device in the scheduling transmission method described in FIG. 6:
- the first processing unit 1201 is configured to obtain a fifth state information set based on the fourth state information set of the K to-be-scheduled terminal devices and the state information set of the system, and K is an integer greater than or equal to 1; the K The dimension of the first state information set of any terminal device to be scheduled in the terminal device to be scheduled is L, and L is an integer greater than or equal to 1; the fourth state information set of any terminal device to be scheduled includes any one of the terminal devices to be scheduled The state information of the terminal device and the state association data between any of the terminal devices to be scheduled and other terminal devices to be scheduled; the second processing unit 1202 is configured to input the fifth state information set into the third neural network
- the model determines a weight set; the third neural network model is determined based on the H; the number of weights contained in the weight set is the same as that of the H; and the K terminal devices to be scheduled are determined based on the weight set
- the scheduled weight of each terminal device to be scheduled obtains K scheduled weights; and the scheduling result is determined according to the
- the state information set of the system includes at least one of the following state information: average throughput of the system, system fairness, and system packet loss rate.
- the first processing unit 1201 when it obtains a fifth state information set based on the fourth state information set of the K to-be-scheduled terminal devices and the state information set of the system, it is specifically configured to: The fourth state information sets of the K to-be-scheduled terminal devices are processed to obtain a sixth state information set; the sixth state information set and the state information set of the system are combined into the fifth state information set .
- the first processing unit 1201 when the first processing unit 1201 processes the fourth state information sets of the K to-be-scheduled terminal devices to obtain a sixth state information set, it is specifically configured to: The state information of each item in the fourth state information set of the K terminal devices to be scheduled is averaged to obtain the sixth state information set; or, the fourth state information of the K terminal devices to be scheduled is selected The maximum value of each item of state information in the set forms the sixth state information set; or, the minimum value of each item of state information in the fourth state information set of the K to-be-scheduled terminal devices is selected to form The sixth state information set.
- the second processing unit 1202 is further configured to: input a seventh state information set into a fourth neural network model to obtain a value corresponding to the seventh state information set, and the value is used to update the third state information set.
- the seventh state information set is the same as the fifth state information set.
- any one of the third neural network model and the fourth neural network model is determined based on the L, and may include: the third neural network model and the The dimension of the input of any one of the fourth neural network models is related to the L.
- the second processing unit 1202 determines the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled based on the weight set, it is specifically configured to:
- the weight set respectively performs a weighted summation on the value of each state information in the fourth state information set of each terminal device to be scheduled in the K terminal devices to be scheduled, and obtains the value of each terminal device to be scheduled. Scheduling weight.
- the second processing unit 1202 when determining the scheduling result according to the K scheduled weights, is specifically configured to: use the identifier of the terminal device corresponding to the largest scheduled weight among the K scheduled weights as The scheduling result; or, the identification of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights is used as the scheduling result; or, one of the K scheduled weights is processed by the scheduled weight Into the first value, the remaining K-1 scheduled weights are processed into the second value, and the sequence consisting of the processed first value and K-1 second value is the scheduling result, wherein the scheduling result The terminal device corresponding to the first value in is the scheduled terminal device.
- the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
- the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
- the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
- the embodiments of the present application also provide a scheduling device.
- the scheduling device may be a network device, or a device in a network device (for example, a chip or a chip system or a chip set or a chip used for execution related Part of the method function), or a device that can be used with network equipment.
- the scheduling device may be a chip system.
- the chip system may be composed of chips, or may include chips and other discrete devices.
- the scheduling device can be applied to the communication system shown in FIG. 2 to implement the scheduling method shown in FIG. 4 or FIG. 6.
- the scheduling apparatus 1300 may include: at least one processor 1302, and optionally, may also include a communication interface 1301 and/or a memory 1303.
- the processor 1302 may be a central processing unit (CPU), a network processor (NP), or a combination of CPU and NP, or the like.
- the processor 1302 may further include a hardware chip.
- the above-mentioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD) or a combination thereof.
- the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL) or any combination thereof.
- CPLD complex programmable logic device
- FPGA field-programmable gate array
- GAL generic array logic
- the communication interface 1301 may be a transceiver, a circuit, a bus, a module, or other types of communication interfaces for communicating with other devices through a transmission medium.
- the memory 1303 is coupled with the processor 1302, and is used to store necessary programs of the scheduling device 1300 and the like.
- the program may include program code, and the program code includes computer operation instructions.
- the memory 1303 may include RAM, or may also include non-volatile memory, such as at least one disk memory.
- the processor 1302 executes the application program stored in the memory 1303 to implement the function of the scheduling device 1300.
- the coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units or modules, and may be in electrical, mechanical or other forms, and is used for information exchange between devices, units or modules.
- the embodiment of the present application does not limit the specific connection medium between the aforementioned communication interface 1301, the processor 1302, and the memory 1302.
- the communication interface 1301, the processor 1302, and the memory 1302 are connected by a bus 1304.
- the bus 1304 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard. Structure (Extended Industry Standard Architecture, EISA) bus, etc.
- PCI Peripheral Component Interconnect
- EISA Extended Industry Standard Architecture
- the bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 13, but it does not mean that there is only one bus or one type of bus.
- the scheduling apparatus 1300 when used to perform the operation of the network device in the scheduling method described in FIG. 4:
- the processor 1302 is configured to process the first state information sets of the K terminal devices to be scheduled to obtain the second state information sets of the K terminal devices to be scheduled, and K is an integer greater than or equal to 1; where any The second state information set of a terminal device to be scheduled includes the state information of any terminal device to be scheduled and the state associated data between any terminal device to be scheduled and other terminal devices to be scheduled; the K number of terminal devices to be scheduled.
- the dimension of the second state information set of any terminal device to be dispatched in the dispatch terminal device is H, and H is an integer greater than or equal to 1, and the second state information set of each terminal device to be dispatched is input into the first neural network model. , Determine the scheduled weight of each terminal device to be scheduled to obtain K scheduled weights; the first neural network model is determined based on the H; and the scheduling result is determined according to the K scheduled weights, the The scheduling result indicates the terminal device to be scheduled.
- the first state information set of any terminal device to be scheduled includes at least one of the following state information: the instantaneous estimated throughput of the terminal device, the average throughput of the terminal device, the buffer size of the terminal device, and the packet waiting of the terminal device time.
- the processor 1302 when determining the scheduling result according to the K scheduled weights, is specifically configured to: assign the terminal corresponding to the largest scheduled weight among the K scheduled weights The identifier of the device is used as the scheduling result; or, the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights is used as the scheduling result; or, one of the K scheduled weights is used as the scheduling result.
- the scheduled weight is processed into a first value
- the remaining K-1 scheduled weights are processed into a second value
- the sequence consisting of the processed first value and K-1 second value is the scheduling result, where,
- the terminal device corresponding to the first value in the scheduling result is the scheduled terminal device.
- the processor 1302 is further configured to input a third state information set into the second neural network model to obtain a value corresponding to the third state information set, and the value is used to update all the state information sets.
- the third state information set is obtained by the processor 1302 by processing the second state information sets of the K to-be-scheduled terminal devices.
- the processor 1302 processes the second state information sets of the K to-be-scheduled terminal devices to obtain the third state information set, it is specifically configured to: Each item of state information in the second state information set is averaged to obtain the third state information set; or, the value of each item of state information in the second state information set of the K to-be-scheduled terminal devices is selected The maximum value constitutes the third state information set; or, the minimum value of each item of state information in the second state information sets of the K to-be-scheduled terminal devices is selected to form the third state information set.
- any one of the first neural network model and the second neural network model is determined based on the H, including: the first neural network model and the second neural network model
- the dimension of the input of any one of the neural network models is related to the H.
- the scheduling apparatus 1300 when used to perform the operation of the network device in the scheduling transmission method described in FIG. 6:
- the processor 1302 is configured to obtain a fifth state information set based on the fourth state information set of the K terminal devices to be scheduled and the state information set of the system, where K is an integer greater than or equal to 1; the K to be scheduled The dimension of the first state information set of any terminal device to be scheduled in the terminal device is L, and L is an integer greater than or equal to 1; the fourth state information set of any terminal device to be scheduled includes the any terminal device to be scheduled The state information of any terminal device to be dispatched and the state associated data between the other terminal devices to be dispatched; the fifth state information set is input into the third neural network model to determine a weight set; the third nerve The network model is determined based on the H; the number of weights included in the weight set is the same as that of the H; the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled is determined based on the weight set to obtain K scheduled weights; and determining a scheduling result according to the K scheduled weights, the scheduling result indicating the scheduled terminal device.
- the state information set of the system includes at least one of the following state information: average throughput of the system, system fairness, and system packet loss rate.
- the processor 1302 when the processor 1302 obtains a fifth state information set based on the fourth state information set of the K to-be-scheduled terminal devices and the state information set of the system, it is specifically configured to: The fourth state information sets of the K to-be-scheduled terminal devices are processed to obtain a sixth state information set; the sixth state information set and the state information set of the system are combined into the fifth state information set.
- the processor 1302 when the processor 1302 processes the fourth state information set of the K to-be-scheduled terminal devices to obtain a sixth state information set, it is specifically configured to: Each item of state information in the fourth state information set of the K terminal devices to be scheduled is averaged to obtain the sixth state information set; or, selected from the fourth state information set of the K terminal devices to be scheduled The maximum value of each item of state information of the K to form the sixth state information set; or, the minimum value of each item of state information in the fourth state information set of the K to-be-scheduled terminal devices is selected to form the The sixth state information set.
- the processor 1302 is further configured to: input a seventh state information set into a fourth neural network model to obtain a value corresponding to the seventh state information set, and the value is used to update the third neural network The model and the model parameters of the fourth neural network model; wherein the dimension of the seventh state information set is the same as the dimension of the fifth state information set, and the fourth neural network model is determined based on the L.
- the seventh state information set is the same as the fifth state information set.
- any one of the third neural network model and the fourth neural network model is determined based on the L, and may include: the third neural network model and the The dimension of the input of any one of the fourth neural network models is related to the L.
- the processor 1302 determines the scheduled weight of each terminal device to be scheduled in the K terminal devices to be scheduled based on the weight set, it is specifically configured to: based on the weight value The set separately performs a weighted summation of the value of each state information in the fourth state information set of each terminal device to be scheduled in the K terminal devices to be scheduled to obtain the scheduled weight of each terminal device to be scheduled .
- the processor 1302 determines the scheduling result according to the K scheduled weights, it is specifically configured to: use the identifier of the terminal device corresponding to the largest scheduled weight among the K scheduled weights as the Scheduling result; or, using the identifier of the terminal device corresponding to the smallest scheduled weight among the K scheduled weights as the scheduling result; or, processing one of the K scheduled weights as the first scheduled weight One value, the remaining K-1 scheduled weights are processed into a second value, and a sequence consisting of the processed first value and K-1 second value is the scheduling result, wherein the scheduling result is the first The terminal device corresponding to a value is the scheduled terminal device.
- the information output or received by the communication unit 1203 and the communication interface 1301 may be in the form of baseband signals.
- the communication interface 1301 receives the baseband signal carrying information.
- the output or reception of the communication interface 1301 may be a radio frequency signal.
- the communication interface 1301 receives radio frequency signals that carry information.
- FIG. 14 is a schematic structural diagram of a network device provided by an embodiment of the present application, for example, it may be a schematic structural diagram of a base station.
- the network can be applied to the communication system shown in FIG. 2 to perform the functions of the network device in the method embodiment described in FIG. 4 or FIG. 6.
- the base station 1400 may include one or more distributed units (DU) 1401 and one or more centralized units (CU) 1402.
- the DU 1401 may include at least one antenna 14011, at least one radio frequency unit 14012, at least one processor 14017, and at least one memory 14014.
- the DU 1401 part is mainly used for the transmission and reception of radio frequency signals, the conversion of radio frequency signals and baseband signals, and part of baseband processing.
- the CU 1402 may include at least one processor 14022 and at least one memory 14021.
- CU1402 and DU1401 can communicate through interfaces, where the control plan interface can be Fs-C, such as F1-C, and the user plane (User Plan) interface can be Fs-U, such as F1-U.
- control plan interface can be Fs-C, such as F1-C
- user plane (User Plan) interface can be Fs-U, such as F1-U.
- the CU 1402 part is mainly used for baseband processing, control of base stations, and so on.
- the DU 1401 and the CU 1402 may be physically set together, or may be physically separated, that is, a distributed base station.
- the CU 1402 is the control center of the base station, which may also be referred to as a processing unit, and is mainly used to complete baseband processing functions.
- the CU 1402 may be used to control the base station to execute the operation process of the network device in the method embodiment described in FIG. 4 or FIG. 6.
- the baseband processing on the CU and the DU can be divided according to the protocol layer of the wireless network.
- the functions of the PDCP layer and the above protocol layers are set in the CU, and the protocol layers below the PDCP, such as the RLC layer and the MAC layer, are set in the DU.
- the CU implements the functions of the RRC and PDCP layers
- the DU implements the functions of the RLC, MAC, and physical (physical, PHY) layers.
- the base station 1400 may include one or more radio frequency units (RU), one or more DUs, and one or more CUs.
- the DU may include at least one processor 14017 and at least one memory 14014
- the RU may include at least one antenna 14011 and at least one radio frequency unit 14012
- the CU may include at least one processor 14022 and at least one memory 14021.
- the CU1402 can be composed of one or more single boards, and multiple single boards can jointly support a wireless access network (such as a 5G network) with a single access indication, and can also support wireless access networks of different access standards.
- Access network (such as LTE network, 5G network or other networks).
- the memory 14021 and the processor 14022 may serve one or more boards. In other words, the memory and the processor can be set separately on each board. It can also be that multiple boards share the same memory and processor. In addition, necessary circuits can be provided on each board.
- the DU1401 can be composed of one or more single boards, and multiple single boards can jointly support a wireless access network with a single access indication (such as a 5G network), or can respectively support wireless access networks with different access standards (such as LTE network, 5G network or other network).
- the memory 14014 and the processor 14017 may serve one or more boards. In other words, the memory and the processor can be set separately on each board. It can also be that multiple boards share the same memory and processor. In addition, necessary circuits can be provided on each board.
- the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
- the scheduling device implements the foregoing scheduling method.
- the computer-readable storage medium may be any available medium that can be accessed by a computer. Take this as an example but not limited to: computer-readable media may include non-transitory computer-readable media, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage media, or other magnetic storage devices, or can be used for carrying Or any other medium that stores desired program codes in the form of instructions or data structures and can be accessed by a computer.
- the embodiments of the present application also provide a computer program product, which when executed by a communication device, enables the communication device to implement the foregoing scheduling method.
- the computer program product may be a computer program product including a non-transitory computer readable medium.
- this application provides a scheduling method and device.
- the neural network model can be shared and used by all terminal devices to be scheduled.
- the neural network model can be applied to all terminal devices to be scheduled, so as to achieve
- the neural network model is decoupled from the number of terminal devices to be scheduled, and the neural network model can be applied to scenarios with different numbers of terminal devices to be scheduled, with good adaptability and scalability.
- this application can be provided as methods, systems, or computer program products. Therefore, this application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
- the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
- the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
一种调度方法及装置,用以解决现有技术中调度算法用户设备数量自适应困难的问题。该方法为:对K个待调度终端设备的第一状态信息集合进行处理后得到K个待调度终端设备的第二状态信息集合;任一个待调度终端设备的第二状态信息集合包含任一个待调度终端设备的状态信息以及任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;任一个待调度终端设备的第二状态信息集合的维度为H;分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定每个待调度终端设备的被调度权重,进而确定调度结果;第一神经网络模型基于H确定。调度时神经网络模型与待调度终端设备的数量解耦,可以应用到待调度终端设备数量不同的场景中。
Description
相关申请的交叉引用
本申请要求在2019年12月13日提交中国专利局、申请号为201911285268.3、申请名称为“一种调度方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及通信技术领域,尤其涉及一种调度方法及装置。
无线资源调度在蜂窝网络中起着至关重要的作用,其本质就是根据当前各个用户设备的信道质量、服务的质量(quality of service,QoS)要求等对可用的无线频谱等资源进行分配。在蜂窝网络中,媒体访问控制(media access control,MAC)层调度主要解决时频资源的分配、调制与编码策略(modulation and coding scheme,MCS)选择、用户配对、预编码等问题,通过调度来实现系统吞吐和公平性的折中。
目前,为了在动态变化的无线传输环境中进行调度,通常结合深度强化学习(deep reinforcement learning,DRL)算法来获得较优的调度策略。而在上述调度算法中,通常采用深度神经网络实现,该深度神经网络的输入神经元规模和输出神经元规模由系统的状态与动作空间决定,而系统的状态与动作空间与系统中的待调度的用户设备数量相关,所以深度神经网络会随着待调度的用户设备数量变化。
由于待调度的用户设备数量的不可控,适合用户设备数量深度神经网络的规模大小不能保证,因此可能会造成性能损失,调度灵活性较差。
发明内容
本申请提供一种调度方法及装置,用以将深度神经网络在调度问题上与用户设备数量解耦,从而解决深度强化学习调度算法用户设备数量自适应困难的问题。
第一方面,本申请提供了一种调度方法,该方法可以应用于网络设备,也可以应用于网络设备中的芯片或者芯片组。该方法包括:对K个待调度终端设备的第一状态信息集合进行处理,得到K个待调度终端设备的第二状态信息集合;分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定所述每个待调度终端设备的被调度权重,得到K个被调度权重;根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备;其中,K为大于或者等于1的整数;其中,任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述K个待调度终端设备中任一个待调度终端设备的第二状态信息集合的维度为H,H为大于或者等于1的整数;所述第一神经网络模型基于所述H确定。
通过上述方法,通过对所有待调度终端设备的状态信息处理,然后将每个待调度终端 设备的处理后的状态信息分别输入同一个神经网络模型进而得到结果。即,在调度过程中该神经网络模型是被所有待调度终端设备共享使用的,该神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
在一个可能的设计中,任一个待调度终端设备的第一状态信息集合包括以下至少一项状态信息:终端设备的瞬时估计吞吐量,终端设备的平均吞吐量,终端设备的缓存大小,终端设备的包等待时间。
在一个可能的设计中,根据所述K个被调度权重确定调度结果,具体方法可以包括:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
通过上述三种方法,可以灵活地确定所述调度结果,从而准确指示被调度的终端设备。
在一个可能的设计中,将第三状态信息集合输入第二神经网络模型,得到所述第三状态信息集合对应的值,所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数;其中,所述第三状态信息集合的维度为所述H,所述第二神经网络模型基于所述H确定。
通过上述方法,可以不断对第一神经网络模型和第二神经网络模型的模型参数进行训练更新,可以使第一神经网络模型和第二神经网络模型更加准确,从而使调度更加准确。并且,在训练过程中,不受待调度终端设备的数量的影响,从而可以使更新的第一神经网络模型的模型参数和第二神经网络的模型参数与待调度终端设备的数量无关,使所述第一神经网络模型和所述第二神经网络模型应用到更广泛的场景中。
在一个可能的设计中,所述第三状态信息集合是对所述K个待调度终端设备的第二状态信息集合进行处理得到的。
在一个可能的设计中,对所述K个待调度终端设备的第二状态信息集合进行处理得到所述第三状态信息集合,具体方法可以为:针对所述K个待调度终端设备的第二状态信息集合中的每一项状态信息取平均值,得到所述第三状态信息集合;或者,选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最大值,组成所述第三状态信息集合;或者,选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最小值,组成所述第三状态信息集合。
通过上述方法,可以灵活地得到所述第三状态信息集合,以使可以准确通过所述第三状态信息集合得到更新模型参数的值。
在一个可能的设计中,所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型基于所述H确定,可以包括:所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型的输入的维度与所述H相关。这样可以使任一个神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
第二方面,本申请提供了一种调度方法,该方法可以应用于网络设备,也可以应用于网络设备中的芯片或者芯片组。该方法包括:基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,将所述第五状态信息集合输入第三神经网络模型,确定一个权重集合;基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,得到K个被调度权重;根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备;其中,K为大于或者等于1的整数;所述K个待调度终端设备中任一个待调度终端设备的第四状态信息集合的维度为L,L为大于或者等于1的整数;任一个待调度终端设备的第四状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述权重集合包含的权重个数与所述L相同;所述第三神经网络模型基于所述L确定。
通过上述方法,将基于所有待调度终端设备的状态信息集合以及系统的状态信息集合得到的一个状态信息集合输入一个神经网络模型进而得到结果。即,在调度过程中该神经网络模型是被所有待调度终端设备共享使用的,该神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
在一个可能的设计中,所述系统的状态信息集合包括以下至少一项状态信息:系统的平均吞吐量、系统公平性、系统丢包率。
在一个可能的设计中,基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,具体方法可以为:对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合;将所述第六状态信息集合与所述系统的状态信息集合组合成所述第五状态信息集合。
通过上述方法可以准确地得到所述第五状态信息集合,从而可以使调度过程与待调度终端设备的数量解耦,进而实现与待调度终端设备数量无关的调度。
在一个可能的设计中,对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合,具体方法可以为:针对所述K个待调度终端设备的第四状态信息集合中的每一项状态信息取平均值,得到所述第六状态信息集合;或者,选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最大值,组成所述第六状态信息集合;或者,选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最小值,组成所述第六状态信息集合。
通过上述方法,可以灵活地得到所述第六状态信息集合,以使后续可以准确得到所述第五状态信息集合。
在一个可能的设计中,将第七状态信息集合输入第四神经网络模型,得到所述第七状态信息集合对应的值,所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数;其中,所述第七状态信息集合的维度与所述第五状态信息集合的维度相同,所述第四神经网络模型基于所述L确定。
通过上述方法,可以不断对第三神经网络模型和第四神经网络模型的模型参数进行训练更新,可以使第三神经网络模型和第四神经网络模型更加准确,从而使调度更加准确。并且,在训练过程中,不受待调度终端设备的数量的影响,从而可以使更新的第三神经网 络模型的模型参数和第四神经网络的模型参数与待调度终端设备的数量无关,使所述第一神经网络模型和所述第二神经网络模型应用到更广泛的场景中。
在一个可能的设计中,所述第七状态信息集合与所述第五状态信息集合相同。
在一个可能的设计中,所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型基于所述L确定,可以包括:所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型的输入的维度与所述L相关。这样可以使任一个神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
在一个可能的设计中,基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,具体方法可以为:基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第四状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。
通过上述方法,权重集合中的权重可以代表每个状态信息的权重,从而可以根据每个状态信息的权重准确地确定每个待调度终端设备的被调度权重,进而准确地确定调度结果。
在一个可能的设计中,根据所述K个被调度权重确定调度结果,具体方法可以为:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
通过上述三种方法,可以灵活地确定所述调度结果,从而准确指示被调度的终端设备。
第三方面,本申请提供一种调度装置,该调度装置可以是网络设备,也可以是网络设备内的芯片或芯片组。该调度装置可以包括处理单元,还可以包括通信单元。当该调度装置是网络设备时,该处理单元可以是处理器,该通信单元可以是收发器;该调度装置还可以包括存储单元,该存储单元可以是存储器;该存储单元用于存储指令,该处理单元执行该存储单元所存储的指令,以使网络设备执行上述第一方面或第二方面中相应的功能。当该调度装置是网络设备内的芯片或芯片组时,该处理单元可以是处理器,该通信单元可以是输入/输出接口、管脚或电路等;该处理单元执行存储单元所存储的指令,以使网络设备执行上述第一方面或第二方面中相应的功能。该存储单元可以是该芯片或芯片组内的存储模块(例如,寄存器、缓存等),也可以是该网络设备内的位于该芯片或芯片组外部的存储单元(例如,只读存储器、随机存取存储器等)。其中,所述处理单元还可以分为第一处理单元和第二处理单元,具体的,所述第一处理单元分别实现上述第一方面和第二方面中涉及状态处理的功能,所述第二处理单元分别实现上述第一方面或第二方面中调度过程。
第四方面,提供了一种调度装置,包括:处理器、通信接口和存储器。通信接口用于该调度装置与其他装置之间传输信息、和/或消息、和/或数据。该存储器用于存储计算机执行指令,当该装置运行时,该处理器执行该存储器存储的该计算机执行指令,以使该调度装置执行如上述第一方面或第二方面中任一设计所述的调度方法。
第五方面,本申请实施例提供一种调度装置,所述调度装置包括处理器,当所述处理器执行存储器中的计算机程序或指令时,如第一方面或第二方面所述的方法被执行。
第六方面,本申请实施例提供一种调度装置,所述调度装置包括处理器和存储器,所述存储器用于存储计算机执行计算机程序或指令;所述处理器用于执行所述存储器所存储的计算机执行计算机程序或指令,以使所述通信装置执行如上述第一方面或第二方面中所示的相应的方法。
第七方面,本申请实施例提供一种调度装置,所述调度装置包括处理器、存储器和收发器,所述收发器,用于接收信号或者发送信号;所述存储器,用于存储程序代码或指令;所述处理器,用于从所述存储器调用所述程序代码或指令执行如上述第一方面或第二方面所述的方法。
第八方面,本申请实施例提供一种调度装置,所述调度装置包括处理器和接口电路,所述接口电路,用于接收计算机程序代码或指令并传输至所述处理器;所述处理器运行所述计算机程序代码或指令以执行如上述第一方面或第二方面所示的相应的方法。
第九方面,本申请实施例提供了一种通信系统,可以包括上述提及的待调度终端设备和调度装置。
第十方面,本申请实施例提供的一种计算机可读存储介质,该计算机可读存储介质存储有程序指令,当程序指令在网络设备上运行时,使得网络设备执行本申请实施例第一方面及其任一可能的设计、或者第二方面或第二方面中任一可能设计。示例性的,计算机可读存储介质可以是计算机能够存取的任何可用介质。以此为例但不限于:计算机可读介质可以包括非瞬态计算机可读介质、随机存取存储器(random-access memory,RAM)、只读存储器(read-only memory,ROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、CD-ROM或其他光盘存储、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。
第十一方面,本申请实施例提供一种包括计算机程序代码或指令的计算机程序产品,当所述计算机程序代码或指令被执行时,使得上述第一方面或第二方面所述的方法被实现。示例性的,计算机程序产品可以是包括非暂时性计算机可读介质的计算机程序产品。
另外,第三方面至第十一方面所带来的技术效果可参见上述第一方面至第三方面的描述,此处不再赘述。
图1为本申请提供的一种全连接神经网络的示意图;
图2为本申请提供的一种通信系统的架构图;
图3为本申请提供的一种调度方法的框图;
图4为本申请提供的一种调度方法的流程图;
图5为本申请提供的另一种调度方法的框图;
图6为本申请提供的另一种调度方法的流程图;
图7为本申请提供的另一种调度方法的框图;
图8为本申请提供的一种性能分析图;
图9为本申请提供的另一种性能分析图;
图10为本申请提供的另一种性能分析图;
图11为本申请提供的另一种性能分析图;
图12为本申请提供的一种调度装置的结构示意图;
图13为本申请提供的一种调度装置的结构图;
图14为本申请提供的一种网络设备的结构示意图。
下面将结合附图对本申请作进一步地详细描述。
本申请实施例提供一种调度方法及装置,用以将深度神经网络在调度问题上与用户设备数量解耦,从而解决深度强化学习调度算法用户设备数量自适应困难的问题。其中,本申请所述方法和装置基于同一发明构思,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
1)、强化学习是智能体(Agent)以与环境交互的方式进行学习,智能体根据环境反馈的状态,对环境做出动作,从而获得奖励(reward)及下一个时刻的状态,目标是使智能体在一段时间内积累最大的奖赏。强化学习中由环境提供的强化信号对产生动作的好坏作一种评价(通常为标量信号),通过这种方式,智能体在行动-评价的环境中获得知识,改进行动方案以适应环境。常见的强化学习算法有Q学习(Q-learning),策略梯度算法(policy gradient),演员-评论家法(actor-critic)等。目前常用的强化学习算法通常为深度强化学习,其主要将强化学习与深度学习结合,采用神经网络对策略/价值函数进行建模,从而适应更大输入/输出维度,被称为深度强化学习(DRL)。
2)、全连接神经网络又叫多层感知机(multi-layer perceptron,MLP),一个MLP包含一个输入层(左侧),一个输出层(右侧),及多个隐藏层(中间),每层包含数个节点,称为神经元。其中相邻两层的神经元间两两相连。例如,图1示出了一种示例性的全连接神经网络,图1中以包含2个隐藏层示例,其中,x为输入,y为输出,w为权重矩阵,b为偏置向量。
3)MAC层调度:MAC层调度主要解决时频资源的分配、MCS选择、用户配对、预编码等问题,通过MAC调度来实现系统吞吐和公平性的折中。
4)、在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
5)、至少一项(个),指一项(个)或多项(个),多项(个)指两项(个)或两项(个)以上。
6)“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
需要说明的是,随着技术的不断发展,本申请实施例的用语有可能发生变化,但都在本申请的保护范围之内。
为了更加清晰地描述本申请实施例的技术方案,下面结合附图,对本申请实施例提供的调度方法及装置进行详细说明。
本申请实施例提供的调度方法可以应用于各类通信系统中,例如,卫星通信系统、物联网(internet of things,IoT)、窄带物联网(narrow band internet of things,NB-IoT)系统、全球移动通信系统(global system for mobile communications,GSM)、增强型数据速率GSM 演进系统(enhanced data rate for GSM evolution,EDGE)、宽带码分多址系统(wideband code division multiple access,WCDMA)、码分多址2000系统(code division multiple access,CDMA2000)、时分同步码分多址系统(time division-synchronization code division multiple access,TD-SCDMA),长期演进系统(long term evolution,LTE)、第五代(5G)通信系统,例如5G新无线(new radio,NR),以及下一代5G移动通信系统的三大应用场景增强型移动带宽(enhanced mobile broadband,eMBB),超可靠、低时延通信(ultra reliable low latency communications,uRLLC)和海量机器类通信(massive machine type communications,mMTC),或者还可以是其他通信系统。只要通信系统中存在通信设备进行MAC层调度,均可以采用本申请实施例提供的调度方法。
示例性的,图2示出了本申请实施例提供的调度方法适用的一种可能的通信系统的架构,所述通信系统的架构中包括网络设备和终端设备(例如图2中示出的终端设备1和终端设备2)。其中:
所述终端设备又称为移动台(mobile station,MS)、用户设备(user equipment,UE)或者终端(terminal)。例如,所述终端设备可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其它处理设备等。所述终端设备还可以是用户单元(subscriber unit)、蜂窝电话(cellular phone)、智能手机(smart phone)、无线数据卡、个人数字助理(personal digital assistant,PDA)电脑、平板型电脑、无线调制解调器(modem)、手持设备(handset)、膝上型电脑(laptop computer)、机器类型通信(machine type cCommunication,MTC)终端等。
所述网络设备可以是一种部署在无线接入网中为终端设备提供无线通信功能的设备。具体的,所述网络设备可以是基站(base station,BS)。例如,所述基站可以包括各种形式的宏基站,微基站(也称为小站),中继站,接入点等。在采用不同的无线接入技术的系统中,具备基站功能的设备的名称可能会有所不同,例如,在LTE系统中,所述基站称为演进的节点B(evolved NodeB,eNB或者eNodeB),在第三代(3rd generation,3G)系统中,称为节点B(node B),在5G系统中可以是gNode B(gNB),新无线控制器(new radio controller,NR controller)等。为方便描述,本申请所有实施例中,上述为终端设备提供无线通信功能的设备或者装置统称为网络设备。
在所述通信系统中,所述网络设备(例如基站)可以进行MAC层资源调度。通信系统通常由小区组成,每个小区包含一个基站,基站向至少一个终端设备提供通信服务。其中基站包含基带单元(baseband unit,BBU)和远端射频单元(remote radio unit,RRU)。BBU和RRU可以放置在不同的地方,例如:RRU拉远,放置于高话务量的区域,BBU放置于中心机房。BBU和RRU也可以放置在同一机房。BBU和RRU也可以为一个机架下的不同部件。本申请对此不作限定。
需要说明的是,图1中网络设备和终端设备的个数并不作为对所述通信系统的限定,所述通信系统中可以包含更多的网络设备,以及可以包括更少或者更多的终端设备。应理解,所述通信系统中除了所述网络设备和所述终端设备还可以包含其他设备,本申请此处不再一一列举。
应理解,本申请实施例描述的通信系统的架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着通信系统的演变和新业务场景的出现,本申请实施例提供的技术方案 对于类似的技术问题,同样适用。
目前,为了在动态变化的无线传输环境中进行调度,可以使用DRL算法实现。这类算法利用DRL中的智能体与无线传输环境的交互,不断更新其自身参数,以获得较优的决策策略。智能体首先获取通信系统的当前状态,并根据此状态做出决策,执行决策后,通信系统进入下一状态,同时反馈奖励,智能体根据奖励对自身决策参数进行调整。智能体通过迭代式地与环境进行交互,不断调整自身参数以获得更大奖励,最终收敛后即可得到较优的调度策略。
其中,智能体的策略通常采用深度神经网络实现,该深度神经网络的输入神经元规模和输出神经元规模由系统的状态与动作空间决定,而系统的状态与动作空间与系统中的待调度终端设备数量相关,所以深度神经网络会随着待调度的终端设备数量变化。为了将这个深度神经网络用于实际系统中,通常有两种做法:1、训练一个最大终端设备数量的网络。2、根据不同用户数量训练多个网络用于切换。
但是,上述现有的做法1在终端设备数量不匹配的时候会造成性能损失,而且由于状态和动作空间非常大,其对应神经网络规模也很大,这种情况下神经网络的训练比较困难,收敛性也不能保证。做法2遍历所有终端设备的灵活性较差,而且会带来较大的存储复杂度。
基于上述问题,本申请提出一种新的调度方法及装置,将深度神经网络在调度问题上与终端设备数量解耦,从而解决将深度强化学习调度算法终端设备数量自适应困难的问题。具体的,本申请提供的调度方法可以基于DRL实现,具体的,可以先对多个待调度终端设备(UE)的状态(states)进行处理,之后单个终端设备的状态(state)通过同一个神经网络模型后得到多个待调度的终端设备分别对应的输出,对得到的输出进行处理后得到调度结果。其中,对所述多个待调度终端设备的状态进行处理可以是通过新增的一个状态处理模块(单元)执行,当然,所述状态处理模块的命名仅仅是一个示例,还可以有其它多种不同的命名,本申请对此不作限定。示例性的,得到多个待调度的终端设备分别对应的输出可以是各个待调度终端设备的得分等,本申请对此也不作限定。例如,以终端设备为UE为例,图3示出了一种调度方法的框图。
需要说明的是,本申请实施例提供的调度方法可以应用于网络设备,也可以应用于网络设备中的芯片或者芯片组,下面以应用于网络设备为例对本申请提供的调度方法进行详细说明。
本申请实施例提供了一种调度方法,适用于如图2所示的通信系统。该实施例中的调度方法可以应用在强化学习中的actor-critic算法(离散动作)上。参阅图4所示,该方法的具体流程可以包括:
步骤401、网络设备对K个待调度终端设备的第一状态信息集合进行处理,得到K个待调度终端设备的第二状态信息集合,K为大于或者等于1的整数;其中,任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述K个待调度终端设备中任一个待调度终端设备的第二状态信息集合的维度为H,H为大于或者等于1的整数。
具体的,所述网络设备可以通过第一处理模块(可以称为状态处理模块)执行步骤401。例如,当所述K个待调度终端设备中的每个待调度终端设备的第一状态信息集合的维度均为F时,所有待调度终端设备的状态维度可以认为是K*F,其中,F为大于或者等于1的 整数;在所述网络设备通过所述第一处理模块对所述K个待调度终端设备的第一状态信息集合进行处理后,所有待调度终端设备的状态维度变成了K*H,这里可以用K*1*H表示。也即得到了每个待调度终端设备的状态维度(也即第二状态信息集合的维度)为H,这里可以用1*H或者2*(H/2)等等表示。可选的,每个待调度终端设备的第一状态信息集合可以理解为所述每个待调度终端设备的特征。例如,上述过程可以参考图5示出的调度方法的框图中,由K*F维度的UE(即为待调度终端设备)特征通过状态处理模块(第一处理模块)后得到K*1*H的K个UE的状态的相关框图。
在一种可选的实施方式中,所述H大于或者等于所述F,所述H为基于所述F设定的。所述H也即每个第二状态信息集合中包含的状态信息的个数。
在一种示例性的实施方式中,上述涉及的第一处理模块是为了将待调度终端设备之间特征(状态信息)的关联性显式的提取出来作为待调度终端设备状态信息集合的组成部分,而且在处理的过程中不丢失待调度终端设备自身的状态信息,即可以认为处理后的待调度终端设备的第二状态信息集合包含用户自身的状态信息及其他待调度终端设备对该待调度终端设备的影响。也即通过上述处理,可以使任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据。需要说明的是,这里所说的其他待调度终端设备可以是所述K个待调度终端设备中除了所述任一个待调度终端设备之外的待调度终端设备中的部分或者全部,本申请对此不作限定。
示例性的,所述第一处理模块中用到的操作可以有嵌入(embedding),用户间归一化(inter-user normalization),注意力机制(attention)等。
其中,嵌入指的是通过一个共享的神经网络将输入的用户特征(即K个待调度终端设备的第一状态信息集合)映射到另外一个空间,可以进行维度的变换,如从F维空间映射到F’维空间。其中,所述F’即为所述H。
用户间归一化指的是在用户特征的每个维度都根据该维度所有用户(即K个的待调度终端设备)的状态信息进行归一化,如x′=(x-x
min)/(x
max-x
min)或x′=(x-μ)/σ或x′=x-μ或x′=x/x
max等,其中x为某个特征(某项状态信息)的取值,x’为归一化后的特征取值,x
max为所有用户该特征的最大值(也即K个待调度终端设备中某项状态信息的最大值),x
min为所有用户该特征的最小值(也即K个待调度终端设备中某项状态信息的最小值),μ为所有用户该特征的平均值(也即K个待调度终端设备中某项状态信息的平均值),σ为所有用户该特征的标准差(也即K个待调度终端设备中某项状态信息的标准差)。
注意力机制主要功能是提取用户特征(也即待调度终端设备的状态信息)之间的相关性,其原理是通过将所有用户特征通过三个共享的神经网络转化为Q、M、V三个矩阵,维度都为M*dm,attention的操作可以表示为
其中,所述dm即为所述H。
基于上述处理相关介绍可以看出,这些处理本质上都是独立对每一个待调度终端设备的状态信息(特征)进行操作,这与待调度终端设备的数量K无关。
在一种具体的实施方式中,所述任一个待调度终端设备的第一状态信息集合可以包括 以下至少一项状态信息:终端设备的瞬时估计吞吐量,终端设备的平均吞吐量,终端设备的缓存大小,终端设备的包等待时间等等。
步骤402、所述网络设备分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定所述每个待调度终端设备的被调度权重,得到K个被调度权重;所述第一神经网络模型基于所述H确定。
其中,步骤402中的过程可以理解为推理过程,通过上述方法,可以使得第一神经网络模型只需要关注单个待调度终端设备的第二状态信息集合就可以做出决策,从而实现所述第一神经网络模型与待调度终端设备的数量K无关。例如,图5中示出的K个待调度终端设备的第二状态信息集合分别通过所述第一神经网络模型,也即K个1*H的第二状态信息集合输入所述第一神经网络模型后得到K*1个输出,K个被调度权重。
其中,每个待调度终端设备的被调度权重可以是每个待调度终端设备的被调度的概率、得分等等。
在一种可选的实施方式中,所述第一神经网络模型基于所述H确定可以理解为所述第一神经网络模型的输入的维度与所述H相关,也即所述第一神经网络模型的输入层的神经元个数与所述H相关。一种实现方式中,本申请中,所述第一神经网络模型的输入的维度与所述H相等,也即所述第一神经网络模型的输入层的神经元的个数为H个。
具体的,所述第一神经网络模型与所述K无关。
示例性的,所述第一神经网络模型可以称为策略神经网络。所述第一神经网络模型可以是全连接神经网络,其中,所述第一神经网络模型的隐藏层的激活函数可以为ReLU,输出层的激活函数可以为softmax。
步骤403、所述网络设备根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备。
例如,所述网络设备根据所述K个被调度权重确定调度结果,可以通过执行以下三种操作(action),下述三种操作即示例为图5中的操作:
操作a1:所述网络设备将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果。
所述操作a1可以通过操作argmax实现,得到的所述终端设备的标识可以为所述终端设备在所述K个待调度终端设备中的序号。
操作a2:所述网络设备将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果。
所述操作a2可以通过操作argmin实现,同样得到的所述终端设备的标识可以为所述终端设备在所述K个待调度终端设备中的序号。
操作a3:所述网络设备将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
例如,所述操作a3得到的调度结果可以为one-hot码,例如,在所述one-hot码中,所述第一值为0,所述第二值为1,即为111011……1;或者所述第一值为1,所述第二值为0,即为000100……0。当然还可以有其它多种表示,本申请不再一一列举。
在一种可选的实施方式中,在执行上述步骤之前或者在执行上述步骤的过程中,进行神经网络模型训练。
在一种具体的实施方式中,将第三状态信息集合输入第二神经网络模型,得到所述第三状态信息集合对应的值(可以参阅图5),所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数;其中,所述第三状态信息集合的维度为所述H,所述第二神经网络模型基于所述H确定。
其中,所述第二神经网络模型可以称为价值神经网络,所述第二神经网络模型的隐藏层可以与所述第一神经网络模型的隐藏层相同,所述第二神经网络模型的输出层的激活函数可以是线性激活函数。
示例性的,所述第二神经网络模型基于所述H确定可以理解为所述第二神经网络模型的输入的维度与所述H相关,也即所述第二神经网络模型的输入层的神经元的个数与所述H相关。一种实现方式中,本申请中,所述第二神经网络模型的输入的维度与所述H相等,也即所述第二神经网络模型的输入层的神经元的个数为H个。
在一种具体的实施方式中,所述第三状态信息集合对应的值可以理解为价值,所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数时,具体可以为结合所述值与从环境获得的奖励来更新所述第一神经网络模型和所述第二神经网络模型的模型参数。
通过上述方法中将单个的第三状态信息集合输入所述第二神经网络模型得到值的过程,与所述待调度终端设备的数量K无关,所述第二神经网络模型与所述K也无关,从而实现神经网络模型的训练过程与所述K也无关,这样可以不受待调度终端设备的数量的影响,减小训练复杂度。
在一种可选的实施方式中,得到所述第三状态信息集合的情况可以有多种。例如,所述第三状态信息集合是对所述K个待调度终端设备的第二状态信息集合进行处理得到的,如图5中虚框中最左边的平均操作。又例如,所述第三状态信息集合可以是对所述K个待调度终端设备的第二状态信息集合中的部分第二状态信息集合处理得到的。又例如,所述第三状态信息还可以是对所述K个待调度终端设备的第二状态信息集合以及其他训练样本中的终端设备的状态信息集合进行处理得到的。又例如,所述第三状态信息集合可以是对其他训练样本中终端设备的状态信息集合进行处理得到的。
需要说明的是,上述得到所述第三状态信息集合的数据可以认为是一组数据,在训练过程中,可以执行多次上述过程,即通过多组数据得到多个类似于所述第三状态信息集合的状态信息集合,重复执行输入所述第二神经网络模型输入一个值的过程。其中,每组数据中涉及的终端设备的个数可以任意,即与终端设备的数量解耦(无关)。
在一种具体的实施方式中,所述网络设备对所述K个待调度终端设备的第二状态信息集合进行处理得到所述第三状态信息集合,可以包括以下三种方法:
方法b1:所述网络设备针对所述K个待调度终端设备的第二状态信息集合中的每一项状态信息取平均值,得到所述第三状态信息集合。所述方法b1可以理解为平均操作(average,avg.)(一种降维操作),在图5中仅以平均操作示出。
方法b2:所述网络设备选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最大值,组成所述第三状态信息集合。所述方法b2可以理解为最大降维操作。
方法b3:所述网络设备选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最小值,组成所述第三状态信息集合。所述方法b3可以理解为最小降维操作。
需要说明的是,除了上述三种方法,还有其他降维方法可以实现所述网络设备对所述 K个待调度终端设备的第二状态信息集合进行处理得到所述第三状态信息集合,此处不再一一列举。
下面以一个例子对上述调度方法进行示例性说明:
假设待调度终端设备的数量K=5,每个待调度终端设备的第一状态信息集合的维度F=4,其中每个第一状态信息集合包含的状态信息均为{瞬时估计吞吐,平均吞吐,缓存大小,包等待时间},则系统的输入(也即需要通过第一处理模块处理的输入)为5*4的矩阵(下面称为第一状态矩阵),第一状态矩阵的每一行是一个待调度终端设备的所有特征(也即该调度终端设备的第一状态信息集合中的所有状态信息)。进入所述第一处理模块后,例如进行用户间归一化的操作,第一状态矩阵的每一列采用上文提到的归一化方法进行归一化,输出为5*1*4的第二状态矩阵,嵌入或注意力机制同理,只不过输出的维度变成了5*1*F’和5*1*dm,F’和dm为预先设定的参数。第二状态矩阵中,每个待调度终端设备都有1*4的状态向量,将每个待调度终端设备的状态向量通过第一神经网络模型,如隐藏层为2,隐藏层神经元数目为128的全连接神经网络,隐藏层的激活函数为ReLU,神经网络的输出激活函数为softmax,最终输出的维度是5*1,分别代表调度某个待调度终端设备的概率。例如,在训练阶段,把第二状态矩阵按照第一维进行平均,得到的1*4的平均状态后经过第二神经网络模型输出一个值,所述第二神经网络模型的隐藏层与所述第一神经网络模型相同,最后输出层的激活函数为线性激活函数。
采用本申请实施例提供的调度方法,通过对所有待调度终端设备的状态信息处理,然后将每个待调度终端设备的处理后的状态信息分别输入同一个神经网络模型进而得到结果。即,在调度过程中该神经网络模型是被所有待调度终端设备共享使用的,该神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
需要说明的是,在上述图4所示的实施例中,当所述K等于1时,也就是说只有一个待调度终端设备的时候,涉及的其他待调度终端设备也即所述待调度终端设备本身。在这种情况下,同样可以采用上述方法去执行,只不过得到所述待调度终端设备本身的状态信息集合,也即所述待调度终端设备的第二状态信息集合与所述待调度终端设备的第一状态信息集合相同。进而直接将所述待调度终端设备的第二状态信息集合输入所述第一神经网络模型,最后得到一个调度权重。也就是说上述调度方法当K等于1的时候也完全适用。这样使得上述调度方法与待调度终端设备的数量更具无关性,使得调度的兼容性更好。
当然,在具体实现时,也可以省略对所述待调度终端设备的状态信息集合的处理过程,以及当输出只有一个被调度权重的时候省略对被调度权重的进一步确定调度结果的过程,本申请对此不作限定。
基于以上实施例,本申请实施例还提供了另一种调度方法,适用于图2所示的通信系统。该实施例中的调度方法可以应用在强化学习actor-critic算法(连续动作)上。参阅图6所示,该方法的具体流程可以包括:
步骤601:网络设备基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,K为大于或者等于1的整数;所述K个待调度终端设备中任一个待调度终端设备的第四状态信息集合的维度为L,L为大于或者等于1的整数;任一个待调度终端设备的第四状态信息集合包含所述任一个待调度终端设备的状态信息 以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据。
其中,所述系统的状态信息集合可以包括以下至少一项状态信息:系统的平均吞吐量、系统公平性、系统丢包率。
具体的,所述网络设备可以通过第二处理模块(可以称为状态处理模块)执行步骤601。具体的,所述网络设备基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合具体方法可以为:所述网络设备对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合;将所述第六状态信息集合与所述系统的状态信息集合组合成所述第五状态信息集合。例如,所述K个待调度终端设备的第四状态信息集合构成的所有待调度终端设备的状态维度为K*L,进而得到所述第六状态信息集合的维度为1*L;假设系统的状态信息集合为维度为J(这里可以理解为是1*J),J为大于或者等于1的整数;然后最后组合得到的所述第五状态信息集合的维度可以是1*G,其中G为大于或者等于2的整数。其中,G可以等于J+L,也即上述集合组合可以理解为是两个集合的相加。例如,上述过程可以如图7所示调度方法的框图中第二处理模块的流程。其中,在图7中每个待调度终端设备的第四信息状态集合以UE状态示意。
在一种可选的实施方式中,所述K个待调度终端设备的第四状态信息集合,可以与图4所示的实施例中涉及的所述K个待调度终端设备的第二状态信息集合,此时所述L等于所述H。相应的,所述K个待调度终端设备的第四状态信息集合同样可以是由所述网络设备对所述K个待调度终端设备的第一状态信息集合进行处理得到的,具体处理方法可以参见上述图4所示的实施例中涉及的嵌入(embedding),用户间归一化(inter-user normalization),注意力机制(attention)等操作,本申请此处不再重复赘述。
在一种具体的实施方式中,所述网络设备对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合,具体方法可以包括以下三种方法:
方法c1:所述网络设备针对所述K个待调度终端设备的第四状态信息集合中的每一项状态信息取平均值,得到所述第六状态信息集合。
方法c2:所述网络设备选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最大值,组成所述第六状态信息集合。
方法c3:所述网络设备选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最小值,组成所述第六状态信息集合。
可选的,当所述第四状态信息集合与图4所示的实施例中的第二状态信息集合相同时,得到的所述第六状态信息集合,与图4所示的实施例中涉及的所述网络设备对所述K个待调度终端设备的第二状态信息集合进行处理得到的所述第三状态信息集合相同。相应的,上述方法c1-c3分别与图4涉及的方法b1-b3类似,可以互相参见,此处不再详细描述。
步骤602:所述网络设备将所述第五状态信息集合输入第三神经网络模型,确定一个权重集合;所述第三神经网络模型基于所述L确定;所述权重集合包含的权重个数与所述L相同。
其中,步骤602中的过程可以理解为推理过程,通过上述方法,可以使得第三神经网络模型只需要关注一个状态信息集合(这里即为所述第五状态信息集合)就可以输出相应结果,从而实现所述第三神经网络模型与待调度终端设备的数量K无关。例如,图7中所示的,所述第五状态信息集合的维度是1*G时,将所述第五状态信息集合输入所述第三神经网络模型后,得到所述权重集合的维度为1*L,也即所述权重集合包含的权重个数为L。
具体的,所述权重集合可以看成是包含所述K个待调度终端设备中每个待调度终端设备的第四状态信息基集合中每一项状态信息的权重。也可以理解为L个权重分别代表每个待调度终端设备的L个状态信息的占得分的权重,可选的,所述L个权重可以是连续值。
在一种示例性的实施方式中,所述第三神经网络模型基于所述L确定,可以理解为所述第三神经网络模型的输入的维度与所述L相关,也即所述第三神经网络模型的输入层的神经元个数与所述L相关。
在一种实现方式中,本申请中,所述第三神经网络模型的输入的维度与所述G相等,也即所述第一神经网络模型的输入层的神经元的个数为G个。
具体的,所述第三神经网络模型与所述K无关。
步骤603:所述网络设备基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,得到K个被调度权重。
在一种实施方式中,所述网络设备基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,具体方法可以为:所述网络设备基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第四状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。例如,上述过程可以理解为是维度为1*L的矩阵(权重集合)与维度为K*L的矩阵(K个待调度终端设备的第四状态信息集合)进行点乘后得到K*1被调度权重,即K个被调度权重。例如,图7所示的相关过程,K个被调度权重以得分示出。
当然,每个待调度终端设备的被调度权重还可以是每个待调度终端设备的被调度的概率等等。
示例性的,所述第三神经网络模型可以称为策略神经网络。所述第三神经网络模型可以是全连接神经网络,其中,所述第三神经网络模型的隐藏层的激活函数可以为ReLU,输出层输出可以是一个多维的高斯分布,例如输出层可以是两个,对应输出均值和方差,对应的激活函数分别可以为tanh和softplus,经采样后获得所述第三神经网络模型的最后的输出结果。
步骤604:所述网络设备根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备。
例如,所述网络设备根据所述K个被调度权重确定调度结果,可以通过执行以下三种操作(action):
操作d1:所述网络设备将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果。
操作d2:所述网络设备将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果。
操作d3:所述网络设备将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
需要说明的是,上述操作d1-d3与图4所示的实施例中涉及的操作a1-a3相同,具体举例可以参见上述相关描述,此处不再重复赘述。
在一种可选的实施方式中,在执行上述步骤之前或者在执行上述步骤的过程中,进行神经网络模型训练。
在一种具体的实施方式中,将第七状态信息集合输入第四神经网络模型,得到所述第七状态信息集合对应的值(参阅图7所示),所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数;其中,所述第七状态信息集合的维度与所述第五状态信息集合的维度相同,所述第四神经网络模型基于所述L确定。
其中,所述第四神经网络模型可以称为价值神经网络,所述第四神经网络模型的隐藏层可以与所述第三神经网络模型的隐藏层相同,所述第四神经网络模型的输出层的激活函数可以是线性激活函数。
示例性的,所述第四神经网络模型基于所述L确定可以理解为所述第四神经网络模型的输入的维度与所述L相关,也即所述第四神经网络模型的输入层的神经元的个数与所述L相关。
一种实现方式中,本申请中,所述第四神经网络模型的输入的维度与所述G相等,也即所述第四神经网络模型的输入层的神经元的个数为G个。
在一种具体的实施方式中,所述第七状态信息集合对应的值可以理解为价值,所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数时,具体可以为结合所述值与从环境获得的奖励来更新所述第三神经网络模型和所述第四神经网络模型的模型参数。
通过上述方法中将单个的第七状态信息集合输入所述第四神经网络模型得到值的过程,与所述待调度终端设备的数量K无关,所述第四神经网络模型与所述K也无关,从而实现神经网络模型的训练过程与所述K也无关,这样可以不受待调度终端设备的数量的影响,减小训练复杂度。
在一种可选的实施方式中,得到所述第七状态信息集合的情况可以有多种。例如,所述第七状态信息集合可以与所述第五状态信息集合相同,也即可以直接将得到的所述第五信息状态集合用于训练。又例如,所述第七状态信息集合可以是基于对所述第五状态信息集合以及其他训练样本中的终端设备的状态信息集合和系统的状态信息集合进行处理得到的。又例如,所述第七状态信息可以是基于其他训练样本中终端设备的状态信息集合以及系统的状态信息集合进行处理得到的。
需要说明的是,上述得到所述第七状态信息集合的数据可以认为是一组数据,在训练过程中,可以执行多次上述过程,即通过多组数据得到多个类似于所述第七状态信息集合的状态信息集合,重复执行输入所述第四神经网络模型输入一个值的过程。其中,每组数据中涉及的终端设备的个数可以任意,即与终端设备的数量解耦(无关)。
下面以一个例子对上述调度方法进行示例性说明:
假设待调度终端设备的数量K=5,每个待调度终端设备的第四状态信息集合的维度L=32,系统的状态信息集合可以包含系统当前性能指标{平均吞吐,公平性,丢包率},维度是1*3。则5个待调度终端设备的第四状态信息集合组成的第一状态矩阵维度是5*32,进而第一状态矩阵进行平均后得到的第二状态矩阵的维度是1*32,则系统全局状态(也即上述所述第五状态信息集合)的维度是1*35。将系统全局状态通过第三神经网络模型,如隐藏层为2,隐藏层神经元数目为512的全连接神经网络,隐藏层的激活函数为ReLU,所述第三神经网络模型有两个输出层,对应输出均值和方差,输出层激活函数分别为tanh和softplus,最终输出的维度是2*32,分别代表待调度终端设备的状态信息的权重的均值和方差,进行采样后得到1*32的权重集合。之后得到的权重集合分别与每个待调度终端设备 的第四状态信息集合中的状态子向量点乘,得到每个待调度终端设备得分,根据得分对待调度终端设备进行调度。在训练阶段,第四神经网络模型的隐藏层可以为2层,隐藏蹭神经元数可以为512,最后输出层的激活函数可以为线性激活函数。在该示例中,可以将维度为1*35的系统全局状态输入所述第四神经网络模型得到一个值。
采用本申请实施例提供的调度方法,将基于所有待调度终端设备的状态信息集合以及系统的状态信息集合得到的一个状态信息集合输入一个神经网络模型进而得到结果。即,在调度过程中该神经网络模型是被所有待调度终端设备共享使用的,该神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
需要说明的是,在上述图6所示的实施例中,当所述K等于1时,也就是说只有一个待调度终端设备的时候,涉及的其他待调度终端设备也即所述待调度终端设备本身。在这种情况下,同样可以采用上述方法去执行,只不过所述第四状态信息集合值包含所述待调度终端设备的状态信息,得到的所述第五状态信息集合也是只与所述待调度终端设备的状态信息和系统的状态信息集合有关。在最后得出权重集合后也只得出所述待调度终端设备的一个被调度权重。也就是说上述调度方法当K等于1的时候也完全适用。这样使得上述调度方法与待调度终端设备的数量更具无关性,使得调度的兼容性更好。
当然,在具体实现时,当输出只有一个被调度权重的时候可以省略对被调度权重的进一步确定调度结果的过程,本申请对此不作限定。
在一种可能的实施例中,在上述图6所示的调度方法中,在步骤601中,网络设备可以基于K个待调度终端设备的第一信息状态集合(也即图4所示的实施例中涉及的第一状态信息集合)和系统的状态信息集合得到一个第八状态信息集合。其中,得到所述第八状态信息集合的方法与得到所述第五状态信息集合的方法原理相同,可以相互参见。此时,得到的所述第八状态信息集合的维度可以是F+J。进一步地,在步骤602中,将所述第八状态信息集合输入所述第三神经网络模型得到的权重集合的个数则与所述F相同。然后,在步骤603的执行过程中,所述网络设备基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第八状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。最后在执行步骤604。应理解,上述过程中的原理均与图6所示的调度方法的原理相同,可以互相参见,此处不再详细描述。
需要说明的是,在图4所示的实施例中涉及的训练过程,以及在图6所示的实施例中谁的训练过程均与待调度终端设备的数量无关,增加了训练的灵活性。上述训练过程均可以是在先训练过程,也可以是离线训练过程。示例性的,上述两个训练过程可以总结为以下离线(在线)训练过程:
步骤1、初始化策略神经网络π
θ和价值神经网络V
φ,θ为策略神经网络待训练的系数,φ为价值神经网络待训练的系数。
步骤2、获取时刻t所有待调度终端设备的状态信息集合s
t,根据策略神经网络π
θ获得动作a
t,并实施调度。
步骤3、获取所有待调度终端设备下一个时刻(t+1时刻)的状态信息集合s
t+1,并获得奖励r
t。
步骤4、保存{s
t,a
t,r
t,s
t+1}作为训练样本。
步骤5、重复步骤2-4,积累一批训练样本(batch size个)训练样本后进行神经网络的更新,更新步骤如下:
策略神经网络的目标函数为J
θ=∑
i(R
i-V
φ(s
i))logπ
θ(a
i|s
i),价值神经网络的损失函数为L
φ=∑
i(R
i-V
φ(s
i))
2,其中回报R=r
t+γV(s
t+1)。其中,i是训练样本在这一批数据中的序号,γ是折扣因子,V()表示价值神经网络的输出,即值。
在在线训练(学习)中,小区内终端设备数量改变是个常见的问题,传统强化学习调度的神经网络与训练样本各参数的维度相关,故无法进行高效地在线学习,即步骤5中的多个训练样本必须保证终端设备数量相同。在本申请的技术方案中,由于训练样本将进一步分解为与终端设备数量无关的状态并输入神经网络,故可以有效利用各种终端设备数量不同的训练样本,即步骤5中保存多个训练样本{s
t,a
t,r
t,s
t+1}可以是网络设备在不同终端设备数量的情况下产生的样本。
此外,在一些极端场景,网络设备不能获取训练样本、或者获取的训练样本非常少,传统强化学习调度在这种场景下无法进行训练。在这种情况下,基于本申请技术方案中训练过程的用户无关性,其他小区的网络设备可以共享其训练样本,使该网络设备能完成训练。
基于上述实施例,以比例公平(proportional fair,PF)算法作为基线,采用5UE的配置训练神经网络,采用本申请提供的方案分别验证了5、10、20、50UE情况下的吞吐、公平性及丢包率性能分别如图8、图9、图10、图11所示。通过图8-图11可以看出,使用5UE情况下训练的策略同样可以适用于10UE、20UE、50UE,并且能保持稳定的性能增益。因此,可以说明,本申请提供的调度方法可以实现将神经网络在调度问题上与用户设备数量解耦,从而解决深度强化学习调度算法用户设备数量自适应困难的问题。
上述涉及的其中PF算法可以实现吞吐和公平性较好的折中,因此应用比较广泛。下面以PF算法为例,介绍基于确定模型和公式的调度算法。PF算法可以按照以下公式选择被调度用户:
其中,R
i(t)为用户i在时刻t的估计吞吐,它由信道条件、用户的缓存情况等因素决定,而T
i(t)为用户i在时刻t时的历史累积吞吐。由此可见,R
i(t)/T
i(t)是一种吞吐和公平性兼顾的度量:当前估计吞吐R
i(t)越大,说明该用户信道条件较好,且缓存中又足够的数据需要发送,因此度量值也越大;同时,累积吞吐T
i(t)越大,说明该用户已经发送的数据量越多,为了公平起见,应减少其发送机会,因此度量值越小。通过选择度量值最大的用户进行调度即实现了吞吐和公平性的折中。
但是,由于通信系统的复杂性,用闭式的模型和公式对其进行精确建模是不可能的。因此PF算法等基于公式的调度算法并不能保证其最优性。因此,本申请提供的调度算法在应用上比较广泛,并且可扩展性比较好。
基于以上实施例,本申请实施例还提供了一种调度装置,该调度装置可以是网络设备,也可以是网络设备中的装置(例如,芯片或者芯片系统或芯片组或芯片中用于执行相关方法功能的一部分),或者是能够和网络设备匹配使用的装置。该调度装置可以应用于如图2所示的通信系统,用于实现如图4或图6所示的调度方法。一种设计中,该调度装置可以包括执行上述方法实施例中网络设备执行的方法/操作/步骤/动作所一一对应的模块,该模块可以是硬件电路,也可是软件,也可以是硬件电路结合软件实现。例如,参阅图12所示,该调度装置1200可以包括:第一处理单元1201、第二处理单元1202和通信单元1203。其中,所述第一处理单元1201和所述第二处理单元1202可以合成为一个处理单元,合成的该处理单元可以同时具有所述第一处理单元1201和所述第二处理单元1202的功能。需要说明的是,在本申请中某某单元也可以称为某某模块或其他,本申请对命名不作限定。
在一个实施例中,当所述调度装置1200用于执行上述图4所述的调度方法中网络设备的操作时:
所述第一处理单元1201,用于对K个待调度终端设备的第一状态信息集合进行处理,得到K个待调度终端设备的第二状态信息集合,K为大于或者等于1的整数;其中,任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述K个待调度终端设备中任一个待调度终端设备的第二状态信息集合的维度为H,H为大于或者等于1的整数;所述第二处理单元1202,用于分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定所述每个待调度终端设备的被调度权重,得到K个被调度权重;所述第一神经网络模型基于所述H确定;以及根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备;所述通信单元1203,用于输出所述调度结果。
示例性的,任一个待调度终端设备的第一状态信息集合包括以下至少一项状态信息:终端设备的瞬时估计吞吐量,终端设备的平均吞吐量,终端设备的缓存大小,终端设备的包等待时间。
在一种具体的实施方式中,所述第二处理单元1202,在根据所述K个被调度权重确定调度结果时,具体用于:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
在一种可选的实施方式中,所述第二处理单元1202还用于将第三状态信息集合输入第二神经网络模型,得到所述第三状态信息集合对应的值,所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数;其中,所述第三状态信息集合的维度为所述H,所述第二神经网络模型基于所述H确定。
示例性的,所述第三状态信息集合是所述第二处理单元1202对所述K个待调度终端设备的第二状态信息集合进行处理得到的。
具体的,所述第二处理单元1202在对所述K个待调度终端设备的第二状态信息集合 进行处理得到所述第三状态信息集合时,具体用于:针对所述K个待调度终端设备的第二状态信息集合中的每一项状态信息取平均值,得到所述第三状态信息集合;或者,选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最大值,组成所述第三状态信息集合;或者,选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最小值,组成所述第三状态信息集合。
一种可能的实现方式中,所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型基于所述H确定,包括:所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型的输入的维度与所述H相关。
在另一个实施例中,当所述调度装置1200用于执行上述图6所述的调度传输方法中网络设备的操作时:
所述第一处理单元1201,用于基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,K为大于或者等于1的整数;所述K个待调度终端设备中任一个待调度终端设备的第一状态信息集合的维度为L,L为大于或者等于1的整数;任一个待调度终端设备的第四状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述第二处理单元1202,用于将所述第五状态信息集合输入第三神经网络模型,确定一个权重集合;所述第三神经网络模型基于所述H确定;所述权重集合包含的权重个数与所述H相同;基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,得到K个被调度权重;以及根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备;所述通信单元1203,用于输出所述调度结果。
示例性的,所述系统的状态信息集合包括以下至少一项状态信息:系统的平均吞吐量、系统公平性、系统丢包率。
在一种具体的实施方式中,所述第一处理单元1201在基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合时,具体用于:对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合;将所述第六状态信息集合与所述系统的状态信息集合组合成所述第五状态信息集合。
在一种可选的实施方式中,所述第一处理单元1201在对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合时,具体用于:针对所述K个待调度终端设备的第四状态信息集合中的每一项状态信息取平均值,得到所述第六状态信息集合;或者,选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最大值,组成所述第六状态信息集合;或者,选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最小值,组成所述第六状态信息集合。
示例性的,所述第二处理单元1202还用于:将第七状态信息集合输入第四神经网络模型,得到所述第七状态信息集合对应的值,所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数;其中,所述第七状态信息集合的维度与所述第五状态信息集合的维度相同,所述第四神经网络模型基于所述L确定。
可选的,所述第七状态信息集合与所述第五状态信息集合相同。
在一种可能的实现方式中,所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型基于所述L确定,可以包括:所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型的输入的维度与所述L相关。
一种可选的实施方式,所述第二处理单元1202在基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重时,具体用于:基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第四状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。
示例性的,所述第二处理单元1202在根据所述K个被调度权重确定调度结果时,具体用于:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
基于以上实施例,本申请实施例还提供了一种调度装置,该调度装置可以是网络设备,也可以是网络设备中的装置(例如,芯片或者芯片系统或芯片组或芯片中用于执行相关方法功能的一部分),或者是能够和网络设备匹配使用的装置。其中,该调度装置可以为芯片系统。本申请实施例中,芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。该调度装置可以应用于如图2所示的通信系统,用于实现如图4或图6所示的调度方法。例如,参阅图13所示,所述调度装置1300可以包括:至少一个处理器1302,可选的,还可以把总括通信接口1301和/或存储器1303。
其中,所述处理器1302可以是中央处理器(central processing unit,CPU),网络处理器(network processor,NP)或者CPU和NP的组合等等。所述处理器1302还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。所述处理器1302在实现上述功能时,可以通过硬件实现,当然也可以通过硬件执行相应的软件实现。
在本申请实施例中,通信接口1301可以是收发器、电路、总线、模块或其它类型的通信接口,用于通过传输介质和其它设备进行通信。所述存储器1303,与所述处理器1302耦合,用于存放所述调度装置1300必要的程序等。例如,程序可以包括程序代码,该程 序代码包括计算机操作指令。
所述存储器1303可能包括RAM,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。所述处理器1302执行所述存储器1303所存放的应用程序,实现所述调度装置1300的功能。本申请实施例中的耦合是装置、单元或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式,用于装置、单元或模块之间的信息交互。
本申请实施例中不限定上述通信接口1301、处理器1302以及存储器1302之间的具体连接介质。本申请实施例在图13中以通信接口1301、处理器1302以及存储器1302之间通过总线1304连接,所述总线1304可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图13中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
在一个实施例中,当所述调度装置1300用于执行上述图4所述的调度方法中网络设备的操作时:
所述处理器1302,用于对K个待调度终端设备的第一状态信息集合进行处理,得到K个待调度终端设备的第二状态信息集合,K为大于或者等于1的整数;其中,任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述K个待调度终端设备中任一个待调度终端设备的第二状态信息集合的维度为H,H为大于或者等于1的整数;分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定所述每个待调度终端设备的被调度权重,得到K个被调度权重;所述第一神经网络模型基于所述H确定;以及根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备。
示例性的,任一个待调度终端设备的第一状态信息集合包括以下至少一项状态信息:终端设备的瞬时估计吞吐量,终端设备的平均吞吐量,终端设备的缓存大小,终端设备的包等待时间。
在一种具体的实施方式中,所述处理器1302,在根据所述K个被调度权重确定调度结果时,具体用于:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
在一种可选的实施方式中,所述处理器1302还用于将第三状态信息集合输入第二神经网络模型,得到所述第三状态信息集合对应的值,所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数;其中,所述第三状态信息集合的维度为所述H,所述第二神经网络模型基于所述H确定。
示例性的,所述第三状态信息集合是所述处理器1302对所述K个待调度终端设备的第二状态信息集合进行处理得到的。
具体的,所述处理器1302在对所述K个待调度终端设备的第二状态信息集合进行处理得到所述第三状态信息集合时,具体用于:针对所述K个待调度终端设备的第二状态信息集合中的每一项状态信息取平均值,得到所述第三状态信息集合;或者,选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最大值,组成所述第三状态信 息集合;或者,选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最小值,组成所述第三状态信息集合。
一种可能的实现方式中,所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型基于所述H确定,包括:所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型的输入的维度与所述H相关。
在另一个实施例中,当所述调度装置1300用于执行上述图6所述的调度传输方法中网络设备的操作时:
所述处理器1302,用于基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,K为大于或者等于1的整数;所述K个待调度终端设备中任一个待调度终端设备的第一状态信息集合的维度为L,L为大于或者等于1的整数;任一个待调度终端设备的第四状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;将所述第五状态信息集合输入第三神经网络模型,确定一个权重集合;所述第三神经网络模型基于所述H确定;所述权重集合包含的权重个数与所述H相同;基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,得到K个被调度权重;以及根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备。
示例性的,所述系统的状态信息集合包括以下至少一项状态信息:系统的平均吞吐量、系统公平性、系统丢包率。
在一种具体的实施方式中,所述处理器1302在基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合时,具体用于:对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合;将所述第六状态信息集合与所述系统的状态信息集合组合成所述第五状态信息集合。
在一种可选的实施方式中,所述处理器1302在对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合时,具体用于:针对所述K个待调度终端设备的第四状态信息集合中的每一项状态信息取平均值,得到所述第六状态信息集合;或者,选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最大值,组成所述第六状态信息集合;或者,选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最小值,组成所述第六状态信息集合。
示例性的,所述处理器1302还用于:将第七状态信息集合输入第四神经网络模型,得到所述第七状态信息集合对应的值,所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数;其中,所述第七状态信息集合的维度与所述第五状态信息集合的维度相同,所述第四神经网络模型基于所述L确定。
可选的,所述第七状态信息集合与所述第五状态信息集合相同。
在一种可能的实现方式中,所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型基于所述L确定,可以包括:所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型的输入的维度与所述L相关。
一种可选的实施方式,所述处理器1302在基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重时,具体用于:基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第四状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。
示例性的,所述处理器1302在根据所述K个被调度权重确定调度结果时,具体用于:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者,将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
在一种实施例中,调度装置1200和调度装置1300具体是芯片或者芯片系统时,通信单元1203和通信接口1301所输出或接收的可以是基带信号形式的信息。例如,调度装置1200和调度装置1300在实现网络设备的功能时,通信接口1301接收到的是承载信息的基带信号。
在一种实施例中,调度装置1200和调度装置1300具体是设备时,通信接口1301所输出或接收的可以是射频信号。例如,调度装置1200和调度装置1300在实现网络设备的功能时,通信接口1301接收到的是承载信息的射频信号。
图14是本申请实施例提供的一种网络设备的结构示意图,如可以为基站的结构示意图。如图14所示,该网络可应用于如图2所示的通信系统中,执行上述图4或图6所述方法实施例中网络设备的功能。基站1400可包括一个或多个分布单元(distributed unit,DU)1401和一个或多个集中单元(centralized unit,CU)1402。所述DU 1401可以包括至少一个天线14011,至少一个射频单元14012,至少一个处理器14017和至少一个存储器14014。所述DU 1401部分主要用于射频信号的收发以及射频信号与基带信号的转换,以及部分基带处理。CU1402可以包括至少一个处理器14022和至少一个存储器14021。CU1402和DU1401之间可以通过接口进行通信,其中,控制面(Control plan)接口可以为Fs-C,比如F1-C,用户面(User Plan)接口可以为Fs-U,比如F1-U。
所述CU 1402部分主要用于进行基带处理,对基站进行控制等。所述DU 1401与CU 1402可以是物理上设置在一起,也可以物理上分离设置的,即分布式基站。所述CU 1402为基站的控制中心,也可以称为处理单元,主要用于完成基带处理功能。例如所述CU 1402可以用于控制基站执行上述图4或图6所述方法实施例中关于网络设备的操作流程。
具体的,CU和DU上的基带处理可以根据无线网络的协议层划分,例如PDCP层及以上协议层的功能设置在CU,PDCP以下的协议层,例如RLC层和MAC层等的功能设置在DU。又例如,CU实现RRC,PDCP层的功能,DU实现RLC、MAC和物理(physical,PHY)层的功能。
此外,可选的,基站1400可以包括一个或多个射频单元(RU),一个或多个DU和一个或多个CU。其中,DU可以包括至少一个处理器14017和至少一个存储器14014,RU可以包括至少一个天线14011和至少一个射频单元14012,CU可以包括至少一个处理器14022和至少一个存储器14021。
在一个实例中,所述CU1402可以由一个或多个单板构成,多个单板可以共同支持单一接入指示的无线接入网(如5G网),也可以分别支持不同接入制式的无线接入网(如LTE网,5G网或其他网)。所述存储器14021和处理器14022可以服务于一个或多个单板。也就是说,可以每个单板上单独设置存储器和处理器。也可以是多个单板共用相同的存储器和处理器。此外每个单板上还可以设置有必要的电路。所述DU1401可以由一个或多个单板构成,多个单板可以共同支持单一接入指示的无线接入网(如5G网),也可以分别支 持不同接入制式的无线接入网(如LTE网,5G网或其他网)。所述存储器14014和处理器14017可以服务于一个或多个单板。也就是说,可以每个单板上单独设置存储器和处理器。也可以是多个单板共用相同的存储器和处理器。此外每个单板上还可以设置有必要的电路。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被调度装置执行时,使得该调度装置实现上述调度方法。示例性的,计算机可读存储介质可以是计算机能够存取的任何可用介质。以此为例但不限于:计算机可读介质可以包括非瞬态计算机可读介质、RAM、ROM、EEPROM、CD-ROM或其他光盘存储、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质。
本申请实施例还提供了一种计算机程序产品,该计算机程序产品被通信装置执行时,使得该通信装置实现上述调度方法。示例性的,计算机程序产品可以是包括非暂时性计算机可读介质的计算机程序产品。
综上所述,本申请提供了一种调度方法及装置,在调度过程中可以使神经网络模型被所有待调度终端设备共享使用,该神经网络模型可以适用所有的待调度终端设备,从而可以达到调度时神经网络模型与待调度终端设备的数量解耦,可以将该神经网络模型应用到待调度终端设备数量不同的场景中,具有较好的自适应性和可扩展性。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (34)
- 一种调度方法,其特征在于,包括:对K个待调度终端设备的第一状态信息集合进行处理,得到K个待调度终端设备的第二状态信息集合,K为大于或者等于1的整数;其中,任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述K个待调度终端设备中任一个待调度终端设备的第二状态信息集合的维度为H,H为大于或者等于1的整数;分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定所述每个待调度终端设备的被调度权重,得到K个被调度权重;所述第一神经网络模型基于所述H确定;根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备。
- 如权利要求1所述的方法,其特征在于,任一个待调度终端设备的第一状态信息集合包括以下至少一项状态信息:终端设备的瞬时估计吞吐量,终端设备的平均吞吐量,终端设备的缓存大小,终端设备的包等待时间。
- 如权利要求1或2所述的方法,其特征在于,根据所述K个被调度权重确定调度结果,包括:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
- 如权利要求1-3任一项所述的方法,其特征在于,所述方法还包括:将第三状态信息集合输入第二神经网络模型,得到所述第三状态信息集合对应的值,所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数;其中,所述第三状态信息集合的维度为所述H,所述第二神经网络模型基于所述H确定。
- 如权利要求4所述的方法,其特征在于,所述第三状态信息集合是对所述K个待调度终端设备的第二状态信息集合进行处理得到的。
- 如权利要求5所述的方法,其特征在于,对所述K个待调度终端设备的第二状态信息集合进行处理得到所述第三状态信息集合,包括:针对所述K个待调度终端设备的第二状态信息集合中的每一项状态信息取平均值,得到所述第三状态信息集合;或者选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最大值,组成所述第三状态信息集合;或者选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最小值,组成所述第三状态信息集合。
- 如权利要求1、4-6任一项所述的方法,其特征在于,所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型基于所述H确定,包括:所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型的输入的维度与所述H相关。
- 一种调度方法,其特征在于,包括:基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,K为大于或者等于1的整数;所述K个待调度终端设备中任一个待调度终端设备的第四状态信息集合的维度为L,L为大于或者等于1的整数;任一个待调度终端设备的第四状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;将所述第五状态信息集合输入第三神经网络模型,确定一个权重集合;所述第三神经网络模型基于所述L确定;所述权重集合包含的权重个数与所述L相同;基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,得到K个被调度权重;根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备。
- 如权利要求8所述的方法,其特征在于,所述系统的状态信息集合包括以下至少一项状态信息:系统的平均吞吐量、系统公平性、系统丢包率。
- 如权利要求8或9所述的方法,其特征在于,基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,包括:对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合;将所述第六状态信息集合与所述系统的状态信息集合组合成所述第五状态信息集合。
- 如权利要求10所述的方法,其特征在于,对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合,包括:针对所述K个待调度终端设备的第四状态信息集合中的每一项状态信息取平均值,得到所述第六状态信息集合;或者选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最大值,组成所述第六状态信息集合;或者选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最小值,组成所述第六状态信息集合。
- 如权利要求8-11任一项所述的方法,其特征在于,所述方法还包括:将第七状态信息集合输入第四神经网络模型,得到所述第七状态信息集合对应的值,所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数;其中,所述第七状态信息集合的维度与所述第五状态信息集合的维度相同,所述第四神经网络模型基于所述L确定。
- 如权利要求12所述的方法,其特征在于,所述第七状态信息集合与所述第五状态信息集合相同。
- 如权利要求8、12或13所述的方法,其特征在于,所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型基于所述L确定,包括:所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型的输入的维度与所述L相关。
- 如权利要求8-14任一项所述的方法,其特征在于,基于所述权重集合确定所述K 个待调度终端设备中每个待调度终端设备的被调度权重,包括:基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第四状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。
- 如权利要求8-15任一项所述的方法,其特征在于,根据所述K个被调度权重确定调度结果,包括:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
- 一种调度装置,其特征在于,包括:第一处理单元,用于对K个待调度终端设备的第一状态信息集合进行处理,得到K个待调度终端设备的第二状态信息集合,K为大于或者等于1的整数;其中,任一个待调度终端设备的第二状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;所述K个待调度终端设备中任一个待调度终端设备的第二状态信息集合的维度为H,H为大于或者等于1的整数;第二处理单元,用于分别将每个待调度终端设备的第二状态信息集合输入第一神经网络模型,确定所述每个待调度终端设备的被调度权重,得到K个被调度权重;所述第一神经网络模型基于所述H确定;以及根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备;通信单元,用于输出所述调度结果。
- 如权利要求17所述的装置,其特征在于,任一个待调度终端设备的第一状态信息集合包括以下至少一项状态信息:终端设备的瞬时估计吞吐量,终端设备的平均吞吐量,终端设备的缓存大小,终端设备的包等待时间。
- 如权利要求17或18所述的装置,其特征在于,所述第二处理单元,在根据所述K个被调度权重确定调度结果时,具体用于:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
- 如权利要求17-19任一项所述的装置,其特征在于,所述第二处理单元,还用于:将第三状态信息集合输入第二神经网络模型,得到所述第三状态信息集合对应的值,所述值用于更新所述第一神经网络模型和所述第二神经网络模型的模型参数;其中,所述第三状态信息集合的维度为所述H,所述第二神经网络模型基于所述H确定。
- 如权利要求20所述的装置,其特征在于,所述第三状态信息集合是所述第二处理单元对所述K个待调度终端设备的第二状态信息集合进行处理得到的。
- 如权利要求21所述的装置,其特征在于,所述第二处理单元,在对所述K个待调度终端设备的第二状态信息集合进行处理得到所述第三状态信息集合时,具体用于:针对所述K个待调度终端设备的第二状态信息集合中的每一项状态信息取平均值,得到所述第三状态信息集合;或者选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最大值,组成所述第三状态信息集合;或者选取所述K个待调度终端设备的第二状态信息集合中的每一项状态信息的最小值,组成所述第三状态信息集合。
- 如权利要求17、20-22任一项所述的装置,其特征在于,所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型基于所述H确定,包括:所述第一神经网络模型和所述第二神经网络模型中的任一个神经网络模型的输入的维度与所述H相关。
- 一种调度装置,其特征在于,包括:第一处理单元,用于基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合,K为大于或者等于1的整数;所述K个待调度终端设备中任一个待调度终端设备的第四状态信息集合的维度为L,L为大于或者等于1的整数;任一个待调度终端设备的第四状态信息集合包含所述任一个待调度终端设备的状态信息以及所述任一个待调度终端设备与其他待调度终端设备之间的状态关联数据;第二处理单元,用于将所述第五状态信息集合输入第三神经网络模型,确定一个权重集合;所述第三神经网络模型基于所述L确定;所述权重集合包含的权重个数与所述H相同;基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重,得到K个被调度权重;以及根据所述K个被调度权重确定调度结果,所述调度结果指示被调度的终端设备;通信单元,用于输出所述调度结果。
- 如权利要求24所述的装置,其特征在于,所述系统的状态信息集合包括以下至少一项状态信息:系统的平均吞吐量、系统公平性、系统丢包率。
- 如权利要求24或25所述的装置,其特征在于,所述第一处理单元,在基于K个待调度终端设备的第四状态信息集合和系统的状态信息集合得到一个第五状态信息集合时,具体用于:对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合;将所述第六状态信息集合与所述系统的状态信息集合组合成所述第五状态信息集合。
- 如权利要求26所述的装置,其特征在于,所述第一处理单元,在对所述K个待调度终端设备的第四状态信息集合进行处理,得到一个第六状态信息集合时,具体用于:针对所述K个待调度终端设备的第四状态信息集合中的每一项状态信息取平均值,得到所述第六状态信息集合;或者选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最大值,组成所述第六状态信息集合;或者选取所述K个待调度终端设备的第四状态信息集合中的每一项状态信息的最小值,组成所述第六状态信息集合。
- 如权利要求24-27任一项所述的装置,其特征在于,所述第二处理单元,还用于:将第七状态信息集合输入第四神经网络模型,得到所述第七状态信息集合对应的值,所述值用于更新所述第三神经网络模型和所述第四神经网络模型的模型参数;其中,所述第七状态信息集合的维度与所述第五状态信息集合的维度相同,所述第四神经网络模型基于所述L确定。
- 如权利要求28所述的装置,其特征在于,所述第七状态信息集合与所述第五状态信息集合相同。
- 如权利要求24、28或29所述的装置,其特征在于,所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型基于所述L确定,包括:所述第三神经网络模型和所述第四神经网络模型中的任一个神经网络模型的输入的维度与所述L相关。
- 如权利要求24-30任一项所述的装置,其特征在于,所述第二处理单元,在基于所述权重集合确定所述K个待调度终端设备中每个待调度终端设备的被调度权重时,具体用于:基于所述权值集合分别对所述K个待调度终端设备中每个待调度终端设备的第四状态信息集合中的每个状态信息的值进行加权求和,得到所述每个待调度终端设备的被调度权重。
- 如权利要求24-31任一项所述的装置,其特征在于,所述第二处理单元,在根据所述K个被调度权重确定调度结果时,具体用于:将所述K个被调度权重中最大的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中最小的被调度权重对应的终端设备的标识作为所述调度结果;或者将所述K个被调度权重中的一个被调度权重处理成第一值,将剩余K-1个被调度权重处理成第二值,将处理后的第一值和K-1个第二值组成的序列为所述调度结果,其中,所述调度结果中第一值对应的终端设备为被调度的终端设备。
- 一种调度装置,其特征在于,包括处理器,当所述处理器执行存储器中的计算机程序或指令时,执行如权利要求1-7任一项所述的方法,或者执行如权利要求8-16任一项所述的方法,其中,所述调度装置与所述存储器相连,或者,所述调度装置包括所述存储器。
- 一种计算机可读存储介质,其特征在于,包括程序或指令,当其在计算机上运行时,使得计算机执行如权利要求1-7任一项所述的方法,或者执行如权利要求8-16任一项所述的方法。
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP20899755.1A EP4057751B1 (en) | 2019-12-13 | 2020-11-05 | Scheduling method and apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201911285268.3A CN112996125B (zh) | 2019-12-13 | 2019-12-13 | 一种调度方法及装置 |
| CN201911285268.3 | 2019-12-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021114968A1 true WO2021114968A1 (zh) | 2021-06-17 |
Family
ID=76329563
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2020/126756 Ceased WO2021114968A1 (zh) | 2019-12-13 | 2020-11-05 | 一种调度方法及装置 |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP4057751B1 (zh) |
| CN (1) | CN112996125B (zh) |
| WO (1) | WO2021114968A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119031040A (zh) * | 2024-08-29 | 2024-11-26 | 中国移动通信集团北京有限公司 | 分布式业务流量调度方法、装置、设备、介质及程序产品 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113766538A (zh) * | 2021-08-19 | 2021-12-07 | 深圳技术大学 | NB-IoT无线资源分配方法、装置及可读介质 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160321537A1 (en) * | 2014-03-28 | 2016-11-03 | International Business Machines Corporation | Consolidating multiple neurosynaptic core circuits into one reconfigurable memory block |
| CN109005060A (zh) * | 2018-08-02 | 2018-12-14 | 上海交通大学 | 一种基于层级化高度异构分布式系统的深度学习应用优化框架 |
| CN109982434A (zh) * | 2019-03-08 | 2019-07-05 | 西安电子科技大学 | 无线资源调度一体智能化控制系统及方法、无线通信系统 |
| CN109996247A (zh) * | 2019-03-27 | 2019-07-09 | 中国电子科技集团公司信息科学研究院 | 网络化资源调配方法、装置、设备及存储介质 |
| CN110096526A (zh) * | 2019-04-30 | 2019-08-06 | 秒针信息技术有限公司 | 一种用户属性标签的预测方法及预测装置 |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8675511B2 (en) * | 2008-12-10 | 2014-03-18 | Qualcomm Incorporated | List elimination for distributed downlink coordinated multi-point (CoMP) framework |
| US8588801B2 (en) * | 2009-08-21 | 2013-11-19 | Qualcomm Incorporated | Multi-point equalization framework for coordinated multi-point transmission |
| CN106973413B (zh) * | 2017-03-28 | 2020-04-28 | 重庆理工大学 | 面向无线传感器网络的自适应QoS控制方法 |
| CN110264272A (zh) * | 2019-06-21 | 2019-09-20 | 山东师范大学 | 一种移动互联网劳务众包平台任务最优定价预测方法、装置及系统 |
| CN112888076B (zh) * | 2019-11-29 | 2023-10-24 | 华为技术有限公司 | 一种调度方法及装置 |
-
2019
- 2019-12-13 CN CN201911285268.3A patent/CN112996125B/zh active Active
-
2020
- 2020-11-05 EP EP20899755.1A patent/EP4057751B1/en active Active
- 2020-11-05 WO PCT/CN2020/126756 patent/WO2021114968A1/zh not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160321537A1 (en) * | 2014-03-28 | 2016-11-03 | International Business Machines Corporation | Consolidating multiple neurosynaptic core circuits into one reconfigurable memory block |
| CN109005060A (zh) * | 2018-08-02 | 2018-12-14 | 上海交通大学 | 一种基于层级化高度异构分布式系统的深度学习应用优化框架 |
| CN109982434A (zh) * | 2019-03-08 | 2019-07-05 | 西安电子科技大学 | 无线资源调度一体智能化控制系统及方法、无线通信系统 |
| CN109996247A (zh) * | 2019-03-27 | 2019-07-09 | 中国电子科技集团公司信息科学研究院 | 网络化资源调配方法、装置、设备及存储介质 |
| CN110096526A (zh) * | 2019-04-30 | 2019-08-06 | 秒针信息技术有限公司 | 一种用户属性标签的预测方法及预测装置 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4057751A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119031040A (zh) * | 2024-08-29 | 2024-11-26 | 中国移动通信集团北京有限公司 | 分布式业务流量调度方法、装置、设备、介质及程序产品 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4057751B1 (en) | 2025-05-21 |
| EP4057751A1 (en) | 2022-09-14 |
| CN112996125A (zh) | 2021-06-18 |
| EP4057751A4 (en) | 2023-01-18 |
| CN112996125B (zh) | 2023-04-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111698789B (zh) | 通信系统中的调度方法、装置及存储介质 | |
| CN112888076B (zh) | 一种调度方法及装置 | |
| WO2023093725A1 (zh) | 一种校准方法及装置 | |
| WO2021114968A1 (zh) | 一种调度方法及装置 | |
| US20240097993A1 (en) | Machine Learning Model Distribution | |
| US20250008350A1 (en) | Communication method and communication apparatus | |
| WO2022095805A1 (zh) | 一种调度终端的方法及装置 | |
| CN120266445A (zh) | 通信方法、装置及存储介质 | |
| WO2023185890A1 (zh) | 一种数据处理方法及相关装置 | |
| CN120835269A (zh) | 一种通信方法及相关装置 | |
| CN118804304A (zh) | 一种资源分配方法、装置、通信设备及可读存储介质 | |
| CN120238399A (zh) | Ai单元的量化对齐方法、装置及相关设备 | |
| CN120934716A (zh) | 一种通信方法及相关装置 | |
| CN120856577A (zh) | 一种通信方法及相关装置 | |
| CN121240068A (zh) | 能力上报方法、能力确定方法及装置 | |
| CN121077596A (zh) | 通信的方法和通信装置 | |
| CN120711413A (zh) | 一种通信方法及相关装置 | |
| WO2025227698A1 (zh) | 一种通信方法及相关装置 | |
| EP4562552A1 (en) | Decentralized learning based on activation function | |
| CN120659081A (zh) | 一种通信方法及相关装置 | |
| CN120881513A (zh) | 无线电地图获取方法和装置 | |
| CN120659066A (zh) | 一种通信方法及相关装置 | |
| CN121052291A (zh) | 一种通信方法及装置 | |
| CN120547585A (zh) | 一种通信方法及相关设备 | |
| CN121014225A (zh) | 一种通信方法及相关设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20899755 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2020899755 Country of ref document: EP Effective date: 20220610 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWG | Wipo information: grant in national office |
Ref document number: 2020899755 Country of ref document: EP |