CN116170052A - Hybrid non-orthogonal\orthogonal multiple access satellite virtualization intelligent scheduling method - Google Patents
Hybrid non-orthogonal\orthogonal multiple access satellite virtualization intelligent scheduling method Download PDFInfo
- Publication number
- CN116170052A CN116170052A CN202211568877.1A CN202211568877A CN116170052A CN 116170052 A CN116170052 A CN 116170052A CN 202211568877 A CN202211568877 A CN 202211568877A CN 116170052 A CN116170052 A CN 116170052A
- Authority
- CN
- China
- Prior art keywords
- network
- satellite
- value
- orthogonal
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/1851—Systems using a satellite or space-based relay
- H04B7/18513—Transmission in a satellite or space-based system
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B7/00—Radio transmission systems, i.e. using radiation field
- H04B7/14—Relay systems
- H04B7/15—Active relay systems
- H04B7/185—Space-based or airborne stations; Stations for satellite systems
- H04B7/1851—Systems using a satellite or space-based relay
- H04B7/18519—Operations control, administration or maintenance
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W16/00—Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
- H04W16/02—Resource partitioning among network components, e.g. reuse partitioning
- H04W16/10—Dynamic resource partitioning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/18—Negotiating wireless communication parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W28/00—Network traffic management; Network resource management
- H04W28/16—Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
- H04W28/18—Negotiating wireless communication parameters
- H04W28/22—Negotiating communication rate
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- Astronomy & Astrophysics (AREA)
- Aviation & Aerospace Engineering (AREA)
- General Physics & Mathematics (AREA)
- Radio Relay Systems (AREA)
Abstract
The invention discloses a hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method, and relates to the field of satellite communication. The method considers the communication among multiple satellites and multiple users with time delay influence, and improves the sum of the total communication rate during satellite access by adopting a DDQN deep reinforcement learning algorithm to jointly optimize power distribution, NOMA/OMA mode switching, subcarrier selection and user pairing. The invention can self-adapt and adjust the optimization scheme to improve the communication rate under different virtualized network information acquisition conditions such as centralized, mixed centralized/distributed, distributed and the like.
Description
Technical Field
The invention relates to the technical field of satellite communication, in particular to a hybrid Non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA) \orthogonal multiple access (Orthogonal Multiple Access, OMA) satellite virtualization intelligent scheduling method in satellite communication.
Background
Due to the reasons of scarce network resources, strong network dynamic performance, large link time delay and the like of the mobile satellite, the problem of task failure caused by imperfect scheduling can occur in satellite communication during task scheduling. The virtualized scheduling of the communication resources of the mobile satellite platform refers to dynamically adjusting the allocated network resources in the mobile satellite network environment, so that the idle resources can be fully utilized, and the problems of task scheduling failure and the like caused by the shortage of the satellite communication resources and the like are alleviated. However, resource conflicts mapped to different tasks are easily caused in the process of virtualized scheduling. Therefore, a mobile satellite communication virtual resource scheduling algorithm is proposed to address this problem.
In addition, with the rapid development of the satellite internet of things, in order to meet the access requirements of large connection and high spectrum efficiency of the satellite communication network, it is highly desirable to design a more effective multiple access mechanism to meet the performance requirements of the scheduling task. In a conventional satellite communication network, available resources are generally divided into orthogonal resource blocks, and satellite users of each access network occupy one of the resource blocks for data transmission, so that interference among users is avoided, communication quality of the users is guaranteed. When the orthogonal multiple access (Orthogonal Multiple Access, OMA) mode is adopted, the maximum number of users which can be accommodated in the satellite network is equal to the number of orthogonal resources, and the satellite network is limited in a scene of massive user access under the constraint of satellite load resources. Therefore, with the rapid development of the satellite internet of things, in order to meet the requirements of large-scale connection and high-spectrum efficiency access of the satellite internet of things, a new user multiple access mechanism needs to be introduced to improve the resource utilization rate of the new user multiple access mechanism.
The Non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA) technology can simultaneously and co-frequency transmit a plurality of signals through the power domain multiplexing and serial interference cancellation technology (Successive Interference Cancellation, SIC), has higher resource utilization and user fairness than the OMA technology, and is greatly focused by academia and industry. Under the same network resource condition, NOMA can greatly increase the capacity of a wireless communication system, improve the utilization rate of network resources and is widely focused in a 5G communication network. In addition, the NOMA/OMA hybrid use technology can adapt to the change of network conditions, and further improves the reliability of satellite communication transmission. However, the application of the hybrid transmission mode of NOMA/OMA in satellite communication network virtualization scheduling lacks systematic analytical verification and further converged use with the prior art such as power allocation.
Disclosure of Invention
Aiming at the problem of virtualized scheduling, the invention provides a hybrid NOMA/OMA virtualized intelligent scheduling method in satellite communication, which adopts an artificial intelligent technology to analyze environmental parameters based on outdated channel information and optimize a scheduling scheme.
The invention adopts the technical scheme that:
a hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method comprises the following steps:
(1) Defining satellite communication state at time t as s (t), wherein s (t) comprises channel information, satellite position, ground user position and satellite transmitting power of a satellite-user transmission link in a communication environment;
(2) The artificial intelligent Agent starts to select an action a (t) in a random exploration mode, wherein the action refers to joint scheduling of satellite transmitting power distribution, user switching, hybrid non-orthogonal/orthogonal multiple access switching and subcarrier distribution required by a satellite virtualization scheduling mechanism;
(3) The Agent adopts the action in the step (2) to interact with the environment to obtain a preset rewarding value r (t); the prize value is determined by the actual communication rate of the user;
(4) Repeating the steps (1) - (3) within the time T, and collecting corresponding training data { s (T), a (T), r (T), s (t+1) }; then, the Q value is updated by utilizing a double-depth Q network, wherein the Q value is an evaluation value of corresponding actions of each moment state;
(5) Repeating the steps (1) - (4) until convergence, and obtaining a trained Agent;
(6) And (3) under three conditions of centralized, distributed and mixed centralized/distributed, performing iterative training in the modes of steps (1) - (5) respectively to obtain a plurality of agents corresponding to each other, and executing intelligent scheduling of satellite virtualization.
Further, in step (1), the channel information in the state s (t) is outdated channel information considering the time delay, and the description of the outdated channel information is as follows:
wherein ρ is AB Is the Jakes autocorrelation model parameter omega AB Is a circularly symmetric complex Gaussian random variable, h AB Is the actual channel parameter; the virtualized intelligent scheduling method obtains outdated channel information in the step (1), and establishes a relation between the outdated channel information and the actual communication rate in the training of the step (4).
Further, in step (4), the dual depth Q network has two neural networksCollaterals Q A And Q B Wherein network Q A The Q value update in (2) is calculated by the following formula:
Q A (s(t),a(t))=Q A (s(t),a(t))+δ(r (s(t),a(t)) +τQ B (s(t+1),argmax a {Q A (s(t+1),a)})-Q A (s(t),a(t)))
network Q B The Q value update in (2) is calculated by the following formula:
Q B (s(t),a(t))=Q B (s(t),a(t))+δ(r (s(t),a(t)) +τQ A (s(t+1),argmax a {Q B (s(t+1),a)})-Q B (s(t),a(t)))
wherein Q is A (s (t), a (t)) and Q B (s (t), a (t)) respectively represent the network Q A And Q B Q value corresponding to the state-action combination (s (t), a (t)); delta is the learning rate of the dual-depth Q network, and influences the learning speed of the network; r is (r) (s(t),a(t)) A prize value for the corresponding state-action combination (s (t), a (t)); τ is a discount factor affecting the weights of the newly obtained rewards and the Q value of the original strategy in the new strategy respectively; argmax a {. } means selecting action a corresponding to the maximum Q value from {. Times. };
dual depth Q network first exists in current network Q A Find out the action corresponding to the maximum Q value, and then use the obtained action to obtain the network Q B Corresponding Q value in the next state; network Q is then updated by updating the formula A The Q value of (2); q (Q) B At Q A Remains fixed during training and is periodically defined by Q A Updating the entire Q B A network;
the training loss functions of two neural networks in the dual depth Q network are as follows:
wherein T is the number of training data used in training the neural network once;
after convergence, the dual-depth Q network directly adopts actions corresponding to the maximum Q value as scheduling decisions for any satellite communication state.
Further, in step (6), centralized means that a plurality of satellites are scheduled by a unified central control node; the distributed type means that each satellite is controlled independently and does not share information with each other; the hybrid centralized/distributed mode means that part of satellites are uniformly scheduled by a central control node and part of satellites are individually controlled.
The invention has the beneficial effects that:
1. the invention adopts a fusion system of mixed NOMA\OMA and power distribution technology in satellite communication, and adopts artificial intelligence-based deep reinforcement learning technology to solve the problem of high complexity optimization.
2. The invention considers different information acquisition conditions (centralized, mixed centralized/distributed, distributed), introduces the influence of delay channel information, and is hardly influenced by information deletion through analysis of artificial intelligence technology.
Drawings
Fig. 1 is a system model diagram of satellite communication in the present invention.
FIG. 2 is an interaction diagram of an Agent and satellite communication environment in the present invention.
Fig. 3 is a comparison of the performance of the centralized DDQN algorithm and no power allocation versus no subcarrier allocation for different satellite altitudes.
Fig. 4 is a comparison of the performance of DDQN algorithms under different information acquisition conditions (centralized, hybrid centralized/distributed, distributed) for different satellite altitudes.
Detailed Description
In order to make the principle, technical flow scheme and optimization effect of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and examples.
A hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method comprises the following steps:
(1) Defining satellite communication state at time t as s (t), wherein s (t) comprises channel information, satellite position, ground user position and satellite transmitting power of a satellite-user transmission link in a communication environment;
(2) The artificial intelligent Agent starts to select an action a (t) in a random exploration mode, wherein the action refers to joint scheduling of satellite transmitting power distribution, user switching, hybrid non-orthogonal/orthogonal multiple access switching and subcarrier distribution required by a satellite virtualization scheduling mechanism;
(3) The Agent adopts the action in the step (2) to interact with the environment to obtain a preset rewarding value r (t); the prize value is determined by the actual communication rate of the user;
(4) Repeating the steps (1) - (3) within the time T, and collecting corresponding training data { s (T), a (T), r (T), s (t+1) }; then, the Q value is updated by utilizing a double-depth Q network, wherein the Q value is an evaluation value of corresponding actions of each moment state;
(5) Repeating the steps (1) - (4) until convergence, and obtaining a trained Agent;
(6) Considering the problem of cooperation and communication overhead among satellites in actual communication, in three cases of centralization (all satellites have unified central control nodes, all satellite information can be collected, such as ground control stations), distributation (all satellites do not have any direct and indirect communication with each other and do not interact with any information with each other), and hybrid centralization/distributation (part of satellites have central control nodes, other satellites and other satellites do not interact with information in the distributional mode), a plurality of corresponding agents are obtained through iterative training in steps (1) - (5) respectively to execute satellite virtualization intelligent scheduling.
In step (1), the channel information in the state s (t) is outdated channel information considering the time delay, and the description of the outdated channel information is as follows:
wherein ρ is AB Is the Jakes autocorrelation model parameter omega AB Is a circularly symmetric complex Gaussian random variable, h AB Is the actual channel parameter; the virtualized intelligent scheduling method obtains outdated channel information in the step (1), and establishes a relation between the outdated channel information and the actual communication rate in the training of the step (4).
In step (4), the dual deep Q network has two neural networks Q A And Q B Wherein network Q A The Q value update in (2) is calculated by the following formula:
Q A (s(t),a(t))=Q A (s(t),a(t))+δ(r (s(t),a(t)) +τQ B (s(t+1),argmax a {Q A (s(t+1),a)})-Q A (s(t),a(t)))
network Q B The Q value update in (2) is calculated by the following formula:
Q B (s(t),a(t))=Q B (s(t),a(t))+δ(r (s(t),a(t)) +τQ A (s(t+1),argmax a {Q B (s(t+1),a)})-Q B (s(t),a(t)))
wherein Q is A (s (t), a (t)) and Q B (s (t), a (t)) respectively represent the network Q A And Q B Q value corresponding to the state-action combination (s (t), a (t)); delta is the learning rate of the dual-depth Q network, and influences the learning speed of the network; r is (r) (s(t),a(t)) A prize value for the corresponding state-action combination (s (t), a (t)); τ is a discount factor affecting the weights of the newly obtained rewards and the Q value of the original strategy in the new strategy respectively; argmax a {. } means selecting action a corresponding to the maximum Q value from {. Times. };
dual depth Q network first exists in current network Q A Find out the action corresponding to the maximum Q value, and then use the obtained action to obtain the network Q B Corresponding Q value in the next state; network Q is then updated by updating the formula A The Q value of (2); q (Q) B At Q A Remains fixed during training and is periodically defined by Q A Updating the entire Q B A network;
the training loss functions of two neural networks in the dual depth Q network are as follows:
wherein T is the number of training data used in training the neural network once;
after convergence, the dual-depth Q network directly adopts actions corresponding to the maximum Q value as scheduling decisions for any satellite communication state.
In the satellite virtualization technology, the method solves the joint intelligent scheduling problem including satellite transmitting power distribution, user switching, hybrid non-orthogonal/orthogonal multiple access switching and subcarrier distribution through training a plurality of agents in a centralized mode, a distributed mode and a hybrid centralized/distributed mode, and considers the outdated channel information problem caused by time delay.
The following is a specific example:
as shown in fig. 1, it is contemplated that a plurality of Low Earth Orbit Satellites (LEOs) serve a plurality of users in one ground area at the same time. Each LEO satellite has the functions of hybrid NOMA/OMA adaptive transmission mode, subcarrier allocation, power allocation, and at each transmission moment the ground user switches access to the appropriate satellite. Meanwhile, in view of the limited energy of the satellite, the upper energy consumption threshold of the satellite in serving an area is considered in the power distribution. The objective is to develop a virtualized scheduling scheme for optimizing the proposed maximization of satellite-to-ground communication transmission performance by optimizing hybrid NOMA/OMA adaptive transmission modes, subcarrier allocation, power allocation, user-to-satellite access handoff to maximize the sum of the communication capacities of multiple satellites serving all users in the same ground area. Meanwhile, the method considers the problem of cooperation among satellites in actual communication, and the optimization scheme under the conditions of different communication environments of centralized type, distributed type and mixed centralized/distributed type. In addition, considering the time delay problem in actual communication, the method introduces outdated channel information.
Because the scheduling problem is a very complex non-convex mixed integer programming problem, the method adopts a deep reinforcement learning technology to maximize the sum of communication capacity of users in a service area, and learns the influence and the connection of the change of a communication environment, so that the proposed optimization scheme can adapt to three different conditions of centralized, mixed centralized/distributed and distributed, and a mixed NOMA/OMA communication virtualization scheduling technology with multi-LEO cooperation is developed.
In order to realize the method, a training method based on an artificial intelligence technology is adopted, and the specific steps are as follows:
(1) Defining a satellite communication state at a certain t moment as s (t), wherein the state comprises environment parameters required by satellite communication such as channel information, satellite position, ground user position, satellite transmitting power and the like of a satellite-user transmission link in a communication environment;
(2) The artificial intelligent Agent starts to select an action a (t) in a random exploration mode, wherein the action refers to joint scheduling of satellite transmission power allocation, user switching, mixed NOMA/OMA switching and subcarrier allocation required by a satellite virtualization scheduling mechanism;
(3) And (3) the Agent interacts with the environment by adopting the action in the step (2) to obtain a preset rewarding value r (t). The prize value is determined by the actual communication rate of the user. The interaction of an Agent and a communication environment is shown in fig. 2;
(4) Repeating the steps (1) - (3) in the T time, and collecting corresponding training data { s (T), a (T), r (T), s (t+1) }. Then, the Q value is updated using the (Deep Reinforcement Learning with Double Q-learning) DDQN algorithm. The Q value is an evaluation value of the corresponding action for each moment state;
(5) Repeating steps (1) - (4) until the algorithm converges.
In step (1), the channel information in the state is outdated channel information considering time delay, and the description of the outdated channel information is as follows:
wherein ρ is AB Is the Jakes autocorrelation model parameter omega AB Is a circularly symmetric complex Gaussian random variable, h AB Is the actual channel parameter. The scheduling method adopted obtains outdated channel information in the step (1), and establishes a relation between the outdated channel information and the actual communication rate in the training of the step (4).
In addition, in the step (4), the DDQN algorithm has two neural networks Q A And Q B Wherein Q is A The Q value update in the network is calculated by:
Q A (s(t),a(t))=Q A (s(t),a(t))+δ(r (s(t),a(t)) +τQ B (s(t+1),argmax a {Q A (s(t+1),a)})-Q A (s(t),a(t)))
wherein, delta is the learning rate of DDQN, which affects the learning speed of the network. Gamma is a discount factor, affecting the weight of the newly obtained prize and the original policy Q value in the new policy, respectively. DDQN is first at current Q A Finding out the action corresponding to the maximum Q value in the network, and then obtaining the Q value in the network by using the obtained action B Corresponding Q value at the next state. Then update Q by updating the formula A Q value in the network.
Network Q B The updated formula of (c) is as follows:
Q B (s(t),a(t))=Q B (s(t),a(t))+δ(r (s(t),a(t)) +τQ A (s(t+1),argmax a {Q B (s(t+1),a)})-Q B (s(t),a(t)))
considering that the network is to be avoided from being too aggressive in exploring the policy, Q B At Q A Remains fixed during training and is periodically defined by Q A Updating the entire Q B A network. The training loss functions for two neural networks in DDQN are as follows:
the DDQN repeats steps (1) - (4) until the algorithm converges. After convergence, the DDQN algorithm directly adopts actions corresponding to the maximum Q value as scheduling decisions for any satellite communication state.
For the satellite communication network in the method, the environment refers to the whole satellite communication network environment, and the state is the communication necessary data including the channel information of the satellite-user transmission link, the satellite position, the ground user position and the satellite transmitting power obtained from the environment. The actions are joint scheduling of satellite transmission power allocation, user handover, hybrid NOMA/OMA handover, and subcarrier allocation required by the satellite virtualized scheduling mechanism. The reward is then dependent on the actual communication rate obtained after the user has accessed the satellite communication network. Thus, the greater the total rate achieved by the user, the greater the reward.
In the method, a DDQN algorithm is used for specifically realizing the development of a satellite virtualization network scheduling technology. For a satellite communication environment, a state s (t) at a certain time t is defined, an Agent selects an action a (t) in a random exploration mode, the environment changes the state s (t) into a state s (t+1) at the next time after executing the action a (t), and corresponding rewards r (t) are obtained according to feedback of a user rate. Two networks Q are designed in the intelligent agent A And Q B The evaluation value (Q value) of the corresponding s (t) -a (t) state-action combination is updated by using the prize r (t), and the performance of the combination is judged accordingly.
Thus, in the DDQN algorithm, a large number of data sets { s (t), a (t), r (t), s (t+1) } can be generated by random exploration, and then the two networks Q are iteratively updated with each other by updating the Q values A And Q B . Considering that in satellite communication networks, the number of combinations of states and actions is a very large number, it is almost impossible to build a corresponding table to record the Q value. Therefore, a deep neural network is employed instead of a table to record the Q value. The deep neural network can input states and output actions, and the mapping relation of the states and the actions in the neural network is the Q value. The specific algorithm flow is as follows:
1. initializing deep neural network Q A And Q B I.e. initializing the Q value).
2. The cycle is started:
2.1 for state s (t) at time t, action a (t) is obtained using a-network prediction/random search.
2.2 by executing action a (t), get feedback rewards r (t) of the environment and state s (t+1) of the next moment
2.3 generating training data { s (t), a (t), r (t), s (t+1) }
2.4 after each generation of T training data, the existing dataset is used by Q A Update formula update Q of (2) A
2.5Q per training N times A Thereafter, Q A Copy of Q value to Q B 。
3. Terminating the loop after reaching the preset iteration times or convergence targets
The test of the present method is shown in FIGS. 3-4. The test result shows the corresponding performance of the proposed algorithm with no power allocation, no subcarrier allocation and no power no subcarrier allocation under different satellite heights, and the comparison result is convenient. Furthermore, the method provides a centralized, hybrid centralized/distributed and distributed free choice and gives a corresponding rate performance comparison.
From the results in the centralized case, the proposed DDQN-based algorithm in fig. 3 performs best, greatly increasing the overall rate of communication compared to the case of no power allocation (average allocated power). The effect of subcarrier selection is insignificant because for long-range satellite communications, the effect of small-scale fading is much lower than large-scale fading, and thus the gain of subcarrier selection is insignificant compared to power allocation. From the results in fig. 4, the centralized performance is most superior, the hybrid centralized distributed performance of fig. 4 is inferior, and the distributed performance of fig. 5 is the lowest. This conforms to the logic of the scene design, i.e. global optimization must be stronger than the sum of local optimizations.
Therefore, the invention can ensure the communication rate of the multi-low orbit satellite multi-user communication system with the consideration of time delay channel information. Meanwhile, the neural network with low complexity is beneficial to saving calculation resources during real-time calculation.
In summary, the invention considers the communication among multiple satellites and multiple users with time delay effect, and improves the sum of the total communication rate during satellite access by adopting a DDQN deep reinforcement learning algorithm to jointly optimize power distribution, NOMA/OMA mode switching, subcarrier selection and user pairing. The invention can self-adapt and adjust the optimization scheme to improve the communication rate under different virtualized network information acquisition conditions such as centralized, mixed centralized/distributed, distributed and the like.
The embodiment described above is only one specific embodiment of the present invention, and not all embodiments. Based on the embodiments of the present invention, other embodiments that may be obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.
Claims (4)
1. A hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method is characterized by comprising the following steps:
(1) Defining satellite communication state at time t as s (t), wherein s (t) comprises channel information, satellite position, ground user position and satellite transmitting power of a satellite-user transmission link in a communication environment;
(2) The artificial intelligent Agent starts to select an action a (t) in a random exploration mode, wherein the action refers to joint scheduling of satellite transmitting power distribution, user switching, hybrid non-orthogonal/orthogonal multiple access switching and subcarrier distribution required by a satellite virtualization scheduling mechanism;
(3) The Agent adopts the action in the step (2) to interact with the environment to obtain a preset rewarding value r (t); the prize value is determined by the actual communication rate of the user;
(4) Repeating the steps (1) - (3) within the time T, and collecting corresponding training data { s (T), a (T), r (T), s (t+1) }; then, the Q value is updated by utilizing a double-depth Q network, wherein the Q value is an evaluation value of corresponding actions of each moment state;
(5) Repeating the steps (1) - (4) until convergence, and obtaining a trained Agent;
(6) And (3) under three conditions of centralized, distributed and mixed centralized/distributed, performing iterative training in the modes of steps (1) - (5) respectively to obtain a plurality of agents corresponding to each other, and executing intelligent scheduling of satellite virtualization.
2. The hybrid non-orthogonal/orthogonal multiple access virtualized intelligent scheduling method of claim 1, wherein in the step (1), the channel information in the state s (t) is outdated channel information considering time delay, and the description of the outdated channel information is as follows:
wherein ρ is AB Is the Jakes autocorrelation model parameter omega AB Is a circularly symmetric complex Gaussian random variable, h AB Is the actual channel parameter; the virtualized intelligent scheduling method obtains outdated channel information in the step (1), and establishes a relation between the outdated channel information and the actual communication rate in the training of the step (4).
3. The hybrid non-orthogonal/orthogonal multiple access virtualized intelligent scheduling method of claim 1, wherein in step (4), a dual depth Q network has two neural networks Q A And Q B Wherein network Q A The Q value update in (2) is calculated by the following formula:
Q A (s(t),a(t))=Q A (s(t),a(t))+δ(r (s(t),a(t))
+τQ B (s(t+1),argmax a {Q A (s(t+1),a)})
-Q A (s(t),a(t)))
network Q B The Q value update in (2) is calculated by the following formula:
Q B (s(t),a(t))=Q B (s(t),a(t))+δ(r (s(t),a(t))
+τQ A (s(t+1),argmax a {Q B (s(t+1),a)})
-Q B (s(t),a(t)))
wherein Q is A (s (t), a (t)) and Q B (s (t), a (t)) respectively represent the network Q A And Q B Q value corresponding to the state-action combination (s (t), a (t)); delta is the learning rate of the dual-depth Q network, and influences the learning speed of the network; r is (r) (s(t),a(t)) A prize value for the corresponding state-action combination (s (t), a (t)); τ is a discount factor affecting the weights of the newly obtained rewards and the Q value of the original strategy in the new strategy respectively; argmax a {. } means selecting action a corresponding to the maximum Q value from {. Times. };
dual depth Q network first exists in current network Q A Find out the action corresponding to the maximum Q value, and then use the obtained action to obtain the network Q B Corresponding Q value in the next state; network Q is then updated by updating the formula A The Q value of (2); q (Q) B At Q A Remains fixed during training and is periodically defined by Q A Updating the entire Q B A network;
the training loss functions of two neural networks in the dual depth Q network are as follows:
wherein T is the number of training data used in training the neural network once;
after convergence, the dual-depth Q network directly adopts actions corresponding to the maximum Q value as scheduling decisions for any satellite communication state.
4. The hybrid non-orthogonal/orthogonal multiple access virtualized intelligent scheduling method of claim 1, wherein in the step (6), centralized means that a plurality of satellites are scheduled by a unified central control node; the distributed type means that each satellite is controlled independently and does not share information with each other; the hybrid centralized/distributed mode means that part of satellites are uniformly scheduled by a central control node and part of satellites are individually controlled.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211568877.1A CN116170052B (en) | 2022-12-08 | 2022-12-08 | Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211568877.1A CN116170052B (en) | 2022-12-08 | 2022-12-08 | Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116170052A true CN116170052A (en) | 2023-05-26 |
| CN116170052B CN116170052B (en) | 2024-11-26 |
Family
ID=86410153
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211568877.1A Active CN116170052B (en) | 2022-12-08 | 2022-12-08 | Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116170052B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116600387A (en) * | 2023-06-02 | 2023-08-15 | 中国人民解放军军事科学院系统工程研究院 | Multidimensional resource allocation method |
| CN119783554A (en) * | 2025-03-10 | 2025-04-08 | 北京邮电大学 | Training method of satellite routing prediction model and low-orbit satellite routing method |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111867104A (en) * | 2020-07-15 | 2020-10-30 | 中国科学院上海微系统与信息技术研究所 | A power distribution method and power distribution device for low-orbit satellite downlink |
| US20210078735A1 (en) * | 2019-09-18 | 2021-03-18 | Bae Systems Information And Electronic Systems Integration Inc. | Satellite threat mitigation by application of reinforcement machine learning in physics based space simulation |
| CN114362810A (en) * | 2022-01-11 | 2022-04-15 | 重庆邮电大学 | Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning |
| CN114698045A (en) * | 2022-03-30 | 2022-07-01 | 西安交通大学 | Serial Q learning distributed switching method and system under large-scale LEO satellite network |
-
2022
- 2022-12-08 CN CN202211568877.1A patent/CN116170052B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210078735A1 (en) * | 2019-09-18 | 2021-03-18 | Bae Systems Information And Electronic Systems Integration Inc. | Satellite threat mitigation by application of reinforcement machine learning in physics based space simulation |
| CN111867104A (en) * | 2020-07-15 | 2020-10-30 | 中国科学院上海微系统与信息技术研究所 | A power distribution method and power distribution device for low-orbit satellite downlink |
| CN114362810A (en) * | 2022-01-11 | 2022-04-15 | 重庆邮电大学 | Low-orbit satellite beam hopping optimization method based on migration depth reinforcement learning |
| CN114698045A (en) * | 2022-03-30 | 2022-07-01 | 西安交通大学 | Serial Q learning distributed switching method and system under large-scale LEO satellite network |
Non-Patent Citations (1)
| Title |
|---|
| 周碧莹 等: "基于强化学习的卫星网络资源调度机制", 计算机工程与科学, no. 12, 15 December 2019 (2019-12-15) * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116600387A (en) * | 2023-06-02 | 2023-08-15 | 中国人民解放军军事科学院系统工程研究院 | Multidimensional resource allocation method |
| CN116600387B (en) * | 2023-06-02 | 2024-02-06 | 中国人民解放军军事科学院系统工程研究院 | Multidimensional resource allocation method |
| CN119783554A (en) * | 2025-03-10 | 2025-04-08 | 北京邮电大学 | Training method of satellite routing prediction model and low-orbit satellite routing method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116170052B (en) | 2024-11-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Rahman et al. | Deep reinforcement learning based computation offloading and resource allocation for low-latency fog radio access networks | |
| Wang et al. | Intelligent cognitive radio in 5G: AI-based hierarchical cognitive cellular networks | |
| Abouaomar et al. | Federated deep reinforcement learning for open RAN slicing in 6G networks | |
| CN113163451A (en) | D2D communication network slice distribution method based on deep reinforcement learning | |
| CN109474980A (en) | A wireless network resource allocation method based on deep reinforcement learning | |
| CN114665952A (en) | A beam-hopping optimization method for low-orbit satellite networks based on satellite-ground fusion architecture | |
| CN112383922A (en) | Deep reinforcement learning frequency spectrum sharing method based on prior experience replay | |
| CN107919986A (en) | VM migrates optimization method between MEC nodes in super-intensive network | |
| CN118042528B (en) | Self-adaptive load balancing ground user access method for unmanned aerial vehicle auxiliary network | |
| CN116170052B (en) | Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method | |
| CN114173421B (en) | LoRa logical channel and power allocation method based on deep reinforcement learning | |
| CN110519849B (en) | Communication and computing resource joint allocation method for mobile edge computing | |
| CN101820671A (en) | Particle swarm algorithm-based distributed power distributing method used for OFDMA system | |
| CN115633033B (en) | Collaborative energy-saving computing migration method integrating RF energy harvesting | |
| CN118828603B (en) | A method for user association and resource allocation in cellular networks based on deep reinforcement learning | |
| CN114217638A (en) | Dynamic deployment method for information acquisition of mobile station under task driving | |
| Su et al. | A power allocation scheme based on deep reinforcement learning in HetNets | |
| CN115633402A (en) | Resource scheduling method for mixed service throughput optimization | |
| Liang et al. | Decentralized bit, subcarrier and power allocation with interference avoidance in multicell OFDMA systems using game theoretic approach | |
| Geng et al. | The study on anti-jamming power control strategy based on Q-learning | |
| Wang et al. | Distributed Q-learning for interference mitigation in self-organised femtocell networks: Synchronous or asynchronous? | |
| CN116133142B (en) | Intelligent framework-based high-energy-efficiency uplink resource allocation method under QoS constraint | |
| Luong et al. | Resource allocation in UAV-Assisted wireless networks using reinforcement learning | |
| CN115052356A (en) | Resource allocation method and system based on imitation learning | |
| Anzaldo et al. | Training Effect on AI-based Resource Allocation in small-cell networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |