CN116170052A

CN116170052A - Hybrid non-orthogonal\orthogonal multiple access satellite virtualization intelligent scheduling method

Info

Publication number: CN116170052A
Application number: CN202211568877.1A
Authority: CN
Inventors: 刘允; 刘宁; 王子恺; 宋志群; 张海鹏; 贾丹; 宋瑞良; 宋树田; 黄翀; 陈高洁; 肖培
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-05-26
Anticipated expiration: 2042-12-08
Also published as: CN116170052B

Abstract

The invention discloses a hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method, and relates to the field of satellite communication. The method considers the communication among multiple satellites and multiple users with time delay influence, and improves the sum of the total communication rate during satellite access by adopting a DDQN deep reinforcement learning algorithm to jointly optimize power distribution, NOMA/OMA mode switching, subcarrier selection and user pairing. The invention can self-adapt and adjust the optimization scheme to improve the communication rate under different virtualized network information acquisition conditions such as centralized, mixed centralized/distributed, distributed and the like.

Description

Hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method

Technical Field

The invention relates to the technical field of satellite communication, in particular to a hybrid Non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA) \orthogonal multiple access (Orthogonal Multiple Access, OMA) satellite virtualization intelligent scheduling method in satellite communication.

Background

Due to the reasons of scarce network resources, strong network dynamic performance, large link time delay and the like of the mobile satellite, the problem of task failure caused by imperfect scheduling can occur in satellite communication during task scheduling. The virtualized scheduling of the communication resources of the mobile satellite platform refers to dynamically adjusting the allocated network resources in the mobile satellite network environment, so that the idle resources can be fully utilized, and the problems of task scheduling failure and the like caused by the shortage of the satellite communication resources and the like are alleviated. However, resource conflicts mapped to different tasks are easily caused in the process of virtualized scheduling. Therefore, a mobile satellite communication virtual resource scheduling algorithm is proposed to address this problem.

In addition, with the rapid development of the satellite internet of things, in order to meet the access requirements of large connection and high spectrum efficiency of the satellite communication network, it is highly desirable to design a more effective multiple access mechanism to meet the performance requirements of the scheduling task. In a conventional satellite communication network, available resources are generally divided into orthogonal resource blocks, and satellite users of each access network occupy one of the resource blocks for data transmission, so that interference among users is avoided, communication quality of the users is guaranteed. When the orthogonal multiple access (Orthogonal Multiple Access, OMA) mode is adopted, the maximum number of users which can be accommodated in the satellite network is equal to the number of orthogonal resources, and the satellite network is limited in a scene of massive user access under the constraint of satellite load resources. Therefore, with the rapid development of the satellite internet of things, in order to meet the requirements of large-scale connection and high-spectrum efficiency access of the satellite internet of things, a new user multiple access mechanism needs to be introduced to improve the resource utilization rate of the new user multiple access mechanism.

The Non-orthogonal multiple access (Non-orthogonal Multiple Access, NOMA) technology can simultaneously and co-frequency transmit a plurality of signals through the power domain multiplexing and serial interference cancellation technology (Successive Interference Cancellation, SIC), has higher resource utilization and user fairness than the OMA technology, and is greatly focused by academia and industry. Under the same network resource condition, NOMA can greatly increase the capacity of a wireless communication system, improve the utilization rate of network resources and is widely focused in a 5G communication network. In addition, the NOMA/OMA hybrid use technology can adapt to the change of network conditions, and further improves the reliability of satellite communication transmission. However, the application of the hybrid transmission mode of NOMA/OMA in satellite communication network virtualization scheduling lacks systematic analytical verification and further converged use with the prior art such as power allocation.

Disclosure of Invention

Aiming at the problem of virtualized scheduling, the invention provides a hybrid NOMA/OMA virtualized intelligent scheduling method in satellite communication, which adopts an artificial intelligent technology to analyze environmental parameters based on outdated channel information and optimize a scheduling scheme.

The invention adopts the technical scheme that:

a hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method comprises the following steps:

(1) Defining satellite communication state at time t as s (t), wherein s (t) comprises channel information, satellite position, ground user position and satellite transmitting power of a satellite-user transmission link in a communication environment;

(2) The artificial intelligent Agent starts to select an action a (t) in a random exploration mode, wherein the action refers to joint scheduling of satellite transmitting power distribution, user switching, hybrid non-orthogonal/orthogonal multiple access switching and subcarrier distribution required by a satellite virtualization scheduling mechanism;

(3) The Agent adopts the action in the step (2) to interact with the environment to obtain a preset rewarding value r (t); the prize value is determined by the actual communication rate of the user;

(4) Repeating the steps (1) - (3) within the time T, and collecting corresponding training data { s (T), a (T), r (T), s (t+1) }; then, the Q value is updated by utilizing a double-depth Q network, wherein the Q value is an evaluation value of corresponding actions of each moment state;

(5) Repeating the steps (1) - (4) until convergence, and obtaining a trained Agent;

(6) And (3) under three conditions of centralized, distributed and mixed centralized/distributed, performing iterative training in the modes of steps (1) - (5) respectively to obtain a plurality of agents corresponding to each other, and executing intelligent scheduling of satellite virtualization.

Further, in step (1), the channel information in the state s (t) is outdated channel information considering the time delay, and the description of the outdated channel information is as follows:

wherein ρ is _AB Is the Jakes autocorrelation model parameter omega _AB Is a circularly symmetric complex Gaussian random variable, h _AB Is the actual channel parameter; the virtualized intelligent scheduling method obtains outdated channel information in the step (1), and establishes a relation between the outdated channel information and the actual communication rate in the training of the step (4).

Further, in step (4), the dual depth Q network has two neural networksCollaterals Q _A And Q _B Wherein network Q _A The Q value update in (2) is calculated by the following formula:

Q _A (s(t),a(t))＝Q _A (s(t),a(t))+δ(r _(s(t),a(t)) +τQ _B (s(t+1),argmax _a {Q _A (s(t+1),a)})-Q _A (s(t),a(t)))

network Q _B The Q value update in (2) is calculated by the following formula:

Q _B (s(t),a(t))＝Q _B (s(t),a(t))+δ(r _(s(t),a(t)) +τQ _A (s(t+1),argmax _a {Q _B (s(t+1),a)})-Q _B (s(t),a(t)))

wherein Q is _A (s (t), a (t)) and Q _B (s (t), a (t)) respectively represent the network Q _A And Q _B Q value corresponding to the state-action combination (s (t), a (t)); delta is the learning rate of the dual-depth Q network, and influences the learning speed of the network; r is (r) _(s(t),a(t)) A prize value for the corresponding state-action combination (s (t), a (t)); τ is a discount factor affecting the weights of the newly obtained rewards and the Q value of the original strategy in the new strategy respectively; argmax _a {. } means selecting action a corresponding to the maximum Q value from {. Times. };

dual depth Q network first exists in current network Q _A Find out the action corresponding to the maximum Q value, and then use the obtained action to obtain the network Q _B Corresponding Q value in the next state; network Q is then updated by updating the formula _A The Q value of (2); q (Q) _B At Q _A Remains fixed during training and is periodically defined by Q _A Updating the entire Q _B A network;

the training loss functions of two neural networks in the dual depth Q network are as follows:

wherein T is the number of training data used in training the neural network once;

after convergence, the dual-depth Q network directly adopts actions corresponding to the maximum Q value as scheduling decisions for any satellite communication state.

Further, in step (6), centralized means that a plurality of satellites are scheduled by a unified central control node; the distributed type means that each satellite is controlled independently and does not share information with each other; the hybrid centralized/distributed mode means that part of satellites are uniformly scheduled by a central control node and part of satellites are individually controlled.

The invention has the beneficial effects that:

1. the invention adopts a fusion system of mixed NOMA\OMA and power distribution technology in satellite communication, and adopts artificial intelligence-based deep reinforcement learning technology to solve the problem of high complexity optimization.

2. The invention considers different information acquisition conditions (centralized, mixed centralized/distributed, distributed), introduces the influence of delay channel information, and is hardly influenced by information deletion through analysis of artificial intelligence technology.

Drawings

Fig. 1 is a system model diagram of satellite communication in the present invention.

FIG. 2 is an interaction diagram of an Agent and satellite communication environment in the present invention.

Fig. 3 is a comparison of the performance of the centralized DDQN algorithm and no power allocation versus no subcarrier allocation for different satellite altitudes.

Fig. 4 is a comparison of the performance of DDQN algorithms under different information acquisition conditions (centralized, hybrid centralized/distributed, distributed) for different satellite altitudes.

Detailed Description

In order to make the principle, technical flow scheme and optimization effect of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and examples.

(6) Considering the problem of cooperation and communication overhead among satellites in actual communication, in three cases of centralization (all satellites have unified central control nodes, all satellite information can be collected, such as ground control stations), distributation (all satellites do not have any direct and indirect communication with each other and do not interact with any information with each other), and hybrid centralization/distributation (part of satellites have central control nodes, other satellites and other satellites do not interact with information in the distributional mode), a plurality of corresponding agents are obtained through iterative training in steps (1) - (5) respectively to execute satellite virtualization intelligent scheduling.

In step (1), the channel information in the state s (t) is outdated channel information considering the time delay, and the description of the outdated channel information is as follows:

In step (4), the dual deep Q network has two neural networks Q _A And Q _B Wherein network Q _A The Q value update in (2) is calculated by the following formula:

network Q _B The Q value update in (2) is calculated by the following formula:

In the satellite virtualization technology, the method solves the joint intelligent scheduling problem including satellite transmitting power distribution, user switching, hybrid non-orthogonal/orthogonal multiple access switching and subcarrier distribution through training a plurality of agents in a centralized mode, a distributed mode and a hybrid centralized/distributed mode, and considers the outdated channel information problem caused by time delay.

The following is a specific example:

as shown in fig. 1, it is contemplated that a plurality of Low Earth Orbit Satellites (LEOs) serve a plurality of users in one ground area at the same time. Each LEO satellite has the functions of hybrid NOMA/OMA adaptive transmission mode, subcarrier allocation, power allocation, and at each transmission moment the ground user switches access to the appropriate satellite. Meanwhile, in view of the limited energy of the satellite, the upper energy consumption threshold of the satellite in serving an area is considered in the power distribution. The objective is to develop a virtualized scheduling scheme for optimizing the proposed maximization of satellite-to-ground communication transmission performance by optimizing hybrid NOMA/OMA adaptive transmission modes, subcarrier allocation, power allocation, user-to-satellite access handoff to maximize the sum of the communication capacities of multiple satellites serving all users in the same ground area. Meanwhile, the method considers the problem of cooperation among satellites in actual communication, and the optimization scheme under the conditions of different communication environments of centralized type, distributed type and mixed centralized/distributed type. In addition, considering the time delay problem in actual communication, the method introduces outdated channel information.

Because the scheduling problem is a very complex non-convex mixed integer programming problem, the method adopts a deep reinforcement learning technology to maximize the sum of communication capacity of users in a service area, and learns the influence and the connection of the change of a communication environment, so that the proposed optimization scheme can adapt to three different conditions of centralized, mixed centralized/distributed and distributed, and a mixed NOMA/OMA communication virtualization scheduling technology with multi-LEO cooperation is developed.

In order to realize the method, a training method based on an artificial intelligence technology is adopted, and the specific steps are as follows:

(1) Defining a satellite communication state at a certain t moment as s (t), wherein the state comprises environment parameters required by satellite communication such as channel information, satellite position, ground user position, satellite transmitting power and the like of a satellite-user transmission link in a communication environment;

(2) The artificial intelligent Agent starts to select an action a (t) in a random exploration mode, wherein the action refers to joint scheduling of satellite transmission power allocation, user switching, mixed NOMA/OMA switching and subcarrier allocation required by a satellite virtualization scheduling mechanism;

(3) And (3) the Agent interacts with the environment by adopting the action in the step (2) to obtain a preset rewarding value r (t). The prize value is determined by the actual communication rate of the user. The interaction of an Agent and a communication environment is shown in fig. 2;

(4) Repeating the steps (1) - (3) in the T time, and collecting corresponding training data { s (T), a (T), r (T), s (t+1) }. Then, the Q value is updated using the (Deep Reinforcement Learning with Double Q-learning) DDQN algorithm. The Q value is an evaluation value of the corresponding action for each moment state;

(5) Repeating steps (1) - (4) until the algorithm converges.

In step (1), the channel information in the state is outdated channel information considering time delay, and the description of the outdated channel information is as follows:

wherein ρ is _AB Is the Jakes autocorrelation model parameter omega _AB Is a circularly symmetric complex Gaussian random variable, h _AB Is the actual channel parameter. The scheduling method adopted obtains outdated channel information in the step (1), and establishes a relation between the outdated channel information and the actual communication rate in the training of the step (4).

In addition, in the step (4), the DDQN algorithm has two neural networks Q _A And Q _B Wherein Q is _A The Q value update in the network is calculated by:

wherein, delta is the learning rate of DDQN, which affects the learning speed of the network. Gamma is a discount factor, affecting the weight of the newly obtained prize and the original policy Q value in the new policy, respectively. DDQN is first at current Q _A Finding out the action corresponding to the maximum Q value in the network, and then obtaining the Q value in the network by using the obtained action _B Corresponding Q value at the next state. Then update Q by updating the formula _A Q value in the network.

Network Q _B The updated formula of (c) is as follows:

considering that the network is to be avoided from being too aggressive in exploring the policy, Q _B At Q _A Remains fixed during training and is periodically defined by Q _A Updating the entire Q _B A network. The training loss functions for two neural networks in DDQN are as follows:

the DDQN repeats steps (1) - (4) until the algorithm converges. After convergence, the DDQN algorithm directly adopts actions corresponding to the maximum Q value as scheduling decisions for any satellite communication state.

For the satellite communication network in the method, the environment refers to the whole satellite communication network environment, and the state is the communication necessary data including the channel information of the satellite-user transmission link, the satellite position, the ground user position and the satellite transmitting power obtained from the environment. The actions are joint scheduling of satellite transmission power allocation, user handover, hybrid NOMA/OMA handover, and subcarrier allocation required by the satellite virtualized scheduling mechanism. The reward is then dependent on the actual communication rate obtained after the user has accessed the satellite communication network. Thus, the greater the total rate achieved by the user, the greater the reward.

In the method, a DDQN algorithm is used for specifically realizing the development of a satellite virtualization network scheduling technology. For a satellite communication environment, a state s (t) at a certain time t is defined, an Agent selects an action a (t) in a random exploration mode, the environment changes the state s (t) into a state s (t+1) at the next time after executing the action a (t), and corresponding rewards r (t) are obtained according to feedback of a user rate. Two networks Q are designed in the intelligent agent _A And Q _B The evaluation value (Q value) of the corresponding s (t) -a (t) state-action combination is updated by using the prize r (t), and the performance of the combination is judged accordingly.

Thus, in the DDQN algorithm, a large number of data sets { s (t), a (t), r (t), s (t+1) } can be generated by random exploration, and then the two networks Q are iteratively updated with each other by updating the Q values _A And Q _B . Considering that in satellite communication networks, the number of combinations of states and actions is a very large number, it is almost impossible to build a corresponding table to record the Q value. Therefore, a deep neural network is employed instead of a table to record the Q value. The deep neural network can input states and output actions, and the mapping relation of the states and the actions in the neural network is the Q value. The specific algorithm flow is as follows:

1. initializing deep neural network Q _A And Q _B I.e. initializing the Q value).

2. The cycle is started:

2.1 for state s (t) at time t, action a (t) is obtained using a-network prediction/random search.

2.2 by executing action a (t), get feedback rewards r (t) of the environment and state s (t+1) of the next moment

2.3 generating training data { s (t), a (t), r (t), s (t+1) }

2.4 after each generation of T training data, the existing dataset is used by Q _A Update formula update Q of (2) _A

2.5Q per training N times _A Thereafter, Q _A Copy of Q value to Q _B 。

3. Terminating the loop after reaching the preset iteration times or convergence targets

The test of the present method is shown in FIGS. 3-4. The test result shows the corresponding performance of the proposed algorithm with no power allocation, no subcarrier allocation and no power no subcarrier allocation under different satellite heights, and the comparison result is convenient. Furthermore, the method provides a centralized, hybrid centralized/distributed and distributed free choice and gives a corresponding rate performance comparison.

From the results in the centralized case, the proposed DDQN-based algorithm in fig. 3 performs best, greatly increasing the overall rate of communication compared to the case of no power allocation (average allocated power). The effect of subcarrier selection is insignificant because for long-range satellite communications, the effect of small-scale fading is much lower than large-scale fading, and thus the gain of subcarrier selection is insignificant compared to power allocation. From the results in fig. 4, the centralized performance is most superior, the hybrid centralized distributed performance of fig. 4 is inferior, and the distributed performance of fig. 5 is the lowest. This conforms to the logic of the scene design, i.e. global optimization must be stronger than the sum of local optimizations.

Therefore, the invention can ensure the communication rate of the multi-low orbit satellite multi-user communication system with the consideration of time delay channel information. Meanwhile, the neural network with low complexity is beneficial to saving calculation resources during real-time calculation.

In summary, the invention considers the communication among multiple satellites and multiple users with time delay effect, and improves the sum of the total communication rate during satellite access by adopting a DDQN deep reinforcement learning algorithm to jointly optimize power distribution, NOMA/OMA mode switching, subcarrier selection and user pairing. The invention can self-adapt and adjust the optimization scheme to improve the communication rate under different virtualized network information acquisition conditions such as centralized, mixed centralized/distributed, distributed and the like.

The embodiment described above is only one specific embodiment of the present invention, and not all embodiments. Based on the embodiments of the present invention, other embodiments that may be obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

Claims

1. A hybrid non-orthogonal/orthogonal multiple access satellite virtualization intelligent scheduling method is characterized by comprising the following steps:

2. The hybrid non-orthogonal/orthogonal multiple access virtualized intelligent scheduling method of claim 1, wherein in the step (1), the channel information in the state s (t) is outdated channel information considering time delay, and the description of the outdated channel information is as follows:

3. The hybrid non-orthogonal/orthogonal multiple access virtualized intelligent scheduling method of claim 1, wherein in step (4), a dual depth Q network has two neural networks Q _A And Q _B Wherein network Q _A The Q value update in (2) is calculated by the following formula:

Q _A (s(t),a(t))＝Q _A (s(t),a(t))+δ(r _(s(t),a(t))

+τQ _B (s(t+1),argmax _a {Q _A (s(t+1),a)})

-Q _A (s(t),a(t)))

network Q _B The Q value update in (2) is calculated by the following formula:

Q _B (s(t),a(t))＝Q _B (s(t),a(t))+δ(r _(s(t),a(t))

+τQ _A (s(t+1),argmax _a {Q _B (s(t+1),a)})

-Q _B (s(t),a(t)))

4. The hybrid non-orthogonal/orthogonal multiple access virtualized intelligent scheduling method of claim 1, wherein in the step (6), centralized means that a plurality of satellites are scheduled by a unified central control node; the distributed type means that each satellite is controlled independently and does not share information with each other; the hybrid centralized/distributed mode means that part of satellites are uniformly scheduled by a central control node and part of satellites are individually controlled.