[go: up one dir, main page]

CN112637965A - Game-based Q learning competition window adjusting method, system and medium - Google Patents

Game-based Q learning competition window adjusting method, system and medium Download PDF

Info

Publication number
CN112637965A
CN112637965A CN202011620219.3A CN202011620219A CN112637965A CN 112637965 A CN112637965 A CN 112637965A CN 202011620219 A CN202011620219 A CN 202011620219A CN 112637965 A CN112637965 A CN 112637965A
Authority
CN
China
Prior art keywords
node
network
learning
nodes
competition window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011620219.3A
Other languages
Chinese (zh)
Other versions
CN112637965B (en
Inventor
俞晖
毛中杰
王政
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiao Tong University
Original Assignee
Shanghai Jiao Tong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiao Tong University filed Critical Shanghai Jiao Tong University
Priority to CN202011620219.3A priority Critical patent/CN112637965B/en
Publication of CN112637965A publication Critical patent/CN112637965A/en
Application granted granted Critical
Publication of CN112637965B publication Critical patent/CN112637965B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0808Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA]
    • H04W74/0816Non-scheduled access, e.g. ALOHA using carrier sensing, e.g. carrier sense multiple access [CSMA] with collision avoidance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W74/00Wireless channel access
    • H04W74/08Non-scheduled access, e.g. ALOHA
    • H04W74/0833Random access procedures, e.g. with 4-step access
    • H04W74/0841Random access procedures, e.g. with 4-step access with collision treatment
    • H04W74/085Random access procedures, e.g. with 4-step access with collision treatment collision avoidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明提供了一种基于博弈的Q学习竞争窗口调整方法、系统及介质,包括:步骤1:初始化网络节点设置,进行自组网并建立路由表;步骤2:全网节点通过路由表获知一跳邻居节点个数,并广播至邻居节点;步骤3:计算节点在网络中的权重大小并进行广播;步骤4:根据网络差异性采取不同退避策略;步骤5:网络中各节点按照步骤4产生的竞争窗口状态集合进行Q学习,输出最优竞争窗口区间,并依此进行通信;步骤6:网络拓扑结构发生改变或业务负载产生较大波动后,重复执行步骤2‑5。本发明使用博弈论对网络场景进行分析,确定不同节点进行Q学习的状态集合,然后利用Q学习算法产生决策,更新竞争窗口区间,以达到优化整体网络性能的大小。

Figure 202011620219

The present invention provides a game-based Q-learning competition window adjustment method, system and medium, including: step 1: initialize network node settings, perform ad hoc networking and establish a routing table; Jump the number of neighbor nodes, and broadcast to neighbor nodes; Step 3: Calculate the weight of the node in the network and broadcast it; Step 4: Adopt different back-off strategies according to network differences; Step 5: Each node in the network is generated according to Step 4 Q-learning is performed on the set of contention window states obtained, and the optimal contention window interval is output, and communication is performed accordingly; Step 6: After the network topology changes or the service load fluctuates greatly, repeat steps 2-5. The invention uses game theory to analyze the network scene, determines the state sets of different nodes for Q-learning, and then uses the Q-learning algorithm to generate decisions and update the competition window interval to optimize the overall network performance.

Figure 202011620219

Description

Game-based Q learning competition window adjusting method, system and medium
Technical Field
The invention relates to the technical field of mobile ad hoc networks, in particular to a method, a system and a medium for adjusting a Q learning competition window based on a game.
Background
In recent years, with the rapid development of wireless communication technology, the mobile ad hoc network is widely applied to various fields, such as public network-free areas of earthquake relief, remote investigation and the like, by virtue of the characteristics of high mobility, strong adaptability, low cost and the like. Meanwhile, the characteristics of strong dynamic property and large topology difference of the mobile ad hoc network bring certain challenges to network design, and especially, high requirements are provided for the design of a back-off algorithm in an access protocol.
Currently, widely applied Access technologies in mobile ad hoc networks are Carrier Sense Multiple Access (CSMA/CA) protocol with Collision avoidance and Binary Exponential Back-off (BEB) algorithm. However, the existing backoff algorithm cannot adapt to the characteristics of high dynamic property and large topological difference of the mobile self-organizing network, and due to the fact that parameters in a network protocol are numerous and the algorithm is complicated, the parameters may influence a certain final index, and the influence is difficult to express through a mathematical formula to find a closed optimal solution, so that the mobile self-organizing network is optimized by a machine learning method through a large amount of research. The research of the vehicle-mounted communication MAC layer channel access technology based on multi-agent Q Learning introduces a Q-Learning algorithm into a vehicle-mounted self-organizing network, and provides a dynamic adjustment competition window algorithm based on the Q-Learning: in the communication process, the vehicle node always selects the CW value which enables the accumulated reward value to be maximum to carry out backoff, so that the effects of improving the transmission success rate and reducing the time delay are achieved, but the defect of large topology difference in the mobile ad hoc network is not considered.
In a mobile ad hoc network, due to the mobility of nodes and the complexity of a network scenario, a multi-hop communication mode often occurs. In such a distributed communication network, the load of different nodes in the network varies due to factors such as the geographical location of the nodes. For a node in a network center, the node has more neighbor nodes and correspondingly bears more forwarding services, so that the service load of the node is larger; for nodes at the edge of the network, there are fewer neighbor nodes, and the nodes often do not or do less forwarding traffic, resulting in a smaller traffic load. The game-based reinforcement learning backoff window design scheme is provided aiming at the differences of the nodes, and the overall network performance is improved by setting the size of a targeted backoff window interval for different nodes.
Patent document CN107426772A (application number: CN201710537493.6) discloses a dynamic contention window adjustment method, device and apparatus based on Q learning, the method includes: A. initializing channel access parameters and initial annealing temperature; B. transmitting and acquiring a first throughput of data packet transmission under the size of an initial contention window; C. under the first throughput, adopting a simulated annealing algorithm to obtain the size of a first competition window; D. acquiring a second throughput of data packet transmission under the size of the first contention window; E. updating the Q value table according to a preset formula for the second throughput and the target throughput, and updating the initial annealing temperature by using preset conditions to obtain the updated annealing temperature; and when the updated annealing temperature is higher than the minimum threshold value, taking the updated annealing temperature as the initial annealing temperature, executing the steps B to E, repeatedly updating the Q value table and obtaining the updated annealing temperature, and stopping updating the Q value table until the updated annealing temperature is lower than or equal to the minimum threshold value, thus obtaining the optimal competition window.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method, a system and a medium for adjusting a game-based Q learning competition window.
The game-based Q learning competition window adjusting method provided by the invention comprises the following steps:
step 1: initializing network node settings, including a communication protocol, a network topology, a service arrival model and a physical layer standard, performing ad hoc networking and establishing a routing table by a network in a mode of broadcasting routing information after communication is started, and in an initial state, performing backoff by the node according to the size of a default contention window, and then adjusting the size of the contention window according to an action output by a Q learning algorithm to perform backoff;
step 2: the whole network node acquires the number of one-hop neighbor nodes through a routing table, broadcasts the number to the neighbor nodes through RTS/CTS signaling, receives RTS/CTS information broadcast by the neighbor nodes, acquires and calculates the average number of the neighbor nodes, and calculates the size of a service load in unit time by inquiring the service to be sent in a cache region and calculates a corresponding load factor;
and step 3: calculating the weight of the node in the network, broadcasting, calculating a network difference index by the node according to the maximum weight and the minimum weight in a one-hop range, and then playing games according to the network difference index;
and 4, step 4: if the network difference is larger than a preset threshold value, a balanced backoff strategy is adopted, the node with the maximum weight in the network adopts a smaller competition window state set in the reinforcement learning, and other nodes adopt a larger competition window state set in the reinforcement learning; otherwise, adopting a default backoff strategy, and adopting the same competition window state set by all nodes in the network;
and 5: each node in the network performs Q learning according to the competition window state set generated in the step 4, outputs an optimal competition window interval and performs communication according to the optimal competition window interval;
step 6: and (5) repeatedly executing the step 2-5 after the network topology structure is changed or the service load is greatly fluctuated.
Preferably, the importance of the node in the network is confirmed by inquiring the routing table and counting the flow, and for any node k, the number N of one-hop neighbor nodes of the node is acquired by inquiring the routing tablekAcquiring the number sigma of neighbor nodes of the neighbor nodes in the one-hop communication range through the RTS/CTS additional bit informationiNiAnd calculating the average neighbor number of the neighbor nodes
Figure BDA0002872142670000031
For any node k, counting the size L of the traffic load in the current time periodkAccording to the traffic load LkFor channel transmission rate RkWhether the node is a heavy service node is divided according to the relative size of the node, and the load factor l is usedkRepresents:
Figure BDA0002872142670000032
according to the load factor lkThe number N of neighbor nodeskAnd neighborsAverage number of neighbors of a node
Figure BDA0002872142670000033
Calculating the weight:
Figure BDA0002872142670000034
wherein b is a constant to ensure the basic weight of the edge node in the network.
Preferably, for the node with the maximum weight in the one-hop communication range, the weight y is broadcasted through RTS/CTS control signalingiKnowing the minimum node weight in a hop range as
Figure BDA0002872142670000035
Calculating a difference index
Figure BDA0002872142670000036
Determining the difference of the network nodes according to the value of G;
if the G value exceeds a preset threshold value, judging that the difference of the network is large, and adopting a balanced backoff strategy: the node with the largest weight adopts a smaller competition window interval, namely a smaller left value under the same right value; the node with the smallest weight adopts a larger competition window interval, namely a larger left value under the same right value.
Preferably, the node is used as an agent, and the interval left value of the contention window in the backoff algorithm is used as the environment state set, i.e. CWmin=2s-1 left value CW of contention window per unit timeminAs a set of actions
Figure BDA0002872142670000037
Figure BDA0002872142670000038
The network transmission success rate and the average time delay are taken as optimization targets to learn, and the updating formula is as follows:
Q(S,A)←Q(S,A)+α[r+γmaxaQ(S',a)-Q(S,A)]
wherein, gamma is a discount factor and represents the influence of the action adopted in the past on the current action; r is reward, which represents the size of a reward function evaluated by the transmission success rate and the average delay index obtained by taking the action A in the current state S; alpha is a convergence factor and is a main factor influencing the convergence rate; s' is a past state.
The game-based Q learning competition window adjusting system provided by the invention comprises:
module M1: initializing network node settings, including a communication protocol, a network topology, a service arrival model and a physical layer standard, performing ad hoc networking and establishing a routing table by a network in a mode of broadcasting routing information after communication is started, and in an initial state, performing backoff by the node according to the size of a default contention window, and then adjusting the size of the contention window according to an action output by a Q learning algorithm to perform backoff;
module M2: the whole network node acquires the number of one-hop neighbor nodes through a routing table, broadcasts the number to the neighbor nodes through RTS/CTS signaling, receives RTS/CTS information broadcast by the neighbor nodes, acquires and calculates the average number of the neighbor nodes, and calculates the size of a service load in unit time by inquiring the service to be sent in a cache region and calculates a corresponding load factor;
module M3: calculating the weight of the node in the network, broadcasting, calculating a network difference index by the node according to the maximum weight and the minimum weight in a one-hop range, and then playing games according to the network difference index;
module M4: if the network difference is larger than a preset threshold value, a balanced backoff strategy is adopted, the node with the maximum weight in the network adopts a smaller competition window state set in the reinforcement learning, and other nodes adopt a larger competition window state set in the reinforcement learning; otherwise, adopting a default backoff strategy, and adopting the same competition window state set by all nodes in the network;
module M5: each node in the network performs Q learning according to the competition window state set generated by the module M4, outputs an optimal competition window interval and performs communication according to the optimal competition window interval;
module M6: after the network topology changes or the traffic load greatly fluctuates, the modules M2-M5 are called in sequence.
Preferably, the importance of the node in the network is confirmed by inquiring the routing table and counting the flow, and for any node k, the number N of one-hop neighbor nodes of the node is acquired by inquiring the routing tablekAcquiring the number sigma of neighbor nodes of the neighbor nodes in the one-hop communication range through the RTS/CTS additional bit informationiNiAnd calculating the average neighbor number of the neighbor nodes
Figure BDA0002872142670000041
For any node k, counting the size L of the traffic load in the current time periodkAccording to the traffic load LkFor channel transmission rate RkWhether the node is a heavy service node is divided according to the relative size of the node, and the load factor l is usedkRepresents:
Figure BDA0002872142670000042
according to the load factor lkThe number N of neighbor nodeskAnd average neighbor number of neighbor nodes
Figure BDA0002872142670000043
Calculating the weight:
Figure BDA0002872142670000044
wherein b is a constant to ensure the basic weight of the edge node in the network.
Preferably, for the node with the maximum weight in the one-hop communication range, the weight y is broadcasted through RTS/CTS control signalingiKnowing the minimum node weight in a hop range as
Figure BDA0002872142670000045
Calculating a difference index
Figure BDA0002872142670000046
Determining the difference of the network nodes according to the value of G;
if the G value exceeds a preset threshold value, judging that the difference of the network is large, and adopting a balanced backoff strategy: the node with the largest weight adopts a smaller competition window interval, namely a smaller left value under the same right value; the node with the smallest weight adopts a larger competition window interval, namely a larger left value under the same right value.
Preferably, the node is used as an agent, and the interval left value of the contention window in the backoff algorithm is used as the environment state set, i.e. CWmin=2s-1 left value CW of contention window per unit timeminAs a set of actions
Figure BDA0002872142670000051
Figure BDA0002872142670000052
The network transmission success rate and the average time delay are taken as optimization targets to learn, and the updating formula is as follows:
Q(S,A)←Q(S,A)+α[r+γmaxaQ(S',a)-Q(S,A)]
wherein, gamma is a discount factor and represents the influence of the action adopted in the past on the current action; r is reward, which represents the size of a reward function evaluated by the transmission success rate and the average delay index obtained by taking the action A in the current state S; alpha is a convergence factor and is a main factor influencing the convergence rate; s' is a past state.
According to the present invention, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above.
Compared with the prior art, the invention has the following beneficial effects:
the method analyzes the network scene by using the game theory, determines the state set of Q learning of different nodes, generates a decision by using a Q learning algorithm, and updates the competition window interval so as to optimize the overall network performance.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a diagram of a cooperative gaming system model;
FIG. 2 is a cooperative gaming simulation topology;
FIG. 3 is a transmission success rate curve for a cooperative game;
FIG. 4 is a graph of average delay for a cooperative game;
fig. 5 is a transmission success rate curve of a central node of the cooperative game;
FIG. 6 is a graph of average delay for a central node of a cooperative game;
fig. 7 is a transmission success rate curve of the edge nodes of the cooperative game;
FIG. 8 is an average plot of edge nodes for the cooperative game;
fig. 9 is a flow diagram of a cooperative gaming implementation.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example (b):
the invention provides a design scheme for using a game-based reinforcement learning competition window in a mobile ad hoc network, which comprises the following steps: the network node inquires the number of neighbor nodes and the load size of the network node by inquiring a routing table and a traffic statistic mode, broadcasts the information by an RTS/CTS mechanism, further determines the difference of the network, judges which game strategy is adopted according to the difference of the network, determines a state set of reinforcement learning according to the corresponding game strategy, then performs reinforcement learning, finally determines the interval of a competition window, and performs communication according to the state set of reinforcement learning. The method adopts a mode that the system carries out communication and training at the same time, namely, the system carries out reinforcement learning and communication according to the current state set, when the network load changes or the topology changes, the network node carries out decision-making again to determine the state set, then carries out reinforcement learning and communication according to the new state set, and continuously repeats the processes until the iteration is terminated.
The invention adopts Q-Learning algorithm as the implementation method of reinforcement Learning, and the optimization object is the left interval of the competition window CW, namely the CWminAiming at improving the success rate of system transmission and reducing the average time delay, the communication environment is modeled as follows:
and a state S: current minimum contention window size, CW, of a node in communicationmin=2s-1. The minimum competition window size directly determines the lower limit of the back-off time and indirectly influences the average value of the back-off time, the collision of a channel and the time delay of data are influenced significantly, and the set of the state s is given by a game strategy.
Action A: the action setting in the present invention is to adjust the size of the state s,
Figure BDA0002872142670000061
in a communication system, a network node needs to communicate according to existing parameters for a period of time before communication performance can be obtained, so that the change of actions is periodic, and after equal time intervals, the state s is adjusted.
Reward R: the invention takes the index variable quantity obtained by the periodic communication of the network nodes as the reward, and the index is the transmission success rate PDR and the average DELAY DELAY. In order to effectively reflect the degree of change of indexes with different dimensions, the indexes are normalized as follows:
Figure BDA0002872142670000062
Figure BDA0002872142670000063
R=μ*PDR+(1-μ)*DELAY
in the formula, mu is a weight factor and represents the proportion of the transmission success rate variation in the reward; PDRnow,PDRpastRespectively indicating the transmission success rate in the current unit time and the transmission success rate in the previous unit time; delaynow,delaypastRespectively, the average time delay in the current unit time and the average time delay in the previous unit time.
Transition probability P: in the invention, all nodes independently perform the Q learning algorithm, and the selected action can be accurately executed by the communication equipment, so that the state transition probability is 1.
In a distributed network scenario, the service loads of different nodes are different, the influence of the set of the state s on the network performance is amplified, if the interval of the state set is large, the situation that the reinforcement learning effect of different nodes vibrates repeatedly may be caused, and if the difference of the state set is small, the backoff algorithm is degenerated into a mechanism with fixed parameters, and cannot adapt to the change of the environment. The invention provides a competition window design scheme based on a double-headed monopoly model in a game theory so as to adapt to a differential network topology structure.
The double-headed monopoly model of the Gonuo theory on industry organization indicates that the market capacity is limited, and the production volume of different companies determines the price of the product, further influencing the profit of the company. This is similar to the access problem for the communication channel. If the access probability of different nodes to the channel is low, the channel is not fully utilized, and if each node frequently accesses the channel, channel collision is caused, and the channel utilization rate is reduced.
The influence of the backoff window on the average time delay and the transmission success rate can be observed, under a certain service load, the larger the backoff window is, the lower the collision probability is, but the average time delay is also reduced, and the transmission success rate and the average time delay are in negative correlation, so that a Gunuo double-headed monopoly model is used for describing decision behaviors of two nodes, then a game behavior of multiple nodes is extended, and finally a cooperative game model is designed for a network scene with larger difference.
Modeling the game theory:
let q be1,q2Indicating the capability of the nodes 1,2 to access the channel (by adjusting the backoff parameter settings), respectively, the total capability of the access channel in the network Q1+q2Let us order
Figure BDA0002872142670000071
The capability of the network to successfully transmit under the backoff parameter is shown, wherein a represents the maximum transmission capability of the channel, and k is a constant. Let the signaling overhead of the network be constant c, and c<a. According to the goono assumption, two nodes make decisions on access channel capabilities simultaneously.
Assuming that the profit of the node is in positive linear correlation with the number of successfully transmitted data packets in unit time, in a normal two-participant standard game, the profit u of the participant i isi(si,sj) Can be written as:
ui(qi,qj)=qi[k(a-Q-c)]…………(1)
in the Gunuo model of double-headed monopoly, a pair of decision combinations
Figure BDA0002872142670000072
For nash equalization, for each node i,
Figure BDA0002872142670000073
the solution to the following maximization problem should be solved:
Figure BDA0002872142670000081
under nash balance, the access parameters of the node must satisfy:
Figure BDA0002872142670000082
solving the system of equations yields:
Figure BDA0002872142670000083
the above assumptions are satisfied.
The physical meaning of the above equalisation is that each network node, wishing to become the monopost of the channel, selects the parameter qiMake self-income pii(qi,qj) Maximum, when Q is Q1+q2Before reaching the equilibrium value, each node tends to increase qi to increase its own gain; after the equilibrium value is reached, each node may continue to increase, but for each node, the behavior of deviating from the equilibrium solution unilaterally will result in the decrease of the self-income, and on the premise of non-cooperation, the above behavior of breaking the equilibrium will not occur.
In fact, in a distributed network, there may be n network nodes, let qiDenotes the access capability of node i, and Q ═ Q1+q2+…+qnAnd (3) representing the total capacity of an access channel in the network, wherein the anti-demand function p (Q) is a-Q, and the signaling overhead is a fixed value c. All nodes do not know the decision of other nodes and independently make the decision, and the decision process is
For the ith network node, the goal is to maximize its own revenue, i.e.:
Figure BDA0002872142670000084
solving equation (4) yields:
Figure BDA0002872142670000085
the above results show that in the non-cooperative game model in which n nodes share a channel, when nash equilibrium is reached, all nodes will have the same access capability, and the sum of the access capabilities of all nodes will not exceed the upper limit of the channel carrying capability. In the distributed network, each node calculates the own equilibrium strategy according to the number of the own one-hop neighbor nodes, and the parameter is used as a state set to perform reinforcement learning to access a channel.
The cooperative game model comprises the following steps: in the above game model, each node aims at maximizing its own communication performance, and selects a corresponding game balancing strategy, however, such a balancing strategy may cause rapid deterioration of performance of some nodes under the condition of unbalanced node load, so a cooperative game model is proposed to analyze the balancing strategy of each node with the goal of optimizing the overall performance of the network. To better analyze the problem, the following assumptions were made:
each node has the capacity of counting the self traffic load. Let the traffic load of node i be LiThe channel transmission rate is RiDividing whether the node is a heavy service node according to the relative size of the service load to the channel transmission rate, and using a load factor liExpressed, defined as follows:
Figure BDA0002872142670000091
in general, we consider that when liWhen the load factor is larger than or equal to 1, the node is a heavy service node, and the load factor is increased linearly along with the increase of the service load; when l isi<1, the node is a normal node, and the load factor increases exponentially with the increase of the traffic load, so that when the node is almost unloaded, the load factor is very small and the growth trend is very small.
The node i can obtain the number N of nodes in a one-hop range by inquiring the routing tablei(ii) a The node i acquires the number of neighbor nodes of the surrounding one-hop node by broadcasting RTS/CTS control signaling, and calculates the average number of the neighbor nodes
Figure BDA0002872142670000092
The node i combines the load factor l according to the informationiAnd (3) obtaining the weight of the node in the network:
Figure BDA0002872142670000093
in the above equation (7), b is a constant, and its physical meaning is to guarantee the basic weight of the edge node in the network.
For node i, the benefit is defined as:
ui(qi,q-i)=yiqi(a-Q)…………(8)
the revenue function is the sum of the revenue of all nodes, i.e.:
iui(qi,q-i)=∑iyiqi(a-Q)…………(9)
in quasi-static networks, the backoff window of each node must exist, so qiP is the minimum value set by the backoff window, and the weight of the node in the network is kept unchanged in a short time, namely yiIs a constant. Assuming that the channel transmission capacity a is constant, and
Figure BDA0002872142670000094
then the gain function is scaled, and the maximum value is solved to obtain:
Figure BDA0002872142670000095
the above equation shows that when the overall performance tends to be optimal, the capacities of the access channels of the nodes with non-maximum weights are all the minimum value p, and the access channel capacity of the node with the maximum weight can be obtained by calculation according to the actual situation.
The system model of the present invention is shown in fig. 1, the implementation flow is shown in fig. 9, and the following describes the implementation of the present invention with reference to specific examples:
step 1: for any node k, the number N of one-hop neighbor nodes of the node is obtained through the query of a routing tablekAcquiring the number sigma of neighbor nodes in a one-hop communication range through additional bit information of RTS/CTS control signalingiNiAnd calculating the average neighbor number of the neighbor nodes
Figure BDA0002872142670000096
Step 2: for any node k, counting the size L of the traffic load in the current time periodkAccording to the traffic load LkFor channel transmission rate RkWhether the node is a heavy service node is divided according to the relative size of the node, and the load factor l is usedkAnd (4) showing.
Figure BDA0002872142670000101
And step 3: for any node k, according to the load factor lkThe number N of neighbor nodeskAnd average neighbor number of neighbor nodes
Figure BDA0002872142670000102
Calculating weights
Figure BDA0002872142670000103
And 4, step 4: for any node k, entering a decision stage, and respectively making the following decisions according to different node conditions:
for the node with the maximum weight in the one-hop communication range, the weight y is broadcasted through RTS/CTS control signalingi(an indication bit is added in the control signaling, and the corresponding overhead is increased), the minimum node weight in the range of one hop is obtained as
Figure BDA0002872142670000104
Calculating a difference index
Figure BDA0002872142670000105
Difference index threshold GtDetermining the difference of the network nodes according to the value of G, and making a decision as follows:
(1) if the network nodes have large difference, G>GtThen, for the node with the largest weight, the decision is as follows:
if the network service load is larger, adopting a default backoff strategy and adjusting a backoff window to (15,1023);
if the network service load is smaller, a balanced backoff strategy is adopted, and the parameter set of the minimum backoff window in reinforcement learning is set to be CWminE {7,15,31}, using the CW after reinforcement learningminMaking a decision;
and after the node with the maximum weight makes a decision, other neighbor nodes are informed through broadcasting control information.
For the nodes with non-maximum weight in the range of one hop, after the neighbor with the maximum weight is selected, the decision is as follows:
if the network service load is larger, adopting a default backoff strategy and adjusting a backoff window to (15,1023);
if the network service load is smaller, a balanced backoff strategy is adopted, and the parameter set of the minimum backoff window in reinforcement learning is set to be CWminE {63,127,255}, using the CW after reinforcement learningminMaking a decision;
and repeating the above actions by all the nodes until the whole network decision is completed.
(2) If the network node difference is small, G is less than or equal to GtThe decision is as follows:
all nodes adopt a default backoff window size (15,1023);
and 5: and (4) repeating the steps 1-4 after the network topology structure is changed or the service load is greatly fluctuated.
Simulation setting and performance analysis:
the NS3 software network simulator is used for simulation, a network scene with large node difference is considered, as shown in figure 2, a MAC protocol is CSMA/CA, related parameters refer to an IEEE802.11a protocol standard, 9 nodes are set to participate in communication, the maximum communication hop number is 2, and the load of a central node is obviously higher than that of other nodes. The network topology is shown in fig. 2, and the parameter settings are shown in table 1.
TABLE 1
Figure BDA0002872142670000106
Figure BDA0002872142670000111
In the simulation scenario shown in fig. 2, there are 0,1,2,3,4,5,6,7,8 nodes participating in the communication. The node 0 can only communicate with the node 1, the node 1 can communicate with all nodes, and the node 1 and the nodes 2-8 are neighbors. The set communication link includes: 0- >1,1- >4,2- >0,3- >0,4- >5,5- >6,6- >7,7- >8, and the load of all communication links is the same. The node 1 is a heavy service node whose service load accounts for 3/9 of the total service load, and belongs to a node with a larger weight.
And (3) simulation result analysis:
according to the invention, a competition window design scheme adopting a game theory is simulated and compared with a default competition window design scheme, the transmission success rate and the average time delay are selected as judgment indexes, and a simulation result can be analyzed from three aspects of a whole body, a central node and a marginal node.
As shown in fig. 3 and 4, the balancing strategy adopts game decisions of different backoff windows of different nodes under the condition that the network is less than or equal to 3Mbps, and compared with the default backoff strategy, the average time delay is increased, the transmission success rate is also improved, but the amplitudes are small.
As for the central node (node 1), the network performance is as shown in fig. 5 and fig. 6, and it can be seen from fig. 5 and fig. 6 that although the traffic load of the central node is heavier, after the balanced back-off strategy is adopted, the central node obtains more opportunities to access the channel due to the extension of the back-off windows of other nodes, so that the average delay is reduced from 3.7ms to 1.7ms before, the improvement effect is obvious, and the transmission success rate is maintained at 100%.
For the edge node (node 0), the network performance is as shown in fig. 7 and fig. 8, and it can be seen from fig. 7 and fig. 8 that for the edge node, the transmission success rate can be greatly improved by about 10% by using the balanced back-off strategy under the medium-low service load, while the average delay is increased by only 2-3ms, and the delay-insensitive service can be ignored.
In summary, under low traffic load, the network employs a balanced backoff strategy, and the average delay of the network is almost the same as that of the default strategy, which results in a small increase in the transmission success rate. For the edge node, the transmission success rate can be effectively improved by adopting a balancing strategy; for the central node, the average time delay can be effectively reduced. After the medium and high load, the network switches the back-off strategy, and the performance is kept consistent with the default back-off strategy. The back-off algorithm based on the cooperative game model can improve the transmission success rate of the whole and edge nodes in a partial low-load differential network under the condition of hardly influencing average time delay, and can further reduce the time delay of the central node under the condition of maintaining the high transmission success rate of the central node. The game strategy is suitable for a network with a central node of a time delay sensitive type and other nodes of a time delay insensitive type.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (9)

1.一种基于博弈的Q学习竞争窗口调整方法,其特征在于,包括:1. a game-based Q learning competition window adjustment method, is characterized in that, comprises: 步骤1:初始化网络节点设置,包括通信协议、网络拓扑、业务到达模型和物理层标准,开始通信后网络通过广播路由信息的方式,进行自组网并建立路由表,初始状态下,节点按照默认竞争窗口大小进行退避,此后按照Q学习算法输出的动作调整竞争窗口大小进行退避;Step 1: Initialize the network node settings, including the communication protocol, network topology, service arrival model and physical layer standards. After the communication starts, the network performs ad hoc networking and establishes a routing table by broadcasting routing information. In the initial state, the nodes follow the default The size of the competition window is used to back off, and then the size of the competition window is adjusted to back off according to the action output by the Q-learning algorithm; 步骤2:全网节点通过路由表获知一跳邻居节点个数,并通过RTS/CTS信令广播至邻居节点,同时节点收到其邻居节点广播的RTS/CTS信息,获知并计算得出其邻居节点的平均邻居节点个数,各节点通过查询缓存区待发送的业务统计出单位时间内的业务负载大小,求出相应的负载因子;Step 2: The whole network node obtains the number of one-hop neighbor nodes through the routing table, and broadcasts it to the neighbor nodes through RTS/CTS signaling. At the same time, the node receives the RTS/CTS information broadcast by its neighbor node, and learns and calculates its neighbors. The average number of neighbor nodes of the node, each node counts the service load size per unit time by querying the service to be sent in the buffer area, and obtains the corresponding load factor; 步骤3:计算节点在网络中的权重大小,并进行广播,节点通过一跳范围内最大权重值和最小权重值计算出网络差异性指标,然后根据网路差异性指标进行博弈;Step 3: Calculate the weight of the node in the network and broadcast it. The node calculates the network difference index through the maximum weight value and the minimum weight value within a hop range, and then conducts the game according to the network difference index; 步骤4:若网络差异性大于预设阈值,则采取均衡退避策略,网络中权重最大的节点在强化学习中采用较小的竞争窗口状态集合,其他节点在强化学习中采用较大的竞争窗口状态集合;否则采用默认退避策略,网络中所有的节点采取相同的竞争窗口状态集合;Step 4: If the network difference is greater than the preset threshold, a balanced back-off strategy is adopted. The node with the largest weight in the network adopts a smaller competition window state set in reinforcement learning, and other nodes adopt a larger competition window state in reinforcement learning. set; otherwise, the default backoff strategy is adopted, and all nodes in the network adopt the same set of contention window states; 步骤5:网络中各节点按照步骤4产生的竞争窗口状态集合进行Q学习,输出最优竞争窗口区间,并依此进行通信;Step 5: Each node in the network performs Q-learning according to the contention window state set generated in step 4, outputs the optimal competition window interval, and communicates accordingly; 步骤6:网络拓扑结构发生改变或业务负载产生较大波动后,重复执行步骤2-5。Step 6: Repeat steps 2-5 after the network topology changes or the service load fluctuates greatly. 2.根据权利要求1所述的基于博弈的Q学习竞争窗口调整方法,其特征在于,通过查询路由表和统计流量的方式,确认节点在网络中的重要性,对于任意节点k,其通过路由表查询获取本节点的一跳邻居节点个数Nk,通过RTS/CTS附加位信息获取一跳通信范围内邻居节点的邻居节点个数∑iNi,并计算出邻居节点的平均邻居个数
Figure FDA0002872142660000011
Figure FDA0002872142660000012
对于任意节点k,统计当前时间段内的业务负载大小Lk,根据业务负载Lk对信道传输速率Rk的相对大小划分节点是否为重业务节点,以负载因子lk表示:
2. the game-based Q-learning competition window adjustment method according to claim 1, is characterized in that, by the mode of query routing table and statistical flow, confirm the importance of node in network, for any node k, it is by routing Query the table to obtain the number N k of one-hop neighbor nodes of this node, obtain the number of neighbor nodes ∑ i N i within the communication range of one hop through the RTS/CTS additional bit information, and calculate the average number of neighbor nodes of the neighbor node.
Figure FDA0002872142660000011
Figure FDA0002872142660000012
For any node k, count the service load size L k in the current time period, and classify whether the node is a heavy service node according to the relative size of the service load L k to the channel transmission rate R k , which is represented by the load factor lk :
Figure FDA0002872142660000013
Figure FDA0002872142660000013
根据负载因子lk、邻居节点数个数Nk和邻居节点的平均邻居个数
Figure FDA0002872142660000014
计算权重:
According to the load factor l k , the number of neighbor nodes N k and the average number of neighbors of neighbor nodes
Figure FDA0002872142660000014
Calculate weights:
Figure FDA0002872142660000021
Figure FDA0002872142660000021
其中,b为常数,为保证边缘节点在网络中的基本权重。Among them, b is a constant, in order to ensure the basic weight of edge nodes in the network.
3.根据权利要求1所述的基于博弈的Q学习竞争窗口调整方法,其特征在于,对一跳通信范围内权重最大的节点,通过RTS/CTS控制信令广播权重yi,获知一跳范围内最小的节点权重为
Figure FDA0002872142660000022
计算差异指标
Figure FDA0002872142660000023
根据G的取值确定网络节点差异性的大小;
3. game-based Q learning competition window adjustment method according to claim 1, is characterized in that, to the node with the largest weight in one-hop communication range, broadcast weight yi by RTS/CTS control signaling, learn one-hop range The smallest node weight in the
Figure FDA0002872142660000022
Calculate the difference indicator
Figure FDA0002872142660000023
Determine the size of the network node difference according to the value of G;
若G值超过预设阈值,则判定网络差异性大,采用均衡退避策略:权重最大的节点采用较小的竞争窗口区间,即相同右值下更小的左值;权重最小的节点采用较大的竞争窗口区间,即相同右值下更大的左值。If the G value exceeds the preset threshold, it is determined that the network difference is large, and a balanced back-off strategy is adopted: the node with the largest weight adopts a smaller contention window interval, that is, a smaller left value under the same right value; the node with the smallest weight adopts a larger The contention window interval of , that is, the larger lvalue under the same rvalue.
4.根据权利要求1所述的基于博弈的Q学习竞争窗口调整方法,其特征在于,以节点作为智能体,退避算法中竞争窗口的区间左值作为环境状态集合,即CWmin=2s-1,以单位时间内竞争窗口左值CWmin的取值作为动作集合
Figure FDA0002872142660000024
Figure FDA0002872142660000025
以网络传输成功率和平均时延为优化目标,进行学习,其更新公式为:
4. The game-based Q-learning competition window adjustment method according to claim 1, wherein the node is used as the agent, and the interval left value of the competition window in the back-off algorithm is used as the environment state set, that is, CW min =2 s − 1. Take the value of the left value CW min of the competition window in unit time as the action set
Figure FDA0002872142660000024
Figure FDA0002872142660000025
Taking the network transmission success rate and average delay as the optimization goals, learning is carried out, and the update formula is:
Q(S,A)←Q(S,A)+α[r+γmaxaQ(S',a)-Q(S,A)]Q(S,A)←Q(S,A)+α[r+γmax a Q(S',a)-Q(S,A)] 其中,γ为折扣因子,表示过去采用的动作对当前动作的影响大小;r为奖励,表示在当前状态S下采取动作A所获得的由传输成功率和平均时延指标评价的奖励函数大小;α为收敛因子,是影响收敛速度的主要因子;S'为过去状态。Among them, γ is the discount factor, which represents the influence of the actions taken in the past on the current action; r is the reward, which represents the reward function obtained by taking the action A in the current state S, which is evaluated by the transmission success rate and the average delay index; α is the convergence factor, which is the main factor affecting the convergence speed; S' is the past state.
5.一种基于博弈的Q学习竞争窗口调整系统,其特征在于,包括:5. A game-based Q learning competition window adjustment system, characterized in that, comprising: 模块M1:初始化网络节点设置,包括通信协议、网络拓扑、业务到达模型和物理层标准,开始通信后网络通过广播路由信息的方式,进行自组网并建立路由表,初始状态下,节点按照默认竞争窗口大小进行退避,此后按照Q学习算法输出的动作调整竞争窗口大小进行退避;Module M1: Initialize the network node settings, including the communication protocol, network topology, service arrival model and physical layer standard. After the communication starts, the network performs ad hoc networking and establishes a routing table by broadcasting routing information. In the initial state, the nodes follow the default The size of the competition window is used to back off, and then the size of the competition window is adjusted to back off according to the action output by the Q-learning algorithm; 模块M2:全网节点通过路由表获知一跳邻居节点个数,并通过RTS/CTS信令广播至邻居节点,同时节点收到其邻居节点广播的RTS/CTS信息,获知并计算得出其邻居节点的平均邻居节点个数,各节点通过查询缓存区待发送的业务统计出单位时间内的业务负载大小,求出相应的负载因子;Module M2: The entire network node obtains the number of one-hop neighbor nodes through the routing table, and broadcasts it to the neighbor nodes through RTS/CTS signaling. At the same time, the node receives the RTS/CTS information broadcast by its neighbor nodes, and learns and calculates its neighbors. The average number of neighbor nodes of the node, each node counts the service load size per unit time by querying the service to be sent in the buffer area, and obtains the corresponding load factor; 模块M3:计算节点在网络中的权重大小,并进行广播,节点通过一跳范围内最大权重值和最小权重值计算出网络差异性指标,然后根据网路差异性指标进行博弈;Module M3: Calculate the weight of the node in the network and broadcast it. The node calculates the network difference index through the maximum weight value and the minimum weight value within one hop, and then conducts a game according to the network difference index; 模块M4:若网络差异性大于预设阈值,则采取均衡退避策略,网络中权重最大的节点在强化学习中采用较小的竞争窗口状态集合,其他节点在强化学习中采用较大的竞争窗口状态集合;否则采用默认退避策略,网络中所有的节点采取相同的竞争窗口状态集合;Module M4: If the network difference is greater than the preset threshold, a balanced backoff strategy is adopted. The node with the largest weight in the network adopts a smaller competition window state set in reinforcement learning, and other nodes adopt a larger competition window state in reinforcement learning. set; otherwise, the default backoff strategy is adopted, and all nodes in the network adopt the same set of contention window states; 模块M5:网络中各节点按照模块M4产生的竞争窗口状态集合进行Q学习,输出最优竞争窗口区间,并依此进行通信;Module M5: each node in the network performs Q-learning according to the competition window state set generated by module M4, outputs the optimal competition window interval, and communicates accordingly; 模块M6:网络拓扑结构发生改变或业务负载产生较大波动后,依次调用模块M2-M5。Module M6: After the network topology changes or the service load fluctuates greatly, the modules M2-M5 are called in sequence. 6.根据权利要求5所述的基于博弈的Q学习竞争窗口调整系统,其特征在于,通过查询路由表和统计流量的方式,确认节点在网络中的重要性,对于任意节点k,其通过路由表查询获取本节点的一跳邻居节点个数Nk,通过RTS/CTS附加位信息获取一跳通信范围内邻居节点的邻居节点个数∑iNi,并计算出邻居节点的平均邻居个数
Figure FDA0002872142660000031
Figure FDA0002872142660000032
对于任意节点k,统计当前时间段内的业务负载大小Lk,根据业务负载Lk对信道传输速率Rk的相对大小划分节点是否为重业务节点,以负载因子lk表示:
6. game-based Q-learning competition window adjustment system according to claim 5, characterized in that, by querying the routing table and the mode of traffic statistics, the importance of the node in the network is confirmed, and for any node k, it is routed through Query the table to obtain the number N k of one-hop neighbor nodes of this node, obtain the number of neighbor nodes ∑ i N i within the communication range of one hop through the RTS/CTS additional bit information, and calculate the average number of neighbor nodes of the neighbor node.
Figure FDA0002872142660000031
Figure FDA0002872142660000032
For any node k, count the service load size L k in the current time period, and classify whether the node is a heavy service node according to the relative size of the service load L k to the channel transmission rate R k , which is represented by the load factor lk :
Figure FDA0002872142660000033
Figure FDA0002872142660000033
根据负载因子lk、邻居节点数个数Nk和邻居节点的平均邻居个数
Figure FDA0002872142660000034
计算权重:
According to the load factor l k , the number of neighbor nodes N k and the average number of neighbors of neighbor nodes
Figure FDA0002872142660000034
Calculate weights:
Figure FDA0002872142660000035
Figure FDA0002872142660000035
其中,b为常数,为保证边缘节点在网络中的基本权重。Among them, b is a constant, in order to ensure the basic weight of edge nodes in the network.
7.根据权利要求5所述的基于博弈的Q学习竞争窗口调整系统,其特征在于,对一跳通信范围内权重最大的节点,通过RTS/CTS控制信令广播权重yi,获知一跳范围内最小的节点权重为
Figure FDA0002872142660000036
计算差异指标
Figure FDA0002872142660000037
根据G的取值确定网络节点差异性的大小;
7. game-based Q learning competition window adjustment system according to claim 5, is characterized in that, to the node with the largest weight in one-hop communication range, broadcast weight yi by RTS/CTS control signaling, learn one-hop range The smallest node weight in the
Figure FDA0002872142660000036
Calculate the difference indicator
Figure FDA0002872142660000037
Determine the size of the network node difference according to the value of G;
若G值超过预设阈值,则判定网络差异性大,采用均衡退避策略:权重最大的节点采用较小的竞争窗口区间,即相同右值下更小的左值;权重最小的节点采用较大的竞争窗口区间,即相同右值下更大的左值。If the G value exceeds the preset threshold, it is determined that the network difference is large, and a balanced back-off strategy is adopted: the node with the largest weight adopts a smaller contention window interval, that is, a smaller left value under the same right value; the node with the smallest weight adopts a larger The contention window interval of , that is, the larger lvalue under the same rvalue.
8.根据权利要求5所述的基于博弈的Q学习竞争窗口调整系统,其特征在于,以节点作为智能体,退避算法中竞争窗口的区间左值作为环境状态集合,即CWmin=2s-1,以单位时间内竞争窗口左值CWmin的取值作为动作集合
Figure FDA0002872142660000041
Figure FDA0002872142660000042
以网络传输成功率和平均时延为优化目标,进行学习,其更新公式为:
8. The game-based Q-learning competition window adjustment system according to claim 5, wherein the node is used as the agent, and the interval left value of the competition window in the backoff algorithm is used as the environment state set, that is, CW min =2 s − 1. Take the value of the left value CW min of the competition window in unit time as the action set
Figure FDA0002872142660000041
Figure FDA0002872142660000042
Taking the network transmission success rate and average delay as the optimization goals, learning is carried out, and the update formula is:
Q(S,A)←Q(S,A)+α[r+γmaxaQ(S',a)-Q(S,A)]Q(S,A)←Q(S,A)+α[r+γmax a Q(S',a)-Q(S,A)] 其中,γ为折扣因子,表示过去采用的动作对当前动作的影响大小;r为奖励,表示在当前状态S下采取动作A所获得的由传输成功率和平均时延指标评价的奖励函数大小;α为收敛因子,是影响收敛速度的主要因子;S'为过去状态。Among them, γ is the discount factor, which represents the influence of the actions taken in the past on the current action; r is the reward, which represents the reward function obtained by taking the action A in the current state S, which is evaluated by the transmission success rate and the average delay index; α is the convergence factor, which is the main factor affecting the convergence speed; S' is the past state.
9.一种存储有计算机程序的计算机可读存储介质,其特征在于,所述计算机程序被处理器执行时实现权利要求1至4中任一项所述的方法的步骤。9 . A computer-readable storage medium storing a computer program, wherein, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 4 are implemented. 10 .
CN202011620219.3A 2020-12-30 2020-12-30 Game-based Q-learning competition window adjustment method, system and medium Active CN112637965B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011620219.3A CN112637965B (en) 2020-12-30 2020-12-30 Game-based Q-learning competition window adjustment method, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011620219.3A CN112637965B (en) 2020-12-30 2020-12-30 Game-based Q-learning competition window adjustment method, system and medium

Publications (2)

Publication Number Publication Date
CN112637965A true CN112637965A (en) 2021-04-09
CN112637965B CN112637965B (en) 2022-06-10

Family

ID=75287341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011620219.3A Active CN112637965B (en) 2020-12-30 2020-12-30 Game-based Q-learning competition window adjustment method, system and medium

Country Status (1)

Country Link
CN (1) CN112637965B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113438744A (en) * 2021-06-23 2021-09-24 嘉兴学院 Sectional type backoff algorithm based on weighted reinforcement learning
CN113923794A (en) * 2021-11-12 2022-01-11 中国人民解放军国防科技大学 Distributed dynamic spectrum access method based on multi-agent reinforcement learning
CN116390077A (en) * 2023-03-16 2023-07-04 西安电子科技大学 Unmanned aerial vehicle neighbor node discovery method based on DQN network under 6G space-air-ground integrated network
CN116634598A (en) * 2023-07-25 2023-08-22 中国人民解放军军事科学院系统工程研究院 Method for adjusting cluster broadcasting business competition window of unmanned aerial vehicle based on potential game
CN118139201A (en) * 2024-05-08 2024-06-04 上海朗力半导体有限公司 Channel competition optimization method, system, equipment and medium based on reinforcement learning
CN118227815A (en) * 2024-05-24 2024-06-21 浙江邦盛科技股份有限公司 A dynamic intermediate state aggregation graph calculation method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107426772A (en) * 2017-07-04 2017-12-01 北京邮电大学 A kind of dynamic contention window method of adjustment, device and equipment based on Q study
US20180367286A1 (en) * 2017-06-19 2018-12-20 Mitsubishi Electric Research Laboratories, Inc. System for Coexistence of Wi-Fi HaLow Network and Low-Rate Wireless Personal Area Network (LR-WPAN)
CN110336620A (en) * 2019-07-16 2019-10-15 沈阳理工大学 A QL-UACW Backoff Method Based on MAC Layer Fair Access
WO2020068127A1 (en) * 2018-09-28 2020-04-02 Ravikumar Balakrishnan System and method using collaborative learning of interference environment and network topology for autonomous spectrum sharing
CN111867139A (en) * 2020-07-06 2020-10-30 上海交通大学 Implementation method and system of deep neural network adaptive backoff strategy based on Q-learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180367286A1 (en) * 2017-06-19 2018-12-20 Mitsubishi Electric Research Laboratories, Inc. System for Coexistence of Wi-Fi HaLow Network and Low-Rate Wireless Personal Area Network (LR-WPAN)
CN107426772A (en) * 2017-07-04 2017-12-01 北京邮电大学 A kind of dynamic contention window method of adjustment, device and equipment based on Q study
WO2020068127A1 (en) * 2018-09-28 2020-04-02 Ravikumar Balakrishnan System and method using collaborative learning of interference environment and network topology for autonomous spectrum sharing
CN110336620A (en) * 2019-07-16 2019-10-15 沈阳理工大学 A QL-UACW Backoff Method Based on MAC Layer Fair Access
CN111867139A (en) * 2020-07-06 2020-10-30 上海交通大学 Implementation method and system of deep neural network adaptive backoff strategy based on Q-learning

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113438744A (en) * 2021-06-23 2021-09-24 嘉兴学院 Sectional type backoff algorithm based on weighted reinforcement learning
CN113438744B (en) * 2021-06-23 2022-07-05 嘉兴学院 A Segmented Backoff Algorithm Based on Weighted Reinforcement Learning
CN113923794A (en) * 2021-11-12 2022-01-11 中国人民解放军国防科技大学 Distributed dynamic spectrum access method based on multi-agent reinforcement learning
CN116390077A (en) * 2023-03-16 2023-07-04 西安电子科技大学 Unmanned aerial vehicle neighbor node discovery method based on DQN network under 6G space-air-ground integrated network
CN116634598A (en) * 2023-07-25 2023-08-22 中国人民解放军军事科学院系统工程研究院 Method for adjusting cluster broadcasting business competition window of unmanned aerial vehicle based on potential game
CN116634598B (en) * 2023-07-25 2023-10-31 中国人民解放军军事科学院系统工程研究院 Method for adjusting cluster broadcasting business competition window of unmanned aerial vehicle based on potential game
CN118139201A (en) * 2024-05-08 2024-06-04 上海朗力半导体有限公司 Channel competition optimization method, system, equipment and medium based on reinforcement learning
CN118139201B (en) * 2024-05-08 2024-07-16 上海朗力半导体有限公司 Channel competition optimization method, system, equipment and medium based on reinforcement learning
CN118227815A (en) * 2024-05-24 2024-06-21 浙江邦盛科技股份有限公司 A dynamic intermediate state aggregation graph calculation method, device, equipment and storage medium
CN118227815B (en) * 2024-05-24 2024-08-13 浙江邦盛科技股份有限公司 A dynamic intermediate state aggregation graph calculation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112637965B (en) 2022-06-10

Similar Documents

Publication Publication Date Title
CN112637965A (en) Game-based Q learning competition window adjusting method, system and medium
CN103327556B (en) The dynamic network system of selection of optimizing user QoE in heterogeneous wireless network
CN113709701A (en) Millimeter wave vehicle networking combined beam distribution and relay selection method
Goudarzi et al. Non-cooperative beacon rate and awareness control for VANETs
CN112188504A (en) Multi-user cooperative anti-interference system and dynamic spectrum cooperative anti-interference method
CN107135469B (en) A Distributed User Access Method
CN108684065B (en) Ant colony optimization-based relay selection method in Internet of vehicles
Zhang et al. Distributed age-of-information scheduling with noma via deep reinforcement learning
Beheshtifard et al. An adaptive channel assignment in wireless mesh network: the learning automata approach
CN106028345A (en) A Small Cell Capacity and Coverage Optimization Method Based on Adaptive Tabu Search
CN115087046A (en) Transmission data rate adjusting method for network congestion control in Internet of vehicles
Pei et al. MAC contention protocol based on reinforcement learning for IoV communication environments
CN113301638A (en) D2D communication spectrum allocation and power control algorithm based on Q learning
Ghazvini et al. GCW: A game theoretic contention window adjustment approach for IEEE 802.11 WLANs
Komathy et al. Study of co-operation among selfish neighbors in manet under evolutionary game theoretic model
CN113163426B (en) High-density AP distribution scene GCN-DDPG wireless local area network parameter optimization method and system
CN112019372B (en) Vegetable wireless monitoring network topology virtualization control method and device
Chincoli et al. Predictive power control in wireless sensor networks
CN116367342A (en) Unmanned aerial vehicle ad hoc network channel access method and system under broadcast scene
CN116347508A (en) A non-cooperative game congestion control method in D2D mode of Internet of Vehicles
CN103313268B (en) A kind of multiple target topology control method based on heterogeneous radio access networks excitation cooperation
Frantti Expert system for open-loop power control of wireless local area networks
Fang et al. Improve quality of experience of users by optimizing handover parameters in mobile networks
CN116634598B (en) Method for adjusting cluster broadcasting business competition window of unmanned aerial vehicle based on potential game
Jiang et al. Multi-Agent Deep Reinforcement Learning for Channel Resource Allocation in Vehicular Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant