[go: up one dir, main page]

CN109195135A - Base station selecting method based on deeply study in LTE-V - Google Patents

Base station selecting method based on deeply study in LTE-V Download PDF

Info

Publication number
CN109195135A
CN109195135A CN201810885951.XA CN201810885951A CN109195135A CN 109195135 A CN109195135 A CN 109195135A CN 201810885951 A CN201810885951 A CN 201810885951A CN 109195135 A CN109195135 A CN 109195135A
Authority
CN
China
Prior art keywords
base station
lte
dqn
function
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810885951.XA
Other languages
Chinese (zh)
Other versions
CN109195135B (en
Inventor
郭爱煌
谢浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201810885951.XA priority Critical patent/CN109195135B/en
Publication of CN109195135A publication Critical patent/CN109195135A/en
Application granted granted Critical
Publication of CN109195135B publication Critical patent/CN109195135B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/0289Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W48/00Access restriction; Network selection; Access point selection
    • H04W48/20Selecting an access point

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本发明涉及一种LTE‑V中基于深度强化学习的基站选择方法,包括以下步骤:1)根据LTE‑V网络通信特点及基站选择性能指标,构建Q函数;2)移动管理单元获取网络内车辆的状态信息,构建状态矩阵,并存入经验回放池;3)以经验回放池为样本,基于构建的Q函数,采用竞争‑双重训练方式训练获得一用于选择最优接入基站的主DQN;4)以训练获得的主DQN对输入信息进行处理,输出选择接入基站。与现有技术相比,本发明同时兼顾通信的时延性能和负载均衡性能,使得车辆能够及时可靠地进行通信,具有基站选择效率高、精确度高等优点。

The present invention relates to a base station selection method based on deep reinforcement learning in LTE-V, comprising the following steps: 1) constructing a Q function according to LTE-V network communication characteristics and base station selection performance indicators; 2) a mobility management unit acquires vehicles in the network 3) Taking the experience replay pool as a sample, based on the constructed Q function, a main DQN for selecting the optimal access base station is obtained by training in the competitive-dual training method. ; 4) Process the input information with the main DQN obtained by training, and output the selected access base station. Compared with the prior art, the present invention takes into account both the delay performance and the load balancing performance of the communication, so that the vehicle can communicate in a timely and reliable manner, and has the advantages of high base station selection efficiency and high accuracy.

Description

Base station selecting method based on deeply study in LTE-V
Technical field
The present invention relates to the LTE-V communication technologys and DRL technology, and in particular to a kind of base based on the continuous decision of neural network It stands selection method, for reducing LTE-V network congestion rate.
Background technique
LTE-V (long term evolution-vehicle, Long Term Evolution-Vehicl) is that China has independent intellectual property right V2X technology, be the ITS based on timesharing long term evolution (Time Division-Long Term Evolution, TD-LTE) System scheme belongs to the important application branch of the subsequent evolution technology of LTE.2 months 2015,3GPP working group LTE-V standard Change research work formally to start, the proposition of Release 14 indicates that LTE-V technical standard is formulated work and counted in 3GPP working group Formal beginning in drawing, while compatible and performance be substantially improved will be also obtained in 5G.LTE V2V Core part in The end of the year 2016 finished, and LTE V2X Core part finishes at the beginning of 2017, and V2V is the core of LTE-V, it is contemplated that the end of the year 2018 are complete Knot, system and equipment based on LTE-V technical standard are estimated will to start commercialization after the year two thousand twenty.
In peak time and congested link, the very big periodic broadcast of the load capacity that road safety and traffic efficiency can generate Information.If without reasonably congestion control scheme, load caused by these message will lead to serious message delay, and Acid test can be brought to LTE network capacity.In addition to this, the base that vehicle selects channel conditions best by random competition It stands, this is easy to cause network congestion in the biggish situation of vehicle flowrate.Therefore, it is necessary to design one kind effectively simultaneously for LTE-V And eNB (best base station, evolved node B) selection algorithm that robustness is good.
Summary of the invention
The purpose of the present invention is to the delay performance for the cellular communications network for introducing the LTE-V communication technology and network congestions Deficiency existing for aspect, and the base station selecting method based on deeply study in a kind of LTE-V is provided.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of base station selecting method based on deeply study in LTE-V, comprising the following steps:
1) according to LTE-V network communication feature and base station selected performance indicator, Q function is constructed;
2) mobile management unit obtains the status information of vehicle in network, constructs state matrix, and be stored in experience replay pond;
3) using experience replay pond as sample, the Q function based on building obtains one using competition-dual training method training For selecting the main DQN of optimal access base station;
4) the main DQN obtained with training handles input information, output selection access base station.
Further, the LTE-V network communication feature includes communication bandwidth and signal-to-noise ratio, and the base station selected performance refers to Mark includes user's receiving velocity and load of base station.
Further, the Q function specifically constructs as follows:
In formula, μ indicates that user's receiving velocity, L indicate that load of base station, R indicate that reward function, α indicate learning rate, Q (st, at) indicating that being in state s in t moment takes movement a to can be obtained expectation reward, subscript s' expression is taken dynamic at state s Make next state of a entrance, γ ∈ [0,1] is discount factor, w1、w2For weight coefficient,It indicates in t+1 It carves and takes different movements to can be obtained greatest hope reward in state s.
Further, in the competition-dual training method:
A target DQN and a main DQN are established based on Q function, base station, the Q function maxima of the base station are selected by main DQN It is calculated and is generated by target DQN.
Further, in the competition-dual training method, whether restrained using loss function as whether training of judgement is tied The foundation of beam, the loss function are as follows:
In formula, rt+1Indicate that being located at state s at the t+1 moment takes the reward size harvested after movement a, QtargetIndicate mesh Mark the Q function maxima that DQN is generated, QmainIndicate the Q function maxima that main DQN is generated, γ ∈ [0,1] is discount factor.
Further, in the competition-dual training method, training selects access base using ε-greedy algorithm every time It stands, while updating network parameter using back-propagation algorithm and adaptability moments estimation algorithm.
Further, the exploration probability of the ε-greedy algorithm is as follows:
εt+1(s)=δ × f (s, a, σ)+(1- δ) × εt(s)
In formula, δ is current state selectable movement sum, and f (s, a, σ) characterizes the uncertainty of environment, σ ∈ [0, 1] direction and sensitivity, ε are indicatedt+1(s) it indicates to be located at the probability that state s takes DQN generation movement a at the t+1 moment.
Further, in the competition-dual training method, optimal hyper parameter is selected using cross-validation method.
Further, the capacity in experience replay pond is T, preferential to delete most when the quantity of the state matrix of deposit is greater than T The state matrix being early stored in.
Compared with prior art, the present invention combines the delay performance and load-balancing performance of communication, enables vehicle It is enough timely and reliably to be communicated, it has the advantages that
1) present invention is according to the relevant Q function of LTE-V communication special point design, to convert reinforcing for congestion control problem Optimal decision-making problem in study, improves base station selected efficiency.
2) present invention is used as Agent (generation with MME (mobile management unit, Mobility Management Entity) Reason), base station side network congestion probability and the receiving velocity of receiving end are considered in car networking to design reward (reward) letter Number proposes the base learnt based on deeply in conjunction with Q (movement-value) function modelling is carried out in LTE-V the characteristics of vehicle communication It stands eNB selection method, makes the congestion probability of network under a maximum value, to guarantee the load balancing of whole network.
3) the present invention is based on competition-double-depth Q networks (Dueling-Double Deep Q Network) to fit within The Q function modeled under LTE-V network, and using reception delay, network congestion probability as base station selected standard, it is selected for vehicle It is most not susceptible to the base station of network congestion, guarantees LTE-V network delay performance and load balancing, to promote communication performance.
4) present invention exists, and training selects access base station using ε-greedy algorithm every time, while being calculated using backpropagation Method and adaptability moments estimation (Adaptive moment estimation, Adam) algorithm update network parameter, effectively increase Motion space is rich
5) present invention carries out hyper parameter selection using cross-validation method, more preferably network model can be obtained, to improve Base station selected precision.
Detailed description of the invention
Fig. 1 is application scenarios schematic diagram of the invention;
Fig. 2 is flow diagram of the invention.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to Following embodiments.
The present invention be directed to long term evolution-vehicle (Long Term Evolution-Vehicle, LTE-V) under vehicle with Machine contention access network, the problem of be easy to causeing network congestion, provide the base station choosing based on deeply study in a kind of LTE-V Selection method combines the delay performance and load-balancing performance of communication, allows the vehicle to timely and reliably be communicated, answer It is as shown in Figure 1 with scene.The present invention using mobile management unit in LTE core network (Mobility Management Entity, MME) as agency (agent), while considering network lateral load and receiving end receiving velocity, the matching for completing vehicle and eNB is asked Topic reduces network congestion probability, reduces network delay.Use competition-double-depth Q network (Dueling-Double Deep Q Network, DQN) come fit object movement-evaluation function (action-value function), dimensional state input is completed, The conversion of low-dimensional movement output.
As shown in Fig. 2, method includes the following steps:
Step 1: according to LTE-V network communication feature and base station selected performance indicator, constructing Q function.
The LTE-V network communication feature includes communication bandwidth Bandwidth and signal-to-noise ratio SINR, the base station selected property Energy index includes user's receiving velocity μ and load of base station L, then Q function specifically constructs as follows:
μ=Bandwidth × log2(1+SINR)
In formula, μ indicates that user's receiving velocity, L indicate that load of base station, R indicate that reward function, α indicate learning rate, Q (st, at) indicating that being in state s in t moment takes movement a to can be obtained expectation reward, subscript s' expression is taken dynamic at state s Make next state of a entrance, subscript k indicates k-th of base station, and γ ∈ [0,1] is discount factor, w1、w2For weight coefficient,Indicate that being in state s at the t+1 moment takes different movements to can be obtained greatest hope reward.
Step 2: mobile management unit obtains the status information of vehicle in network, constructs state matrix, and is stored in experience and returns Put pond.The capacity in experience replay pond is T, when the quantity of the state matrix of deposit is greater than T, preferentially deletes the state being stored in earliest Matrix.
Step 3: randomly selecting a part of sample-feed DQN study from experience replay pond D when training every time.With experience replay Pond is sample, the Q function based on building, obtains one for selecting optimal access base station using competition-dual training method training Main DQN.
The present invention is intended using competition-double-depth Q network (Dueling-Double Deep Q Network, DQN) It closes target action-evaluation function (action-value function), completes dimensional state input, low-dimensional movement output turns Change.Competition-dual training method specifically: a target DQN and a main DQN are established based on Q function, main DQN passes through its Q function most Big value (abbreviation Q value) selects eNB, then goes to obtain this Q value of the movement on target DQN.Master network is responsible for selecting eNB in this way, And the Q value of this chosen eNB is then generated by target DQN.
In the competition-dual training method, whether the foundation whether terminated as training of judgement is restrained using loss function, The loss function are as follows:
In formula, rt+1Indicate that being located at state s at the t+1 moment takes the reward size harvested after movement a, QtargetIndicate mesh Mark the Q value that DQN is generated, QmainIndicate the Q value that main DQN is generated, γ ∈ [0,1] is discount factor.
Rich in order to increase motion space, training selects access base station using ε-greedy algorithm every time, makes simultaneously Network parameter is updated with back-propagation algorithm and adaptability moments estimation algorithm.
ε-greedy algorithm is the movement (utilizing) for having the probability selection of ε to be generated by DQN in each state, there is 1- The probability of ε takes movement (exploring) at random, it is therefore an objective to expand optional motion space.In training process of the present invention, according to probability ε It judges whether to explore, if so, random selection base station or no, then select the corresponding base station of Q function maxima.
The exploration probability of the ε-greedy algorithm is as follows:
εt+1(s)=δ × f (s, a, σ)+(1- δ) × εt(s)
In formula, δ is current state selectable movement sum, and f (s, a, σ) characterizes the uncertainty of environment, σ ∈ [0, 1] direction and sensitivity, ε are indicatedt+1(s) it indicates to be located at the probability that state s takes DQN generation movement a at the t+1 moment.
The forward-propagating of neural network, i.e. reasoning process calculate loss function Loss, profound nerve net using input Network is considered as multi hierarchical and nested function, and backpropagation is to be become using the chain rule differentiated to each of function Amount is differentiated, and carrys out more new variables using gradient.
Adam is a kind of adaptive learning rate optimization algorithm, even if the exponent-weighted average of variable first derivative is rectified The more new direction and size of positive gradient, second order gradient reconcile the learning rate size updated every time, so that change of gradient is cracking Variable update slows down.
In the competition-dual training method, optimal hyper parameter is selected using cross-validation method.Cross-validation method is one Kind hyper parameter selection algorithm, i.e., so-called tune ginseng.Training data is divided into K parts, is trained using K-1 parts therein, it is remaining Portion is used as test set, tests K times in this way, the average value on test set is taken to correspond to the performance of model as current hyper parameter collection. It repeats that optimal model can be obtained from M model M times.
Step 4: the main DQN obtained with training handles input information, output selection access base station.
After the convergence of DQN parameter, when practical application, only needs to retain main DQN, is directly exported and is selected according to its forward-propagating Access base station.
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be within the scope of protection determined by the claims.

Claims (9)

1.一种LTE-V中基于深度强化学习的基站选择方法,其特征在于,包括以下步骤:1. a base station selection method based on deep reinforcement learning in LTE-V, is characterized in that, comprises the following steps: 1)根据LTE-V网络通信特点及基站选择性能指标,构建Q函数;1) According to the characteristics of LTE-V network communication and the performance indicators selected by the base station, the Q function is constructed; 2)移动管理单元获取网络内车辆的状态信息,构建状态矩阵,并存入经验回放池;2) The mobility management unit obtains the state information of the vehicles in the network, constructs a state matrix, and stores it in the experience playback pool; 3)以经验回放池为样本,基于构建的Q函数,采用竞争-双重训练方式训练获得一用于选择最优接入基站的主DQN;3) Taking the experience playback pool as a sample, based on the constructed Q function, a main DQN for selecting the optimal access base station is obtained by training in a competition-dual training mode; 4)以训练获得的主DQN对输入信息进行处理,输出选择接入基站。4) Process the input information with the master DQN obtained by training, and output the selected access base station. 2.根据权利要求1所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述LTE-V网络通信特点包括通信带宽和信噪比,所述基站选择性能指标包括用户接收速率和基站负载。2. The base station selection method based on deep reinforcement learning in LTE-V according to claim 1, wherein the LTE-V network communication characteristics include communication bandwidth and signal-to-noise ratio, and the base station selection performance index includes user Receive rate and base station load. 3.根据权利要求2所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述Q函数具体构建如下:3. the base station selection method based on deep reinforcement learning in LTE-V according to claim 2, is characterized in that, described Q function is specifically constructed as follows: 式中,μ表示用户接收速率,L表示基站负载,R表示奖励函数,α表示学习率,Q(st,at)表示在t时刻处于状态s采取动作a所能获得的期望奖励,下标s'表示在状态s处采取动作a进入的下一个状态,γ∈[0,1]为折扣因子,w1、w2为权重系数,表示在t+1时刻处于状态s采取不同动作所能获得的最大期望奖励。In the formula, μ represents the user reception rate, L represents the base station load, R represents the reward function, α represents the learning rate, Q(s t , at ) represents the expected reward that can be obtained by taking action a in state s at time t , and the following The subscript s' represents the next state entered by taking action a at state s, γ∈[0,1] is the discount factor, w 1 , w 2 are the weight coefficients, Represents the maximum expected reward that can be obtained by taking different actions in state s at time t+1. 4.根据权利要求1所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述竞争-双重训练方式中:4. The base station selection method based on deep reinforcement learning in LTE-V according to claim 1, is characterized in that, in described competition-dual training mode: 基于Q函数建立一目标DQN和一主DQN,由主DQN选择基站,该基站的Q函数最大值由目标DQN计算生成。A target DQN and a main DQN are established based on the Q function, the main DQN selects the base station, and the maximum value of the Q function of the base station is calculated and generated by the target DQN. 5.根据权利要求1所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述竞争-双重训练方式中,以损失函数是否收敛作为判断训练是否结束的依据,所述损失函数为:5. The base station selection method based on deep reinforcement learning in LTE-V according to claim 1, is characterized in that, in the described competition-dual training mode, whether the loss function converges as the basis for judging whether the training ends, the described The loss function is: 式中,rt+1表示在t+1时刻位于状态s采取动作a之后收获的奖励大小,Qtarget表示目标DQN生成的Q函数最大值,Qmain表示主DQN生成的Q函数最大值,γ∈[0,1]是折扣因子。In the formula, r t+1 represents the reward size obtained after taking action a in state s at time t+1, Q target represents the maximum value of the Q function generated by the target DQN, Q main represents the maximum value of the Q function generated by the main DQN, γ ∈[0,1] is the discount factor. 6.根据权利要求1所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述竞争-双重训练方式中,每次训练使用ε-greedy算法来选择接入基站,同时使用反向传播算法和适应性矩估计算法更新网络参数。6. The base station selection method based on deep reinforcement learning in LTE-V according to claim 1, characterized in that, in the competition-dual training mode, each training uses an ε-greedy algorithm to select an access base station, and simultaneously The network parameters are updated using the backpropagation algorithm and the adaptive moment estimation algorithm. 7.根据权利要求6所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述ε-greedy算法的探索概率如下:7. The base station selection method based on deep reinforcement learning in LTE-V according to claim 6, is characterized in that, the exploration probability of described ε-greedy algorithm is as follows: εt+1(s)=δ×f(s,a,σ)+(1-δ)×εt(s)ε t+1 (s)=δ×f(s,a,σ)+(1-δ)×ε t (s) 式中,δ是当前状态可选择的动作总数,f(s,a,σ)来表征环境的不确定性,σ∈[0,1]表示方向灵敏度,εt+1(s)表示在t+1时刻位于状态s采取DQN生成动作a的概率。In the formula, δ is the total number of actions that can be selected in the current state, f(s, a, σ) represents the uncertainty of the environment, σ∈[0,1] represents the directional sensitivity, ε t+1 (s) represents at t The probability of taking a DQN to generate action a in state s at time +1. 8.根据权利要求1所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,所述竞争-双重训练方式中,使用交叉验证法选择最优的超参数。8 . The base station selection method based on deep reinforcement learning in LTE-V according to claim 1 , wherein, in the competition-dual training method, a cross-validation method is used to select the optimal hyperparameters. 9 . 9.根据权利要求1所述的LTE-V中基于深度强化学习的基站选择方法,其特征在于,经验回放池的容量为T,当存入的状态矩阵的数量大于T时,优先删除最早存入的状态矩阵。9. The base station selection method based on deep reinforcement learning in LTE-V according to claim 1, is characterized in that, the capacity of experience replay pool is T, when the quantity of the state matrix stored is greater than T, preferentially deletes the earliest stored state matrix. Enter the state matrix.
CN201810885951.XA 2018-08-06 2018-08-06 Base station selection method based on deep reinforcement learning in LTE-V Active CN109195135B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810885951.XA CN109195135B (en) 2018-08-06 2018-08-06 Base station selection method based on deep reinforcement learning in LTE-V

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810885951.XA CN109195135B (en) 2018-08-06 2018-08-06 Base station selection method based on deep reinforcement learning in LTE-V

Publications (2)

Publication Number Publication Date
CN109195135A true CN109195135A (en) 2019-01-11
CN109195135B CN109195135B (en) 2021-03-26

Family

ID=64920254

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810885951.XA Active CN109195135B (en) 2018-08-06 2018-08-06 Base station selection method based on deep reinforcement learning in LTE-V

Country Status (1)

Country Link
CN (1) CN109195135B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109743210A (en) * 2019-01-25 2019-05-10 电子科技大学 Multi-user access control method for UAV network based on deep reinforcement learning
CN109803338A (en) * 2019-02-12 2019-05-24 南京邮电大学 A kind of dual link base station selecting method based on degree of regretting
CN110493826A (en) * 2019-08-28 2019-11-22 重庆邮电大学 A kind of isomery cloud radio access network resources distribution method based on deeply study
CN110717600A (en) * 2019-09-30 2020-01-21 京东城市(北京)数字科技有限公司 Sample pool construction method and device, and algorithm training method and device
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 A terminal access selection method based on deep reinforcement learning
CN111065131A (en) * 2019-12-16 2020-04-24 深圳大学 Switching method, device and electronic device
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111181618A (en) * 2020-01-03 2020-05-19 东南大学 An intelligent reflective surface phase optimization method based on deep reinforcement learning
CN111243299A (en) * 2020-01-20 2020-06-05 浙江工业大学 Single cross port signal control method based on 3 DQN-PSER algorithm
WO2021002866A1 (en) * 2019-07-03 2021-01-07 Nokia Solutions And Networks Oy Reinforcement learning based inter-radio access technology load balancing under multi-carrier dynamic spectrum sharing
CN112468984A (en) * 2020-11-04 2021-03-09 国网上海市电力公司 Method for selecting address of power wireless private network base station and related equipment
CN113507503A (en) * 2021-06-16 2021-10-15 华南理工大学 Internet of vehicles resource allocation method with load balancing function
CN114584951A (en) * 2022-03-08 2022-06-03 南京航空航天大学 A joint computing offloading and resource allocation method based on multi-agent DDQN

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245608A (en) * 2015-10-23 2016-01-13 同济大学 Telematics network node screening and accessibility routing construction method based on self-encoding network
CN106910351A (en) * 2017-04-19 2017-06-30 大连理工大学 A traffic signal adaptive control method based on deep reinforcement learning
CN107705557A (en) * 2017-09-04 2018-02-16 清华大学 Road network signal control method and device based on depth enhancing network
US20180052825A1 (en) * 2016-08-16 2018-02-22 Microsoft Technology Licensing, Llc Efficient dialogue policy learning
CN107832836A (en) * 2017-11-27 2018-03-23 清华大学 Model-free depth enhancing study heuristic approach and device
US20180089553A1 (en) * 2016-09-27 2018-03-29 Disney Enterprises, Inc. Learning to schedule control fragments for physics-based character simulation and robots using deep q-learning
CN108365874A (en) * 2018-02-08 2018-08-03 电子科技大学 Based on the extensive MIMO Bayes compressed sensing channel estimation methods of FDD

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245608A (en) * 2015-10-23 2016-01-13 同济大学 Telematics network node screening and accessibility routing construction method based on self-encoding network
US20180052825A1 (en) * 2016-08-16 2018-02-22 Microsoft Technology Licensing, Llc Efficient dialogue policy learning
US20180089553A1 (en) * 2016-09-27 2018-03-29 Disney Enterprises, Inc. Learning to schedule control fragments for physics-based character simulation and robots using deep q-learning
CN106910351A (en) * 2017-04-19 2017-06-30 大连理工大学 A traffic signal adaptive control method based on deep reinforcement learning
CN107705557A (en) * 2017-09-04 2018-02-16 清华大学 Road network signal control method and device based on depth enhancing network
CN107832836A (en) * 2017-11-27 2018-03-23 清华大学 Model-free depth enhancing study heuristic approach and device
CN108365874A (en) * 2018-02-08 2018-08-03 电子科技大学 Based on the extensive MIMO Bayes compressed sensing channel estimation methods of FDD

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MATTEO GADALETA: "D-DASH: A Deep Q-Learning Framework for DASH Video Streaming", 《 IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING》 *
刘全: "深度强化学习综述", 《计算机学报》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109743210A (en) * 2019-01-25 2019-05-10 电子科技大学 Multi-user access control method for UAV network based on deep reinforcement learning
CN109803338A (en) * 2019-02-12 2019-05-24 南京邮电大学 A kind of dual link base station selecting method based on degree of regretting
CN109803338B (en) * 2019-02-12 2021-03-12 南京邮电大学 A regret-based selection method for dual-connected base stations
CN114287145B (en) * 2019-07-03 2025-02-21 诺基亚通信公司 Reinforcement learning based load balancing among radio access technologies under multi-carrier dynamic spectrum sharing
CN114287145A (en) * 2019-07-03 2022-04-05 诺基亚通信公司 Reinforcement learning based inter-radio access technology load balancing under multi-carrier dynamic spectrum sharing
WO2021002866A1 (en) * 2019-07-03 2021-01-07 Nokia Solutions And Networks Oy Reinforcement learning based inter-radio access technology load balancing under multi-carrier dynamic spectrum sharing
CN110493826A (en) * 2019-08-28 2019-11-22 重庆邮电大学 A kind of isomery cloud radio access network resources distribution method based on deeply study
CN110493826B (en) * 2019-08-28 2022-04-12 重庆邮电大学 Heterogeneous cloud wireless access network resource allocation method based on deep reinforcement learning
CN110717600B (en) * 2019-09-30 2021-01-26 京东城市(北京)数字科技有限公司 Sample pool construction method and device, and algorithm training method and device
CN110717600A (en) * 2019-09-30 2020-01-21 京东城市(北京)数字科技有限公司 Sample pool construction method and device, and algorithm training method and device
CN110809306A (en) * 2019-11-04 2020-02-18 电子科技大学 A terminal access selection method based on deep reinforcement learning
CN111065131A (en) * 2019-12-16 2020-04-24 深圳大学 Switching method, device and electronic device
CN111065131B (en) * 2019-12-16 2023-04-18 深圳大学 Switching method and device and electronic equipment
CN111083767B (en) * 2019-12-23 2021-07-27 哈尔滨工业大学 A Heterogeneous Network Selection Method Based on Deep Reinforcement Learning
CN111083767A (en) * 2019-12-23 2020-04-28 哈尔滨工业大学 Heterogeneous network selection method based on deep reinforcement learning
CN111181618A (en) * 2020-01-03 2020-05-19 东南大学 An intelligent reflective surface phase optimization method based on deep reinforcement learning
CN111243299B (en) * 2020-01-20 2020-12-15 浙江工业大学 A Single Intersection Signal Control Method Based on 3DQN_PSER Algorithm
CN111243299A (en) * 2020-01-20 2020-06-05 浙江工业大学 Single cross port signal control method based on 3 DQN-PSER algorithm
CN112468984A (en) * 2020-11-04 2021-03-09 国网上海市电力公司 Method for selecting address of power wireless private network base station and related equipment
CN112468984B (en) * 2020-11-04 2023-02-10 国网上海市电力公司 Method for selecting address of power wireless private network base station and related equipment
CN113507503A (en) * 2021-06-16 2021-10-15 华南理工大学 Internet of vehicles resource allocation method with load balancing function
CN114584951A (en) * 2022-03-08 2022-06-03 南京航空航天大学 A joint computing offloading and resource allocation method based on multi-agent DDQN
CN114584951B (en) * 2022-03-08 2024-12-10 南京航空航天大学 A joint computation offloading and resource allocation method based on multi-agent DDQN

Also Published As

Publication number Publication date
CN109195135B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN109195135A (en) Base station selecting method based on deeply study in LTE-V
CN103327082B (en) A kind of colony evacuation optimal change method
CN112954651B (en) A low-latency and high-reliability V2V resource allocation method based on deep reinforcement learning
WO2021169577A1 (en) Wireless service traffic prediction method based on weighted federated learning
CN114071661A (en) Base station energy-saving control method and device
TWI700649B (en) Deep reinforcement learning based beam selection method in wireless communication networks
CN107690176A (en) A kind of network selecting method based on Q learning algorithms
CN112927505B (en) An adaptive control method for signal lights based on multi-agent deep reinforcement learning in the Internet of Vehicles environment
CN103781146A (en) Wireless sensor network optimal route path establishing method based on ant colony algorithm
Xu et al. Fuzzy Q-learning based vertical handoff control for vehicular heterogeneous wireless network
CN109068350A (en) A terminal autonomous network selection system and method for a wireless heterogeneous network
CN113469425A (en) Deep traffic jam prediction method
CN107483079A (en) Double population genetic Ant Routing algorithms of low-voltage powerline carrier communication
CN114449482A (en) Heterogeneous vehicle networking user association method based on multi-agent deep reinforcement learning
CN117596700A (en) A transmission scheduling method for Internet of Vehicles based on transfer reinforcement learning
CN111586809B (en) Heterogeneous wireless network access selection method and system based on SDN
CN117749692A (en) Wireless route optimization method and network system based on deep contrast reinforcement learning
CN104883388A (en) Car networking road-side unit deployment method based on genetic algorithm
CN118348997A (en) Global path planning method for robot inspection of booster stations in smart wind farms
CN105844370B (en) Urban road vehicle degree of communication optimization method based on particle swarm algorithm
CN110021168B (en) Grading decision method for realizing real-time intelligent traffic management under Internet of vehicles
CN113115355B (en) Power distribution method based on deep reinforcement learning in D2D system
CN116867025A (en) Sensor node clustering method and device in wireless sensor network
CN114419884B (en) Self-adaptive signal control method and system based on reinforcement learning and phase competition
CN106028345A (en) A Small Cell Capacity and Coverage Optimization Method Based on Adaptive Tabu Search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant