Intersection entrance road mixed vehicle team collaborative guiding control method
Technical Field
The invention relates to a control method in the field of traffic information engineering and control, in particular to a method for controlling the collaborative guidance of mixed vehicles at an intersection entrance road.
Background
The intersection is used as a key node of the urban road network, and the running condition of the intersection has direct influence on the running efficiency of the urban road network. With the development of the vehicle-road cooperation technology and the intelligent networking technology, the fine control of the vehicle flow is a common means for relieving the queuing congestion at intersections, reducing the overall delay of the vehicle team, reducing traffic conflict and avoiding unnecessary energy waste. Compared with a single-lane control scene, the difficulty of traffic control is increased by interweaving vehicles with different requirements in a multi-lane scene, and the vehicles not only have longitudinal following behaviors, but also can make a transverse lane change decision according to driving targets. According to the development progress of the current intelligent networking technology, the online automatic driving Vehicle (Connected And Automated Vehicle, CAV) gradually replaces the manual driving Vehicle (RV), but the CAV cannot reach 100% permeability in a short period, and a mixed driving fleet composed of the CAV and the RV can exist for a long time in the future as a new traffic state. Under the scene of a multi-lane entrance road at an intersection, due to the constraint of phase sequences of fixed signals, random lane changing behaviors generated by RV under the drive of different driving purposes can cause driving conflict among vehicles, the running safety and stability of a mixed vehicle team are reduced, and meanwhile, the arrival distribution condition of different driving target vehicles and the adaptation degree of a timing scheme of the intersection signals can influence the traffic efficiency of the vehicle team. Therefore, the research on the collaborative guiding control method for the traffic of the mixed motorcade at the multi-lane entrance road of the intersection is significant for ensuring the operation safety and stability of the motorcade and improving the traffic efficiency of the intersection.
The multi-agent reinforcement learning method directly learns the optimal control strategy from the experience data through the multi-agent inter-game trial and error, the calculation burden is mainly in the off-line training process, and the learned driving strategy can be rapidly implemented in real time. At present, some intelligent agent strengthening methods are applied to control of a motorcade so as to improve the running stability, energy economy and safety of the motorcade, and the method shows good control performance under a complex traffic environment. The control problem of the multi-entrance lane of the mixed driving fleet at the intersection has higher real-time requirement on the solving method, and the multi-agent reinforcement learning method has relatively higher calculating speed, so that the control real-time requirement of the fleet can be met while the optimal solution is provided. The existing research of realizing the ecological control of the motorcade by utilizing the multi-agent reinforcement learning method is mostly to optimize all single vehicle running tracks in the motorcade in real time, so that the calculated amount is large and the global optimal state is difficult to reach.
Disclosure of Invention
The invention aims to solve the technical problems of large calculated amount and difficult to achieve a global optimal state in the prior art, and provides a method for controlling collaborative guidance of mixed vehicles at an intersection entrance road.
In order to solve the technical problems, the invention is realized by adopting the following technical scheme that the method for controlling the collaborative guidance of the mixed vehicle fleet at the intersection entrance lane comprises the following steps:
1) Acquiring road information:
the road information acquisition means that a control scene is determined, and a control range, a control object and a control target are defined;
2) Dividing an intelligent agent:
the dividing agent is different agents according to road traffic information and the following behavior among vehicles;
3) Establishing a multi-agent ecological cooperative control model of the intersection entrance road mixed vehicle team:
Firstly, establishing a single-agent speed track planning model, calculating the optimal passing track of the single-agent, then taking the overall operation efficiency of a vehicle team as an optimization target on the basis, establishing a multi-agent cooperative control model, realizing cooperative control among a plurality of agents, and training the model through a reinforcement learning method after the model is established;
4) The controllable agent executes the optimized track:
And issuing the established and trained model to each controllable intelligent agent, adopting a rolling time domain updating strategy, calculating the speed track of each intelligent agent passing through the intersection by using road traffic information, and inputting the speed track to the CAV controller for execution.
The road information acquisition method comprises the following steps:
1) The control scene of the invention is a single three-lane entrance road of a conventional cross road of an urban road, lane attributes comprise left turn, straight run and right turn, and lane information parameterization is expressed as follows:
Lint={l1,l2,l3}
Wherein l 1、l2 and l 3 are lane attributes, different values represent different lane attributes, -1 represents a left-turn lane, 0 represents a straight-turn lane, and 1 represents a right-turn lane.
2) In the control scene of the invention, the intersection signal control type is fixed four-phase signal timing, and the signal timing scheme is expressed as follows:
Wherein sig int represents the intersection signal timing scheme information set, The timing information set representing the phase k (k has the values of 1,2,3 and 4) is specifically shown as follows:
Wherein: In the intersection signal control of the invention, the right turn traffic flow is not constrained by a signal timing scheme, and T k phase represents red light and green light duration information of the phase k, and the specific information is as follows:
Wherein: And Representing the red light duration and green light duration of the signal phase k, respectively.
3) The control range of the multi-agent reinforcement learning-based intersection multi-lane entrance road mixed vehicle team collaborative guiding control model is from an intersection parking line to a position 800m away from the intersection parking line, the control object is a mixed vehicle team consisting of CAV and RV in the control range, and the information of each lane mixed vehicle team is expressed as follows:
Wherein F int represents a traffic flow information set in a control range, AndThe method is used for representing the mixed traffic flow information set in each lane, and specifically comprises the following formula:
wherein C i_type represents the type of vehicle I (I is 1,2, I) on lane L (L is 1,2,3, etc.), which is 0 or 1, wherein 0 represents CAV and 1 represents RV.
4) The invention obtains the running information of CAV in the control range of the intersection, comprising the speed, acceleration and position of the vehicle, and the CAV can obtain the running information of surrounding vehicles and the traffic condition information of the intersection through V2V and V2I communication.
Wherein: representing the total set of CAV operation information on the lane of l L, A running information set of CAVh (H has a value of 1,2, H.),The current position of the CAV; The current speed is CAV; is the current acceleration of CAV.
The invention aims to optimize the speed space-time track of CAV in the traffic flow so as to adjust the overall distribution and running state of the traffic flow, and guide different traffic flows to reach the distribution situation and the fixed signal timing scheme of the intersection to be more adaptive, so that the traffic flow passes through the intersection in a safer, stable, efficient and energy-saving mode.
The method for dividing the intelligent agent comprises the following steps:
1) Based on the information acquired in the step of acquiring the road information, the method is convenient for subsequent model establishment to embody the guiding effect of CAV on RV, and divides vehicles into different vehicle groups with different scales and different compositions according to the following behavior among the vehicles aiming at different vehicle arrival distribution, and further abstracts the vehicles into intelligent bodies;
Based on different vehicle group compositions, the invention divides the vehicle group on each lane in the control range into three kinds of agents, namely, a vehicle group composed of 1 CAV or a plurality of CAVs, which is defined as CAV agent CAV;
The method comprises the steps that a train set consisting of a CAV serving as a head car and a plurality of RVs followed by the head car is defined as a minimum ecological control unit, the train set is abstracted into an agent on the basis of the minimum ecological control unit, and the minimum ecological control unit agent ECU is defined;
The vehicle group which is not influenced by the front vehicles and is formed by 1 or a plurality of RVs running independently is defined as RV agent RV;
wherein, because of CAV in the train set, the invention defines CAV agent CAV and ECU agent ECU as controllable agents and RV agent RV as uncontrollable agents.
When the distance between RV and the front vehicle is in the no-difference areaIn the inner time, the invention judges that the vehicle and the front vehicle belong to the same control unit,AndThe maximum value and the minimum value of the non-difference region are represented as follows:
Wherein bound=max or min, T bound is the expected time interval under Bound condition, S 0 is the parking interval, v i (T) is the speed of RV at T, deltav i (T) is the speed difference between RV and the preceding vehicle at T, alpha max is the maximum acceleration of the vehicle, and beta is the comfortable deceleration of the vehicle;
2) Based on the information acquired in the step of acquiring the road information, predicting the channel change behavior possibly generated by the RV, in the control scene of the invention, CAV enters a control range through a target lane, so that the channel change requirement is not needed in the running process, the RV is taken as an uncontrollable vehicle, and the driving behavior has certain uncertainty.
Wherein: To indicate how likely the RV is to have a channel change behavior, The larger the value is, the greater the possibility of the RV to execute channel changing behavior under the current traffic condition is shown, wherein, psi nece and psi safe respectively represent the channel changing necessity obtained according to the driving target of the RV and the safety condition for executing channel changing, and the expressions of the two parameters are respectively as follows:
Wherein p line is the longitudinal critical position of RV lane change, p i (t) is the position of vehicle i at time t, lambda i is the number of lane changes required by vehicle i, D i,j (t) is the distance between vehicle i and vehicle j at time t, D i,j (t) is the safe distance between vehicle i and vehicle j at time t, ρ is the safe headway, a i,max,brake is the maximum braking deceleration of vehicle i, a i,min,brake is the minimum braking deceleration of vehicle i, a i,brake (t) is the braking deceleration of vehicle i at time t, and v i,max is the maximum speed of vehicle i on the road.
The method for establishing the multi-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team comprises the following steps:
1) Based on the step of dividing the intelligent agents, dividing the vehicle teams on each lane to obtain controllable intelligent agents, and establishing a single intelligent agent speed track planning model by using the minimum accumulated energy consumption of the speed track of the single intelligent agent passing through the intersection as an optimization target;
2) Taking the overall traffic efficiency of the mixed traffic flow into consideration, establishing a multi-agent cooperative control model;
3) Training a multi-agent cooperative control model;
The single-agent speed track planning model established in the step 1) and the multi-agent cooperative control model established in the step 2) in the step of establishing the multi-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team are trained through an improved double-time-lag deterministic strategy gradient (TWIN DELAYED DEEP DETERMINISTIC policy gradient, TD 3) algorithm, and the training aim is to perform multiple interactions with the environment through an optimization result of the model established by the simulation vehicle team based on the invention, so that the overall operation benefit of the vehicle team corresponding to different decisions of the agents under different environments and vehicle operation information is obtained, the optimal benefit is obtained through repeated interactions, the optimal benefit and the corresponding decisions are stored as known experiences, and in the actual operation process of the vehicle team, if the operation condition consistent with the known experiences exists, the corresponding optimal scheme can be immediately obtained, so that the optimization efficiency is improved.
The controllable intelligent agent obtained by dividing the vehicle teams on each lane based on the intelligent agent dividing step is minimized to be an optimization target by the speed track accumulated energy consumption of the single intelligent agent passing through the intersection, and a single intelligent agent speed track planning model is established, and the specific steps are as follows:
(1) Calculating an individual optimal speed of the single agent, wherein the individual optimal speed of the single agent is aimed at minimizing the running energy consumption of all vehicles in the agent, and the calculation formula is as follows, considering that the single agent possibly comprises a plurality of Fuel Vehicles (FV) and a plurality of electric vehicles (ELECTRIC VEHICLE, EV), and the energy consumption types are different and therefore need to be considered independently:
Z=minEagent
Wherein mu fv and mu ev are unified coefficients of oil consumption and electricity consumption respectively, m and n are the quantity of FV and EV in an intelligent body respectively, T 0 is the starting time, T d_i is the time of a vehicle passing through a stop line of an intersection, v i (T) and a i (T) are the speed and the acceleration of the vehicle respectively, v i (T) has a value between a minimum speed limit v min and a maximum speed limit v max, a i (T) has a value between a minimum acceleration a min and a maximum acceleration a max, and F f_i and F e_i are the instantaneous oil consumption of the fuel vehicle and the instantaneous electricity consumption of the electric vehicle respectively;
(2) Based on the ecological speed obtained in the step (1), in order to realize smooth guiding of the CAV to the RV, calculating a speed track of the intelligent agent from the current speed to the ecological speed through a trigonometric function, wherein the calculation formula is as follows:
Wherein v opt (t) is the speed of the vehicle after the optimization based on trigonometric function at time t, v aim is the average target speed of the vehicle, v dis is the difference between the average target speed of the vehicle and the initial speed v 0, and omega tr is the rate of change of acceleration/deceleration when the vehicle speed is between the current speed v con and the average target vehicle speed v aim; T f is the ratio of the target distance D int to the average target vehicle speed v aim;
(3) Taking the uncertainty of the driving behavior of the front vehicle into consideration, the single-agent speed track is constrained through the virtual track, and the specific steps are as follows:
a. If the front vehicle is in a driving state, the dynamic virtual track is adopted for constraint, so that the safe following distance between the intelligent body and the front vehicle is ensured, and the dynamic virtual track is calculated according to the following formula
Wherein: And Τ is the reaction time of the driver;
b. If the front vehicle is in a queuing state, after the front vehicle is in a queuing state and dissipates, the intelligent agent can pass through the intersection, and the intelligent agent is restrained by adopting a static virtual track s p_i (t);
(4) Calculating channel changing probability of the controllable intelligent agent and surrounding RVs in the step of dividing the intelligent agent, analyzing influence of channel changing behaviors of the internal RV leaving the intelligent agent, the surrounding RVs entering the intelligent agent and the surrounding RVs inserted in front of the intelligent agent on the overall ecology of a motorcade and the running benefit of an intersection according to the obtained channel changing probability, if the channel changing behaviors have no positive influence, inhibiting the channel changing behaviors of the RV by compressing possible channel changing intervals of the RV by the related CAV, otherwise, ensuring safe channel changing space of the RV for the related CAV, ensuring the safe channel changing of the RV, and returning to the step (1), the step (2) and the step (3) in the step of establishing a multi-intelligent-agent ecological cooperative control model of the intersection entrance channel hybrid motorcade to recalculate and generate a single intelligent agent speed track;
For RV inserted in front of the agent due to channel change behavior, the calculation formula of the dynamic virtual trajectory described in a in (3) is converted into the following formula:
Where x and y are the longitudinal and lateral displacements of the agent at time t, p i and q j are coefficients that determine the shape of the trajectory, and since the longitudinal and lateral displacements are represented using a time-based function, the velocities v x and v y and accelerations a x and a y at each time node can be readily obtained by taking the first and second derivatives.
And (3) calculating to obtain the self optimal speed track of each intelligent agent by establishing the step 1) of the multi-intelligent-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team, wherein the speed track is the initial state input of the follow-up multi-intelligent-agent cooperative control model.
In the technical scheme, the multi-agent cooperative control model is established by considering the overall traffic efficiency of the mixed traffic flow, and the specific steps are as follows:
(1) In order to reduce the calculated amount, simultaneously, a single-agent speed track planning model is connected with a multi-agent cooperative control model, and the time of an agent passing through an intersection is used as the action space of the multi-agent cooperative control model:
aj=Td_j
Calculating the time of each agent passing through the intersection from the ecological speed track obtained in the single agent ecological speed track planning model in 1) As the initial action of the multi-agent cooperative control model, after the action of the agents is updated, the speed track corresponding to the new action can be output through the smoothness constraint obtained by the trigonometric function in the single-agent speed track planning model;
(2) In the invention, the state space of the agent is defined as the speed and position of the agent under the current action:
sj=(vj(t),pj(t))
wherein p j (t) is the position of the intelligent agent at the current moment;
(3) In order to ensure that the training result of the intelligent agent is as close to the optimal solution as possible, the invention designs the reward function as follows:
a. Firstly, calculating a reward function of a single agent, taking the safety of the agent in the motion process into consideration, and taking the maximum available deceleration as a safety index of the agent operation, wherein the calculation formula of the safe reward function is as follows:
Wherein a safe is the maximum available deceleration of the head car to avoid collision with the front car, and gamma safe is a penalty term;
b. In the step of establishing the multi-agent ecological cooperative control model of the intersection entrance road mixed vehicle team, the optimal ecological speed track of a single agent is calculated in the step 1), so that the invention considers the action which is closer to the initial action of the agent, and the whole energy consumption of the corresponding speed track is smaller, thereby obtaining an ecological rewarding function calculation formula:
c. The safe rewarding function and the ecological rewarding function are integrated according to a certain weight, so that the rewarding value of a single agent can be obtained, and the calculation formula is as follows:
Wherein w 1 and w 2 are weight coefficients;
d. the number of vehicles passing through the stop line of the intersection in each lane of the target entrance lane in one period is used as an efficiency rewarding function of the intersection to reflect the passing efficiency of the whole entrance lane, and the calculation formula of the efficiency rewarding function of the entrance lane is as follows:
N pass_li is the number of vehicles passing through the intersection of lane I i in one signal period;
e. After adding the single agent rewarding values of all the controllable agents in the control range, synthesizing the single agent rewarding values with the inlet road efficiency rewarding function by a certain weight coefficient, and obtaining the rewarding value of the whole motorcade, wherein the calculation formula is as follows:
Wherein w 3 and w 4 are weight coefficients.
According to the technical scheme, a single-agent speed track planning model established in the step 1) in the step of establishing a multi-agent ecological cooperative control model of an intersection entrance road mixed vehicle team and a multi-agent cooperative control model established in the step 2) are trained through an improved double-time-lag depth deterministic strategy gradient (TWIN DELAYED DEEP DETERMINISTIC policy gradient, TD 3) algorithm, the training aims at achieving multiple interactions between an optimization result of a model established by the simulation vehicle team based on the simulation vehicle team and an environment, so that overall operation benefits of the vehicle team corresponding to different decisions of agents under different environments and vehicle operation information are achieved, the optimal benefits are obtained through repeated interactions, the optimal benefits and the corresponding decisions are stored as known experiences, and in the actual operation process of the vehicle team, if the operation conditions consistent with the known experiences exist, the corresponding optimal scheme can be immediately obtained, so that the optimization efficiency is improved, and the specific steps are as follows:
(1) Firstly, solving a single-agent speed track planning model by using an DQN solver, wherein the method comprises the following specific steps of:
a. Initializing an experience pool And its capacity N single, action cost functionTarget network
B. Acquiring environmental status informationThe historical state information is contained, and the value of t s is between 0 and the total cycle number M single;
c. selecting an action And executing, under the exploration mode, randomly selecting actions with the probability of epsilon single In the empirical mode, select between statesLower makeThe maximum action is calculated as follows:
d. Action Interacting with the environment to obtain rewardsAnd judges the next stateWhether a termination condition is triggered;
e. Will experience Store in experience pool
F. repeating steps b through e until the experience pool becomes full;
g. selecting small batches of experiences in experience pools, and calculating a target network The calculation formula is as follows:
Wherein gamma single is a discount factor;
h. calculating a loss function:
i. Minimizing the loss function of step h by gradient descent method, thereby updating the target network Weights of (2)
J. Repeating the steps g to i at each fixed interval of T single, and updating the target network parameters;
k. After the training of the single-agent speed track planning model is completed, solving the model, wherein the obtained speed track is the optimal speed track of the single-agent, and calculating the time of each agent passing through an intersection based on the track, thereby being used as the initial action when the multi-agent cooperative control model is trained;
(2) The invention adopts an improved TD3 algorithm to train a multi-agent cooperative control model, and comprises the following specific steps:
a. Initializing Actor network μ (θ μ), critic1 network Critic2 networkTarget Actor network μ' (θ μ′), target Critic1 networkTarget Critic2 networkAnd its capacity N single;
b. Acquiring environmental status information The historical state information is contained, and the value of t s is between 0 and the total cycle number M multi;
c. selecting an action And executing actions, in order to improve the action searching efficiency of the intelligent agent in the training process, the invention designs the following strategy by using the passing time corresponding to the speed track of the single intelligent agentAs a search start point, namely:
if the reward value obtained by shifting the action leftwards has an ascending trend, updating the action leftwards, otherwise updating rightwards;
d. Action Interacting with the environment to obtain a single agent rewardThen, calculating the overall rewarding value of the intersectionAnd judges the next stateWhether a termination condition is triggered;
e. Will experience Store in experience pool
F. b, repeating the steps b to e of the step (2) in the step of establishing the multi-agent ecological cooperative control model of the intersection entrance road mixed vehicle team until the experience pool becomes full;
g. selecting small batches of experience in an experience pool Calculated using target Actor network μ' (θ μ')The following actions, regularizing and adding noise based on a target strategy, are aimed at limiting the complexity of the model, so as to avoid the occurrence of overfitting as much as possible:
h. Based on the dual network concept, a target value is calculated:
Wherein gamma single is a discount factor.
I. calculating a loss function:
j. minimizing the loss function of step i by gradient descent method, and updating Critic1 network And Critic1 networksParameters of (a);
critic1 networks And Critic2 networkAfter updating d critic, the state is calculated using the target Actor network μ' (θ μ')The following actions:
i. utilize Critic1 network And Critic2 networkCalculation pairIs determined by the evaluation value of (a):
m. gradient ascent method maximizing step l Updating the Actor network mu (theta μ) is completed;
And n, carrying out weighted average on new and old target network parameters through the learning rate kappa to finish updating the target network parameters:
θμ′=κθμ+(1-κ)θμ′
Repeating the steps until the target network converges or reaches the total circulation times M multi, and ending training;
Step 3) in the step of establishing the multi-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team completes the training of the multi-agent cooperative control model.
The track after the controllable agent performs optimization in the technical scheme comprises the following steps:
1) Transmitting the trained multi-agent cooperative control model to a server in an intersection control area;
2) Step 1, acquiring various information parameters of an intersection, uploading the various information parameters to a server, acquiring vehicle composition information in a mixed vehicle team when a vehicle enters a control area, and uploading the vehicle composition information to a vehicle team operation information storage;
3) Dividing the mixed vehicle queues on each lane in the control area into intelligent agents with different scales and types based on the obtained information;
4) Uploading mixed vehicle team information in a current control area to an intersection server by a vehicle team operation information storage, performing the step 1) in the step of establishing an intersection entrance road mixed vehicle team multi-agent ecological cooperative control model, optimizing the initial ecological speed track of a controllable agent entering the control area through an intersection, and issuing the initial ecological speed track to a controller of CAV in each controllable agent for execution;
5) Step 2) in the step of establishing a multi-agent ecological cooperative control model of the mixed vehicle fleet at the entrance of the intersection is carried out, the time of each agent passing through the intersection under the current speed track is calculated, the intersection server carries out secondary optimization on the speed track of each controllable agent on the basis of the running condition of each vehicle fleet at the current intersection, and the energy consumption, the tail gas emission level and the passing efficiency of the intersection in the running process of the vehicle fleet are considered, and the optimization result is output to each CAV controller for execution;
6) The vehicle team operation information storage uploads the stored information to a historical database of an intersection server at regular intervals, and the server uses the updated information to carry out the steps of obtaining road information, dividing intelligent agents and establishing an intersection entrance road mixed vehicle team multi-intelligent agent ecological cooperative control model, so that the running safety and stability of the mixed vehicle team are ensured, and meanwhile, the passing efficiency of an intersection entrance road is ensured.
Compared with the prior art, the invention has the beneficial effects that:
Aiming at the requirement of safe and efficient running of mixed traffic flow at a multi-lane entrance road of an urban intersection, the invention designs a multi-agent cooperative control method considering the uncertainty of RV driving behavior.
1. The intersection entrance road mixed vehicle team collaborative guiding control method reduces the calculated amount and better reflects the guiding effect of CAV on RV for conveniently establishing a model, and defines a vehicle group formed by following behavior in the vehicle team as an intelligent body. In the running process of the vehicle, CAV and RV generate lane changing behaviors with a certain probability due to respective driving targets, and the lane changing behaviors of the vehicle relate to the splitting and recombination of the intelligent agents before and after the lane changing behaviors. According to different types of vehicle combination forms, 3 types of intelligent agents are defined, wherein the intelligent agents comprise a CAV intelligent agent, an RV intelligent agent and a minimum ecological control unit intelligent agent, the RV intelligent agent is regarded as an uncontrollable intelligent agent, the CAV intelligent agent and the minimum ecological control unit intelligent agent are regarded as controllable intelligent agents, and a multi-intelligent-agent cooperative control optimization model is established.
2. For a controllable intelligent body, the method for controlling the cooperative guidance of the mixed vehicle fleet at the entrance road at the intersection comprises the steps of firstly calculating the optimal speed of an individual based on the power composition of vehicles in the intelligent body, outputting the optimal speed of the intelligent body in the current optimization period, then completing smooth transition from the current speed to the target speed by utilizing a trigonometric function and passing through the speed track of the intersection, calculating and predicting the lane change possibility of surrounding related RVs by combining traffic information, and taking the lane change track as the constraint of the controllable intelligent body track.
3. The intersection entrance road mixed vehicle team collaborative guiding control method can effectively improve the safety and stability of mixed traffic flow at an intersection multi-lane entrance road, reduce the safety risk in the running process of the traffic flow, simultaneously give consideration to the vehicle passing efficiency, avoid the unnecessary waiting of red light at the intersection, overcome the problem of large-scale congestion of vehicles at the intersection of the current urban road caused by queuing, and facilitate the realization and application of an urban intelligent traffic system in the future intelligent network environment.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a flow chart diagram of a method for controlling collaborative guidance of a mixed fleet at an intersection entrance;
FIG. 2 is a schematic diagram of RV agent in the intersection entrance road mixed vehicle team collaborative guiding control method according to the invention;
FIG. 3 is a schematic diagram of the minimum ecological control unit agent in the intersection entrance road mixed vehicle team collaborative guiding control method according to the invention;
FIG. 4 is a schematic diagram of CAV agent in the method for controlling the collaborative guidance of a mixed fleet at an intersection entrance;
FIG. 5 is a schematic diagram of an intersection setup based on a traffic simulation platform in an embodiment of the intersection entrance lane mixed fleet collaborative guiding control method according to the present invention;
FIG. 6 is a schematic diagram illustrating an operation of a control model in a traffic simulation platform in an embodiment of a method for controlling co-guidance of a mixed fleet at an intersection entrance;
fig. 7 is a vehicle motion trajectory diagram under the action of the proposed control model in an embodiment of the intersection entrance road mixed fleet collaborative guiding control method according to the present invention.
Detailed Description
The invention is described in detail below with reference to the attached drawing figures:
referring to fig. 1, the method for controlling the collaborative guidance of the mixed fleet at the entrance of the intersection comprises the following steps:
1. acquiring road information
The road information acquisition means that a control scene is determined, a control range, a control object and a control target are defined, and the method comprises the following steps:
1) The control scene of the invention is a single three-lane entrance road of a conventional cross road of an urban road, lane attributes comprise left turn, straight run and right turn, and lane information parameterization is expressed as follows:
Lint={l1,l2,l3}
Wherein l 1、l2 and l 3 are lane attributes, different values represent different lane attributes, -1 represents a left-turn lane, 0 represents a straight-turn lane, and 1 represents a right-turn lane.
2) In the control scene of the invention, the intersection signal control type is fixed four-phase signal timing, and the signal timing scheme is expressed as follows:
Wherein sig int represents the intersection signal timing scheme information set, The timing information set representing the phase k (k has the values of 1,2,3 and 4) is specifically shown as follows:
Wherein: The value of (2) represents the phase type of k phase, wherein the value 1 represents the north-south straight line phase, -1 represents the east-west straight line phase, 2 represents the south-north left-turn phase, -2 represents the east-west left-turn phase, in the intersection signal control of the invention, the right-turn traffic is not constrained by the signal timing scheme, The red and green light duration information representing phase k is specifically expressed as follows:
Wherein: And Representing the red light duration and green light duration of the signal phase k, respectively.
3) The control range of the multi-agent reinforcement learning-based intersection multi-lane entrance road mixed vehicle team collaborative guiding control model is from an intersection parking line to a position 800m away from the intersection parking line, the control object is a mixed vehicle team consisting of CAV and RV in the control range, and the information of each lane mixed vehicle team is expressed as follows:
Wherein F int represents a traffic flow information set in a control range, AndThe method is used for representing the mixed traffic flow information set in each lane, and specifically comprises the following formula:
wherein C i_type represents the type of vehicle I (I is 1,2, I) on lane L (L is 1,2,3, etc.), which is 0 or 1, wherein 0 represents CAV and 1 represents RV.
4) The invention obtains the running information of CAV in the control range of the intersection, comprising the speed, acceleration and position of the vehicle, and the CAV can obtain the running information of surrounding vehicles and the traffic condition information of the intersection through V2V and V2I communication.
Wherein: representing the total set of CAV operation information on the lane of l L, A running information set of CAVh (H has a value of 1,2, H.),The current position of the CAV; The current speed is CAV; The current acceleration is CAV;
the invention aims to optimize the speed space-time track of CAV in the traffic flow so as to adjust the overall distribution and running state of the traffic flow, and guide different traffic flows to reach the distribution situation and the fixed signal timing scheme of the intersection to be more adaptive, so that the traffic flow passes through the intersection in a safer, stable, efficient and energy-saving mode;
2. dividing intelligent agent
Referring to fig. 2, 3 and 4, the dividing agent is to divide different agents by following behavior between vehicles according to road traffic information, and includes the following steps:
1) Based on the information obtained in the step 1, namely the road information obtaining step, the method is convenient for subsequent model establishment to embody the guiding function of CAV on RV, and divides vehicles into different vehicle groups with different scales and different compositions according to the following behaviors among the vehicles aiming at different vehicle arrival distribution, and further abstracts the vehicles into intelligent bodies;
Based on different vehicle group compositions, the invention divides the vehicle group on each lane in the control range into three kinds of agents, namely, a vehicle group composed of 1 CAV or a plurality of CAVs, which is defined as CAV agent CAV;
The method comprises the steps that a train set consisting of a CAV serving as a head car and a plurality of RVs followed by the head car is defined as a minimum ecological control unit, the train set is abstracted into an agent on the basis of the minimum ecological control unit, and the minimum ecological control unit agent ECU is defined;
The vehicle group which is not influenced by the front vehicles and is formed by 1 or a plurality of RVs running independently is defined as RV agent RV;
wherein, because of CAV in the train set, the invention defines CAV agent CAV and ECU agent ECU as controllable agents and RV agent RV as uncontrollable agents.
When the distance between RV and the front vehicle is in the no-difference areaIn the inner time, the invention judges that the vehicle and the front vehicle belong to the same control unit,AndThe maximum value and the minimum value of the non-difference region are represented as follows:
Wherein bound=max or min, T bound is the expected time interval under Bound condition, S 0 is the parking interval, v i (T) is the speed of RV at T, deltav i (T) is the speed difference between RV and the preceding vehicle at T, alpha max is the maximum acceleration of the vehicle, and beta is the comfortable deceleration of the vehicle;
2) Based on the information obtained in the step 1, namely the road information obtaining step, predicting the channel changing behavior possibly generated by the RV, in the control scene of the invention, CAV enters a control range through a target lane, so that the channel changing requirement does not exist in the running process, the RV is used as an uncontrollable vehicle, the driving behavior has certain uncertainty, and the channel changing probability of the RV is calculated by the following formula:
Wherein: To indicate how likely the RV is to have a channel change behavior, The larger the value is, the greater the possibility of the RV to execute channel changing behavior under the current traffic condition is shown, wherein, psi nece and psi safe respectively represent the channel changing necessity obtained according to the driving target of the RV and the safety condition for executing channel changing, and the expressions of the two parameters are respectively as follows:
Wherein p line is the longitudinal critical position of RV lane change, p i (t) is the position of vehicle i at time t, lambda i is the number of lane changes required by vehicle i, D i,j (t) is the distance between vehicle i and vehicle j at time t, D i,j (t) is the safe distance between vehicle i and vehicle j at time t, ρ is the safe headway, a i,max,brake is the maximum braking deceleration of vehicle i, a i,min,brake is the minimum braking deceleration of vehicle i, a i,brake (t) is the braking deceleration of vehicle i at time t, and v i,max is the maximum speed of vehicle i on the road section;
3. Establishing multi-agent ecological cooperative control model of intersection entrance road mixed vehicle team
Firstly, a single-agent speed track planning model is established, an optimal passing track of the single-agent is calculated, then a multi-agent cooperative control model is established by taking the overall operation efficiency of a vehicle team as an optimization target on the basis, cooperative control among a plurality of agents is realized, and the model is trained by a reinforcement learning method after the model is established, wherein the method comprises the following steps:
1) Based on the step 2, namely the step of dividing the intelligent agents, the controllable intelligent agents obtained by dividing the vehicle teams on each lane are minimized to be an optimization target by the speed track accumulated energy consumption of the single intelligent agent passing through the intersection, and a single intelligent agent speed track planning model is established, and the specific steps are as follows:
(1) Calculating an individual optimal speed of the single agent, wherein the individual optimal speed of the single agent is aimed at minimizing the running energy consumption of all vehicles in the agent, and the calculation formula is as follows, considering that the single agent possibly comprises a plurality of Fuel Vehicles (FV) and a plurality of electric vehicles (ELECTRIC VEHICLE, EV), and the energy consumption types are different and therefore need to be considered independently:
Z=minEagent
Wherein mu fv and mu ev are unified coefficients of oil consumption and electricity consumption respectively, m and n are the quantity of FV and EV in an intelligent body respectively, T 0 is the starting time, T d_i is the time of a vehicle passing through a stop line of an intersection, v i (T) and a i (T) are the speed and the acceleration of the vehicle respectively, v i (T) has a value between a minimum speed limit v min and a maximum speed limit v max, a i (T) has a value between a minimum acceleration a min and a maximum acceleration a max, and F f_i and F e_i are the instantaneous oil consumption of the fuel vehicle and the instantaneous electricity consumption of the electric vehicle respectively;
(2) Based on the ecological speed obtained in the step (1), in order to realize smooth guiding of the CAV to the RV, calculating a speed track of the intelligent agent from the current speed to the ecological speed through a trigonometric function, wherein the calculation formula is as follows:
Wherein v opt (t) is the speed of the vehicle after the optimization based on trigonometric function at time t, v aim is the average target speed of the vehicle, v dis is the difference between the average target speed of the vehicle and the initial speed v 0, and omega tr is the rate of change of acceleration/deceleration when the vehicle speed is between the current speed v con and the average target vehicle speed v aim; T f is the ratio of the target distance D int to the average target vehicle speed v aim;
(3) Taking the uncertainty of the driving behavior of the front vehicle into consideration, the single-agent speed track is constrained through the virtual track, and the specific steps are as follows:
a. If the front vehicle is in a driving state, the dynamic virtual track is adopted for constraint, so that the safe following distance between the intelligent body and the front vehicle is ensured, and the dynamic virtual track is calculated according to the following formula
Wherein: And Τ is the reaction time of the driver;
b. If the front vehicle is in a queuing state, after the front vehicle is in a queuing state and dissipates, the intelligent agent can pass through the intersection, and the intelligent agent is restrained by adopting a static virtual track s p_i (t);
(4) Calculating channel changing probability of the inside of the controllable intelligent body and surrounding RVs of the intelligent body by the step 2), analyzing influence of channel changing behaviors of the inside RV, which leave the intelligent body, the surrounding RVs enter the intelligent body and the surrounding RVs, which are inserted in front of the intelligent body, on the overall ecology of a motorcade and the running benefit of an intersection according to the obtained channel changing probability, if the channel changing behaviors have no positive influence, the related CAV inhibits the channel changing behaviors of the RV by compressing a possible channel changing interval of the RV, otherwise, the related CAV gives up a safe channel changing space to ensure the safe channel changing of the RV, and the channel changing behaviors lead the composition of the inside of the intelligent body to change, and returning to the step 3, namely recalculating and generating a single intelligent body speed track in the steps 1), 2 and 3 in the step 1) of establishing a multi-intelligent ecological cooperative control model of an intersection entrance channel hybrid vehicle;
For RV inserted in front of the agent due to channel change behavior, the calculation formula of the dynamic virtual trajectory described in a in (3) is converted into the following formula:
Where x and y are the longitudinal and lateral displacements of the agent at time t, p i and q j are coefficients that determine the shape of the trajectory, and since the longitudinal and lateral displacements are represented using a time-based function, the velocities v x and v y and accelerations a x and a y at each time node can be readily obtained by taking the first and second derivatives.
Step 3, establishing a multi-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team, wherein in the step 1), calculating to obtain an optimal speed track of each agent, wherein the speed track is the initial state input of a subsequent multi-agent cooperative control model;
2) Taking the overall traffic efficiency of the mixed traffic flow into consideration, a multi-agent cooperative control model is established, and the specific steps are as follows:
(1) In order to reduce the calculated amount, simultaneously, a single-agent speed track planning model is connected with a multi-agent cooperative control model, and the time of an agent passing through an intersection is used as the action space of the multi-agent cooperative control model:
aj=Td_j
Calculating the time of each agent passing through the intersection from the ecological speed track obtained in the single agent ecological speed track planning model in 1) As the initial action of the multi-agent cooperative control model, after the action of the agents is updated, the speed track corresponding to the new action can be output through the smoothness constraint obtained by the trigonometric function in the single-agent speed track planning model;
(2) In the invention, the state space of the agent is defined as the speed and position of the agent under the current action:
sj=(vj(t),pj(t))
wherein p j (t) is the position of the intelligent agent at the current moment;
(3) In order to ensure that the training result of the intelligent agent is as close to the optimal solution as possible, the invention designs the reward function as follows:
a. Firstly, calculating a reward function of a single agent, taking the safety of the agent in the motion process into consideration, and taking the maximum available deceleration as a safety index of the agent operation, wherein the calculation formula of the safe reward function is as follows:
Wherein a safe is the maximum available deceleration of the head car to avoid collision with the front car, and gamma safe is a penalty term;
b. Step 3, namely, in the step 1) of establishing a multi-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team, the optimal ecological speed track of a single agent is calculated, so that the invention considers the action closer to the initial action of the agent, the smaller the whole energy consumption of the corresponding speed track is, and the calculation formula of the ecological rewarding function is obtained:
c. The safe rewarding function and the ecological rewarding function are integrated according to a certain weight, so that the rewarding value of a single agent can be obtained, and the calculation formula is as follows:
Wherein w 1 and w 2 are weight coefficients;
d. the number of vehicles passing through the stop line of the intersection in each lane of the target entrance lane in one period is used as an efficiency rewarding function of the intersection to reflect the passing efficiency of the whole entrance lane, and the calculation formula of the efficiency rewarding function of the entrance lane is as follows:
Wherein: The number of vehicles passing through the intersection for lane l i in one signal period;
e. After adding the single agent rewarding values of all the controllable agents in the control range, synthesizing the single agent rewarding values with the inlet road efficiency rewarding function by a certain weight coefficient, and obtaining the rewarding value of the whole motorcade, wherein the calculation formula is as follows:
Wherein w 3 and w 4 are weight coefficients;
3) Multi-agent cooperative control model training
Training the single-agent speed track planning model established in the step 1) in the step of establishing the multi-agent ecological cooperative control model of the intersection entrance road mixed vehicle team in the step of establishing the step 3 and the multi-agent cooperative control model established in the step 2) through an improved double-time-lag depth deterministic strategy gradient (TWIN DELAYED DEEP DETERMINISTIC policy gradient, TD 3) algorithm, wherein the training aims at carrying out multiple interactions with the environment through simulating the optimization result of the model established by the vehicle team based on the invention, thereby obtaining the overall operation benefit of the vehicle team corresponding to different decisions of the agents under different environments and vehicle operation information, obtaining the optimal benefit through repeated interactions, storing the optimal benefit and the corresponding decisions as known experiences, and immediately obtaining the corresponding optimal scheme in the actual operation process of the vehicle team if the operation condition consistent with the known experiences exists, and the specific steps are as follows:
(1) Firstly, solving a single-agent speed track planning model by using an DQN solver, wherein the method comprises the following specific steps of:
a. Initializing an experience pool And its capacity Nsingle and action cost functionTarget network
B. Acquiring environmental status informationThe historical state information is contained, and the value of t s is between 0 and the total cycle number M single;
c. selecting an action And executing, under the exploration mode, randomly selecting actions with the probability of epsilon single In the empirical mode, select between statesLower makeThe maximum action is calculated as follows:
d. Action Interacting with the environment to obtain rewardsAnd judges the next stateWhether a termination condition is triggered;
e. Will experience Store in experience pool
F. Repeating steps b through f until the experience pool becomes full;
g. selecting small batches of experiences in experience pools, and calculating a target network The calculation formula is as follows:
Wherein gamma single is a discount factor;
h. calculating a loss function:
i. Minimizing the loss function of step h by gradient descent method, thereby updating the target network Weights of (2)
J. Repeating the steps g to i at each fixed interval of T single, and updating the target network parameters;
k. After the training of the single-agent speed track planning model is completed, solving the model, wherein the obtained speed track is the optimal speed track of the single-agent, and calculating the time of each agent passing through an intersection based on the track, thereby being used as the initial action when the multi-agent cooperative control model is trained;
(2) The invention adopts an improved TD3 algorithm to train a multi-agent cooperative control model, and comprises the following specific steps:
a. Initializing Actor network μ (θ μ), critic1 network Critic2 networkTarget Actor network μ' (θ μ′), target Critic1 networkTarget Critic2 networkAnd its capacity N single;
b. Acquiring environmental status information The historical state information is contained, and the value of t s is between 0 and the total cycle number M multi;
c. selecting an action And executing actions, in order to improve the action searching efficiency of the intelligent agent in the training process, the invention designs the following strategy by using the passing time corresponding to the speed track of the single intelligent agentAs a search start point, namely:
if the reward value obtained by shifting the action leftwards has an ascending trend, updating the action leftwards, otherwise updating rightwards;
d. Action Interacting with the environment to obtain a single agent rewardThen, calculating the overall rewarding value of the intersectionAnd judges the next stateWhether a termination condition is triggered;
e. Will experience Store in experience pool
F. Repeating the step 3 to establish the steps b to e in the step (2) in the multi-agent ecological cooperative control model step of the intersection entrance road mixed vehicle team until the experience pool becomes full;
g. selecting small batches of experience in an experience pool Calculated using target Actor network μ' (θ μ')The following actions, regularizing and adding noise based on a target strategy, are aimed at limiting the complexity of the model, so as to avoid the occurrence of overfitting as much as possible:
h. Based on the dual network concept, a target value is calculated:
Wherein gamma single is a discount factor.
I. calculating a loss function:
j. minimizing the loss function of step i by gradient descent method, and updating Critic1 network And Critic1 networksParameters of (a);
critic1 networks And Critic2 networkAfter updating d critic, the state is calculated using the target Actor network μ' (θ μ')The following actions:
i. utilize Critic1 network And Critic2 networkCalculation pairIs determined by the evaluation value of (a):
m. gradient ascent method maximizing step l Updating the Actor network mu (theta μ) is completed;
And n, carrying out weighted average on new and old target network parameters through the learning rate kappa to finish updating the target network parameters:
θμ=κθμ+(1-κ)θμ
Repeating the steps until the target network converges or reaches the total circulation times M multi, and ending training;
Step 3, namely, the step 3) of establishing the multi-agent ecological cooperative control model of the intersection entrance road hybrid vehicle team completes the training of the multi-agent cooperative control model provided by the invention;
4. Track after controllable agent execution optimization
The established and trained model is issued to each controllable intelligent agent, a rolling time domain updating strategy is adopted, the speed track of each intelligent agent passing through an intersection is calculated by utilizing road traffic information, and the speed track is input to a CAV controller for execution, and the method comprises the following steps:
1) Transmitting the trained multi-agent cooperative control model to a server in an intersection control area;
2) Step 1, acquiring various information parameters of an intersection, uploading the various information parameters to a server, acquiring vehicle composition information in a mixed vehicle team when a vehicle enters a control area, and uploading the vehicle composition information to a vehicle team operation information storage;
3) Dividing the mixed vehicle queues on each lane in the control area into intelligent agents with different scales and types based on the obtained information;
4) Uploading the mixed vehicle team information in the current control area to an intersection server by using a vehicle team operation information storage, performing step 3, namely, establishing a multi-agent ecological cooperative control model of the mixed vehicle team at an intersection entrance, optimizing an initial ecological speed track of a controllable agent entering the control area through an intersection, and issuing the initial ecological speed track to a controller of CAV in each controllable agent for execution;
5) Step 3, namely a step 2) in a step of establishing a multi-agent ecological cooperative control model of the mixed vehicle fleet at the entrance of the intersection is carried out, the time of each agent passing through the intersection under the current speed track is calculated, the intersection server carries out secondary optimization on the speed track of each controllable agent on the basis of the running condition of each vehicle fleet at the current intersection, and the energy consumption, the tail gas emission level and the passing efficiency of the intersection in the running process of the vehicle fleet are considered, and the optimization result is output to each CAV controller to be executed;
6) The vehicle team operation information storage uploads the stored information to a historical database of an intersection server at regular intervals, the server uses the updated information to carry out the step 1 again to obtain road information, the step 2 to divide the intelligent agent and the step 3 to establish a multi-intelligent agent ecological cooperative control model of the intersection entrance road mixed vehicle team, and the traffic efficiency of the intersection entrance road is ensured while the operation safety and stability of the mixed vehicle team are ensured.
Example 1
Referring to fig. 5, 6 and 7, the following describes a method for controlling the collaborative guidance of an intersection entrance roadway mixed vehicle according to the present invention with reference to fig. 5 to 7.
The implementation scene is that an urban cross intersection is arranged in a traffic simulation platform, all inlets and outlets are three lanes, the traditional fixed signal timing is adopted in signal timing, the period duration is 70s, the red light time is 50s, and the green light time is 20s. The intersection scene setting is specifically shown in fig. 5, and the multi-agent ecological cooperative control model provided by the invention is realized in MATLAB through a COM interface of a traffic simulation platform.
FIG. 6 shows a schematic diagram of the operation of the multi-agent cooperative control model in the traffic simulation platform. In order to prove the effectiveness of the multi-agent reinforcement learning-based intersection entrance roadway mixed vehicle team collaborative guiding control method, the vehicle team average delay, the average travel time and the average energy consumption are adopted as evaluation indexes in the embodiment, and the optimization result of the model is evaluated. Let T 0_1,…,T0_i,...T0_n denote the time when the vehicle enters the control area, T d_1,…Td_ i...Td_n denote the time when the vehicle leaves the stop line of the intersection, F f_i and F e_i are the instantaneous fuel consumption of FV and the instantaneous power consumption of EV respectively, n 1 denote the number of passing fuel vehicles, n 2 denote the number of passing fuel vehicles, then:
Average travel time:
Average delay:
Average energy consumption:
In order to verify the effectiveness of the invention, vehicles are released at 300m from the stop line of the intersection based on the experimental scene, the 3 evaluation indexes are adopted to be compared with a non-control method, and the comparison results of the indexes are shown in the following table. The automatic driving vehicle has no network connection function in the non-control method, and the running of the vehicle is the same as that of the manual driving vehicle.
TABLE 1 comparison of various indices of proposed method and uncontrolled method for different CAV permeabilities and traffic flows
As can be seen from the table, under the condition of fixed traffic, with the improvement of CAV permeability, the average delay time and average energy consumption of a motorcade at an intersection are gradually reduced, and compared with a non-control method, the maximum delay and average energy consumption of the method provided by the invention are reduced. The reason is that as CAV vehicles increase, the number of controllable agents in the fleet increases, and delay and energy consumption decrease accordingly. When the CAV permeability reaches 100% and the traffic volume is 1200pcu/h, the average delay reduction rate of the method provided by the invention reaches 16.31% relative to the uncontrolled method, because when all the vehicles are CAV vehicles, all the vehicles operate according to the optimized result of the method provided by the invention.
When the CAV permeability is fixed, the average delay time and the average energy consumption of a vehicle team tend to be reduced firstly and then reduced, because under the condition of low flow, the mutual influence among vehicles is smaller, and as vehicles on a road increase, the CAV has positive influence on the following RV vehicle, so that the delay and the energy consumption are reduced, when the traffic reaches 1000pcu/h, the index reaches the minimum value, then the traffic is further improved, the queuing behavior of the vehicle is gradually obvious, the running speed of the vehicle is further limited, and the CAV cannot execute an optimization strategy, so that the control effect is greatly reduced.
Fig. 7 shows a running track diagram of a vehicle under the control of the multi-agent cooperative control model provided by the invention, wherein a dark curve is a CAV running track, and a light curve is an RV running track. The track in the wire frame can be seen that the front intelligent body can completely pass through the intersection, but the rear intelligent body cannot pass through in the current green light time, if the vehicle continues to travel with the front intelligent body, the vehicle is required to stop at the intersection and wait for the green light to turn on, and the passing mode can definitely improve the whole energy level of a vehicle team and simultaneously reduce the vehicle passing efficiency. However, under the multi-agent ecological cooperative control model provided by the invention, the following agents avoid long-time idle waiting of red light time by adjusting the speed track of the following agents, reach a parking line and completely pass through the parking line at the beginning of the next green light time, and the overall passing efficiency of a motorcade is improved.