[go: up one dir, main page]

CN120048123A - Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning - Google Patents

Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning Download PDF

Info

Publication number
CN120048123A
CN120048123A CN202510521087.5A CN202510521087A CN120048123A CN 120048123 A CN120048123 A CN 120048123A CN 202510521087 A CN202510521087 A CN 202510521087A CN 120048123 A CN120048123 A CN 120048123A
Authority
CN
China
Prior art keywords
vehicle
network
strategy
module
collaborative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510521087.5A
Other languages
Chinese (zh)
Other versions
CN120048123B (en
Inventor
鲜利
邓艺舟
田文
谭双娅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Kairui Robot Technology Co ltd
Original Assignee
Chongqing Kairui Robot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Kairui Robot Technology Co ltd filed Critical Chongqing Kairui Robot Technology Co ltd
Priority to CN202510521087.5A priority Critical patent/CN120048123B/en
Publication of CN120048123A publication Critical patent/CN120048123A/en
Application granted granted Critical
Publication of CN120048123B publication Critical patent/CN120048123B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0125Traffic data processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0108Measuring and analyzing of parameters relative to traffic conditions based on the source of data
    • G08G1/0112Measuring and analyzing of parameters relative to traffic conditions based on the source of data from the vehicle, e.g. floating car data [FCD]
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0108Measuring and analyzing of parameters relative to traffic conditions based on the source of data
    • G08G1/0116Measuring and analyzing of parameters relative to traffic conditions based on the source of data from roadside infrastructure, e.g. beacons
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/01Detecting movement of traffic to be counted or controlled
    • G08G1/0104Measuring and analyzing of parameters relative to traffic conditions
    • G08G1/0137Measuring and analyzing of parameters relative to traffic conditions for specific applications
    • G08G1/0145Measuring and analyzing of parameters relative to traffic conditions for specific applications for active traffic flow control
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/07Controlling traffic signals
    • G08G1/08Controlling traffic signals according to detected number or speed of vehicles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • H04W4/44Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for communication between vehicles and infrastructures, e.g. vehicle-to-cloud [V2C] or vehicle-to-home [V2H]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a vehicle-road collaborative dynamic scheduling system and a vehicle-road collaborative dynamic scheduling method for multi-agent reinforcement learning, which relate to the field of artificial intelligence and intelligent traffic and comprise a perception layer, a decision layer and an execution layer. The sensing layer is provided with a road side and a vehicle sensing module, the sensing layer is provided with environment and vehicle information, the decision layer receives the information, the information is mapped to a multidimensional topological space through a topological dynamic state representation unit, the nonlinear dimensionality reduction and manifold space representation is realized, a double-network collaborative decision unit comprises a vehicle dynamic distribution network VDDPG and a road side dynamic scheduling network RDDPG, a vehicle and road side scheduling strategy is respectively generated, strategy matrix decomposition is converted into low-rank representation, a multi-agent collaborative optimization unit is used for constructing an agent relation graph, a communication strategy is optimized based on information entropy, a collaborative scheduling strategy is inferred by using variation, an optimized collaborative scheduling scheme is generated, an execution layer receiving scheme is realized, and a vehicle execution module and a road side control module execute instructions and feed back results to the sensing layer to form closed-loop control.

Description

Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning
Technical Field
The invention relates to the field of artificial intelligence and intelligent traffic, in particular to a vehicle-road collaborative dynamic scheduling system and a vehicle-road collaborative dynamic scheduling method based on a multi-agent reinforcement learning technology, which are used for realizing collaborative optimization management of intelligent vehicles and road side facilities.
Background
With the acceleration of the urban process, the problem of urban traffic jam is increasingly severe, and the travel efficiency and the life quality of residents are seriously affected. The traditional traffic management mode mainly depends on the periodic control and manual intervention of a fixed signal lamp, and cannot adapt to the dynamically-changed traffic flow, so that the traffic resource utilization rate is low. Along with the rapid development of the internet of vehicles technology and artificial intelligence in recent years, the cooperation of vehicles and roads is gradually rising as the core technology of the next-generation intelligent transportation system.
At present, the vehicle-road cooperation technology is mainly divided into three types, namely a centralized control-based method for uniformly dispatching vehicles and road side facilities through a central server, a distributed control-based method for realizing cooperation through local decisions of the vehicles and the road side facilities, and a reinforcement learning-based method for learning an optimal control strategy through interaction of an agent and an environment. However, the prior art has the defects that the calculation complexity of the centralized control method is high, the real-time requirement is difficult to meet, the global optimum cannot be ensured by the distributed control method, and the problem of high-dimensional state space and multi-agent collaborative decision in the vehicle road environment is difficult to process by the existing reinforcement learning method.
In order to solve the above-mentioned problems, a multi-agent reinforcement learning system capable of efficiently processing a cooperative decision of a vehicle and a road in a complex traffic environment is needed.
Disclosure of Invention
The invention aims to provide a vehicle-road collaborative dynamic scheduling system and a vehicle-road collaborative dynamic scheduling method for multi-agent reinforcement learning, and aims to solve the problems that in the prior art, complex traffic environment state characterization, vehicle-road collaborative decision, multi-agent information interaction and the like are difficult to process efficiently.
The invention discloses a vehicle-road collaborative dynamic scheduling system for multi-agent reinforcement learning, which comprises the following steps:
The sensing layer comprises a road side sensing module and a vehicle sensing module and is used for collecting environment state information and vehicle information;
The decision layer is in communication connection with the perception layer and comprises:
The topological dynamic state characterization unit is used for receiving the environment state information and the vehicle information, mapping the environment state information and the vehicle information to a multidimensional topological space to form a topological state representation, and forming a manifold space representation through nonlinear dimension reduction;
The dual-network collaborative decision-making unit is configured to receive the manifold space representation, and includes a vehicle dynamic allocation network VDDPG and a roadside dynamic scheduling network RDDPG, where the VDDPG is configured to generate a vehicle scheduling policy, the RDDPG is configured to generate a roadside scheduling policy, and convert the vehicle scheduling policy and the roadside scheduling policy into a low-rank representation through policy matrix decomposition;
the multi-agent collaborative optimization unit is used for constructing an agent relation diagram, optimizing a communication strategy in the agent relation diagram based on information entropy, coordinating the vehicle scheduling strategy and the road side scheduling strategy through a variation inference method, and generating an optimized collaborative scheduling scheme;
And the execution layer is in communication connection with the decision layer and comprises a vehicle execution module and a road side control module, and is used for receiving the cooperative scheduling scheme, executing a corresponding scheduling instruction and feeding back an execution result to the perception layer to form closed-loop control.
Preferably, the topology dynamic state characterization unit includes:
The state acquisition subunit is used for receiving the environment state information and the vehicle information, and preprocessing the environment state information and the vehicle information to generate standardized data;
the topology characterization subunit is used for mapping the standardized data to a multidimensional topology space, extracting topology features and establishing a local coordinate system;
And the dimension conversion subunit is used for performing nonlinear dimension reduction on the data of the multidimensional topological space and generating manifold space representation which keeps the key topological relation.
Preferably, the dual-network collaborative decision-making unit includes:
VDDPGActor a network for receiving a manifold space representation associated with a vehicle, generating a vehicle dispatch strategy;
VDDPGCRITIC network for evaluating the value of the vehicle scheduling policy;
RDDPGActor a network for receiving manifold space representation related to a road side and generating a road side scheduling policy;
RDDPGCRITIC a network for evaluating the value of the roadside scheduling policy;
the strategy characterization subunit is used for converting the vehicle scheduling strategy and the road side scheduling strategy into strategy matrixes and executing matrix decomposition to generate low-rank representation;
And the collaborative optimization subunit is used for identifying potential conflict between the vehicle scheduling strategy and the road side scheduling strategy and adjusting strategy parameters through a conjugate gradient method.
Preferably, the multi-agent cooperative optimization unit includes:
The intelligent agent relation building module unit is used for building an intelligent agent relation diagram based on the current traffic condition and identifying the condition dependency relation among the intelligent agents;
the message transmission optimizing subunit is used for calculating the information entropy of each communication channel in the intelligent agent relation diagram, generating an optimal message transmission strategy and distributing communication resources;
The global consistency optimization subunit is used for decomposing the global optimization target into local sub-targets, coordinating the local decisions through a variation inference method, and verifying the global consistency of the final decisions.
Preferably, the sensing layer further includes:
the data preprocessing module is used for filtering, denoising and standardizing the environment state information and the vehicle information;
the state buffer module is used for storing historical environment state information and vehicle information and supporting time sequence analysis;
And the attention distribution module is used for adjusting the distribution strategy of the perceived resources according to the feedback of the decision layer.
Preferably, the execution layer further includes:
the instruction analysis module is used for converting the collaborative scheduling scheme into a specific control instruction;
the execution monitoring module is used for monitoring the execution condition of the control instruction and identifying an abnormal state;
The execution history recording module is used for maintaining execution history data for reference of a decision layer;
and the degradation processing module is used for executing a preset degradation strategy under the condition of communication interruption or equipment failure.
Preferably, the vehicle execution module and the roadside control module are provided with:
a timing synchronization unit for ensuring time synchronization of the vehicle control instruction and the road side control instruction;
the conflict detection unit is used for detecting potential conflicts in the execution process in real time and triggering emergency treatment;
and the cooperative effect evaluation unit is used for quantitatively evaluating the execution effect of the vehicle-road cooperative control and generating an effect evaluation report.
Preferably, the VDDPG network and the RDDPG network exchange information through a shared hidden layer, wherein:
The shared hidden layer receives common features of the manifold space representation and outputs intermediate feature representations;
the VDDPG network and the RDDPG network respectively receive the intermediate feature representations and generate corresponding strategy output by combining respective specific input features;
the VDDPG network and the RDDPG network realize collaborative parameter updating through a gradient locking mechanism, so as to ensure policy coordination consistency.
Preferably, the system further comprises:
the experience playback module is used for storing historical data of system and environment interaction and supporting offline batch learning;
the target network updating module is used for copying parameters from the main network to the target network at regular intervals, so as to ensure learning stability;
the self-adaptive learning rate adjusting module is used for dynamically adjusting the network learning rate according to training progress;
the model evaluation and deployment module is used for evaluating the performance of the model and deploying the trained model to the production environment.
The vehicle-road collaborative dynamic scheduling method for multi-agent reinforcement learning is applied to the vehicle-road collaborative dynamic scheduling system for multi-agent reinforcement learning, and comprises the following steps:
initializing a system, including initializing VDDPG network and RDDPG network parameters, establishing communication connection, loading road network topology structure and historical traffic data;
Collecting state information, including collecting environmental state information through a road side sensing module and collecting vehicle information through a vehicle sensing module;
Performing topology state characterization, including mapping state information to a multidimensional topology space, extracting topology features, forming manifold space representations by nonlinear dimension reduction;
Generating a double-network decision, wherein the double-network decision comprises VDDPGActor of generating a vehicle dispatching strategy by a network, RDDPGActor of generating a road side dispatching strategy and generating a low-rank representation by executing strategy matrix decomposition;
Executing collaborative optimization, including constructing an agent relation graph, optimizing a message passing strategy, coordinating local decisions through a variation inference method, and verifying global consistency;
executing a scheduling scheme, wherein the scheduling scheme comprises the steps of converting an optimized decision into a specific execution instruction and a vehicle and road side facility execution control instruction;
Collecting execution feedback, including monitoring execution results, updating environmental states, updating network parameters and experience pools based on the execution results;
iterative optimization, repeatedly executing the steps, and continuously optimizing the scheduling effect.
The invention adopts three innovative mechanisms of topology dynamic state representation, matrix decomposition driven double-network collaborative optimization and probability map driven multi-agent collaborative decision, and constructs a complete multi-agent reinforcement learning vehicle-road collaborative dynamic scheduling system, which has the following beneficial effects:
1) The high-dimensional complex traffic environment state is mapped to the low-dimensional manifold space through the topology dynamic state representation mechanism, key topology features are reserved, the calculation complexity is reduced, and the adaptability of the system to complex traffic scenes is improved.
2) The cooperative decision of the vehicle network and the road side network is realized through a matrix decomposition driven double-network cooperative optimization mechanism, the conflict problem of the vehicle strategy and the road side strategy is solved, and the scheduling efficiency is improved.
3) The information interaction strategy among the agents is optimized through the multi-agent collaborative decision mechanism driven by the probability map, the communication cost is reduced, and the consistency of the local decision and the global optimization target is ensured.
4) The system is designed in a layering and modularization way, has good expandability and adaptability, and can adapt to urban traffic environments with different scales and complexities.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.
FIG. 1 is a schematic diagram of the overall architecture of a vehicle-road collaborative dynamic scheduling system for multi-agent reinforcement learning according to the present invention.
FIG. 2 is a schematic diagram of the topology dynamic state characterization unit of the present invention.
Fig. 3 is a schematic diagram of the structure of the dual-network collaborative decision-making unit of the present invention.
FIG. 4 is a schematic diagram of the architecture of the multi-agent co-optimization unit of the present invention.
FIG. 5 is a flow chart of the vehicle-road cooperative dynamic scheduling method of the invention.
FIG. 6 is a schematic diagram of a topology state characterization process in an embodiment of the present invention.
Fig. 7 is a schematic diagram of a dual-network collaborative optimization process in an embodiment of the invention.
FIG. 8 is a schematic diagram of a multi-agent collaborative decision-making process in an embodiment of the invention.
Detailed Description
The following detailed description of the preferred embodiments of the present invention is provided in conjunction with the accompanying drawings, it being understood that the preferred embodiments described herein are merely illustrative and explanatory of the invention, and are not restrictive of the invention.
Referring to fig. 1, the vehicle-road collaborative dynamic scheduling system for multi-agent reinforcement learning provided by the invention comprises three main parts of a perception layer 101, a decision layer 102 and an execution layer 103.
The sensing layer 101 includes a road side sensing module 111 and a vehicle sensing module 112 for collecting environmental status information and vehicle information. The road side sensing module 111 may be various sensing devices distributed in the urban road network, such as cameras, radars, signal lamp controllers, etc., for collecting environmental information such as traffic flow, signal lamp states, road network structures, etc. The vehicle awareness module 112 may be various sensors and communication devices mounted on the vehicle for collecting vehicle information such as vehicle position, speed, direction, destination, etc.
The decision layer 102 is a core part of the system and comprises a topology dynamic state characterization unit 121, a dual-network collaborative decision unit 122 and a multi-agent collaborative optimization unit 123. The topology dynamic state characterization unit 121 is configured to receive the environmental state information and the vehicle information, map these information to a multidimensional topology space to form a topology state representation, and form the manifold space representation by nonlinear degradation. The dual-network collaborative decision-making unit 122 includes a vehicle dynamic allocation network VDDPG and a roadside dynamic scheduling network RDDPG, which are respectively used to generate a vehicle scheduling policy and a roadside scheduling policy, and convert these policies into a low-rank representation through policy matrix decomposition. The multi-agent collaborative optimization unit 123 is configured to construct an agent relationship graph, optimize a communication policy, coordinate decisions of each agent, and generate a final collaborative scheduling scheme.
The execution layer 103 includes a vehicle execution module 131 and a roadside control module 132, which are configured to receive the cooperative scheduling scheme, execute a corresponding scheduling instruction, and feed back an execution result to the perception layer 101 to form closed-loop control. The vehicle execution module 131 is responsible for controlling the traveling behavior of the vehicle, such as adjusting speed, rerouting, etc. The roadside control module 132 is responsible for controlling the operating state of the roadside facility, such as adjusting signal lights timing, limiting entry, etc.
Referring to fig. 2, the topology dynamic state characterization unit 121 includes a state acquisition subunit 1211, a topology characterization subunit 1212, and a dimension conversion subunit 1213.
The state acquisition subunit 1211 is configured to receive the environmental state information and the vehicle information from the sensing layer 101, and perform preprocessing on these raw data to generate standardized data. In one embodiment of the invention, the preprocessing includes denoising, filtering, normalization, and the like. For example, for vehicle speed data, average filtering may be used to remove noise, and then maximum and minimum normalization may be performed to map the speed value to the [0,1] interval.
The topology characterization subunit 1212 is configured to map the standardized data to a multidimensional topology space, extract topology features, and establish a local coordinate system. Specifically, a state space of a vehicle environment is first definedWherein the vehicle state (position, speed, direction) and the road side state (traffic light state, road network topology) respectively form subspacesAnd. For each status pointAnd realizing accurate expression of the vehicle road environment through the local coordinate graph. Preferably, the topology characterization subunit 1212 also builds a slave state spaceTo policy spaceDifferential embryo mapping of (C)The topology invariance between the state change and the strategy adjustment is ensured, so that the system keeps stability in a complex traffic environment. The differential stratospheric map can be defined as follows:
,
Wherein, As a point of the state of the device,As a matrix of weights, the weight matrix,As a function of the non-linear activation,Is a bias vector. By the mapping, points with similar topological structures in the state space can be mapped to similar points in the strategy space, so that the continuity and stability of the strategy are ensured.
The dimension conversion subunit 1213 is configured to perform nonlinear dimension reduction on the data of the multidimensional topological space to generate a manifold space representation that retains the key topological relationships. In one embodiment of the invention, a Local Linear Embedding (LLE) algorithm may be employed to achieve nonlinear dimension reduction. The core idea of LLE algorithm is to maintain the local linear relationship of the data points while reducing the dimension globally non-linearly. Specifically, for each data pointFirst find itNearest neighbor pointThen calculate the weight matrixSo thatCan be reconstructed from the linear combination of its neighbors:
,
Wherein, Representation pointsPoint-to-pointIs satisfied by the contribution weight of (1). Then, find representations in the low-dimensional space that satisfy the same weight relationships
,
This results in a representation of the original high-dimensional data on the low-dimensional manifold. In practical application, the number of neighbors can be dynamically adjusted according to the complexity of traffic scenesFor example, in the case of large traffic flows, a larger value can be selectedValues (e.gTo capture more interactions, and to select smaller ones in case of small traffic flowValues (e.g) To reduce the amount of computation.
The policy characterization subunit 1225 is configured to convert the vehicle scheduling policy and the roadside scheduling policy into a policy matrix, and perform matrix decomposition to generate a low-rank representation. Specifically, the policies of two networks are represented as a higher order matrixIt is then decomposed into a combination of core tensors and factor matrices by a Tucker decomposition:
,
Wherein, As a function of the core tensor,As a matrix of factors,Representing edge numberTensor-matrix product of individual modes. In this way, the complexity of the policy representation can be substantially reduced. For example, for oneIf it is reduced to the rankThe parameter can be selected fromDown toThe storage and computation requirements are greatly reduced.
The collaborative optimization subunit 1226 is configured to identify a potential conflict between a vehicle scheduling policy and a road side scheduling policy, and adjust policy parameters by a conjugate gradient method. In one particular embodiment, collaborative optimization may be expressed as the following optimization problem:
,
Wherein, AndParameters of VDDPG and RDDPG networks respectively,AndWhich are independent loss functions of the two networks respectively,Is a joint loss function, used for measuring the coordination degree of two network output strategies,Is a trade-off coefficient for adjusting the ratio of independent optimization to collaborative optimization.
,
Wherein, Representing a vehicleIs used for the action of (a),Representing roadside facilitiesIs used for the action of (a),Representing a measure of conflict between two actions,Representing a vehicleRoad side facilityProbability of interaction between.
Referring to fig. 4, the multi-agent collaborative optimization unit 123 includes an agent relationship creation subunit 1231, a message passing optimization subunit 1232, and a global consistency optimization subunit 1233.
The agent relationship creation module 1231 is used to create an agent relationship graph based on the current traffic conditions, and identify the condition dependency relationship between agents. Specifically, the multi-vehicle and roadside devices are modeled as markov random fields g= (V, E), where node V represents an agent (e.g., vehicle, signal light, etc.), and edge E represents an interaction relationship between agents (e.g., a following relationship between vehicles, a control relationship between vehicles and signal light, etc.). Through conditional random field theory, the dependency relationship among multiple agents can be captured, and formalized expressed as:
,
Wherein, Representing a set of states of the agent,Represents the observed variable(s),Is a normalization factor that is used to normalize the data,Is defined in a groupA potential function of the upper. In the present system, the position, speed, etc. of the vehicle may be used as a state variable, and the environment awareness information may be used as an observation variable. The message transmission optimizing subunit 1232 is configured to calculate information entropy of each communication channel in the agent relationship graph, generate an optimal message transmission policy, and allocate communication resources. Based on the information entropy theory, the information value of each communication channel can be estimated:
,
Wherein, Expressed in known agentIn the case of a state, the agentThe lower the conditional entropy of the state, the higher the communication value. Based on this, a communication policy can be designed to preferentially allocate resources to communication channels with high information value.
For example, in practical application, when two vehicles are relatively close in distance and relatively high in speed, communication value between the two vehicles is relatively high and should be guaranteed preferentially, and when two vehicles are relatively far apart or are in different road sections, communication frequency can be reduced to save resources. Experiments prove that the communication overhead can be reduced by more than 40% on the premise of keeping 90% of communication effect by adopting the communication strategy based on the information entropy.
The global consistency optimization subunit 1233 is configured to decompose the global optimization objective into local sub-objectives, coordinate the local decisions by using a variation inference method, and verify global consistency of the final decisions. In particular, by the variational Bayesian method, complex posterior distributions can be distributedApproximately simpler distribution:
,
The goal is to minimize the KL divergence between the two distributions:
,
optimizing local distribution of agents by iteration And finally, achieving globally consistent decision.
In one embodiment of the invention, the global optimization objective may be set to minimize system overall delay time, maximize traffic flow, minimize energy consumption, etc. The target can be decomposed into local targets of all the intelligent agents, and the consistency of local decisions and global targets is ensured through a variation inference method. For example, for the goal of minimizing the overall delay time of the system, it may be broken down into local goals of minimizing the transit delay at each intersection, optimizing the routing of each vehicle, etc. Through a messaging mechanism, these local decisions can be coordinated to jointly achieve a global goal.
In one embodiment of the present invention, the perception layer 101 further includes a data preprocessing module, a state caching module, and an attention distribution module.
The data preprocessing module is used for filtering, denoising and standardizing the environmental state information and the vehicle information. In a specific implementation, a kalman filter can be used for filtering vehicle position and speed data, average filtering is used for smoothing traffic flow data, and then maximum and minimum value standardization or Z-score standardization is carried out, so that data of different types and scales can be processed under the same frame.
The state buffer module is used for storing historical environment state information and vehicle information and supporting time sequence analysis. By maintaining a state sequence in a time window, the system can analyze the variation trend of traffic parameters, predict the future traffic condition and make scheduling decisions in advance. For example, by analyzing traffic flow changes over the past 30 minutes, one can predict the flow trend for the next 15 minutes, adjusting the signal timing accordingly.
The attention allocation module is configured to adjust an allocation policy of the perceived resource according to feedback from the decision layer 102. In the case of limited resources, the perceived need for different regions, different times, is different. The attention distribution module can dynamically adjust the distribution of the perceived resources according to the current traffic conditions and decision requirements, such as increasing sampling frequency in areas with large traffic flow, improving data precision at key intersections, and the like.
In another embodiment of the present invention, the execution layer 103 further includes an instruction parsing module, an execution monitoring module, an execution history module, and a degradation processing module.
The instruction analysis module is used for converting the cooperative scheduling scheme into a specific control instruction. For example, a high-level command to reduce traffic flow is converted to specific control parameters that adjust the signal lights for 30 seconds for red and 45 seconds for green.
The execution monitoring module is used for monitoring the execution condition of the control instruction and identifying an abnormal state. An emergency handling mechanism may be triggered when the system detects that the execution deviation exceeds a threshold. For example, when the magnitude of the actual deceleration of the vehicle is less than the demand for the command, the system may issue a warning and adjust the subsequent command.
The execution history module is used for maintaining execution history data for reference by the decision layer 102. By analyzing the historical execution data, the system can learn the actual effect of instruction execution and optimize the decision model. For example, by analyzing the actual effects of different timing schemes, the relationship between traffic flow and signal timing is learned.
The degradation processing module is used for executing a preset degradation strategy under the condition of communication interruption or equipment failure. The system designs a multi-level degradation strategy to ensure that basic functionality is maintained in various abnormal situations. For example, when the vehicle road communication is interrupted, the vehicle can switch to a local decision mode, and when the central server fails, the road side controller can adopt a preset fixed timing scheme.
In still another embodiment of the present invention, a timing synchronization unit, a collision detection unit, and a synergistic effect evaluation unit are provided between the vehicle execution module 131 and the roadside control module 132.
The timing synchronization unit is used for ensuring time synchronization of the vehicle control command and the road side control command. In a distributed system, time synchronization of different nodes is a critical issue. The time sequence synchronization unit adopts Network Time Protocol (NTP) to realize clock synchronization of each node in the system, and ensures that the control instruction is executed according to the correct time sequence.
The conflict detection unit is used for detecting potential conflicts in the execution process in real time and triggering emergency treatment. For example, when the system detects that a plurality of vehicles possibly simultaneously drive into the same road section to cause congestion, the scheduling scheme can be adjusted in advance, so that collision is avoided.
The cooperative effect evaluation unit is used for quantitatively evaluating the execution effect of the vehicle-road cooperative control and generating an effect evaluation report. The system designs multidimensional evaluation indexes including average delay time, energy consumption, system throughput and the like, and comprehensively evaluates the scheduling effect through the indexes to provide basis for system optimization.
In one embodiment of the invention, information exchange is achieved between VDDPG and RDDPG networks through a shared hidden layer.
In particular, the shared hidden layer receives common features of the manifold spatial representation, outputting an intermediate feature representation. The VDDPG network and RDDPG network each receive this intermediate feature representation and, in combination with the respective specific input features, generate corresponding policy outputs. This design enables two networks to share a basic understanding of the environment while maintaining their own expertise.
In addition, VDDPG network and RDDPG network realize collaborative parameter updating through gradient locking mechanism, and ensure policy coordination consistency. The core idea of the gradient lock mechanism is to consider the interaction of two networks in the gradient update process:
,
,
Wherein, Is a factor between 0 and 1 for controlling the intensity of the co-optimization. In practical application, the system can be dynamically adjusted according to the stability of the systemFor example, a smaller value (e.g., 0.1) is used in the early stage of training to ensure convergence, and gradually increases to 0.5 or higher as training proceeds to enhance the synergistic effect.
In one embodiment of the invention, the system further comprises an experience playback module, a target network update module, an adaptive learning rate adjustment module, and a model evaluation and deployment module.
The experience playback module is used for storing historical data of system and environment interaction and supporting offline batch learning. Specifically, the current state of the agent's interaction with the environment, the selected action, the next state, and the rewards obtained are stored as a quadruple (s, a, s', r) in an experience pool. In the training process, batch data are randomly extracted from the experience pool for learning, so that time sequence correlation among samples is broken, and learning stability and learning efficiency are improved.
The target network updating module is used for copying parameters from the main network to the target network regularly, so that learning stability is ensured. In deep reinforcement learning, using a single network for both value estimation and target calculation may lead to instability, and therefore a slowly updated target network is typically maintained. The parameter updating of the target network adopts a soft updating strategy:
,
Wherein, Is a soft update coefficient, typically taking a small value such as 0.01 to ensure stability of the target network.
The self-adaptive learning rate adjustment module is used for dynamically adjusting the network learning rate according to training progress. In the initial stage of training, a larger learning rate (e.g. 0.001) can be used for quick exploration, and the learning rate is gradually reduced (e.g. to 0.0001) to realize fine adjustment as training progresses. In addition, the learning rate can be adaptively adjusted according to the change trend of the loss function, for example, when the loss is continuous for a plurality of rounds and does not drop, the learning rate is reduced.
The model evaluation and deployment module is used for evaluating the performance of the model and deploying the trained model to the production environment. The system designed a complete model evaluation flow, including offline evaluation and online a/B testing. Before deployment, the model is subjected to pressure test and safety evaluation, so that the model can work normally under various conditions.
The invention also provides a vehicle-road collaborative dynamic scheduling method for multi-agent reinforcement learning, which is applied to the vehicle-road collaborative dynamic scheduling system for multi-agent reinforcement learning and comprises the following steps:
1. Initializing the system, including initializing VDDPG network and RDDPG network parameters, establishing communication connections, loading road network topology and historical traffic data. In this step, network parameters may be initialized randomly or a pre-trained model, the communication connection is ensured to be stable by adopting standard protocols such as MQTT, and road network data and historical traffic data are used for preliminary configuration of the system and model pre-training.
2. And collecting state information, including collecting environmental state information through a road side sensing module and collecting vehicle information through a vehicle sensing module. The environment state information includes traffic flow, signal lamp state, road network structure, etc. and the vehicle information includes position, speed, direction, destination, etc. The system acquires the data in real time through a sensor network, and the sampling frequency is dynamically adjusted according to the complexity of the scene, and is generally 5-10 Hz.
3. Performing topology state characterization includes mapping state information to a multidimensional topology space, extracting topology features, and shaping the space representation by nonlinear decontamination. The step adopts the topological dynamic state characterization mechanism described in detail above to compress the high-dimensional complex traffic environment state into a low-dimensional representation, and key topological features are reserved.
4. Generating a dual network decision includes VDDPGActor a network generating a vehicle scheduling policy, RDDPGActor a network generating a roadside scheduling policy, and performing a policy matrix decomposition to generate a low-rank representation. The method adopts the double-network collaborative decision mechanism described in detail above, respectively processes the vehicle strategy and the road side strategy through two special networks, and reduces the complexity through matrix decomposition.
5. Performing collaborative optimization, including building an agent relationship graph, optimizing a message passing strategy, coordinating local decisions by a variation inference method, and verifying global consistency. The step adopts the multi-agent cooperative decision mechanism described in detail above to ensure that the local decisions of all agents can be coordinated and consistent, and the global optimization target is realized together.
6. Executing the scheduling scheme, wherein the scheduling scheme comprises the step of converting the optimized decision into a specific execution instruction and a vehicle and road side facility execution control instruction. The vehicle control instructions comprise speed adjustment, path planning and the like, and the road side control instructions comprise signal lamp timing adjustment, lane allocation and the like. The system ensures that the instructions are executed at the correct timing and monitors the execution in real time.
7. The execution feedback is collected, including monitoring execution results, updating environmental conditions, updating network parameters and experience pools based on the execution results. The system collects the environmental change and the vehicle state after execution through the sensor network, calculates the difference between the actual effect and the expected effect, and generates a reward signal for updating network parameters.
8. Iterative optimization, repeatedly executing the steps, and continuously optimizing the scheduling effect. The system continuously operates, continuously learns and adapts to the changed traffic environment, and gradually improves the dispatching efficiency and effect.
The multi-agent reinforcement learning vehicle-road collaborative dynamic scheduling system provided by the invention has good performance in practical application. Through test verification in a plurality of urban traffic scenes, the system can obviously improve traffic efficiency, reduce congestion and reduce energy consumption.
Specifically, compared with the traditional fixed timing signal control, the system can reduce the average delay time of a vehicle by more than 30%, compared with the simple self-adaptive signal control, the delay time can be reduced by 15%, and under a complex traffic scene, the computing resource requirement of the system is reduced by 50% compared with a centralized decision architecture, the communication bandwidth requirement is reduced by 40%, and the expandability of the system is greatly improved.
In addition, the system has good adaptability and robustness, and can cope with abnormal conditions such as abrupt change of traffic flow, equipment failure, communication interruption and the like, and ensure the stable operation of the traffic system.
The foregoing description is only of the preferred embodiments of the invention and is not intended to limit the invention thereto. Various obvious changes and modifications to the present invention may be made by those skilled in the art without departing from the scope of the invention, and such changes and modifications are intended to be within the scope of the invention.

Claims (10)

1.多智能体强化学习的车路协同动态调度系统,其特征在于,包括:1. A multi-agent reinforcement learning vehicle-road cooperative dynamic scheduling system, characterized by including: 感知层,包括路侧感知模块和车辆感知模块,用于采集环境状态信息和车辆信息;The perception layer includes a roadside perception module and a vehicle perception module, which are used to collect environmental status information and vehicle information; 决策层,与所述感知层通信连接,包括:The decision layer is connected to the perception layer in communication, and includes: 拓扑动态状态表征单元,用于接收所述环境状态信息和所述车辆信息,将所述环境状态信息和所述车辆信息映射到多维拓扑空间形成拓扑状态表示,并通过非线性降维生成流形空间表示;A topological dynamic state representation unit, configured to receive the environmental state information and the vehicle information, map the environmental state information and the vehicle information to a multidimensional topological space to form a topological state representation, and generate a manifold space representation by nonlinear dimensionality reduction; 双网络协同决策单元,用于接收所述流形空间表示,包括车辆动态分配网络VDDPG和路侧动态调度网络RDDPG,其中所述VDDPG用于生成车辆调度策略,所述RDDPG用于生成路侧调度策略,并通过策略矩阵分解将所述车辆调度策略和所述路侧调度策略转换为低秩表示形式;A dual-network collaborative decision-making unit, used to receive the manifold space representation, including a vehicle dynamic allocation network VDDPG and a roadside dynamic dispatch network RDDPG, wherein the VDDPG is used to generate a vehicle dispatch strategy, and the RDDPG is used to generate a roadside dispatch strategy, and convert the vehicle dispatch strategy and the roadside dispatch strategy into a low-rank representation through strategy matrix decomposition; 多智能体协同优化单元,用于构建智能体关系图,基于信息熵优化所述智能体关系图中的通信策略,通过变分推断方法协调所述车辆调度策略和所述路侧调度策略,生成优化后的协同调度方案;A multi-agent collaborative optimization unit, used to construct an agent relationship graph, optimize the communication strategy in the agent relationship graph based on information entropy, coordinate the vehicle scheduling strategy and the roadside scheduling strategy through a variational inference method, and generate an optimized collaborative scheduling solution; 执行层,与所述决策层通信连接,包括车辆执行模块和路侧控制模块,用于接收所述协同调度方案,执行相应的调度指令,并将执行结果反馈至所述感知层形成闭环控制。The execution layer is in communication with the decision layer, and includes a vehicle execution module and a roadside control module, which are used to receive the collaborative scheduling scheme, execute corresponding scheduling instructions, and feed back the execution results to the perception layer to form a closed-loop control. 2.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述拓扑动态状态表征单元包括:2. The multi-agent reinforcement learning vehicle-road cooperative dynamic scheduling system according to claim 1, characterized in that the topological dynamic state representation unit comprises: 状态采集子单元,用于接收所述环境状态信息和所述车辆信息,对所述环境状态信息和所述车辆信息进行预处理生成标准化数据;A state acquisition subunit, used for receiving the environmental state information and the vehicle information, and preprocessing the environmental state information and the vehicle information to generate standardized data; 拓扑表征子单元,用于将所述标准化数据映射到多维拓扑空间,提取拓扑特征,建立局部坐标系;A topological characterization subunit, used to map the standardized data into a multidimensional topological space, extract topological features, and establish a local coordinate system; 维度转换子单元,用于对所述多维拓扑空间的数据执行非线性降维,生成保留关键拓扑关系的流形空间表示。The dimension conversion subunit is used to perform nonlinear dimensionality reduction on the data in the multidimensional topological space to generate a manifold space representation that retains key topological relationships. 3.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述双网络协同决策单元包括:3. The vehicle-road cooperative dynamic scheduling system based on multi-agent reinforcement learning according to claim 1, characterized in that the dual-network collaborative decision-making unit comprises: VDDPGActor网络,用于接收与车辆相关的流形空间表示,生成车辆调度策略;VDDPGActor network, used to receive the manifold space representation related to vehicles and generate vehicle scheduling strategies; VDDPGCritic网络,用于评估所述车辆调度策略的价值;VDDPGCritic network, used to evaluate the value of the vehicle dispatching strategy; RDDPGActor网络,用于接收与路侧相关的流形空间表示,生成路侧调度策略;RDDPGActor network, used to receive the manifold space representation related to the roadside and generate the roadside dispatch strategy; RDDPGCritic网络,用于评估所述路侧调度策略的价值;RDDPGCritic network, used to evaluate the value of the roadside dispatch strategy; 策略表征子单元,用于将所述车辆调度策略和所述路侧调度策略转换为策略矩阵,并执行矩阵分解生成低秩表示;A strategy representation subunit, used for converting the vehicle dispatch strategy and the roadside dispatch strategy into a strategy matrix, and performing matrix decomposition to generate a low-rank representation; 协同优化子单元,用于识别所述车辆调度策略与所述路侧调度策略的潜在冲突,通过共轭梯度法调整策略参数。The collaborative optimization subunit is used to identify potential conflicts between the vehicle scheduling strategy and the roadside scheduling strategy, and adjust strategy parameters through the conjugate gradient method. 4.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述多智能体协同优化单元包括:4. The vehicle-road cooperative dynamic scheduling system based on multi-agent reinforcement learning according to claim 1, characterized in that the multi-agent collaborative optimization unit comprises: 智能体关系建模子单元,用于基于当前交通状况构建智能体关系图,识别智能体间的条件依赖关系;The agent relationship modeling subunit is used to build an agent relationship graph based on the current traffic conditions and identify the conditional dependencies between agents; 消息传递优化子单元,用于计算智能体关系图中各通信渠道的信息熵,生成最优消息传递策略,分配通信资源;The message transmission optimization subunit is used to calculate the information entropy of each communication channel in the agent relationship graph, generate the optimal message transmission strategy, and allocate communication resources; 全局一致性优化子单元,用于将全局优化目标分解为局部子目标,通过变分推断方法协调局部决策,验证最终决策的全局一致性。The global consistency optimization subunit is used to decompose the global optimization objective into local sub-objectives, coordinate local decisions through variational inference methods, and verify the global consistency of the final decision. 5.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述感知层还包括:5. The multi-agent reinforcement learning vehicle-road cooperative dynamic scheduling system according to claim 1, characterized in that the perception layer also includes: 数据预处理模块,用于对所述环境状态信息和所述车辆信息进行滤波、去噪和标准化处理;A data preprocessing module, used for filtering, denoising and standardizing the environmental status information and the vehicle information; 状态缓存模块,用于存储历史环境状态信息和车辆信息,支持时序分析;The state cache module is used to store historical environment state information and vehicle information and support time series analysis; 注意力分配模块,用于根据决策层的反馈调整感知资源的分配策略。The attention allocation module is used to adjust the allocation strategy of perception resources according to the feedback from the decision layer. 6.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述执行层还包括:6. The multi-agent reinforcement learning vehicle-road cooperative dynamic scheduling system according to claim 1, characterized in that the execution layer also includes: 指令解析模块,用于将所述协同调度方案转换为具体的控制指令;An instruction parsing module, used to convert the collaborative scheduling scheme into specific control instructions; 执行监控模块,用于监控控制指令的执行情况,识别异常状态;An execution monitoring module is used to monitor the execution of control instructions and identify abnormal conditions; 执行历史记录模块,用于维护执行历史数据,供决策层参考;Execution history module, used to maintain execution history data for reference by decision-makers; 降级处理模块,用于在通信中断或设备故障情况下,执行预设的降级策略。The degradation processing module is used to execute the preset degradation strategy in the event of communication interruption or equipment failure. 7.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述车辆执行模块与所述路侧控制模块之间设有:7. The multi-agent reinforcement learning vehicle-road cooperative dynamic scheduling system according to claim 1, characterized in that the vehicle execution module and the roadside control module are provided with: 时序同步单元,用于确保车辆控制指令与路侧控制指令的时间同步;A timing synchronization unit, used to ensure the time synchronization between vehicle control instructions and roadside control instructions; 冲突检测单元,用于实时检测执行过程中的潜在冲突,触发应急处理;Conflict detection unit, used to detect potential conflicts in the execution process in real time and trigger emergency processing; 协同效果评估单元,用于定量评估车路协同控制的执行效果,生成效果评估报告。The collaborative effect evaluation unit is used to quantitatively evaluate the execution effect of vehicle-road collaborative control and generate an effect evaluation report. 8.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述VDDPG网络与所述RDDPG网络之间通过共享隐藏层实现信息交换,其中:8. The multi-agent reinforcement learning vehicle-road cooperative dynamic scheduling system according to claim 1, characterized in that the VDDPG network and the RDDPG network exchange information by sharing a hidden layer, wherein: 所述共享隐藏层接收所述流形空间表示的共同特征,输出中间特征表示;The shared hidden layer receives the common features represented in the manifold space and outputs an intermediate feature representation; 所述VDDPG网络和所述RDDPG网络分别接收所述中间特征表示,结合各自特定的输入特征,生成相应的策略输出;The VDDPG network and the RDDPG network respectively receive the intermediate feature representation, and generate corresponding strategy outputs in combination with their respective specific input features; 所述VDDPG网络和所述RDDPG网络通过梯度锁定机制实现协同参数更新,确保策略协调一致。The VDDPG network and the RDDPG network implement collaborative parameter updates through a gradient locking mechanism to ensure strategy coordination and consistency. 9.根据权利要求1所述的多智能体强化学习的车路协同动态调度系统,其特征在于,所述系统还包括:9. The vehicle-road cooperative dynamic scheduling system based on multi-agent reinforcement learning according to claim 1, characterized in that the system further comprises: 经验回放模块,用于存储系统与环境交互的历史数据,支持离线批量学习;The experience replay module is used to store historical data of the interaction between the system and the environment and supports offline batch learning; 目标网络更新模块,用于定期从主网络复制参数到目标网络,确保学习稳定性;The target network update module is used to periodically copy parameters from the main network to the target network to ensure learning stability; 自适应学习率调整模块,用于根据训练进展动态调整网络学习率;Adaptive learning rate adjustment module, used to dynamically adjust the network learning rate according to the progress of training; 模型评估与部署模块,用于评估模型性能,将训练好的模型部署到生产环境。The model evaluation and deployment module is used to evaluate model performance and deploy the trained model to the production environment. 10.多智能体强化学习的车路协同动态调度方法,应用于权利要求1-9任一所述的多智能体强化学习的车路协同动态调度系统,其特征在于,包括以下步骤:10. A vehicle-road cooperative dynamic scheduling method based on multi-agent reinforcement learning, applied to a vehicle-road cooperative dynamic scheduling system based on multi-agent reinforcement learning according to any one of claims 1 to 9, characterized in that it comprises the following steps: 初始化系统,包括初始化VDDPG网络和RDDPG网络参数、建立通信连接、加载路网拓扑结构和历史交通数据;Initialize the system, including initializing VDDPG network and RDDPG network parameters, establishing communication connections, and loading road network topology and historical traffic data; 采集状态信息,包括通过路侧感知模块采集环境状态信息、通过车辆感知模块采集车辆信息;Collecting status information, including collecting environmental status information through a roadside perception module and collecting vehicle information through a vehicle perception module; 执行拓扑状态表征,包括将状态信息映射到多维拓扑空间、提取拓扑特征、通过非线性降维生成流形空间表示;Perform topological state representation, including mapping state information into a multidimensional topological space, extracting topological features, and generating a manifold space representation through nonlinear dimensionality reduction; 生成双网络决策,包括VDDPGActor网络生成车辆调度策略、RDDPGActor网络生成路侧调度策略、执行策略矩阵分解生成低秩表示;Generate dual network decisions, including VDDPGActor network to generate vehicle dispatch strategy, RDDPGActor network to generate roadside dispatch strategy, and execute strategy matrix decomposition to generate low-rank representation; 执行协同优化,包括构建智能体关系图、优化消息传递策略、通过变分推断方法协调局部决策、验证全局一致性;Perform collaborative optimization, including building agent relationship graphs, optimizing message passing strategies, coordinating local decisions through variational inference methods, and verifying global consistency; 执行调度方案,包括将优化后的决策转换为具体执行指令、车辆和路侧设施执行控制指令;Execute the dispatch plan, including converting the optimized decision into specific execution instructions, and executing control instructions for vehicles and roadside facilities; 收集执行反馈,包括监控执行结果、更新环境状态、基于执行结果更新网络参数和经验池;Collect execution feedback, including monitoring execution results, updating environment status, and updating network parameters and experience pool based on execution results; 迭代优化,重复执行上述步骤,不断优化调度效果。Iterate optimization and repeat the above steps to continuously optimize the scheduling effect.
CN202510521087.5A 2025-04-24 2025-04-24 Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning Active CN120048123B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510521087.5A CN120048123B (en) 2025-04-24 2025-04-24 Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510521087.5A CN120048123B (en) 2025-04-24 2025-04-24 Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning

Publications (2)

Publication Number Publication Date
CN120048123A true CN120048123A (en) 2025-05-27
CN120048123B CN120048123B (en) 2025-07-04

Family

ID=95760053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510521087.5A Active CN120048123B (en) 2025-04-24 2025-04-24 Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning

Country Status (1)

Country Link
CN (1) CN120048123B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169631A (en) * 2011-04-21 2011-08-31 福州大学 Manifold-learning-based traffic jam event cooperative detecting method
CN114757092A (en) * 2022-03-24 2022-07-15 南京大学 System and method for training multi-agent cooperative communication strategy based on teammate perception
US20230252895A1 (en) * 2020-12-11 2023-08-10 China Intelligent And Connected Vehicles (Beijing) Research Institute Co., Ltd Method, device and electronic equipment for vehicle cooperative decision-making and computer storage medium
WO2024016386A1 (en) * 2022-07-19 2024-01-25 江苏大学 Multi-agent federated reinforcement learning-based vehicle-road collaborative control system and method under complex intersection
CN117935562A (en) * 2024-03-22 2024-04-26 山东双百电子有限公司 Traffic light control method and system based on deep learning
US20240298294A1 (en) * 2023-03-02 2024-09-05 Hong Kong Applied Science And Technology Research Institute Co., Ltd. System And Method For Road Monitoring
CN119233227A (en) * 2024-12-02 2024-12-31 南京信息职业技术学院 Internet of vehicles multi-parameter monitoring method and system based on 5G multi-access edge calculation
CN119428755A (en) * 2024-10-29 2025-02-14 天津大学 A collaborative decision-making method for intelligent connected vehicles based on dual interactive perception
WO2025074369A1 (en) * 2023-10-03 2025-04-10 Telefonaktiebolaget Lm Ericsson (Publ) System and method for efficient collaborative marl training using tensor networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169631A (en) * 2011-04-21 2011-08-31 福州大学 Manifold-learning-based traffic jam event cooperative detecting method
US20230252895A1 (en) * 2020-12-11 2023-08-10 China Intelligent And Connected Vehicles (Beijing) Research Institute Co., Ltd Method, device and electronic equipment for vehicle cooperative decision-making and computer storage medium
CN114757092A (en) * 2022-03-24 2022-07-15 南京大学 System and method for training multi-agent cooperative communication strategy based on teammate perception
WO2024016386A1 (en) * 2022-07-19 2024-01-25 江苏大学 Multi-agent federated reinforcement learning-based vehicle-road collaborative control system and method under complex intersection
US20240298294A1 (en) * 2023-03-02 2024-09-05 Hong Kong Applied Science And Technology Research Institute Co., Ltd. System And Method For Road Monitoring
WO2025074369A1 (en) * 2023-10-03 2025-04-10 Telefonaktiebolaget Lm Ericsson (Publ) System and method for efficient collaborative marl training using tensor networks
CN117935562A (en) * 2024-03-22 2024-04-26 山东双百电子有限公司 Traffic light control method and system based on deep learning
CN119428755A (en) * 2024-10-29 2025-02-14 天津大学 A collaborative decision-making method for intelligent connected vehicles based on dual interactive perception
CN119233227A (en) * 2024-12-02 2024-12-31 南京信息职业技术学院 Internet of vehicles multi-parameter monitoring method and system based on 5G multi-access edge calculation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
朱英达: "基于多智能体一致性理论的分布式聚类和推断算法研究", 中国优秀硕士学位论文全文数据库 (信息科技辑), 15 January 2021 (2021-01-15), pages 140 - 124 *
林安亚: "多智能体网络中的动态一致平均算法及其应用", 中国优秀硕士学位论文全文数据库 (信息科技辑), 15 January 2017 (2017-01-15), pages 140 - 9 *
胡军;王凯;: "基于邻域粗糙集的连续值分布式数据属性约简", 重庆邮电大学学报(自然科学版), no. 06, 15 December 2017 (2017-12-15) *

Also Published As

Publication number Publication date
CN120048123B (en) 2025-07-04

Similar Documents

Publication Publication Date Title
Zhang et al. Cooperative multi-agent actor–critic control of traffic network flow based on edge computing
CN119147094B (en) Underground facility vibration monitoring method and system based on distributed sensing network
CN109547431A (en) A kind of network security situation evaluating method based on CS and improved BP
CN118859731B (en) State control method, device, equipment and storage medium of monitoring terminal
CN112598150A (en) Method for improving fire detection effect based on federal learning in intelligent power plant
CN114995119B (en) Urban traffic signal cooperative control method based on multi-agent deep reinforcement learning
CN112990485A (en) Knowledge strategy selection method and device based on reinforcement learning
CN110969295B (en) Train section delay prediction error control method
CN114360266A (en) Intersection reinforcement learning signal control method for sensing detection state of internet connected vehicle
CN118611038A (en) Coordinated control method and system of distribution network intelligent agent group based on deep reinforcement learning
CN119598403A (en) A real-time data monitoring and security assessment system and method driven by intelligent perception
DE202025102042U1 (en) Intelligent traffic management using reinforcement learning
CN117369391A (en) System and method for optimizing process parameters of end-edge cloud cooperation
CN120048123A (en) Vehicle-road collaborative dynamic scheduling system and method for multi-agent reinforcement learning
Khan et al. Communication in multi-agent reinforcement learning: a survey
CN120496339A (en) Traffic signal control method and system based on large language model intelligent agent
CN118053308B (en) A method and terminal for optimizing traffic operation status based on the city brain
CN119811108A (en) Traffic signal light intelligent control method and system
CN119292079A (en) Multi-robot collaborative control method and system based on edge computing
CN116738279A (en) A microgrid data processing method and system based on federated learning
CN115719478A (en) End-to-end automatic driving method for accelerated reinforcement learning independent of irrelevant information
CN120340272B (en) Traffic signal cooperative control method based on multi-agent reinforcement learning
CN119129641B (en) Multi-agent cooperative control method for traffic scenarios
Liang et al. State evaluation method for complex task network models
Dai et al. Parallel System-Based Predictive Control for Traffic Signals in Large-Scale Road Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant