WO2019047646A1 - 车辆避障方法和装置 - Google Patents
车辆避障方法和装置 Download PDFInfo
- Publication number
- WO2019047646A1 WO2019047646A1 PCT/CN2018/098637 CN2018098637W WO2019047646A1 WO 2019047646 A1 WO2019047646 A1 WO 2019047646A1 CN 2018098637 W CN2018098637 W CN 2018098637W WO 2019047646 A1 WO2019047646 A1 WO 2019047646A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- obstacle avoidance
- data
- historical
- vehicle
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0234—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using optical markers or beacons
- G05D1/0236—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using optical markers or beacons in combination with a laser
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0238—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
- G05D1/024—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Definitions
- the present application relates to the field of vehicle control technologies, and in particular, to the field of driving safety control technologies for vehicles, and in particular, to a vehicle obstacle avoidance method and apparatus.
- Autopilot is a technology that uses a computer to automatically and safely operate a motor vehicle without any human active operation.
- the existing automatic driving technology needs to first identify the obstacles by using the target recognition technology, and determine the obstacle avoidance strategy based on pre-defined rules or a deep learning system based on simulated learning. .
- the existing obstacle avoidance operation of the automatic driving technology can only cover the road conditions under limited conditions, and has no reasoning ability. It is difficult to cope with various complicated scenes in the actual road, and the success rate of obstacle avoidance needs to be improved.
- embodiments of the present application provide a vehicle obstacle avoidance method and apparatus.
- an embodiment of the present application provides a vehicle obstacle avoidance method, including: acquiring travel data collected by an onboard sensor of a vehicle, where the travel data includes obstacle information of a travel path and sensor data of the vehicle;
- the obstacle strategy model determines the obstacle avoidance control instruction.
- the obstacle avoidance strategy model is based on the historical obstacle avoidance record and is trained in an end-to-end manner using the deep enhanced learning algorithm; the obstacle avoidance control command is sent to the corresponding control system for manipulation.
- the system performs the corresponding obstacle avoidance operation.
- the method further includes: stepping up the step of obtaining an obstacle avoidance strategy model in an end-to-end manner based on historical obstacle avoidance records, wherein the historical obstacle avoidance record includes historical obstacle avoidance results, and The historical driving data and the historical manipulation data corresponding to the historical obstacle avoidance result; the above steps based on the historical obstacle avoidance record and using the deep enhanced learning algorithm to train the obstacle avoidance strategy model include: obtaining the historical evaluation index of each historical obstacle avoidance result; based on history Driving data, historical evaluation index and historical manipulation data are used for deep enhancement learning to obtain an obstacle avoidance strategy model that optimizes obstacle avoidance results.
- the above-described depth-enhanced learning based on historical driving data, historical evaluation index, and historical manipulation data is used to derive policy parameters of an obstacle avoidance strategy model that optimizes obstacle avoidance results, including: obstacle avoidance for each history Recording, historical driving data as state data, historical manipulation data as operational data, historical evaluation index as value data; using historical obstacle avoidance records to construct learning trajectories, the learning trajectory includes multiple trajectories corresponding to each historical obstacle avoidance record Point, each track point includes corresponding state data, operation data, and value data; calculating a gradient of the desired total value data with respect to the policy parameter based on the learning trajectory and the total value data, wherein the policy parameter is used to map the state data to the operational data Based on the expectation of the total value data, the policy parameters are adjusted with respect to the gradient of the policy parameters to generate an obstacle avoidance strategy model.
- the foregoing obstacle-based control command is determined by using the obstacle avoidance strategy model based on the driving data, including: inputting the driving data collected by the on-board sensor as the current state data into the obstacle avoidance strategy model; and based on the adjusted policy parameter, the current The status data is mapped to the current operational data as an obstacle avoidance manipulation command.
- the method further includes: obtaining an evaluation index of the current obstacle avoidance result of the vehicle, and using the evaluation index of the current obstacle avoidance result as the current value data; Current state data, current operational data, and current value data update learning trajectory, total value data, and total value data expectations; calculation of updated total value data based on updated learning trajectory and sum of value data The gradient of the parameters; the policy parameters are adjusted based on the gradient of the policy parameters based on the expected total value data.
- the embodiment of the present application provides a vehicle obstacle avoidance device, including: an acquisition unit configured to acquire travel data collected by an onboard sensor of the vehicle, where the travel data includes obstacle information of the travel path and sensor data of the vehicle;
- the determining unit is configured to determine the obstacle avoidance manipulation instruction by using the obstacle avoidance strategy model based on the driving data, and the obstacle avoidance strategy model is based on the historical obstacle avoidance record and is trained in an end-to-end manner by using the deep enhancement learning algorithm;
- the configuration is used to send an obstacle avoidance control command to the corresponding control system for the control system to perform the corresponding obstacle avoidance operation.
- the apparatus further includes: a training unit configured to train an obstacle avoidance strategy model in an end-to-end manner based on historical obstacle avoidance records, wherein the historical obstacle avoidance record includes history avoidance The obstacle result, the historical driving data and the historical manipulation data corresponding to the historical obstacle avoidance result; the training unit is configured to train the obstacle avoidance strategy model according to the following manner: obtaining the historical evaluation index of each historical obstacle avoidance result; based on the historical driving data, The historical evaluation index and historical manipulation data are used for deep enhancement learning to derive an obstacle avoidance strategy model that optimizes the obstacle avoidance results.
- the training unit is further configured to perform depth enhancement learning as follows to derive a strategy parameter of an obstacle avoidance strategy model that optimizes obstacle avoidance results: for each historical obstacle avoidance record, historical travel is performed The data is used as the state data, the historical manipulation data is used as the operation data, and the historical evaluation index is used as the value data; the historical obstacle avoidance record is used to construct the learning trajectory, and the learning trajectory includes a plurality of trajectory points corresponding to each historical obstacle avoidance record, each trajectory The point includes corresponding state data, operation data, and value data; calculating a gradient of the desired total value data with respect to the policy parameter based on the learning trajectory and the total value data, wherein the policy parameter is used to map the state data to the operational data; based on the total value data The expectation adjusts the policy parameters with respect to the gradient of the policy parameters to generate an obstacle avoidance strategy model.
- the determining unit is further configured to determine an obstacle avoidance manipulation instruction according to the following: input the driving data collected by the onboard sensor as current state data into the obstacle avoidance strategy model; and adjust the current state based on the adjusted policy parameter.
- the data is mapped to the current operational data as an obstacle avoidance manipulation command.
- the apparatus further includes an update unit configured to: after transmitting the obstacle avoidance manipulation command to the corresponding manipulation system, acquire an evaluation index of the current obstacle avoidance result of the vehicle, and use an evaluation index of the current obstacle avoidance result As the current value data; updating the learning trajectory, the total value data, and the total value data based on the current state data, the current operational data, and the current value data; calculating the updated total based on the updated learning trajectory and the sum of the value data
- the expectation of the value data is about the gradient of the policy parameters; the policy parameters are adjusted based on the gradient of the policy parameters based on the expectations of the updated total value data.
- an embodiment of the present application provides an apparatus, including: one or more processors; a storage device, configured to store one or more programs, when one or more programs are executed by one or more processors, One or more processors are caused to implement the above-described vehicle obstacle avoidance method.
- the vehicle obstacle avoidance method and device obtained by the onboard sensor of the vehicle, and the travel data includes obstacle information of the travel path and sensor data of the vehicle; and then determines the obstacle avoidance strategy model based on the travel data.
- the obstacle avoidance strategy model is based on the historical obstacle avoidance record, and is trained in an end-to-end manner using a deep-enhanced learning algorithm; then the obstacle-avoidance control command is sent to the corresponding control system for the control system to perform the corresponding
- the obstacle avoidance operation enables the automatic obstacle avoidance of the vehicle to ensure the safe driving of the vehicle in the unmanned scene; and the obstacle avoidance model trained by the depth-enhanced learning method can cope with complicated road scenes and improve obstacle avoidance Success rate.
- FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
- FIG. 2 is a flow chart of one embodiment of a vehicle obstacle avoidance method according to the present application.
- FIG. 3 is a flow chart of another embodiment of a vehicle obstacle avoidance method according to the present application.
- FIG. 4 is a schematic flowchart of a specific implementation manner of a method for training the obstacle avoidance strategy model by using a depth enhancement learning algorithm according to the present application;
- FIG. 5 is a schematic structural view of an embodiment of a vehicle obstacle avoidance device of the present application.
- FIG. 6 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server of an embodiment of the present application.
- FIG. 1 illustrates an exemplary system architecture 100 of an embodiment of a vehicle obstacle avoidance method or vehicle obstacle avoidance device to which the present application may be applied.
- the system architecture 100 can include a vehicle 101, an onboard sensor 102 and an onboard control unit 103 mounted on the vehicle 101, and a server 104.
- the vehicle 101 can be an unmanned vehicle
- the onboard control unit 103 can be connected to the server 104 via a network, which can include various types of connections, such as wired, wireless communication links, fiber optic cables, and the like.
- the onboard sensor 102 can collect road data and vehicle state data during vehicle travel.
- the in-vehicle sensor 102 may include an in-vehicle camera, a lidar sensor, a millimeter wave radar sensor, a collision sensor, a speed sensor, an air pressure sensor, and the like.
- the in-vehicle control unit 103 may be an ECU (Electronic Control Unit) or may be an on-board computer for analyzing and controlling the working state of each component of the vehicle.
- the onboard control unit 103 can acquire data collected by the onboard sensor 102, process and respond to the data, and can also control storage and transmission of data collected by the onboard sensor.
- the server 104 can establish a connection with the in-vehicle control unit 103 via a network, and the in-vehicle control unit 103 can transmit sensor data to the server 104.
- the server 104 can analyze the sensor data and feed back the analysis result to the in-vehicle control unit 103.
- the in-vehicle control unit 103 can respond according to the received analysis result.
- the vehicle 101 encounters an obstacle during driving, and the in-vehicle control unit 103 can acquire information of an obstacle through the on-board sensor 102, for example, using an in-vehicle camera to collect an image of an obstacle, and detecting an obstacle using an infrared detector or an ultrasonic detector. Information such as distance, shape, etc., and then the in-vehicle control unit 103 can perform analysis processing based on the obstacle information, and finally make a manipulation decision, or the in-vehicle control unit 103 can transmit the obstacle information to the server 104 for analysis processing and execute the manipulation decision by the server. 104 passes the steering decision back to the in-vehicle control unit 103.
- the in-vehicle control unit 103 controls the corresponding components of the vehicle 101 in accordance with the steering decision.
- the vehicle obstacle avoidance method provided by the embodiment of the present application may be performed by the in-vehicle control unit 103 or the server 104. Accordingly, the vehicle obstacle avoidance device may be disposed in the in-vehicle control unit 103 or the server 104.
- the vehicle obstacle avoidance method includes the following steps:
- Step 201 Acquire driving data collected by an onboard sensor of the vehicle.
- the electronic device on which the vehicle obstacle avoidance method runs may obtain the travel data collected by the onboard sensor through a wired or wireless connection with the onboard sensor, or may be through a storage device for storing the onboard sensor data. Connect and issue a data acquisition request to obtain driving data collected by the onboard sensor.
- the travel data includes obstacle information of the travel route and sensor data of the vehicle.
- the obstacle information may be related information of pedestrians, vehicles, roadblock markers, and the like on the road
- the vehicle sensor data may include data of each sensor of the current vehicle, and may include but is not limited to pressure sensor data, image sensor data, and lidar sensor data. , ABS (antilock brake system) sensor data, speed sensor data, position and speed sensor data.
- the vehicle sensor data herein may include status data of the vehicle acquired by the sensor and data on the road on which the vehicle is located that is perceptible by the vehicle sensor.
- the vehicle in this embodiment may be an unmanned vehicle, or a vehicle having an automatic driving function.
- the vehicle sensor can continuously collect corresponding data, and the electronic device can directly connect with each vehicle sensor, acquire data collected by the vehicle sensor in real time, or acquire the vehicle sensor through the vehicle control unit. Sensor data transmitted to the onboard control unit.
- Step 202 Determine an obstacle avoidance manipulation instruction by using an obstacle avoidance strategy model based on the driving data.
- the obstacle avoidance strategy model is used to map the input travel data to the obstacle avoidance manipulation command.
- the obstacle avoidance strategy model is based on historical obstacle avoidance records and is trained in an end-to-end manner using a deep-enhanced learning algorithm.
- the historical obstacle avoidance record may be a related record of multiple obstacle avoidance processes that have occurred during the running of the vehicle.
- the driving data may be input into the trained obstacle avoidance strategy model, and the obstacle avoidance control model is used to determine the corresponding obstacle avoidance manipulation command.
- the obstacle avoidance strategy model can determine the manipulation component to be called and the manipulation instruction to be sent to the manipulation component to be called based on the obstacle information and the sensor data of the vehicle.
- the above obstacle avoidance strategy model can be trained in an end-to-end manner based on DRL (Deep Reinforcement Learning) technology.
- the end-to-end manner refers to the manner in which the input data of the directly acquired image, audio, and the like are mapped to the desired output result.
- the end-to-end refers to the manner in which the input driving data is mapped to the manipulation instruction.
- the end-to-end method combines data cleaning, feature extraction, codec and other processes, and the deep neural network model establishes the mapping relationship between input data and final target output data.
- the end-to-end training method is to obtain the obstacle avoidance strategy model.
- the historical driving data in the historical obstacle avoidance record is used as the input of the model, and the corresponding obstacle avoidance manipulation command is used as the output of the model to continuously learn the process of obstacle avoidance control strategy.
- the obstacle avoidance strategy model is trained in an end-to-end manner, and during the training process, only the driving data collected by the original on-board sensor and the flag value of the obstacle avoidance result obtained by the obstacle avoidance control are provided. That is, the obstacle avoidance strategy model can optimize model parameters through deep enhancement learning.
- the DRL combines deep learning with reinforcement learning algorithms to make the model derived from training both cognitive and decision-making.
- the depth enhancement learning algorithm may include depth function learning based on value function, depth enhancement learning based on policy gradient, and deep enhancement learning based on search and supervision.
- deep learning based on value function combines convolutional neural network with Q learning algorithm in reinforcement learning, using DQN (Deep Q Network) model, input data through multilayer neural network and connection layer
- the Q value of each action is generated after the nonlinear transformation.
- DQN Deep Q Network
- Step 203 Send the obstacle avoidance manipulation command to the corresponding control system, so that the control system performs the corresponding obstacle avoidance operation.
- the obstacle avoidance manipulation command can be sent to the corresponding control system.
- the steering system may include a brake system, an EPS (Electric Power Steering) system, an ESP (Electronic Stability Program), and the like.
- the control system After receiving the obstacle avoidance control command, the control system performs corresponding deceleration, steering, etc., so that the vehicle avoids obstacles.
- the obstacle avoidance manipulation command may include a brake command, a steering command, an alarm command, and the like
- the brake command may be sent to the brake system
- the steering command is sent to the EPS system
- the alarm command is sent to the alarm light and the alarm horn.
- each control system of the vehicle is connected to a CAN (Controller Area Network) bus, and the obstacle avoidance operation commands can be transmitted to the corresponding control system through the CAN bus.
- CAN Controller Area Network
- the vehicle obstacle avoidance method of the above embodiment of the present application obtains travel data collected by the onboard sensor of the vehicle, the travel data includes obstacle information of the travel path and sensor data of the vehicle; and then, based on the travel data, the obstacle avoidance strategy model is used to determine the obstacle avoidance Manipulating commands, the obstacle avoidance strategy model is based on historical obstacle avoidance records, and is trained in an end-to-end manner using a deep-enhanced learning algorithm; then the obstacle-avoidance control commands are sent to the corresponding control system for the control system to perform corresponding avoidance
- the obstacle operation enables the automatic obstacle avoidance of the vehicle to ensure the safe driving of the vehicle in the unmanned scene; and the obstacle avoidance model trained by the depth-enhanced learning method can cope with complicated road scenes and can improve the obstacle avoidance success rate. .
- FIG. 3 shows a flow chart of another embodiment of a vehicle obstacle avoidance method according to the present application.
- the process 300 of the vehicle obstacle avoidance method of this embodiment may include the following steps:
- Step 301 Based on the historical obstacle avoidance record and using the depth enhancement learning algorithm, the obstacle avoidance strategy model is trained in an end-to-end manner.
- the historical obstacle avoidance record may include historical obstacle avoidance results, historical travel data corresponding to historical obstacle avoidance results, and historical manipulation data.
- the historical obstacle avoidance result may be the result of the obstacle avoidance operation in the historical travel of the vehicle, and may represent the state of the vehicle after the obstacle avoidance operation, and may include, for example, “perfect avoidance”, “reluctance to avoid”, “obstacle avoidance failure”. Wait.
- the historical travel data may include obstacle information of the historical travel route and historical sensor data of the vehicle.
- the historical manipulation data may include a historical obstacle avoidance manipulation command issued by the above-mentioned electronic device (such as an in-vehicle control unit) when the vehicle is obstacle-avoided.
- the obstacle avoidance strategy model is used to map the input travel data to the obstacle avoidance manipulation command.
- the above electronic device can be trained to obtain an obstacle avoidance strategy model according to the following method: obtaining a historical evaluation index of historical obstacle avoidance results, performing deep enhancement learning based on historical driving data, historical evaluation index, and historical manipulation data to obtain obstacle avoidance The optimal obstacle avoidance strategy model.
- the historical obstacle avoidance result of each historical obstacle avoidance record can be evaluated to obtain an evaluation index.
- an evaluation index For example, a professional driver can score the historical obstacle avoidance result, and the better the obstacle avoidance result, the higher the score, the score will be scored.
- the historical evaluation index corresponding to the historical obstacle avoidance result is input into the model for deep enhancement learning.
- the above end-to-end method refers to the historical driving data as the input of the model and the historical manipulation data as the output of the model during the learning process, and the “end-to-end” learning mode of the data directly collected by the vehicle to the manipulation command during the learning process.
- the Carter's historical evaluation index can continuously adjust the model parameters in the learning process to make the obstacle avoidance result close to the higher evaluation index.
- the model can more easily learn the logic of the obstacle avoidance strategy with better obstacle avoidance results, accelerate the learning speed, and improve the effect of the obstacle avoidance strategy model.
- step 301 can be performed as follows: sampling from a historical obstacle avoidance record to construct a sample data set, the sample data set including a plurality of sample data, each sample data and a history Corresponding to the obstacle avoidance record, the sample data includes corresponding historical driving data, historical evaluation index and historical manipulation data; based on the sample data set, the constructed deep enhanced learning model is trained to obtain an obstacle avoidance strategy model.
- the sample data set may be constructed by, for example, random sampling from the historical obstacle avoidance record, and then the DQN model is constructed, and the DQN model is trained by using the sample data set, and each DQN model is adjusted according to the evaluation index corresponding to each sample data.
- the parameters of the convolutional layer and the connection layer are the parameters that make the evaluation index the highest, and the obstacle avoidance strategy model can be generated.
- Step 302 Acquire driving data collected by an onboard sensor of the vehicle.
- the electronic device on which the vehicle obstacle avoidance method runs may obtain the travel data collected by the onboard sensor through a wired or wireless connection with the onboard sensor, or may be through a storage device for storing the onboard sensor data. Connect and issue a data acquisition request to obtain driving data collected by the onboard sensor.
- the travel data includes obstacle information of the travel route and sensor data of the vehicle.
- the obstacle information may be related information of pedestrians, vehicles, roadblock markers, etc. on the road
- the vehicle sensor data may include data of various sensors of the current vehicle, including pressure sensor data, image sensor data, lidar sensor data, ABS (antilock) Brake system, brake anti-lock system) sensor data, speed sensor data, position and speed sensor data.
- the vehicle sensor data herein may include status data of the vehicle acquired by the sensor and data on the road on which the vehicle is located that is perceptible by the vehicle sensor.
- Step 303 Determine an obstacle avoidance manipulation instruction by using an obstacle avoidance strategy model based on the travel data.
- the obstacle avoidance strategy model trained in the above step 301 can be used to process the acquired driving process, and the obstacle avoidance manipulation instruction is obtained.
- the obstacle avoidance manipulation command may be an instruction for performing an obstacle avoidance operation for controlling a steering unit of the vehicle.
- step 304 the obstacle avoidance manipulation command is sent to the corresponding control system for the control system to perform the corresponding obstacle avoidance operation.
- the obstacle avoidance manipulation command can be sent to the corresponding control system.
- the obstacle avoidance manipulation command may include a brake command, a steering command, an alarm command, and the like
- the control system may include a brake system, an EPS (Electric Power Steering) system, an ESP (Electronic Stability Program), and the like.
- the control system After receiving the obstacle avoidance control command, the control system performs corresponding deceleration, steering, etc., so that the vehicle avoids obstacles.
- Step 302, step 303, and step 304 in the foregoing method are the same as step 201, step 202, and step 203 in the foregoing embodiment, and the descriptions of step 201, step 202, and step 203 are also applicable to the present embodiment. Step 302, step 303, and step 304 are not described herein again.
- the embodiment shown in FIG. 3 adds a step of training the obstacle avoidance strategy model in an end-to-end manner based on the historical obstacle avoidance record and using the depth enhancement learning algorithm.
- the vehicle obstacle avoidance method can quantify the historical obstacle avoidance result as an evaluation index, so that the electronic device as the training entity can continuously learn the internal obstacle avoidance logic by using the evaluation index as the model effect index, thereby realizing the obstacle avoidance effect. Further optimization.
- the step of performing the depth enhancement learning based on the historical driving data, the historical evaluation index, and the historical manipulation data to obtain an obstacle avoidance strategy model that optimizes the obstacle avoidance result may include: For each historical obstacle avoidance record, the historical driving data is used as the state data, the historical manipulation data is used as the operation data, and the historical evaluation index is used as the value data; the historical obstacle avoidance record is used to construct the learning trajectory, and the learning trajectory includes one with each historical obstacle avoidance record.
- each track point includes corresponding state data, operation data, and value data; calculating a gradient of the desired value parameter data based on the sum of the learning trajectories and the respective value data, wherein the policy parameter is used
- the state data is mapped to the operational data; the policy parameters are adjusted based on the gradient of the total value data with respect to the policy parameters to generate an obstacle avoidance strategy model.
- FIG. 4 is a schematic flowchart of a specific implementation manner of a method for training the obstacle avoidance strategy model by using a depth enhanced learning algorithm according to the present application.
- step 401 a learning trajectory of the depth enhancement learning algorithm is constructed based on the historical obstacle avoidance record.
- the learning track includes a plurality of track points, each of which corresponds to a historical obstacle avoidance record.
- Each track point includes state data, operational data, and value data for the corresponding historical obstacle avoidance record.
- the state data s i of the i-th historical obstacle avoidance record is historical driving data
- the operation data a i is historical manipulation data
- the value data r i is a historical evaluation index
- i is an integer and 0 ⁇ i ⁇ T -1
- T is the number of historical obstacle avoidance records.
- the constructed learning trajectory ⁇ (s 0 , a 0 , r 0 , s 1 , a 1 , r 1 , ..., s T-1 , a T-1 , r T-1 ).
- step 402 a gradient of the desired value data for the total value data is calculated based on the learning trajectory and the total value data.
- the total value data R can be calculated from each value data, for example:
- the expectation of the total value is E[R
- s t ; ⁇ ) can also be expressed as ⁇ ⁇ (a t
- step 403 the policy parameters are adjusted based on the gradient of the total value data with respect to the policy parameters.
- the specific adjustment strategy is to adjust ⁇ to ⁇ + ⁇ g, where ⁇ is the parameter of the control strategy parameter update rate.
- the determining the obstacle avoidance operation instruction by using the obstacle avoidance strategy model based on the driving data may include: inputting the travel data collected by the onboard sensor as the current state data into the obstacle avoidance strategy model; and based on the adjusted policy parameter
- the current state data is mapped to the current operational data as an obstacle avoidance manipulation command.
- the travel data collected by the onboard sensor can be used as the state data s T of the current time T, and the state data s T is mapped to the current state by using the strategy parameter ⁇ in the model.
- the operation data a T , the operation data a T is the obstacle avoidance operation instruction.
- the obstacle avoidance method of the vehicle further includes:
- Step 404 Obtain an evaluation index of the current obstacle avoidance result of the vehicle, and then use the evaluation index of the current obstacle avoidance result as the current value data r T to obtain the current state data s T , the current operation data a T , and the current value data.
- the track point of r T is updated in step 405 by the track point to update the total value data R and the expected value E[R
- ⁇ ⁇ ] of the updated total value data to the strategy parameter ⁇ can be calculated, and the strategy parameter ⁇ is adjusted according to the learning rate ⁇ , and the current obstacle avoidance operation learning is obtained. After the obstacle avoidance strategy model.
- the obstacle avoidance strategy model training method of the present embodiment constructs a learning trajectory based on state data, operation data, and value data by using an enhanced learning algorithm, and then learns each trajectory point according to the learning trajectory.
- Optimal obstacle avoidance strategy parameters to generate an obstacle avoidance strategy model.
- the obstacle avoidance strategy model can be updated according to the obstacle avoidance result, and the obstacle avoidance strategy model is further optimized.
- the present application provides an embodiment of a vehicle obstacle avoidance device, the device embodiment corresponding to the method embodiment shown in FIG. 2, the device specific Can be applied to a variety of electronic devices.
- the vehicle obstacle avoidance device 500 of the present embodiment includes an acquisition unit 501 determination unit 502 and a transmission unit 503.
- the obtaining unit 501 may be configured to acquire driving data collected by an onboard sensor of the vehicle, where the driving data includes obstacle information of the driving path and sensor data of the vehicle;
- the determining unit 502 may be configured to use the obstacle avoidance strategy model based on the driving data. Determining the obstacle avoidance control instruction, the obstacle avoidance strategy model is based on the historical obstacle avoidance record, and is trained in an end-to-end manner by using the deep enhancement learning algorithm;
- the sending unit 503 can be configured to send the obstacle avoidance manipulation instruction to the corresponding control system. For the control system to perform the corresponding obstacle avoidance operation.
- the acquiring unit 501 can acquire the driving data collected by the in-vehicle sensor through a wired or wireless connection with the on-vehicle sensor, and can also acquire the driving data collected by the in-vehicle sensor through a connection with a storage device for storing the in-vehicle sensor data.
- the travel data includes obstacle information of the travel route and sensor data of the vehicle.
- the determining unit 502 can determine the obstacle avoidance manipulation instruction by using an obstacle avoidance strategy model trained in an end-to-end manner based on the DRL technique.
- the obstacle avoidance strategy model can win the driving data and know the corresponding obstacle avoidance control command, which can be trained by using historical driving data and historical manipulation data.
- the determining unit 502 can input the driving data acquired by the obtaining unit 501 into the obstacle avoidance strategy model, and then obtain a corresponding obstacle avoidance manipulation instruction.
- the sending unit 503 can send the obstacle avoidance control command determined by the determining unit 502 to the corresponding control system, and the control system can include a braking system, an EPS (Electric Power Steering) system, and an ESP (Electronic Stability Program). System) and so on.
- the control system After receiving the obstacle avoidance control command, the control system performs corresponding deceleration, steering, etc., so that the vehicle avoids obstacles.
- the apparatus 500 may further include: a training unit configured to perform an obstacle avoidance strategy model in an end-to-end manner based on the historical obstacle avoidance record and using a depth enhanced learning algorithm, wherein the historical obstacle avoidance record includes Historical obstacle avoidance results, historical driving data and historical manipulation data corresponding to historical obstacle avoidance results.
- the training unit is configured to train the obstacle avoidance strategy model as follows: obtain a historical evaluation index of each historical obstacle avoidance result; perform deep reinforcement learning based on historical driving data, historical evaluation index, and historical manipulation data to obtain obstacle avoidance The optimal obstacle avoidance strategy model.
- the training unit is further configured to perform depth enhancement learning as follows to derive a policy parameter of an obstacle avoidance strategy model that optimizes obstacle avoidance results: for each historical obstacle avoidance record, history is The driving data is used as the status data, the historical manipulation data is used as the operation data, and the historical evaluation index is used as the value data; the historical obstacle avoidance record is used to construct the learning trajectory, and the learning trajectory includes a plurality of trajectory points corresponding to each historical obstacle avoidance record, each of which The trajectory points include corresponding state data, operation data, and value data; a gradient of the expected total value data is calculated based on the learning trajectory and the total value data, wherein the policy parameter is used to map the state data to the operational data; based on the total value The expectation of the data adjusts the policy parameters with respect to the gradient of the policy parameters to generate an obstacle avoidance strategy model.
- the determining unit is further configured to determine an obstacle avoidance manipulation command according to the following manner: input the driving data collected by the onboard sensor as current state data into the obstacle avoidance strategy model; and based on the adjusted policy parameter, the current The status data is mapped to the current operational data as an obstacle avoidance manipulation command.
- the apparatus further includes an updating unit configured to: after transmitting the obstacle avoidance manipulation command to the corresponding steering system, acquire an evaluation index of the current obstacle avoidance result of the vehicle, and evaluate the current obstacle avoidance result.
- the index is used as the current value data; the learning trajectory, the total value data, and the total value data are updated based on the current state data, the current operational data, and the current value data; and the updated learning trajectory and the sum of the value data are calculated based on the updated
- the expectation of the total value data is about the gradient of the policy parameters; the policy parameters are adjusted based on the gradient of the policy parameters based on the expected total value data.
- the vehicle obstacle avoidance device 500 of the above embodiment of the present application acquires the travel data collected by the onboard sensor of the vehicle through the acquisition unit, and then utilizes the determination unit to utilize the historical obstacle avoidance record based on the historical obstacle avoidance record and adopts the depth enhancement learning algorithm in an end-to-end manner.
- the obstacle avoidance strategy model obtained by the training determines the obstacle avoidance manipulation instruction, and then sends the obstacle avoidance manipulation instruction to the corresponding control system by using the sending unit, so that the control system performs the corresponding obstacle avoidance operation, thereby realizing the automatic obstacle avoidance of the vehicle, and It can cope with complicated road scenes, improve the success rate of obstacle avoidance, and improve the safety of vehicles.
- apparatus 500 may correspond to various steps in the methods described with reference to Figures 2, 3, and 4. Thus, the operations and features described above for the method are equally applicable to the apparatus 500 and the units contained therein, and are not described herein again.
- FIG. 6 a block diagram of a computer system 600 suitable for use in implementing a terminal device or server of an embodiment of the present application is shown.
- the terminal device or server shown in FIG. 6 is merely an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present application.
- computer system 600 includes a central processing unit (CPU) 601 that can be loaded into a program in random access memory (RAM) 603 according to a program stored in read only memory (ROM) 602 or from storage portion 608. And perform various appropriate actions and processes.
- RAM random access memory
- ROM read only memory
- RAM random access memory
- various programs and data required for the operation of the system 600 are also stored.
- the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
- An input/output (I/O) interface 605 is also coupled to bus 604.
- the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
- Driver 610 is also coupled to I/O interface 605 as needed.
- a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
- an embodiment of the present disclosure includes a computer program product comprising a computer program carried on a computer readable medium, the computer program comprising program code for executing the method illustrated by the flowchart.
- the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
- the central processing unit (CPU) 601 the above-described functions defined in the method of the present application are performed.
- the computer readable medium described herein may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
- the computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain or store a program, which can be used by or in connection with an instruction execution system, apparatus or device.
- a computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying computer readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
- the computer readable signal medium can also be any computer readable medium other than a computer readable storage medium, which can transmit, propagate, or transport a program for use by or in connection with the instruction execution system, apparatus, or device.
- Program code embodied on a computer readable medium can be transmitted by any suitable medium, including but not limited to wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
- each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the logic functions for implementing the specified.
- Executable instructions can also occur in a different order than that illustrated in the drawings. For example, two blocks shown in succession may in fact be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments of the present application may be implemented by software or by hardware.
- the described unit may also be provided in the processor, for example, as a processor including an acquisition unit, a determination unit, and a transmission unit.
- the name of these units does not constitute a limitation on the unit itself under certain circumstances.
- the acquisition unit can also be described as “a unit that acquires travel data collected by the onboard sensor of the vehicle”.
- the present application also provides a computer readable medium, which may be included in the apparatus described in the above embodiments, or may be separately present and not incorporated into the apparatus.
- the computer readable medium carries one or more programs, when the one or more programs are executed by the device, causing the device to: obtain travel data collected by an onboard sensor of the vehicle, the travel data including an obstacle of a travel path Information and sensor data of the vehicle; determining an obstacle avoidance manipulation command based on the driving data using an obstacle avoidance strategy model, the obstacle avoidance strategy model is based on a historical obstacle avoidance record, and training in an end-to-end manner using a depth enhanced learning algorithm Obtaining; the obstacle avoidance manipulation instruction is sent to a corresponding control system for the control system to perform a corresponding obstacle avoidance operation.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Optics & Photonics (AREA)
- Electromagnetism (AREA)
- Traffic Control Systems (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
Abstract
一种车辆避障方法和装置,包括:获取车辆(101)的车载传感器(102)采集的行驶数据(201,302),其中行驶数据包括行驶路径的障碍物信息和车辆(101)的传感器数据;基于行驶数据,利用避障策略模型确定避障操控指令(202,303),其中避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的(301);将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作(203,304)。由此可以提升车辆(101)避障的成功率。
Description
本专利申请要求于2017年9月5日提交的、申请号为201710790602.5、申请人为百度在线网络技术(北京)有限公司、发明名称为“车辆避障方法和装置”的中国专利申请的优先权,该申请的全文以引用的方式并入本申请中。
本申请涉及车辆控制技术领域,具体涉及车辆的行驶安全控制技术领域,尤其涉及车辆避障方法和装置。
随着深度学习技术的迅速发展、人工智能领域的深入研究,给汽车工业带来革命性的变化。
自动驾驶,是在没有任何人类主动的操作下,利用电脑自动安全地操作机动车辆的技术。在车辆行驶过程中碰到障碍物时,现有的自动驾驶技术需要首先需要利用目标识别技术对障碍物进行识别,并基于预先定义的规则或基于模仿学习得出的深度学习系统确定避障策略。然而现有的自动驾驶技术的避障操作只能覆盖有限情况下的道路状况,并不具有推理能力,很难在实际道路中应对各种复杂的场景,避障成功率有待提升。
发明内容
为了解决上述背景技术部分提到的一个或多个技术问题,本申请实施例提供了车辆避障方法和装置。
第一方面,本申请实施例提供了一种车辆避障方法,包括:获取车辆的车载传感器采集的行驶数据,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据;基于行驶数据,利用避障策略模型确定避 障操控指令,避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作。
在一些实施例中,上述方法还包括:基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出避障策略模型的步骤,其中,历史避障记录包括历史避障结果、与历史避障结果对应的历史行驶数据和历史操控数据;上述基于历史避障记录、采用深度增强学习算法训练得出避障策略模型的步骤包括:获取各历史避障结果的历史评估指数;基于历史行驶数据、历史评估指数和历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型。
在一些实施例中,上述基于历史行驶数据、历史评估指数和历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型的策略参数,包括:对每条历史避障记录,将历史行驶数据作为状态数据,历史操控数据作为操作数据,历史评估指数作为价值数据;利用历史避障记录构建学习轨迹,学习轨迹包括与各条历史避障记录一一对应的多个轨迹点,每个轨迹点包括对应的状态数据、操作数据以及价值数据;基于学习轨迹以及总价值数据计算总价值数据的期望关于策略参数的梯度,其中,策略参数用于将状态数据映射至操作数据;基于总价值数据的期望关于策略参数的梯度对策略参数进行调整,生成避障策略模型。
在一些实施例中,上述基于行驶数据,利用避障策略模型确定避障操控指令,包括:将车载传感器采集的行驶数据作为当前的状态数据输入避障策略模型;基于调整后的策略参数将当前的状态数据映射至当前的操作数据,作为避障操控指令。
在一些实施例中,在将避障操控指令发送至对应的操控系统之后,上述方法还包括:获取车辆的当前避障结果的评估指数,将当前避障结果的评价指数作为当前价值数据;基于当前的状态数据、当前的操作数据以及当前价值数据更新学习轨迹、总价值数据以及总价值数据的期望;基于更新后的学习轨迹以及各价值数据之和计算更新后的总价值数据的期望关于策略参数的梯度;基于更新后的总价值数据的期 望关于策略参数的梯度对策略参数进行调整。
第二方面,本申请实施例提供了一种车辆避障装置,包括:获取单元,配置用于获取车辆的车载传感器采集的行驶数据,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据;确定单元,配置用于基于行驶数据,利用避障策略模型确定避障操控指令,避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;发送单元,配置用于将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作。
在一些实施例中,上述装置还包括:训练单元,配置用于基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出避障策略模型,其中,历史避障记录包括历史避障结果、与历史避障结果对应的历史行驶数据和历史操控数据;训练单元配置用于按照如下方式训练得出避障策略模型:获取各历史避障结果的历史评估指数;基于历史行驶数据、历史评估指数和历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型。
在一些实施例中,上述训练单元进一步配置用于按照如下方式进行深度增强学习,以得出使避障结果最优的避障策略模型的策略参数:对每条历史避障记录,将历史行驶数据作为状态数据,历史操控数据作为操作数据,历史评估指数作为价值数据;利用历史避障记录构建学习轨迹,学习轨迹包括与各条历史避障记录一一对应的多个轨迹点,每个轨迹点包括对应的状态数据、操作数据以及价值数据;基于学习轨迹以及总价值数据计算总价值数据的期望关于策略参数的梯度,其中,策略参数用于将状态数据映射至操作数据;基于总价值数据的期望关于策略参数的梯度对策略参数进行调整,生成避障策略模型。
在一些实施例中,上述确定单元进一步配置用于按照如下方式确定避障操控指令:将车载传感器采集的行驶数据作为当前的状态数据输入避障策略模型;基于调整后的策略参数将当前的状态数据映射至当前的操作数据,作为避障操控指令。
在一些实施例中,上述装置还包括更新单元,配置用于:在将避障操控指令发送至对应的操控系统之后,获取车辆的当前避障结果的 评估指数,将当前避障结果的评价指数作为当前价值数据;基于当前的状态数据、当前的操作数据以及当前价值数据更新学习轨迹、总价值数据以及总价值数据的期望;基于更新后的学习轨迹以及各价值数据之和计算更新后的总价值数据的期望关于策略参数的梯度;基于更新后的总价值数据的期望关于策略参数的梯度对策略参数进行调整。
第三方面,本申请实施例提供了一种设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现上述的车辆避障方法。
本申请实施例提供的车辆避障方法和装置,通过获取车辆的车载传感器采集的行驶数据,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据;而后基于行驶数据,利用避障策略模型确定避障操控指令,避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;之后将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作,能够实现车辆的自动避障,保证车辆在无人驾驶场景中的安全行驶;并且利用通过基于深度增强学习方法训练的避障模型,可以应对复杂的道路场景,能够提升避障成功率。
通过阅读参照以下附图所作的对非限制性实施例详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本申请可以应用于其中的示例性系统架构图;
图2是根据本申请的车辆避障方法的一个实施例的流程图;
图3是根据本申请的车辆避障方法的另一个实施例的流程图;
图4是根据本申请的采用深度增强学习算法训练得出所述避障策略模型的方法的一种具体实现方式的示意性流程图;
图5是本申请的车辆避障装置的一个实施例的结构示意图;
图6是适于用来实现本申请实施例的终端设备或服务器的计算机系统的结构示意图。
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
图1示出了可以应用本申请的车辆避障方法或车辆避障装置的实施例的示例性系统架构100。
如图1所示,系统架构100可以包括车辆101,安装在车辆101上的车载传感器102和车载控制单元103、以及服务器104。其中车辆101可以为无人驾驶车辆,车载控制单元103可以通过网络与服务器104连接,该网络可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
车载传感器102可以采集车辆行驶过程中的道路数据和车辆状态数据。车载传感器102可以包括车载摄像头、激光雷达传感器、毫米波雷达传感器、碰撞传感器、速度传感器、空气压力传感器等。车载控制单元103可以为ECU(Electronic Control Unit,电子控制单元),或者可以为车载电脑,用于对车辆的各部件的工作状态进行分析和控制。车载控制单元103可以获取车载传感器102采集的数据,并对数据进行处理和响应,还可以控制车载传感器采集的数据的存储和传输。
服务器104可以通过网络与车载控制单元103建立连接,车载控制单元103可以将传感器数据发送至服务器104。服务器104可以对传感器数据进行分析,将分析结果反馈至车载控制单元103。车载控制单元103可以根据接收到的分析结果进行响应。
通常车辆101在行驶过程中会遇到障碍物,车载控制单元103可以通过车载传感器102获取障碍物的信息,例如利用车载摄像头采集障碍物的图像、利用红外探测器或超声波探测器探测障碍物的距离、 形状等信息,然后车载控制单元103可以根据障碍物信息进行分析处理,最终做出操控决策,或者车载控制单元103可以将障碍物信息传输至服务器104进行分析处理并执行操控决策,由服务器104将操控决策传回车载控制单元103。车载控制单元103根据操控决策对车辆101的对应部件进行控制。
需要说明的是,本申请实施例所提供的车辆避障方法可以由车载控制单元103或服务器104执行,相应地,车辆避障装置可以设置于车载控制单元103或服务器104中。
应该理解,图1中的车辆、车载传感器、车载控制单元和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的车辆、车载传感器、车载控制单元和服务器。
继续参考图2,示出了根据本申请的车辆避障方法的一个实施例的流程200。该车辆避障方法,包括以下步骤:
步骤201,获取车辆的车载传感器采集的行驶数据。
在本实施例中,上述车辆避障方法运行于其上的电子设备可以通过与车载传感器的有线或无线连接获取车载传感器采集的行驶数据,也可以通过与用于存储车载传感器数据的存储设备的连接、发出数据获取请求来获取车载传感器采集的行驶数据。其中,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据。障碍物信息可以为道路上的行人、车辆、路障标志物等的相关信息,车辆传感器数据可以包括当前车辆的各个传感器的数据,可以包括但不限于压力传感器数据、图像传感器数据、激光雷达传感器数据、ABS(antilock brake system,制动防抱死系统)传感器数据、速度传感器数据、位置和转速传感器数据等。这里的车辆传感器数据可以包括传感器采集的车辆的状态数据以及车辆所处道路上的可被车辆传感器感知的数据。
本实施例中的车辆可以为无人驾驶车辆,或者说是具有自动驾驶功能的车辆。通常无人驾驶车辆在行驶过程中,车辆传感器可以持续地采集相应的数据,上述电子设备可以直接与各车辆传感器连接,实时地获取车辆传感器采集的数据,也可以通过与车载控制单元获取车辆传感器传输至车载控制单元的传感器数据。
步骤202,基于行驶数据,利用避障策略模型确定避障操控指令。
在本实施例中,避障策略模型用于将输入的行驶数据映射至避障操控指令。避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的。其中,历史避障记录可以为车辆在行驶过程中已发生的多次避障过程的相关记录。
在本实施例中,可以将上述行驶数据输入已训练的避障策略模型,利用避障策略模型来确定相应的避障操控指令。避障策略模型可以根据障碍物信息和车辆的传感器数据确定需要调用的操控部件及待发送至待调用的操控部件的操控指令。
上述避障策略模型可以基于DRL(Deep Reinforcement Learning。深度增强学习)技术以端到端的方式训练得出。
端到端的方式是指由直接获取的图像、音频等输入数据映射至期望的输出结果的方式,在本申请实施例的场景中,端到端是指由输入的行驶数据映射至操控指令的方式。区别于传统的操控决策方法,端到端的方式将数据清洗、特征提取、编解码等过程融合在一起,由深度神经网络模型建立输入数据到最终的目标输出数据之间的映射关系。以端到端的方式训练得出避障策略模型,就是将历史避障记录中的历史行驶数据作为模型的输入,对应的避障操控指令作为模型的输出而不断学习避障操控策略的过程。
在本实施例中,上述避障策略模型是以端到端的方式训练得出的,在训练过程中,只需提供原始的车载传感器采集的行驶数据和避障操控得到的避障结果的标记值即可,避障策略模型可以通过深度增强学习来优化模型参数。
DRL是将深度学习与强化学习的算法相结合,使训练得出的模型同时具有感知能力和决策能力。具体地,深度增强学习算法可以包括基于价值函数的深度增强学习、基于策略梯度的深度增强学习以及基于搜索与监督的深度增强学习。
举例来说,基于价值函数的深度学习将卷积神经网络与强化学习中的Q学习算法结合,利用DQN(Deep Q Network,深度Q网络)模型,输入的数据经过多层神经网络和连接层的非线性变换之后生成 每个动作的Q值。在训练DQN模型时,首先可以构建一个深度神经网络作为Q值的网络,其中网络参数为w,然后计算Q值的误差函数,之后计算参数w关于误差函数的梯度,之后采用随机梯度下降方法来更新网络参数w。
步骤203,将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作。
在利用避障策略模型制定避障策略、得到避障操控指令之后,可以将避障操控指令发送至对应的操控系统。在这里,操控系统可以包括制动系统、EPS(Electric Power Steering,电子助力转向)系统、ESP(Electronic Stability Program,电子稳定系统)等。操控系统接收到避障操控指令之后执行相应减速、转向等操作,使车辆避开障碍物。
具体地,避障操控指令可以包括制动指令、转向指令、报警指令等,可以将制动指令发送至制动系统,转向指令发送至EPS系统、报警指令发送至报警灯和报警喇叭。可选地,车辆的各操控系统与CAN(Controller Area Network,控制器局域网络)总线相连,这些避障操作指令可以通过CAN总线传输至对应的操控系统。
本申请上述实施例的车辆避障方法,通过获取车辆的车载传感器采集的行驶数据,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据;而后基于行驶数据,利用避障策略模型确定避障操控指令,避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;之后将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作,能够实现车辆的自动避障,保证车辆在无人驾驶场景中的安全行驶;并且利用通过基于深度增强学习方法训练的避障模型,可以应对复杂的道路场景,能够提升避障成功率。
请参考图3,其示出了根据本申请的车辆避障方法的另一个实施例的流程图。如图3所示,本实施例的车辆避障方法的流程300,可以包括以下步骤:
步骤301,基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出避障策略模型。
在本实施例中,历史避障记录可以包括历史避障结果、与历史避 障结果对应的历史行驶数据和历史操控数据。其中,历史避障结果可以为车辆的历史行程中避障操作的结果,可以表征车辆在避障操作之后的状态,例如可以包括“完美避开”、“勉强避开”、“避障失败”等。历史行驶数据可以包括历史行驶路径的障碍物信息和车辆的历史传感器数据。历史操控数据可以包括在车辆避障时上述电子设备(如车载控制单元)发出的历史避障操控指令。
在本实施例中,避障策略模型用于将输入的行驶数据映射至避障操控指令。上述电子设备可以具体按照如下方式来训练得出避障策略模型:获取历史避障结果的历史评估指数,基于历史行驶数据、历史评估指数和历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型。
具体地,可以对每一条历史避障记录的历史避障结果进行评估,得出评估指数,例如可以由专业的驾驶人员为历史避障结果打分,避障结果越好,分数越高,将打分作为该历史避障结果对应的历史评估指数。然后将历史行驶数据、历史操控数据和历史评估指数输入模型进行深度增强学习。上述端到端的方式是指在学习过程中,将历史行驶数据作为模型的输入、历史操控数据作为模型的输出,由车辆直接采集的数据到操控指令的“端到端”学习方式,在学习过程中将历史评估指数作为学习后的模型的效果评价指数,则可以在学习过程中不断调整模型参数使避障结果逼近评估指数较高的结果。这样,通过将避障结果量化为评估指数来进行学习,可以使模型更容易地学习出避障结果较好的避障策略的逻辑,加快学习速度,提升避障策略模型的效果。
在本实施例的一些可选的实现方式中,步骤301可以按照如下方式执行:从历史避障记录中抽样以构建样本数据集,样本数据集包括多条样本数据,每条样本数据与一条历史避障记录对应,样本数据包括对应的历史行驶数据、历史评估指数和历史操控数据;基于样本数据集,对已构建的深度增强学习模型进行训练,得到避障策略模型。
具体地,可以例如采用从历史避障记录中随机抽样的方式构建出样本数据集,然后构建DQN模型,并利用样本数据集训练DQN模型, 根据每个样本数据对应的评估指数调整DQN模型中每个卷积层和连接层的参数,得出使评估指数最高的参数,即可生成避障策略模型。
步骤302,获取车辆的车载传感器采集的行驶数据。
在本实施例中,上述车辆避障方法运行于其上的电子设备可以通过与车载传感器的有线或无线连接获取车载传感器采集的行驶数据,也可以通过与用于存储车载传感器数据的存储设备的连接、发出数据获取请求来获取车载传感器采集的行驶数据。其中,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据。障碍物信息可以为道路上的行人、车辆、路障标志物等的相关信息,车辆传感器数据可以包括当前车辆的各个传感器的数据,包括压力传感器数据、图像传感器数据、激光雷达传感器数据、ABS(antilock brake system,制动防抱死系统)传感器数据、速度传感器数据、位置和转速传感器数据等。这里的车辆传感器数据可以包括传感器采集的车辆的状态数据以及车辆所处道路上的可被车辆传感器感知的数据。
步骤303,基于行驶数据,利用避障策略模型确定避障操控指令。
在本实施例中,可以采用上述步骤301训练得出的避障策略模型来处理获取的行驶处理,得出避障操控指令。避障操控指令可以为用于控制车辆的操控单元进行避障操作的指令。
步骤304,将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作。
在利用避障策略模型制定避障策略、得到避障操控指令之后,可以将避障操控指令发送至对应的操控系统。避障操控指令可以包括制动指令、转向指令、报警指令等,操控系统可以包括制动系统、EPS(Electric Power Steering,电子助力转向)系统、ESP(Electronic Stability Program,电子稳定系统)等。操控系统接收到避障操控指令之后执行相应减速、转向等操作,使车辆避开障碍物。
上述方法流程中的步骤302、步骤303、步骤304分别与前述实施例中的步骤201、步骤202、步骤203相同,上文针对步骤201、步骤202、步骤203的描述也适用于本实施中的步骤302、步骤303、步骤304,此处不再赘述。
在图2所示实施例的基础上,图3所示的实施例增加了基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出所述避障策略模型的步骤,本实施例的车辆避障方法可以将历史避障结果量化为评估指数,使得作为训练实体的上述电子设备可以将该评估指数为模型的效果指标来不断学习内在的避障逻辑,从而实现了避障效果的进一步优化。
在本实施例的一些可选的实现方式中,上述基于历史行驶数据、历史评估指数和历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型的步骤可以包括:对每条历史避障记录,将历史行驶数据作为状态数据,历史操控数据作为操作数据,历史评估指数作为价值数据;利用历史避障记录构建学习轨迹,学习轨迹包括与各条历史避障记录一一对应的多个轨迹点,每个轨迹点包括对应的状态数据、操作数据以及价值数据;基于学习轨迹以及各价值数据之和计算总价值数据的期望关于策略参数的梯度,其中,策略参数用于将状态数据映射至操作数据;基于总价值数据的期望关于策略参数的梯度对策略参数进行调整,生成避障策略模型。
具体地,请参考图4,其示出了根据本申请的采用深度增强学习算法训练得出所述避障策略模型的方法的一种具体实现方式的示意性流程图。
如图4所示,在步骤401中,基于历史避障记录构建深度增强学习算法的学习轨迹。
学习轨迹包括多个轨迹点,每个轨迹点对应一条历史避障记录。每个轨迹点包括对应的历史避障记录的状态数据、操作数据以及价值数据。在本实施例中,第i条历史避障记录的状态数据s
i为历史行驶数据,操作数据a
i为历史操控数据,价值数据r
i为历史评估指数,i为整数且0≤i≤T-1,T为历史避障记录的条数。则构建的学习轨迹τ=(s
0,a
0,r
0,s
1,a
1,r
1,…,s
T-1,a
T-1,r
T-1)。
随后,在步骤402中,基于学习轨迹以及总价值数据计算总价值数据的期望关于策略参数的梯度。
具体地,假设策略参数为θ,可以根据各价值数据计算得出总价 值数据R,例如:
总价值数的期望为E[R|π
θ],其中π
θ表示由状态数据s
i映射至操作数据a
i的策略,总价值数据的期望关于策略参数的梯度可以表示为:
其中,π(a
t|s
t;θ)也可表示为π
θ(a
t|s
t),表示由状态数据s
t映射至操作数据a
t的策略。
之后,在步骤403中,基于总价值数据的期望关于策略参数的梯度对策略参数进行调整。具体的调整策略为将θ调整为θ+αg,其中为α控制策略参数更新速率的参数。
最终可以得出使得总价值数据的期望最大的策略参数,即为上述避障策略模型。
在进一步的实施例中,上述基于行驶数据,利用避障策略模型确定避障操作指令可以包括:将车载传感器采集的行驶数据作为当前的状态数据输入避障策略模型;基于调整后的策略参数将当前的状态数据映射至当前的操作数据,作为避障操控指令。具体来说,在根据上述方法得出避障策略模型之后,可以将车载传感器采集的行驶数据作为当前时刻T的状态数据s
T,利用该模型中的策略参数θ将状态数据s
T映射至当前的操作数据a
T,该操作数据a
T即为避障操作指令。
进一步地,如图4所示,在将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作之后,上述车辆的避障方法还包括:
步骤404,获取车辆的当前避障结果的评估指数,然后将当前避障结果的评估指数作为当前价值数据r
T,得到当包括当前状态数据s
T、当前的操作数据a
T、以及当前价值数据r
T的轨迹点,在步骤405中利用该轨迹点更新上述学习轨迹τ、进而更新总价值数据R以及总价值数据的期望E[R|π
θ]。
然后可以根据上式(2)计算更新后的总价值数据的期望E[R|π
θ]对策略参数θ的梯度,并根据学习率α对策略参数θ进行调整,得到对当前避障操作学习后的避障策略模型。
从图4可以看出,本实施例的避障策略模型训练方法利用增强学习算法构建了基于状态数据、操作数据以及价值数据的学习轨迹,然后依据学习轨迹对每个轨迹点进行学习,得出最优的避障策略参数,从而生成避障策略模型。并且,在每一次避障操作后,可以依据避障结果来更新避障策略模型,实现了避障策略模型的进一步优化。
进一步参考图5,作为对上述各图所示方法的实现,本申请提供了一种车辆避障装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,本实施例的车辆避障装置500包括:获取单元501确定单元502和发送单元503。其中,获取单元501可以配置用于获取车辆的车载传感器采集的行驶数据,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据;确定单元502可以配置用于基于行驶数据,利用避障策略模型确定避障操控指令,避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;发送单元503可以配置用于将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作。
在本实施例中,获取单元501可以通过与车载传感器的有线或无线连接获取车载传感器采集的行驶数据,也可以通过与用于存储车载传感器数据的存储设备的连接来获取车载传感器采集的行驶数据。其中,行驶数据包括行驶路径的障碍物信息和车辆的传感器数据。
确定单元502可以利用基于DRL技术以端到端方式训练得出的避障策略模型确定避障操控指令。避障策略模型可以将行驶数据赢谁知对应的避障操控指令,可以利用历史行驶数据和历史操控数据训练得出。确定单元502可以将获取单元501获取的行驶数据输入避障策略模型,则可以得出相应的避障操控指令。
发送单元503可以将确定单元502确定出的避障操控指令发送至对应的操控系统,操控系统可以包括制动系统、EPS(Electric Power Steering,电子助力转向)系统、ESP(Electronic Stability Program,电子稳定系统)等。操控系统接收到避障操控指令之后执行相应减速、转向等操作,使车辆避开障碍物。
在一些实施例中,上述装置500还可以包括:训练单元,配置用于基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出避障策略模型,其中,历史避障记录包括历史避障结果、与历史避障结果对应的历史行驶数据和历史操控数据。训练单元配置用于按照如下方式训练得出避障策略模型:获取各历史避障结果的历史评估指数;基于历史行驶数据、历史评估指数和历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型。
在进一步的实施例中,上述训练单元进一步配置用于按照如下方式进行深度增强学习,以得出使避障结果最优的避障策略模型的策略参数:对每条历史避障记录,将历史行驶数据作为状态数据,历史操控数据作为操作数据,历史评估指数作为价值数据;利用历史避障记录构建学习轨迹,学习轨迹包括与各条历史避障记录一一对应的多个轨迹点,每个轨迹点包括对应的状态数据、操作数据以及价值数据;基于学习轨迹以及总价值数据计算总价值数据的期望关于策略参数的梯度,其中,策略参数用于将状态数据映射至操作数据;基于总价值数据的期望关于策略参数的梯度对策略参数进行调整,生成避障策略模型。
在进一步的实施例中,上述确定单元进一步配置用于按照如下方式确定避障操控指令:将车载传感器采集的行驶数据作为当前的状态数据输入避障策略模型;基于调整后的策略参数将当前的状态数据映射至当前的操作数据,作为避障操控指令。
在进一步的实施例中,上述装置还包括更新单元,配置用于:在将避障操控指令发送至对应的操控系统之后,获取车辆的当前避障结果的评估指数,将当前避障结果的评价指数作为当前价值数据;基于当前的状态数据、当前的操作数据以及当前价值数据更新学习轨迹、总价值数据以及总价值数据的期望;基于更新后的学习轨迹以及各价值数据之和计算更新后的总价值数据的期望关于策略参数的梯度;基于更新后的总价值数据的期望关于策略参数的梯度对策略参数进行调整。
本申请上述实施例的车辆避障装置500,通过获取单元获取车辆 的车载传感器采集的行驶数据,随后利用确定单元基于行驶数据,利用基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的避障策略模型确定避障操控指令,之后利用发送单元将避障操控指令发送至对应的操控系统,以供操控系统执行相应的避障操作,实现了车辆的自动避障,并且可以应对复杂的道路场景,能够提升避障成功率,提升车辆的安全性。
应当理解,装置500中记载的诸单元可以与参考图2、图3和图4描述的方法中的各个步骤相对应。由此,上文针对方法描述的操作和特征同样适用于装置500及其中包含的单元,在此不再赘述。
下面参考图6,其示出了适于用来实现本申请实施例的终端设备或服务器的计算机系统600的结构示意图。图6示出的终端设备或服务器仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图6所示,计算机系统600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有系统600操作所需的各种程序和数据。CPU 601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程 序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时 也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、确定单元和发送单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取车辆的车载传感器采集的行驶数据的单元”。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的装置中所包含的;也可以是单独存在,而未装配入该装置中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该装置执行时,使得该装置:获取车辆的车载传感器采集的行驶数据,所述行驶数据包括行驶路径的障碍物信息和所述车辆的传感器数据;基于所述行驶数据,利用避障策略模型确定避障操控指令,所述避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;将所述避障操控指令发送至对应的操控系统,以供所述操控系统执行相应的避障操作。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
Claims (12)
- 一种车辆避障方法,其特征在于,所述方法包括:获取车辆的车载传感器采集的行驶数据,所述行驶数据包括行驶路径的障碍物信息和所述车辆的传感器数据;基于所述行驶数据,利用避障策略模型确定避障操控指令,所述避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;将所述避障操控指令发送至对应的操控系统,以供所述操控系统执行相应的避障操作。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出所述避障策略模型的步骤,其中,所述历史避障记录包括历史避障结果、与所述历史避障结果对应的历史行驶数据和历史操控数据;所述基于历史避障记录、采用深度增强学习算法训练得出所述避障策略模型的步骤包括:获取各所述历史避障结果的历史评估指数;基于所述历史行驶数据、所述历史评估指数和所述历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型。
- 根据权利要求2所述的方法,其特征在于,所述基于所述历史行驶数据、所述历史评估指数和所述历史操控数据进行深度增强学习,以得出使避障结果最优的所述避障策略模型的策略参数,包括:对每条所述历史避障记录,将所述历史行驶数据作为状态数据,所述历史操控数据作为操作数据,所述历史评估指数作为价值数据;利用所述历史避障记录构建学习轨迹,所述学习轨迹包括与各条所述历史避障记录一一对应的多个轨迹点,每个轨迹点包括对应的状态数据、操作数据以及价值数据;基于所述学习轨迹以及总价值数据计算总价值数据的期望关于策 略参数的梯度,其中,所述策略参数用于将所述状态数据映射至所述操作数据;基于所述总价值数据的期望关于策略参数的梯度对所述策略参数进行调整,生成所述避障策略模型。
- 根据权利要求3所述的方法,其特征在于,所述基于所述行驶数据,利用避障策略模型确定避障操控指令,包括:将所述车载传感器采集的行驶数据作为当前的状态数据输入所述避障策略模型;基于所述调整后的策略参数将所述当前的状态数据映射至当前的操作数据,作为所述避障操控指令。
- 根据权利要求4所述的方法,其特征在于,在将所述避障操控指令发送至对应的操控系统之后,所述方法还包括:获取所述车辆的当前避障结果的评估指数,将所述当前避障结果的评价指数作为当前价值数据;基于所述当前的状态数据、所述当前的操作数据以及所述当前价值数据更新所述学习轨迹、所述总价值数据以及所述总价值数据的期望;基于更新后的学习轨迹以及各价值数据之和计算更新后的总价值数据的期望关于策略参数的梯度;基于所述更新后的总价值数据的期望关于策略参数的梯度对所述策略参数进行调整。
- 一种车辆避障装置,其特征在于,所述装置包括:获取单元,配置用于获取车辆的车载传感器采集的行驶数据,所述行驶数据包括行驶路径的障碍物信息和所述车辆的传感器数据;确定单元,配置用于基于所述行驶数据,利用避障策略模型确定避障操控指令,所述避障策略模型是基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出的;发送单元,配置用于将所述避障操控指令发送至对应的操控系统,以供所述操控系统执行相应的避障操作。
- 根据权利要求6所述的装置,其特征在于,所述装置还包括:训练单元,配置用于基于历史避障记录、采用深度增强学习算法以端到端的方式训练得出所述避障策略模型,其中,所述历史避障记录包括历史避障结果、与所述历史避障结果对应的历史行驶数据和历史操控数据;所述训练单元配置用于按照如下方式训练得出所述避障策略模型:获取各所述历史避障结果的历史评估指数;基于所述历史行驶数据、所述历史评估指数和所述历史操控数据进行深度增强学习,以得出使避障结果最优的避障策略模型。
- 根据权利要求7所述的装置,其特征在于,所述训练单元进一步配置用于按照如下方式进行深度增强学习,以得出使避障结果最优的所述避障策略模型的策略参数:对每条所述历史避障记录,将所述历史行驶数据作为状态数据,所述历史操控数据作为操作数据,所述历史评估指数作为价值数据;利用所述历史避障记录构建学习轨迹,所述学习轨迹包括与各条所述历史避障记录一一对应的多个轨迹点,每个轨迹点包括对应的状态数据、操作数据以及价值数据;基于所述学习轨迹以及总价值数据计算总价值数据的期望关于策略参数的梯度,其中,所述策略参数用于将所述状态数据映射至所述操作数据;基于所述总价值数据的期望关于策略参数的梯度对所述策略参数进行调整,生成所述避障策略模型。
- 根据权利要求8所述的装置,其特征在于,所述确定单元进一步配置用于按照如下方式确定避障操控指令:将所述车载传感器采集的行驶数据作为当前的状态数据输入所述避障策略模型;基于所述调整后的策略参数将所述当前的状态数据映射至当前的操作数据,作为所述避障操控指令。
- 根据权利要求9所述的装置,其特征在于,所述装置还包括更新单元,配置用于:在将所述避障操控指令发送至对应的操控系统之后,获取所述车辆的当前避障结果的评估指数,将所述当前避障结果的评价指数作为当前价值数据;基于所述当前的状态数据、所述当前的操作数据以及所述当前价值数据更新所述学习轨迹、所述总价值数据以及所述总价值数据的期望;基于更新后的学习轨迹以及各价值数据之和计算更新后的总价值数据的期望关于策略参数的梯度;基于所述更新后的总价值数据的期望关于策略参数的梯度对所述策略参数进行调整。
- 一种设备,其特征在于,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-5中任一所述的方法。
- 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-5中任一所述的方法。
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710790602.5A CN107491072B (zh) | 2017-09-05 | 2017-09-05 | 车辆避障方法和装置 |
| CN201710790602.5 | 2017-09-05 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019047646A1 true WO2019047646A1 (zh) | 2019-03-14 |
Family
ID=60651604
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2018/098637 Ceased WO2019047646A1 (zh) | 2017-09-05 | 2018-08-03 | 车辆避障方法和装置 |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN107491072B (zh) |
| WO (1) | WO2019047646A1 (zh) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110196605A (zh) * | 2019-04-26 | 2019-09-03 | 大连海事大学 | 一种强化学习的无人机群在未知海域内协同搜索多动态目标方法 |
| CN113614713A (zh) * | 2021-06-29 | 2021-11-05 | 华为技术有限公司 | 一种人机交互方法及装置、设备及车辆 |
| CN114118276A (zh) * | 2021-11-29 | 2022-03-01 | 北京触达无界科技有限公司 | 一种网络训练的方法、控制方法以及装置 |
| CN114488980A (zh) * | 2022-01-21 | 2022-05-13 | 上海擎朗智能科技有限公司 | 一种机器人的调度方法、装置、电子设备及存储介质 |
| CN114839969A (zh) * | 2022-04-02 | 2022-08-02 | 达闼机器人股份有限公司 | 控制设备移动的方法、装置、存储介质与电子设备 |
| CN115140091A (zh) * | 2022-06-29 | 2022-10-04 | 中国第一汽车股份有限公司 | 自动驾驶决策方法、装置、车辆及存储介质 |
| CN116009548A (zh) * | 2023-01-03 | 2023-04-25 | 四川化工职业技术学院 | 一种车辆避障的自动化控制系统中的数据传输方法 |
| CN116469069A (zh) * | 2023-03-24 | 2023-07-21 | 北京百度网讯科技有限公司 | 用于自动驾驶的场景编码模型训练方法、装置及介质 |
| CN116823700A (zh) * | 2022-03-21 | 2023-09-29 | 北京沃东天骏信息技术有限公司 | 一种图像质量的确定方法和装置 |
Families Citing this family (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107491072B (zh) * | 2017-09-05 | 2021-03-30 | 百度在线网络技术(北京)有限公司 | 车辆避障方法和装置 |
| US10983524B2 (en) * | 2018-04-12 | 2021-04-20 | Baidu Usa Llc | Sensor aggregation framework for autonomous driving vehicles |
| CN108710370B (zh) * | 2018-05-28 | 2021-03-16 | 广东工业大学 | 一种无人驾驶汽车的控制方法及系统 |
| DE102019113114A1 (de) | 2018-06-19 | 2019-12-19 | Nvidia Corporation | Verhaltensgesteuerte wegplanung in autonomen maschinenanwendungen |
| US11966838B2 (en) * | 2018-06-19 | 2024-04-23 | Nvidia Corporation | Behavior-guided path planning in autonomous machine applications |
| CN108984275A (zh) * | 2018-08-27 | 2018-12-11 | 洛阳中科龙网创新科技有限公司 | 基于Unity3D和深度增强学习的智能无人农用驾驶训练方法 |
| CN111104247A (zh) * | 2018-10-25 | 2020-05-05 | 伊姆西Ip控股有限责任公司 | 管理数据复制的方法、设备和计算机程序产品 |
| CN109697458A (zh) * | 2018-11-27 | 2019-04-30 | 深圳前海达闼云端智能科技有限公司 | 控制设备移动的方法、装置、存储介质及电子设备 |
| CN109583384A (zh) * | 2018-11-30 | 2019-04-05 | 百度在线网络技术(北京)有限公司 | 用于无人驾驶车的避障方法和装置 |
| CN111413957B (zh) * | 2018-12-18 | 2021-11-02 | 北京航迹科技有限公司 | 用于确定自动驾驶中的驾驶动作的系统和方法 |
| CN109782775B (zh) * | 2019-03-13 | 2020-04-14 | 刘乐 | 一种基于热图像的汽车避障系统 |
| CN109993106A (zh) * | 2019-03-29 | 2019-07-09 | 北京易达图灵科技有限公司 | 避障方法和装置 |
| CN110361709B (zh) * | 2019-06-28 | 2021-04-20 | 清矽微电子(南京)有限公司 | 一种基于动态虚警概率的车载毫米波雷达目标识别方法 |
| CN110658820A (zh) * | 2019-10-10 | 2020-01-07 | 北京京东乾石科技有限公司 | 无人驾驶车辆的控制方法及装置、电子设备、存储介质 |
| CN111301404B (zh) * | 2020-02-06 | 2022-02-18 | 北京小马慧行科技有限公司 | 车辆的控制方法及装置、存储介质及处理器 |
| CN111959496B (zh) * | 2020-06-29 | 2022-12-30 | 北京百度网讯科技有限公司 | 用于车辆横向控制的模型生成方法、装置及电子设备 |
| CN111731326B (zh) * | 2020-07-02 | 2022-06-21 | 知行汽车科技(苏州)有限公司 | 避障策略确定方法、装置及存储介质 |
| CN112269385B (zh) * | 2020-10-23 | 2021-09-07 | 北京理工大学 | 云端无人车动力学控制系统和方法 |
| CN115328154B (zh) * | 2022-09-05 | 2025-03-18 | 中煤科工集团重庆研究院有限公司 | 一种煤矿井下履带车避障学习方法 |
| CN115993821A (zh) * | 2022-12-07 | 2023-04-21 | 北京百度网讯科技有限公司 | 自动驾驶车辆的决策方法、装置、设备及自动驾驶车辆 |
| CN117389937B (zh) * | 2023-12-11 | 2024-03-08 | 上海建工一建集团有限公司 | 一种车辆避障数据的计算方法、计算机及可读存储介质 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001078951A1 (en) * | 2000-04-13 | 2001-10-25 | Zhimin Lin | Semi-optimal path finding in a wholly unknown environment |
| CN104317297A (zh) * | 2014-10-30 | 2015-01-28 | 沈阳化工大学 | 一种未知环境下机器人避障方法 |
| CN105139072A (zh) * | 2015-09-09 | 2015-12-09 | 东华大学 | 应用于非循迹智能小车避障系统的强化学习算法 |
| CN106558058A (zh) * | 2016-11-29 | 2017-04-05 | 北京图森未来科技有限公司 | 分割模型训练方法、道路分割方法、车辆控制方法及装置 |
| CN107065890A (zh) * | 2017-06-02 | 2017-08-18 | 北京航空航天大学 | 一种无人车智能避障方法及系统 |
| CN107491072A (zh) * | 2017-09-05 | 2017-12-19 | 百度在线网络技术(北京)有限公司 | 车辆避障方法和装置 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE4001493A1 (de) * | 1990-01-19 | 1991-07-25 | Pietzsch Ibp Gmbh | Verfahren und einrichtung zur selbsttaetigen steuerung von bewegbaren geraeten |
| CN106292704A (zh) * | 2016-09-07 | 2017-01-04 | 四川天辰智创科技有限公司 | 规避障碍物的方法及装置 |
| CN106548645B (zh) * | 2016-11-03 | 2019-07-12 | 济南博图信息技术有限公司 | 基于深度学习的车辆路径寻优方法及系统 |
| CN106873566B (zh) * | 2017-03-14 | 2019-01-22 | 东北大学 | 一种基于深度学习的无人驾驶物流车 |
-
2017
- 2017-09-05 CN CN201710790602.5A patent/CN107491072B/zh active Active
-
2018
- 2018-08-03 WO PCT/CN2018/098637 patent/WO2019047646A1/zh not_active Ceased
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2001078951A1 (en) * | 2000-04-13 | 2001-10-25 | Zhimin Lin | Semi-optimal path finding in a wholly unknown environment |
| CN104317297A (zh) * | 2014-10-30 | 2015-01-28 | 沈阳化工大学 | 一种未知环境下机器人避障方法 |
| CN105139072A (zh) * | 2015-09-09 | 2015-12-09 | 东华大学 | 应用于非循迹智能小车避障系统的强化学习算法 |
| CN106558058A (zh) * | 2016-11-29 | 2017-04-05 | 北京图森未来科技有限公司 | 分割模型训练方法、道路分割方法、车辆控制方法及装置 |
| CN107065890A (zh) * | 2017-06-02 | 2017-08-18 | 北京航空航天大学 | 一种无人车智能避障方法及系统 |
| CN107491072A (zh) * | 2017-09-05 | 2017-12-19 | 百度在线网络技术(北京)有限公司 | 车辆避障方法和装置 |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110196605A (zh) * | 2019-04-26 | 2019-09-03 | 大连海事大学 | 一种强化学习的无人机群在未知海域内协同搜索多动态目标方法 |
| CN110196605B (zh) * | 2019-04-26 | 2022-03-22 | 大连海事大学 | 一种强化学习的无人机群在未知海域内协同搜索多动态目标方法 |
| CN113614713A (zh) * | 2021-06-29 | 2021-11-05 | 华为技术有限公司 | 一种人机交互方法及装置、设备及车辆 |
| CN114118276A (zh) * | 2021-11-29 | 2022-03-01 | 北京触达无界科技有限公司 | 一种网络训练的方法、控制方法以及装置 |
| CN114488980A (zh) * | 2022-01-21 | 2022-05-13 | 上海擎朗智能科技有限公司 | 一种机器人的调度方法、装置、电子设备及存储介质 |
| CN116823700A (zh) * | 2022-03-21 | 2023-09-29 | 北京沃东天骏信息技术有限公司 | 一种图像质量的确定方法和装置 |
| CN114839969A (zh) * | 2022-04-02 | 2022-08-02 | 达闼机器人股份有限公司 | 控制设备移动的方法、装置、存储介质与电子设备 |
| CN115140091A (zh) * | 2022-06-29 | 2022-10-04 | 中国第一汽车股份有限公司 | 自动驾驶决策方法、装置、车辆及存储介质 |
| CN116009548A (zh) * | 2023-01-03 | 2023-04-25 | 四川化工职业技术学院 | 一种车辆避障的自动化控制系统中的数据传输方法 |
| CN116469069A (zh) * | 2023-03-24 | 2023-07-21 | 北京百度网讯科技有限公司 | 用于自动驾驶的场景编码模型训练方法、装置及介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107491072B (zh) | 2021-03-30 |
| CN107491072A (zh) | 2017-12-19 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2019047646A1 (zh) | 车辆避障方法和装置 | |
| CN109213134B (zh) | 生成自动驾驶策略的方法和装置 | |
| US11308391B2 (en) | Offline combination of convolutional/deconvolutional and batch-norm layers of convolutional neural network models for autonomous driving vehicles | |
| CN110654381B (zh) | 用于控制车辆的方法和装置 | |
| WO2019047651A1 (zh) | 驾驶行为预测方法和装置、无人车 | |
| WO2020107974A1 (zh) | 用于无人驾驶车的避障方法和装置 | |
| WO2019047650A1 (zh) | 无人驾驶车辆的数据采集方法和装置 | |
| US12168456B2 (en) | Vehicle placement on aerial views for vehicle control | |
| EP3570214B1 (en) | Automobile image processing method and apparatus, and readable storage medium | |
| WO2019000391A1 (zh) | 车辆的控制方法、装置及设备 | |
| CN114771533B (zh) | 自动驾驶车辆的控制方法、装置、设备、车辆及介质 | |
| US11634156B1 (en) | Aerial view generation for vehicle control | |
| CN114771534B (zh) | 自动驾驶车辆的控制方法、训练方法、车辆、设备及介质 | |
| WO2019047643A1 (zh) | 用于无人车的控制方法和装置 | |
| CN110320883A (zh) | 一种基于强化学习算法的车辆自动驾驶控制方法及装置 | |
| CN112622923B (zh) | 用于控制车辆的方法和装置 | |
| CN114291099B (zh) | 用于自动驾驶车辆的停车方法和装置 | |
| CN114802250B (zh) | 数据处理方法、装置、设备、自动驾驶车辆及介质 | |
| CN110654380B (zh) | 用于控制车辆的方法和装置 | |
| CN117539253A (zh) | 能够遵循指令实现自主脱困的自动驾驶方法、装置和车辆 | |
| CN114581865A (zh) | 深度神经网络中的置信度测量 | |
| CN116088538B (zh) | 车辆轨迹信息生成方法、装置、设备和计算机可读介质 | |
| CN111332279B (zh) | 泊车路径生成方法和装置 | |
| WO2022204867A1 (zh) | 一种车道线检测方法及装置 | |
| CN115214647B (zh) | 一种自动驾驶车辆的轨迹规划方法及装置 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18853431 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05/08/2020) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18853431 Country of ref document: EP Kind code of ref document: A1 |