[go: up one dir, main page]

WO2020177876A1 - System and method for training a model performing human-like driving - Google Patents

System and method for training a model performing human-like driving Download PDF

Info

Publication number
WO2020177876A1
WO2020177876A1 PCT/EP2019/055786 EP2019055786W WO2020177876A1 WO 2020177876 A1 WO2020177876 A1 WO 2020177876A1 EP 2019055786 W EP2019055786 W EP 2019055786W WO 2020177876 A1 WO2020177876 A1 WO 2020177876A1
Authority
WO
WIPO (PCT)
Prior art keywords
driving
human
generative
model
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2019/055786
Other languages
French (fr)
Inventor
Nicolas VIGNARD
Dengxin DAI
Simon Hecker
Luc Van Gool
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toyota Motor Europe NV SA
Eidgenoessische Technische Hochschule Zurich ETHZ
Original Assignee
Toyota Motor Europe NV SA
Eidgenoessische Technische Hochschule Zurich ETHZ
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toyota Motor Europe NV SA, Eidgenoessische Technische Hochschule Zurich ETHZ filed Critical Toyota Motor Europe NV SA
Priority to PCT/EP2019/055786 priority Critical patent/WO2020177876A1/en
Publication of WO2020177876A1 publication Critical patent/WO2020177876A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure is related to the field of image processing, in particular to a method for training a human-like generative driving model for a vehicle.
  • affordance indicators such as distance to the front car and existence of traffic light, cf. e.g.
  • a (desirably computer-implemented) method for training a human-like generative driving model for a vehicle comprises the steps of: a - obtaining a set of video data of driving scenes performed by a human driven vehicle,
  • training step of c is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
  • a discriminator may be trained, together with the driving model, to distinguish human driving and machine driving.
  • the driving model is trained to be accurate, comfortable, and at the same time to fool the discriminator so that it believes that the driving performed by the method was by a human driver.
  • a new evaluation criterion is proposed to score the human-likeness of a driving model.
  • the learning procedure is desirably improved from a pointwise prediction to a sequence-based prediction.
  • the generative driving model desirably outputs predicted driving maneuvers.
  • the model is desirably configured to steer autonomously a vehicle based on said outputted predicted driving maneuvers.
  • the driving maneuvers may comprise any kind of maneuvers for driving control of the vehicle, e.g. simple maneuvers as steering or braking, or more complex maneuver line taking a turn by a combination of braking, steering and re-accelerating.
  • the adversarial training scheme may comprise: cl - training a discriminator model based on the predicted driving maneuvers outputted by the generative driving model and corresponding human driving maneuvers of the data set to discriminate between human and machine maneuvers, and
  • the standard LI and/or L2 loss may be augmented by an adversary loss which is based on a discriminator model trained to distinguish human driving and machine driving.
  • the step of obtaining the set of video data may further comprise: al - obtaining a route planning data set representing route information according to which the human driven vehicle have performed the driving scenes of the set of video data,
  • the set of video data may be enriched with numerical map data from HERE Technologies.
  • the step of training the generative driving model may comprise: training the generative driving model based on a predefined loss function (e.g. the LI and/or the L2 loss) augmented by an adversary loss which is based on the output of the discriminator model.
  • a predefined loss function e.g. the LI and/or the L2 loss
  • an adversary loss which is based on the output of the discriminator model.
  • the generative driving model may receive as an input video data and data of human driving maneuvers of past time steps, and output predicted driving maneuvers for future time steps.
  • the generative driving model may receive as a further input the vehicle location in past time steps.
  • a deep neural network may be trained to predict the steering angle s and speed v for a future time step. All data inputs may be synchronized and sampled at the same sampling rate f, meaning the vehicle makes driving decision every 1/f seconds. The inputs and outputs may be represented in this discretized form.
  • the generative driving model may be a deep neural network.
  • the discriminator model may be a deep neural network.
  • the driving model developed in S. Hecker, D. Dai, and L. Van Gool. End-to-end learning of driving models with surround-view cameras and route planners, ECCV, 2018, may be adopted.
  • the present disclosure further relates to a system for training a human-like generative driving model for a vehicle, the system comprises:
  • training in module C is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
  • the system may comprise further (sub-) modules and features corresponding to the features of the method described above.
  • the present disclosure relates to a system for predicting human-like driving maneuvers of a vehicle, comprising the (trained) model of step c or of module C, as descried above.
  • the present disclosure relates to a computer program including instructions for executing the steps of a method, as described above, when said program is executed by a computer.
  • This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.
  • the present disclosure relates to a recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method, as described above.
  • the information medium can be any entity or device capable of storing the program.
  • the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk.
  • the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.
  • FIG. 1 shows a schematic flow chart of the steps of a method for training a human-like generative driving model according to embodiments of the present disclosure
  • FIG. 2 shows a schematic block diagram of a system according to embodiments of the present disclosure.
  • End-to-end driving allows developing promising driving models based on camera data, cf. e.g.:
  • Fig. 1 shows a schematic flow chart of the steps of a method for training a human-like generative driving model according to embodiments of the present disclosure.
  • the driving model developed in the publication cited above (S. Hecker et. al., 2018) may be adopted in the present disclosure.
  • the used core model may consist of a fine-tuned Resnet34 CNN (cf. e.g. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016) to process sequences of front facing camera images, followed by two regression networks to predict steering wheel angle and vehicle speed.
  • the architecture may thus be similar to the baseline model from the publication cited above (S. Hecker et. al., 2018).
  • a set of video data of driving scenes performed by a human driven vehicle is obtained.
  • the set of video data may be e.g. the Drive360 dataset, as described in the publication cited above (S. Hecker et. al., 2018).
  • a route planning data set representing route information according to which the human driven vehicle have performed the driving scenes of the set of video data may be additionally obtained, e.g. map data from Here Technologies.
  • the set of video data may be enriched by the route planning data, such that the accuracy of the predicted driving maneuvers outputted by the generative driving model is increased.
  • step S02 (which may be carried out before, after or at the same time as step SOI) a data set of human driving maneuvers carried out during the driving scenes is obtained.
  • said data set of human driving maneuvers is a further input for the model proving information regarding the driving control gestures of humans when driving the vehicles in the driving scenes of the set of video data.
  • a generative driving model is trained using the set of video data and the data set of human driving maneuvers. Said training is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
  • a discriminator model may be trained based on the predicted driving maneuvers outputted by the generative driving model and corresponding human driving maneuvers of the data set to discriminate between human and machine maneuvers.
  • the generative driving model may be forced to learn more human like driving in a further optional step S03b (not shown).
  • a deep neural network is trained to predict the steering angle s and speed v for a future time step. All data inputs are synchronized and sampled at the same sampling rate f, meaning the vehicle makes driving decision every 1/f seconds. The inputs and outputs are represented in this discretized form. It is used t to indicate the time stamp, such that ail data can be indexed over time. For example, l t indicates the current video frame and v t the vehicle's current speed. Similarly, l t _k is the k th previous video frame and s t-k is the k th previous steering angle.
  • the k recent video frames are denoted by l[ t-k +i,t] o ⁇ l t _k + i, . . . , l t ), and the k recent map representations by M[ t -k+i,t] o (M t _k+i, . . M t ).
  • the goal is to train a deep network that predicts desired driving actions from the visual observations and the planned route.
  • the learning task can be defined as: (1) where S t+1 represents the steering angle space and V t+1 the speed space for future time t + 1. s and V can be defined at several levels of granularity.
  • v (V
  • 0 £ v £ 180 for speed and s ⁇ S
  • kilometer per hour (km/h) is the unit of v
  • degree (°) the unit of s.
  • M t is either a rendered video frame from the TomTom route planner (cf. S. Hecker et. al., 2018), or the engineered features for the numerical maps from HERE Technologies (as described below), or the combination of both.
  • the synchronized data (I, M) may be denoted as D.
  • the training data are assumed to consist of a long sequence of driving data with T frames in total. Then the basic driving model is to learn the prediction function for the steering angle and the velocity
  • ⁇ and are predicted values, and ⁇ and are the ground truth values.
  • the used comfort component aims at reducing jerk by imposing a temporal smoothness constraint on the longitudinal and lateral oscillations, by minimizing the second derivative of consecutive steering angle and speed predictions.
  • Eq. 4 is reformulated. If the number of consecutive predictions that need to be optimized jointly is denoted by O, then minimizing Eq. 4 is equivalent to minimizing
  • An adversarial learning method consists of a generator and discriminator.
  • the drivelet at t is forwarded to G to obtain the driving actions B t .
  • To make autonomous driving more human-like is equivalent to letting the distribution of B t approximate that of B t .
  • the loss for human-like driving according to the present disclosure is defined as an adversarial loss: where D(Bt) 1 is the probability of classifying B t as human driving.
  • z 2 is a trade-off parameter to control the contributions of the costs.
  • min-max criterion the training is conducted under the following min-max criterion:
  • the set of video data according to the present disclosure may be provided by panoramic videos recorded by vehicle comprising one or several cameras oriented toward the environment in at least one of the front, the sides, and the back of the vehicle, e.g. by the Drive360 video data set.
  • Drive360 features 60 hours of real-world driving data over 3000 km.
  • the Drive360 is e.g. augmented with HERE Technologies map data.
  • Drive360 offers a time stamped GPS trace for each route recorded.
  • a path-matcher is used based on a hidden markov model employing the Viterbi algorithm (cf. e.g. G. D. Forney. The viterbi algorithm.
  • AlexNet cf. e.g. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 2012) may be used to process the visual map representation from the TomTom Go App.
  • the respective loss may computed according to Eq. 7 and gradients are back propagated to adjust the driving network.
  • a fully-connected, three-layer discriminator network may be used to model human-like driving.
  • the loss may be computed according to Eq. 9 to adjust the driving network.
  • FIG. 2 shows a schematic block diagram of a system according to embodiments of the present disclosure.
  • a system 200 for training a model has been represented.
  • This system 200 which may be a computer, comprises a processor 201 and a non-volatile memory 202.
  • the system 200 may also comprise, be configured to be integrated in or form a part of a vehicle 400.
  • the system 200 may not only be configured for training a human-like generative driving model for a vehicle but also to apply the trained model to autonomously drive a vehicle (in particular in case it is part of a vehicle 400).
  • the system 200 may further be connected to a (passive) optical sensor 300, in particular a digital camera (e.g. integrated into the vehicle and being oriented to at least one of the front, the sides and the back).
  • the digital camera 300 is configured such that it can record a scene in front of the vehicle 400, and in particular output digital data providing appearance (color, e.g. RGB) information of the scene.
  • the camera 300 desirably generates image data comprising a 2D or 3D image of the environment.
  • the output of the camera 300 may be used as video data of driving scenes for training the model (cf. step SOI of the method described above) and/or as input for a trained model, based on which the trained model autonomously controls driving the vehicle.
  • a set of instructions is stored and this set of instructions comprises instructions to perform a method for training a model.
  • these instructions and the processor 201 may respectively form a plurality of modules:
  • a module C for training (S03) a generative driving model using the set of video data, and the data set of human driving maneuvers,
  • training in module C is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method and system for training a human-like generative driving model for a vehicle, comprising: a - obtaining (SOI) a set of video data of driving scenes performed by a human driven vehicle, b - obtaining (S02) a data set of human driving maneuvers carried out during the driving scenes, c - training (S03) a generative driving model using the set of video data and the data set of human driving maneuvers, wherein the training step (S03) of c is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.

Description

SYSTEM AND METHOD FOR TRAINING A MODEL PERFORMING
HUMAN-LIKE DRIVING
FIELD OF THE DISCLOSURE
[0001] The present disclosure is related to the field of image processing, in particular to a method for training a human-like generative driving model for a vehicle.
BACKGROUND OF THE DISCLOSURE
[0002] The prospect of deploying autonomously driven cars is imminent owing to the advances in perception, robotics and sensor technologies. However, it is believed that autonomous vehicles are more likely to be accepted if they drive accurately, comfortably and drive the same way as human drivers would do. This is especially true for the near future when autonomous vehicles and human-driven vehicles need to share the same road.
[0003] Classical approaches require the recognition of all driving-relevant objects, such as lanes, traffic signs, traffic lights, cars and pedestrians, and then perform motion planning, which is further used for final vehicle control, cf. e.g.:
C. Urmson, et.al. Autonomous driving in urban environments: Boss and the urban challenge. Journal of Field Robotics Special Issue on the 2007 DARPA Urban Challenge, Part I, 25(8):425-466, June 2008.
[0004] These type of systems are sophisticated, represent the current state- of-the-art for autonomous driving, but they are hard to maintain and prone to error accumulation over the pipeline. Most systems also need to use diverse sensors, such as cameras, laser scanners, radar, GPS and high-definition maps.
[0005] End-to-end mapping methods on the other hand construct a direct mapping from the sensory input to the maneuvers. In this regard the last years have seen tremendous progress in academia on learning driving models, cf. e.g.:
F. Codevilla, M. Muller, A. Lopez, V. Koltun, and A. Doso-vitskiy. End-to-end driving via conditional imitation learning. 2018, and
S. Hecker, D. Dai, and L. Van Gool. End-to-end learning of driving models with surround-view cameras and route planners. In ECCV, 2018.
[0006] However, many of these systems are deficient in terms of the sensors used, when compared to the driving systems developed by large companies. For instance, many algorithms only use a front-facing camera. Maps are exploited only for simple directional commands or rendered videos. While these setups are sufficient to allow the community to study many challenges, developing algorithms for fully autonomous cars requires the use of numerical maps of high fidelity.
[0007] Current driving algorithms, e.g. those cited above, mostly treat driving as a regression problem with i.i.d individual training samples, e.g. regressing the low-level steering angle and speed for a given data sample. Yet, driving is a continuous sequence of events over time. Longitudinal and lateral control need to be coupled and these coupled operations need to be combined over time for a comfortable ride. Thus, driving models need to be learned with continuous data sequences and proper passenger comfort measures need to be embedded into the learning system.
[0008] Other contributions have chosen the middle ground between traditional pipe-lined methods and the monolithic end-to-end approach. They learn driving models from compact intermediate representations called affordance indicators such as distance to the front car and existence of traffic light, cf. e.g.
A. Sauer, N. Savinov, and A. Geiger. Conditional affordance learning for driving in urban environments. In Conference on Robot Learning, 2018.
[0009] While research on passenger comfort started to receive some attention, it hardly did so in learning driving models, cf. e.g. :
M. Elbanhawi, M. Simic, and R. Jazar. In the passenger seat: Investigating ride comfort measures in autonomous cars. IEEE Intelligent Transportation
Systems Magazine, 7(3):4-17, 2015).
[0010] A large body of work has studied human driving styles, cf. e.g.:
G. A. M. Meiring and H. C. Myburgh. A review of intelligent driving style analysis systems and related artificial intelligence algorithms. In Sensors, 2015.
[0011] Statistical approaches have been employed to evaluate human drivers and to suggest improvements, cf. e.g.:
H. Zhao, H. Zhou, C. Chen, and J. Chen. Join driving: A smart phone-based driving behavior evaluation system. In IEEE Global Communications Conference (GLOBECOM), 2013.
However, human-like driving is hard to quantify. SUMMARY OF THE DISCLOSURE
[0012] Currently, it remains desirable to provide a system and a method for training a human-like generative driving model for a vehicle, in particular for learning to drive accurately, comfortably and to drive the same way as human drivers would do.
[0013] Therefore, according to the embodiments of the present disclosure, a (desirably computer-implemented) method for training a human-like generative driving model for a vehicle is provided. The method comprises the steps of: a - obtaining a set of video data of driving scenes performed by a human driven vehicle,
b - obtaining a data set of human driving maneuvers carried out during the driving scenes,
c - training a generative driving model using the set of video data and the data set of human driving maneuvers,
wherein the training step of c is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
[0014] By providing such a method, it becomes possible to take advantage of the advance of adversary learning to learn human-like driving. Specifically, a discriminator may be trained, together with the driving model, to distinguish human driving and machine driving. The driving model is trained to be accurate, comfortable, and at the same time to fool the discriminator so that it believes that the driving performed by the method was by a human driver. A new evaluation criterion is proposed to score the human-likeness of a driving model.
[0015] As a further advantage, the learning procedure is desirably improved from a pointwise prediction to a sequence-based prediction.
[0016] The generative driving model desirably outputs predicted driving maneuvers. Accordingly, the model is desirably configured to steer autonomously a vehicle based on said outputted predicted driving maneuvers. For example, the driving maneuvers may comprise any kind of maneuvers for driving control of the vehicle, e.g. simple maneuvers as steering or braking, or more complex maneuver line taking a turn by a combination of braking, steering and re-accelerating.
[0017] The adversarial training scheme may comprise: cl - training a discriminator model based on the predicted driving maneuvers outputted by the generative driving model and corresponding human driving maneuvers of the data set to discriminate between human and machine maneuvers, and
c2 - forcing the generative driving model to learn more human like driving (e.g. by penalizing the model using an adversary loss).
[0018] For example, the standard LI and/or L2 loss may be augmented by an adversary loss which is based on a discriminator model trained to distinguish human driving and machine driving.
[0019] The step of obtaining the set of video data may further comprise: al - obtaining a route planning data set representing route information according to which the human driven vehicle have performed the driving scenes of the set of video data,
a2 - enriching the set of video data by the route planning data (set), such that the accuracy of the predicted driving maneuvers outputted by the generative driving model is increased.
For example, the set of video data may be enriched with numerical map data from HERE Technologies.
[0020] The step of training the generative driving model may comprise: training the generative driving model based on a predefined loss function (e.g. the LI and/or the L2 loss) augmented by an adversary loss which is based on the output of the discriminator model.
[0021] The generative driving model may receive as an input video data and data of human driving maneuvers of past time steps, and output predicted driving maneuvers for future time steps.
[0022] In case the set of video data are enriched by the route planning data before being inputted to the generative driving model, the generative driving model may receive as a further input the vehicle location in past time steps.
[0023] For example, given the video I, the map information M, and the vehicle's location L, a deep neural network may be trained to predict the steering angle s and speed v for a future time step. All data inputs may be synchronized and sampled at the same sampling rate f, meaning the vehicle makes driving decision every 1/f seconds. The inputs and outputs may be represented in this discretized form.
[0024] The generative driving model may be a deep neural network. [0025] The discriminator model may be a deep neural network.
[0026] For example, the driving model developed in S. Hecker, D. Dai, and L. Van Gool. End-to-end learning of driving models with surround-view cameras and route planners, ECCV, 2018, may be adopted.
[0027] The present disclosure further relates to a system for training a human-like generative driving model for a vehicle, the system comprises:
a module A for obtaining a set of video data of driving scenes performed by a human driven vehicle,
a module B for obtaining a data set of human driving maneuvers carried out during the driving scenes,
a module C for training a generative driving model using the set of video data, and the data set of human driving maneuvers,
wherein training in module C is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
[0028] The system may comprise further (sub-) modules and features corresponding to the features of the method described above.
[0029] Moreover the present disclosure relates to a system for predicting human-like driving maneuvers of a vehicle, comprising the (trained) model of step c or of module C, as descried above.
[0030] Furthermore the present disclosure relates to a computer program including instructions for executing the steps of a method, as described above, when said program is executed by a computer.
[0031] This program can use any programming language and take the form of source code, object code or a code intermediate between source code and object code, such as a partially compiled form, or any other desirable form.
[0032] Finally, the present disclosure relates to a recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method, as described above.
[0033] The information medium can be any entity or device capable of storing the program. For example, the medium can include storage means such as a ROM, for example a CD ROM or a microelectronic circuit ROM, or magnetic storage means, for example a diskette (floppy disk) or a hard disk. [0034] Alternatively, the information medium can be an integrated circuit in which the program is incorporated, the circuit being adapted to execute the method in question or to be used in its execution.
[0035] It is intended that combinations of the above-described elements and those within the specification may be made, except where otherwise contradictory.
[0036] It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure, as claimed.
[0037] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, and serve to explain the principles thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
[0038] Fig. 1 shows a schematic flow chart of the steps of a method for training a human-like generative driving model according to embodiments of the present disclosure; and
[0039] Fig. 2 shows a schematic block diagram of a system according to embodiments of the present disclosure.
DESCRIPTION OF THE EMBODIMENTS
[0040] Reference will now be made in detail to exemplary embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
[0041] End-to-end driving allows developing promising driving models based on camera data, cf. e.g.:
the driving model developed in S. Hecker, D. Dai, and L. Van Gool. End-to-end learning of driving models with surround-view cameras and route planners, ECCV, 2018.
[0042] The focus has mainly been though on perception, not so much navigation. Thus far, the representations for navigation are either primitive directional commands in a simulation environment or rendered videos of planned routes in real-world environments. [0043] Fig. 1 shows a schematic flow chart of the steps of a method for training a human-like generative driving model according to embodiments of the present disclosure. For example, the driving model developed in the publication cited above (S. Hecker et. al., 2018) may be adopted in the present disclosure.
[0044] In particular, the used core model may consist of a fine-tuned Resnet34 CNN (cf. e.g. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016) to process sequences of front facing camera images, followed by two regression networks to predict steering wheel angle and vehicle speed. The architecture may thus be similar to the baseline model from the publication cited above (S. Hecker et. al., 2018).
[0045] In a first step SO1 a set of video data of driving scenes performed by a human driven vehicle is obtained. The set of video data may be e.g. the Drive360 dataset, as described in the publication cited above (S. Hecker et. al., 2018).
[0046] In an optional step SOla (not shown) a route planning data set representing route information according to which the human driven vehicle have performed the driving scenes of the set of video data may be additionally obtained, e.g. map data from Here Technologies.
[0047] In a further optional step SOlb (not shown) the set of video data may be enriched by the route planning data, such that the accuracy of the predicted driving maneuvers outputted by the generative driving model is increased.
[0048] It is hence proposed by the present disclosure to (1) augment real- world driving data with numerical map data from e.g. HERE Technologies; and (2) design map features believed to be relevant for driving and integrating them into a driving model.
[0049] In a further step S02 (which may be carried out before, after or at the same time as step SOI) a data set of human driving maneuvers carried out during the driving scenes is obtained.
[0050] Accordingly, said data set of human driving maneuvers is a further input for the model proving information regarding the driving control gestures of humans when driving the vehicles in the driving scenes of the set of video data. [0051] In a subsequent step S03 a generative driving model is trained using the set of video data and the data set of human driving maneuvers. Said training is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
[0052] In particular, in an optional step S03a (not shown) a discriminator model may be trained based on the predicted driving maneuvers outputted by the generative driving model and corresponding human driving maneuvers of the data set to discriminate between human and machine maneuvers. The generative driving model may be forced to learn more human like driving in a further optional step S03b (not shown).
[0053] For example, given the video I , the map information M, and the vehicle's location L, a deep neural network is trained to predict the steering angle s and speed v for a future time step. All data inputs are synchronized and sampled at the same sampling rate f, meaning the vehicle makes driving decision every 1/f seconds. The inputs and outputs are represented in this discretized form. It is used t to indicate the time stamp, such that ail data can be indexed over time. For example, lt indicates the current video frame and vt the vehicle's current speed. Similarly, lt_k is the kth previous video frame and st-k is the kth previous steering angle. Since predictions need to rely on data of previous time steps, the k recent video frames are denoted by l[t-k+i,t] º <lt_k+i, . . . , lt), and the k recent map representations by M[t-k+i,t] º (Mt_k+i, . . Mt). The goal is to train a deep network that predicts desired driving actions from the visual observations and the planned route. The learning task can be defined as: (1)
Figure imgf000010_0001
where St+1 represents the steering angle space and Vt+1 the speed space for future time t + 1. s and V can be defined at several levels of granularity. The continuous values directly recorded from the car's CAN bus may be considered, where v = (V|0 £ v £ 180 for speed and s = {S| - 720 £ S £ 720] for steering angle in this case. Here, kilometer per hour (km/h) is the unit of v, and degree (°) the unit of s. Mt is either a rendered video frame from the TomTom route planner (cf. S. Hecker et. al., 2018), or the engineered features for the numerical maps from HERE Technologies (as described below), or the combination of both.
[0054] In order to keep notations concise, the synchronized data (I, M) may be denoted as D. Without loss of generality, the training data are assumed to consist of a long sequence of driving data with T frames in total. Then the basic driving model is to learn the prediction function for the steering angle
Figure imgf000011_0001
and the velocity
Figure imgf000011_0002
with the objective
Figure imgf000011_0003
where Ŝ and
Figure imgf000011_0004
are predicted values, and Ŝ and
Figure imgf000011_0005
are the ground truth values.
[0055] The learning under Eq. 4 is straightforward and can be implemented with any standard deep network. This objective, however, assumes the driving decisions at each time step are independent from each other. It is believed that this may be an over-simplification because driving decisions indeed exhibit strong temporal dependencies within a relatively short time range. In the following section, the objective according to the present disclosure is reformulated by introducing ride comfort and human-likeness score to better model the temporal dependency of driving actions.
Accurate and Comfortable Driving
[0056] Multiple concepts relating to driving comfort have been proposed and discussed, such as apparent safety, motion sickness, level of controllability and resulting force. While those are all relevant, some are hard to quantify. It is hence chosen to reduce motion sickness, which has been shown to be largely caused by the vehicle's longitudinal and lateral oscillations, cf. e.g.: G. M. Turner Ml. Motion sickness in public road transport: the effect of driver, route and vehicle. Ergonomics, (1646-64), 1999.
[0057] Due to the short-term predictive nature of most end-to-end driving models, substantial jerking is an inherent problem. The used comfort component aims at reducing jerk by imposing a temporal smoothness constraint on the longitudinal and lateral oscillations, by minimizing the second derivative of consecutive steering angle and speed predictions.
[0058] Before introducing ride comfort and human-like driving, Eq. 4 is reformulated. If the number of consecutive predictions that need to be optimized jointly is denoted by O, then minimizing Eq. 4 is equivalent to minimizing
Figure imgf000012_0001
[0059] Then for every 0 consecutive frames starting at time t, the loss of driving accuracy will be
Figure imgf000012_0002
[0060] Now the objective function can be presented for accurate and comfortable driving as
Figure imgf000012_0003
where
Figure imgf000012_0004
z! is a trade-off parameter to balance the two costs. By optimizing under the objective in Eq. 7, consecutive predictions are learned and optimized together for accurate and comfortable driving. Accurate, Comfortable & Human-like Driving
[0061] If autonomous cars behave differently from human-driven cars, it is hard for humans to predict their future actions. This unpredictability can cause accidents. Thus, it is argued that it is important to design human-like driving algorithms from the very start. Hence, a human-likeness score is introduced. The higher the value, the closer to human driving. Since it is hard to manually define what a human driving style is - as was done for general comfort measures, adversarial learning is adopted to model it.
[0062] An adversarial learning method consists of a generator and discriminator. The driving model of the present disclosure as defined in Eq. 4 or in Eq. 7 is the generator G. Now the training objective for the discriminator will be described. For convenience, the short trajectories of o frames described above are named as drivelets. Given the outputs of the driving model for a drivelet Bt = (st+1, ... , st+0,vt+1. vt+0) and its corresponding ground truth from the human driver Bt = (st+1, ... , st+0,vt+1, ... , vt+o), the goal is to train a fully-connected discriminator D using the cross-entropy loss to classify the two classes (i.e. machine and human).
[0063] The drivelet at t is forwarded to G to obtain the driving actions Bt. To make autonomous driving more human-like is equivalent to letting the distribution of Bt approximate that of Bt. Thus the loss for human-like driving according to the present disclosure is defined as an adversarial loss:
Figure imgf000013_0001
where D(Bt)1 is the probability of classifying Bt as human driving.
[0064] Putting everything together, the objective for accurate, comfortable and human-like driving according to the present disclosure is as follows:
Figure imgf000013_0002
z2 is a trade-off parameter to control the contributions of the costs. In keeping with adversarial learning, the training is conducted under the following min-max criterion:
maxGminDz(I, M). (11) Obtaining HERE Map Data
[0065] The set of video data according to the present disclosure may be provided by panoramic videos recorded by vehicle comprising one or several cameras oriented toward the environment in at least one of the front, the sides, and the back of the vehicle, e.g. by the Drive360 video data set. Drive360 features 60 hours of real-world driving data over 3000 km. The Drive360 is e.g. augmented with HERE Technologies map data. Drive360 offers a time stamped GPS trace for each route recorded. A path-matcher is used based on a hidden markov model employing the Viterbi algorithm (cf. e.g. G. D. Forney. The viterbi algorithm. Proceedings of the IEEE, 61(3): 268-278, 1973) to calculate the most likely path traveled by the vehicle during dataset recording, snapping the GPS trace to the underlying road network. This improves the localization accuracy significantly, especially in urban environments where the GPS signal may be weak and noisy. Through the path matcher a map matched GPS coordinate is obtained for each time stamp, which is then used to query the HERE Technologies map database to obtain the various types of navigation data.
[0066] Following S. Hecker et. al., 2018, a fine-tuned AlexNet (cf. e.g. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 2012) may be used to process the visual map representation from the TomTom Go App.
[0067] For learning comfortable driving, no extra network is needed. The respective loss may computed according to Eq. 7 and gradients are back propagated to adjust the driving network. In order to learn human-like driving, a fully-connected, three-layer discriminator network may be used to model human-like driving. The loss may be computed according to Eq. 9 to adjust the driving network.
[0068] Fig. 2 shows a schematic block diagram of a system according to embodiments of the present disclosure. [0069] In this figure, a system 200 for training a model has been represented. This system 200, which may be a computer, comprises a processor 201 and a non-volatile memory 202. The system 200 may also comprise, be configured to be integrated in or form a part of a vehicle 400. The system 200 may not only be configured for training a human-like generative driving model for a vehicle but also to apply the trained model to autonomously drive a vehicle (in particular in case it is part of a vehicle 400).
[0070] The system 200 may further be connected to a (passive) optical sensor 300, in particular a digital camera (e.g. integrated into the vehicle and being oriented to at least one of the front, the sides and the back). The digital camera 300 is configured such that it can record a scene in front of the vehicle 400, and in particular output digital data providing appearance (color, e.g. RGB) information of the scene. The camera 300 desirably generates image data comprising a 2D or 3D image of the environment. There may also be provided a set of monocular cameras which generate a panoramic 2D or 3D image. The output of the camera 300 may be used as video data of driving scenes for training the model (cf. step SOI of the method described above) and/or as input for a trained model, based on which the trained model autonomously controls driving the vehicle.
[0071] In the non-volatile memory 202, a set of instructions is stored and this set of instructions comprises instructions to perform a method for training a model.
[0072] In particular, these instructions and the processor 201 may respectively form a plurality of modules:
a module A for obtaining (SOI) a set of video data of driving scenes performed by a human driven vehicle,
a module B for obtaining a data set of human driving maneuvers carried out during the driving scenes,
a module C for training (S03) a generative driving model using the set of video data, and the data set of human driving maneuvers,
wherein training in module C is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
[0073] Throughout the description, including the claims, the term "comprising a" should be understood as being synonymous with "comprising at least one" unless otherwise stated. In addition, any range set forth in the description, including the claims should be understood as including its end value(s) unless otherwise stated. Specific values for described elements should be understood to be within accepted manufacturing or industry tolerances known to one of skill in the art, and any use of the terms "substantially" and/or "approximately” and/or "generally" should be understood to mean falling within such accepted tolerances.
[0074] Although the present disclosure herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present disclosure.
[0075] It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims.

Claims

1. A method for training a human-like generative driving model for a vehicle, comprising the steps of:
a - obtaining (SOI) a set of video data of driving scenes performed by a human driven vehicle,
b - obtaining (S02) a data set of human driving maneuvers carried out during the driving scenes,
c - training (S03) a generative driving model using the set of video data and the data set of human driving maneuvers,
wherein the training step (S03) of c is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
2. The method according to the claim 1, wherein
the generative driving model outputs predicted driving maneuvers.
3. The method according to claim 1 or 2, wherein
the adversarial training scheme comprises:
cl - training (S03a) a discriminator model based on the predicted driving maneuvers outputted by the generative driving model and corresponding human driving maneuvers of the data set to discriminate between human and machine maneuvers, and
c2 - forcing (S03b) the generative driving model to learn more human like driving.
4. The method according to any one of the preceding claims, wherein the step of obtaining (SOI) the set of video data further comprises:
al - obtaining (SOla) a route planning data set representing route information according to which the human driven vehicle have performed the driving scenes of the set of video data,
a2 - enriching (SOlb) the set of video data by the route planning data, such that the accuracy of the predicted driving maneuvers outputted by the generative driving model is increased.
5. The method according to any one of the preceding claims, wherein the step of training (S03) the generative driving model comprises:
training the generative driving model based on a predefined loss function augmented by an adversary loss which is based on the output of the discriminator model.
6. The method according to any one of the preceding claims, wherein the generative driving model receives as an input video data and data of human driving maneuvers of past time steps, and
outputs predicted driving maneuvers for future time steps.
7. The method according to the preceding claim, wherein
in case the set of video data are enriched by the route planning data before being inputted to the generative driving model, the generative driving model receives as a further input the vehicle location in past time steps.
8. The method according to the preceding claim, wherein
the generative driving model is a deep neural network, and/or
the discriminator model is a deep neural network.
9. A system for training a human-like generative driving model for a vehicle, comprising:
a module A for obtaining a set of video data of driving scenes performed by a human driven vehicle,
a module B for obtaining a data set of human driving maneuvers carried out during the driving scenes,
a module C for training a generative driving model using the set of video data, and the data set of human driving maneuvers,
wherein training in module C is augmented with an adversarial training scheme such that the prediction of the trained generative driving model becomes more human like.
10. A system for predicting human-like driving maneuvers of a vehicle, comprising the model of step c of any one of claims 1 to 8 or of module C of claim 9.
11. A computer program including instructions for executing the steps of a method according to any one of claims 1 to 8 when said program is executed by a computer.
12. A recording medium readable by a computer and having recorded thereon a computer program including instructions for executing the steps of a method according to any one of claims 1 to 8.
PCT/EP2019/055786 2019-03-07 2019-03-07 System and method for training a model performing human-like driving Ceased WO2020177876A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/055786 WO2020177876A1 (en) 2019-03-07 2019-03-07 System and method for training a model performing human-like driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2019/055786 WO2020177876A1 (en) 2019-03-07 2019-03-07 System and method for training a model performing human-like driving

Publications (1)

Publication Number Publication Date
WO2020177876A1 true WO2020177876A1 (en) 2020-09-10

Family

ID=65817972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/055786 Ceased WO2020177876A1 (en) 2019-03-07 2019-03-07 System and method for training a model performing human-like driving

Country Status (1)

Country Link
WO (1) WO2020177876A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947466A (en) * 2021-03-09 2021-06-11 湖北大学 Parallel planning method and equipment for automatic driving and storage medium
CN113635909A (en) * 2021-08-19 2021-11-12 崔建勋 An automatic driving control method based on adversarial generative imitation learning
CN115830862A (en) * 2022-11-18 2023-03-21 吉林大学 Intelligent automobile man-changing track generation method based on diffusion model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136040A1 (en) * 2005-12-14 2007-06-14 Tate Edward D Jr Method for assessing models of vehicle driving style or vehicle usage model detector
US20180348763A1 (en) * 2017-06-02 2018-12-06 Baidu Usa Llc Utilizing rule-based and model-based decision systems for autonomous driving control

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136040A1 (en) * 2005-12-14 2007-06-14 Tate Edward D Jr Method for assessing models of vehicle driving style or vehicle usage model detector
US20180348763A1 (en) * 2017-06-02 2018-12-06 Baidu Usa Llc Utilizing rule-based and model-based decision systems for autonomous driving control

Non-Patent Citations (13)

* Cited by examiner, † Cited by third party
Title
A. KRIZHEVSKY; I. SUTSKEVER; G. E. HINTON: "Imagenet classification with deep convolutional neural networks", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2012
A. SAUER; N. SAVINOV; A. GEIGER: "Conditional affordance learning for driving in urban environments", CONFERENCE ON ROBOT LEARNING, 2018
C. URMSON: "Autonomous driving in urban environments: Boss and the urban challenge", JOURNAL OF FIELD ROBOTICS SPECIAL ISSUE ON THE 2007 DARPA URBAN CHALLENGE, vol. 25, no. 8, June 2008 (2008-06-01), pages 425 - 466, XP055169612, DOI: doi:10.1002/rob.20255
F. CODEVILLA; M. MUTTER; A. LOPEZ; V. KOLTUN; A. DOSO-VITSKIY, END-TO-END DRIVING VIA CONDITIONAL IMITATION LEARNING, 2018
G. A. M. MEIRING; H. C. MYBURGH: "A review of intelligent driving style analysis systems and related artificial intelligence algorithms", SENSORS, 2015
G. D. FORNEY: "The viterbi algorithm", PROCEEDINGS OF THE IEEE, vol. 61, no. 3, 1973, pages 268 - 278
G. M. TURNER ML: "Motion sickness in public road transport: the effect of driver, route and vehicle", ERGONOMICS, 1999, pages 1646 - 64
H. ZHAO; H. ZHOU; C. CHEN; J. CHEN: "Join driving: A smart phone-based driving behavior evaluation system", IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM, 2013
JONATHAN HO ET AL: "Generative Adversarial Imitation Learning", 10 June 2016 (2016-06-10), XP055639480, Retrieved from the Internet <URL:https://papers.nips.cc/paper/6391-generative-adversarial-imitation-learning.pdf> [retrieved on 20191112] *
K. HE; X. ZHANG; S. REN; J. SUN: "Deep residual learning for image recognition", 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2016
LUONA YANG ET AL: "Real-to-Virtual Domain Unification for End-to-End Autonomous Driving", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 10 January 2018 (2018-01-10), XP081195420 *
M. ELBANHAWI; M. SIMIC; R. JAZAR: "In the passenger seat: Investigating ride comfort measures in autonomous cars", IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, vol. 7, no. 3, 2015, pages 4 - 17, XP011664184, DOI: doi:10.1109/MITS.2015.2405571
S. HECKER; D. DAI; L. VAN GOOL: "End-to-end learning of driving models with surround-view cameras and route planners", ECCV, 2018

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947466A (en) * 2021-03-09 2021-06-11 湖北大学 Parallel planning method and equipment for automatic driving and storage medium
CN113635909A (en) * 2021-08-19 2021-11-12 崔建勋 An automatic driving control method based on adversarial generative imitation learning
CN113635909B (en) * 2021-08-19 2022-07-12 崔建勋 Automatic driving control method based on confrontation generation simulation learning
CN115830862A (en) * 2022-11-18 2023-03-21 吉林大学 Intelligent automobile man-changing track generation method based on diffusion model

Similar Documents

Publication Publication Date Title
Ly et al. Learning to drive by imitation: An overview of deep behavior cloning methods
Zhang et al. End-to-end urban driving by imitating a reinforcement learning coach
US20230280702A1 (en) Hybrid reinforcement learning for autonomous driving
CN110796856B (en) Vehicle Lane Change Intention Prediction Method and Lane Change Intention Prediction Network Training Method
US12118461B2 (en) Methods and systems for predicting dynamic object behavior
Hecker et al. Learning accurate, comfortable and human-like driving
CN113044064B (en) Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN111923928A (en) Decision making method and system for automatic vehicle
CN110304075A (en) Vehicle Trajectory Prediction Method Based on Hybrid Dynamic Bayesian Network and Gaussian Process
US20240208546A1 (en) Predictive models for autonomous vehicles based on object interactions
CN112947466B (en) Parallel planning method and equipment for automatic driving and storage medium
Yu et al. Baidu driving dataset and end-to-end reactive control model
US12313727B1 (en) Object detection using transformer based fusion of multi-modality sensor data
JP2020123346A (en) Method and device for performing seamless parameter switching by using location based algorithm selection to achieve optimized autonomous driving in each of regions
CN114670867A (en) Multi-vehicle trajectory prediction system based on hierarchical learning and potential risk model
WO2020177876A1 (en) System and method for training a model performing human-like driving
CN113920484A (en) Monocular RGB-D feature and reinforcement learning based end-to-end automatic driving decision method
CN116729433A (en) End-to-end automatic driving decision planning method and equipment combining element learning multitask optimization
US20220300851A1 (en) System and method for training a multi-task model
CN118861965A (en) Cascaded deep reinforcement learning safety decision-making method based on multimodal spatiotemporal representation
US20220269948A1 (en) Training of a convolutional neural network
CN119903336A (en) Computer-implemented method for providing annotated perceptual models for training data
Thu et al. An end-to-end motion planner using sensor fusion for autonomous driving
Uppuluri et al. CuRLA: Curriculum Learning Based Deep Reinforcement Learning for Autonomous Driving
CN116048096B (en) Unmanned vehicle movement planning method based on hierarchical depth perception

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19711840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19711840

Country of ref document: EP

Kind code of ref document: A1