DE112020006532T5

DE112020006532T5 - COMPUTER SYSTEM AND METHOD WITH END-TO-END MODELING FOR A SIMULATED TRAFFIC AGENT IN A SIMULATION ENVIRONMENT

Info

Publication number: DE112020006532T5
Application number: DE112020006532.4T
Authority: DE
Inventors: Muhammad Saad Zia; Faizan Mehmood
Original assignee: Automotive Artificial Intelligence Aai GmbH; Automotive Artificial Intelligence AAI GmbH
Current assignee: Automotive Artificial Intelligence Aai GmbH; Automotive Artificial Intelligence AAI GmbH
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2022-11-17
Also published as: WO2021160273A1

Abstract

Die vorliegende Erfindung betrifft ein computerimplementiertes Trainingsverfahren für einen Verkehrsagenten, der ein Straßenfahrzeug in einer Simulationsumgebung unter Verwendung einer Ende-zu-Ende-Modellierung navigiert, sowie ein entsprechendes Trainingscomputersystem und ein Computersystem zur Simulation einer Straßenfahrumgebung für ein oder mehrere Fahrzeuge, das einen oder mehrere Prozessoren umfasst oder aus diesen besteht und den erfindungsgemäß trainierten Verkehrsagenten verwendet.

The present invention relates to a computer-implemented training method for a traffic agent navigating a road vehicle in a simulation environment using end-to-end modeling, as well as a corresponding training computer system and a computer system for simulating a road driving environment for one or more vehicles, the one or more Includes or consists of processors and uses the traffic agents trained according to the invention.

Description

TECHNISCHER BEREICH:TECHNICAL PART:

Die vorliegende Erfindung betrifft ein computerimplementiertes Trainingsverfahren für einen Verkehrsagenten, der ein Straßenfahrzeug in einer Simulationsumgebung unter Verwendung einer Ende-zu-Ende-Modellierung navigiert, sowie ein entsprechendes Trainingscomputersystem und ein Computersystem zur Simulation einer Straßenfahrumgebung für ein oder mehrere Fahrzeuge, das einen oder mehrere Prozessoren umfasst oder aus diesen besteht und den erfindungsgemäß trainierten Verkehrsagenten verwendet.The present invention relates to a computer-implemented training method for a traffic agent navigating a road vehicle in a simulation environment using end-to-end modeling, as well as a corresponding training computer system and a computer system for simulating a road driving environment for one or more vehicles, the one or more Includes or consists of processors and uses the traffic agents trained according to the invention.

STAND DER TECHNIK:STATE OF THE ART:

Bevor die Fahreigenschaften von Straßenfahrzeugen in der Realität getestet werden, werden Computersimulationen bestimmter Fahrsituationen, z. B. beim Bremsen, durchgeführt. Da der Vorhersagezeitraum in der Regel nur bis zu 2 Sekunden beträgt, können komplexe Fahrsituationen, wie sie z. B. bei Überholvorgängen erforderlich sind, von diesen Modellen nicht vorhergesagt werden.Before the driving characteristics of road vehicles are tested in reality, computer simulations of certain driving situations, e.g. B. when braking performed. Since the forecast period is usually only up to 2 seconds, complex driving situations, such as those B. are required when overtaking, are not predicted by these models.

Das Problem, ein System zu entwickeln, das ein Auto in einer Vielzahl von Verkehrssituationen sicher steuern kann, wurde umfassend untersucht und ist von offensichtlichem Interesse für die Entwicklung autonomer Fahrzeuge. Das Hauptaugenmerk in diesem Forschungsbereich liegt auf sicheren und effizienten Entscheidungen unter Echtzeit-Bedingungen. Die simulierten sicheren und effizienten Entscheidungen spiegeln jedoch möglicherweise nicht die menschlichen Fahrentscheidungen im natürlichen Verkehr wider.The problem of developing a system that can safely control a car in a variety of traffic situations has been widely studied and is of obvious interest for autonomous vehicle development. The main focus in this research area is on safe and efficient decisions under real-time conditions. However, the simulated safe and efficient decisions may not reflect human driving decisions in natural traffic.

Menschliche Fahrentscheidungen auf der Straße können im Wesentlichen aus mehreren abstrakten Ebenen oder Phasen bestehen, die einen Fahrstapel bilden. Ausgehend von einer bestimmten Straßensituation kann ein Fahrer entscheiden, ein bestimmtes übergeordnetes Manöver auszuführen, z. B. zu überholen, einen entsprechenden Bewegungsplan (auch „Trajektorie“ genannt) zu formulieren und Steuerfunktionen auf Aktoren (Gas, Bremse, Lenkung) anzuwenden, um die Entscheidung auszuführen.Human driving decisions on the road can essentially consist of several abstract levels or phases that form a driving stack. Based on a certain road situation, a driver can decide to carry out a certain overriding maneuver, e.g. B. to overtake, formulate a corresponding movement plan (also called “trajectory”) and apply control functions to actuators (gas, brake, steering) to carry out the decision.

Daher wird es immer wichtiger, menschliche Fahrentscheidungen im natürlichen Verkehr zu simulieren. Menschliche Fahrentscheidungen im natürlichen Verkehr werden zudem von vielen Faktoren beeinflusst und können auf verschiedenen Ebenen betrachtet werden. So können menschliche Fahrer je nach ihrem mentalen Zustand in der gleichen Situation unterschiedliche Entscheidungen treffen, z. B. überholen, einem vorausfahrenden Fahrzeug folgen oder die Fahrspur wechseln.It is therefore becoming increasingly important to simulate human driving decisions in natural traffic. Human driving decisions in natural traffic are also influenced by many factors and can be viewed at different levels. So human drivers can make different decisions depending on their mental state in the same situation, e.g. B. overtaking, following a vehicle ahead or changing lanes.

Viele bestehende Modelle verwenden eine hierarchische Struktur in dem Sinne, dass abstraktere Entscheidungen (z. B. welche Route zu nehmen ist) zuerst berechnet und dann an verschiedene Schichten „weitergegeben“ werden, die sich auf der Grundlage dieser Eingabe mit einer zunehmenden Detailtiefe des Fahrprozesses befassen. Der Fahrstapel ist in mehrere Phasen unterteilt, die die tatsächlich relevanten Komponenten der verschiedenen Ansätze widerspiegeln sollen, z. B. im Zusammenhang mit Simulationsumgebungen und nicht mit dem Fahrverhalten eines autonomen Fahrzeugs.Many existing models use a hierarchical structure in the sense that more abstract decisions (e.g. which route to take) are first calculated and then “passed” to different layers that, based on this input, evolve with increasing levels of detail of the driving process deal with The driving stack is divided into several phases, which are intended to reflect the actually relevant components of the different approaches, e.g. B. in connection with simulation environments and not with the driving behavior of an autonomous vehicle.

Solche Phasen können wie folgt betrachtet werden:Such phases can be considered as follows:

Wahrnehmung/Karte bezieht sich im Allgemeinen auf die Eingabe über die Umgebung, der anderen Komponenten zur Verfügung steht.Perception/map generally refers to input about the environment that is available to other components.

Verkehrsregeln beziehen sich im Allgemeinen auf jede Komponente, die rechtliche Einschränkungen für Entscheidungen auf hoher Ebene vorsieht.Traffic rules generally refer to any component that provides legal restrictions on high-level decisions.

Einsatzplanung bezieht sich im Allgemeinen auf eine Strategie, wann man sich langfristig wo aufhält (z. B. Routenplanung auf Fahrspurebene).Mission planning generally refers to a strategy of when to be where over the long term (e.g., lane-level routing).

Verkehrsfreie Referenzstrecke bezieht sich im Allgemeinen auf die Planung einer „optimalen“ Referenzstrecke, die andere Verkehrsteilnehmer ignoriert.Traffic-free reference route generally refers to planning an “optimal” reference route that ignores other road users.

Verhaltensplanung bezieht sich im Allgemeinen auf die Planung eines Verhaltensplans, d. h. wann genau Handlungen, wie z. B. Fahrspurwechsel, unter Einbeziehung anderer Teilnehmer durchgeführt werden sollen.Behavioral planning generally refers to the planning of a behavioral plan, i. H. when exactly actions, e.g. B. lane change, are to be carried out with the involvement of other participants.

Bei der Entscheidungsnachbereitung geht es im Allgemeinen darum, die Entscheidungen der vorangegangenen Komponenten zu korrigieren, damit sie gegebenenfalls mit den grundlegenden Sicherheitsregeln übereinstimmen.In general, decision post-processing is about correcting the decisions of the preceding components so that they comply with the basic safety rules, if necessary.

Bewegungs-/Bahnplanung bezieht sich im Allgemeinen auf die Planung der genauen zukünftigen Bahn für einen kurzen Zeithorizont (bis zu 2 Sekunden).Motion/trajectory planning generally refers to planning the exact future trajectory for a short time horizon (up to 2 seconds).

Befehlsumsetzung bezieht sich im Allgemeinen auf die Berechnung der endgültigen Befehle, die an ein (reales oder simuliertes) Fahrzeug zu senden sind, wie z. B. Lenkanweisungen.Command translation generally refers to the calculation of the final commands to be sent to a (real or simulated) vehicle, such as B. Steering instructions.

Fahrzeugdynamik/Physik bezieht sich im Allgemeinen auf die Simulation des Fahrzeugverhaltens, das sich aus den generierten Befehlen ergibt.Vehicle Dynamics/Physics generally refers to the simulation of vehicle behavior resulting from the generated commands.

Positionsaktualisierung bezieht sich im Allgemeinen auf die Berechnung der resultierenden neuen Position des Fahrzeugs in der Simulation.Position update generally refers to the calculation of the resulting new position of the vehicle in the simulation.

Die Verwendung dieser Begriffe variiert in der Literatur drastisch.The use of these terms varies drastically in the literature.

Es kann dahingehend argumentiert werden, dass diese hierarchischen Modelle bestimmte Einschränkungen haben, wie z. B. die Tatsache, dass sie nicht in der Lage sind, Entscheidungen auf hoher Ebene zu treffen, die von „niedrigeren“ Komponenten wie einem Bewegungsplaner (Komponente, die z. B. über das Zeitfenster von Beschleunigungen und Spurwechseln entscheidet) geändert oder sogar abgelehnt werden müssen (siehe Junqing Wei, Jarrod M. Snider, Tianyu Gu, John Dolan und Bakhtiar Litkouhi. A behavioral planning framework for autonomous driving. Seiten 458-464, 06 2014). Dementsprechend bieten die hierarchischen Modelle nur einen begrenzten Realismus bei der Wiedergabe des menschlichen Fahrverhaltens.It can be argued that these hierarchical models have certain limitations, such as B. the fact that they are not able to make high-level decisions that are modified or even rejected by "lower" components such as a motion planner (component that decides e.g. the time window of accelerations and lane changes). (see Junqing Wei, Jarrod M. Snider, Tianyu Gu, John Dolan and Bakhtiar Litkouhi. A behavioral planning framework for autonomous driving. Pages 458-464, 06 2014). Accordingly, the hierarchical models offer only limited realism when reproducing human driving behavior.

Die Leistungsfähigkeit des Ende-zu-Ende-Lernens (Synonym „e2e“ oder „E2E“, [End-to-End]) mit neuronalen Netzen hat sich in verschiedenen Bereichen vielfach bewährt. In der Branche des autonomen Fahrens ist der e2e-Ansatz beliebt, um robuste Modelle für verschiedene Fahrsteuerungen zu konstruieren, z. B. für die Lenkung, die Pedalsteuerung usw., und zwar so, dass sensorische Eingaben (z. B. Bildpixel) direkt auf Steuerungsausgaben abgebildet werden. Diese direkte Zuordnung macht die Verwendung umfangreicher Trainingsdaten mit Fahrbahnmarkierungen, Straßenbegrenzungen usw. überflüssig und ermöglicht die Extraktion auffälliger Merkmale auf der Grundlage eines zielgerichteten Lernansatzes.The power of end-to-end learning (synonym "e2e" or "E2E", [end-to-end]) with neural networks has proven itself many times over in various areas. In the autonomous driving industry, the e2e approach is popular to construct robust models for various driving controls, e.g. B. for steering, pedal control, etc., in such a way that sensory inputs (e.g. image pixels) are mapped directly to control outputs. This direct mapping eliminates the need to use extensive training data with lane markings, road boundaries, etc., and allows for the extraction of salient features based on a targeted learning approach.

Bojarsky hat gezeigt, dass die Entscheidungsprozesse eines menschlichen Fahrers beim Spurhalten in einem tiefen neuronalen Netz modelliert werden können (siehe M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M. Monfort, U. Muller, J. Zhang et al. , „End to end learning for self-driving cars“, arXiv preprint arXiv:1604.07316, 2016). Die Autoren versuchen, Rohbilder aus Fahraufnahmen auf Lenkbefehle des Autos abzubilden und damit die Ebenen des Fahrstapels implizit in die Schichten eines neuronalen Netzes einzubetten, ähnlich wie es ein menschlicher Verstand tut. Dem Modell wird beigebracht, das Spurhalteverhalten von menschlichen Fahrern zu erlernen, wobei der Spurwechsel nicht modelliert wurde. Die Methode verwendet nur Lenkbefehle, aber keine Informationen zur longitudinalen Bewegung des Fahrzeugs (d. h. Beschleunigung/Verzögerung). Das Modell ist in einem autonom fahrenden Auto implementiert und nicht als simulierter Verkehrsagent in einer Simulationsumgebung. Muller präsentiert den gleichen Ansatz, der verwendet wurde, um mit den Daten eines ferngesteuerten Autos zu trainieren und so dessen Fahrverhalten zu automatisieren (siehe U. Muller, J. Ben, E. Cosatto, B. Flepp, und Y. L. Cun, „Off-road obstacle avoidance through end-to-end learning,“ in Advances in neural information processing systems, 2006, S. 739-746).Bojarsky has shown that the decision-making processes of a human driver when staying in lane can be modeled in a deep neural network (see M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel, M Monfort, U Muller, J Zhang et al, "End to end learning for self-driving cars", arXiv preprint arXiv:1604.07316, 2016). The authors attempt to map raw images from driving footage to car steering commands, thereby implicitly embedding the layers of the driving stack in the layers of a neural network, similar to how a human mind does. The model is taught to learn lane-keeping behavior from human drivers, but lane-changing was not modeled. The method uses only steering commands, but no information about the longitudinal movement of the vehicle (i.e. acceleration/deceleration). The model is implemented in an autonomously driving car and not as a simulated traffic agent in a simulation environment. Muller presents the same approach that was used to train on data from a remote-controlled car to automate its driving behavior (see U. Muller, J. Ben, E. Cosatto, B. Flepp, and Y. L. Cun, "Off- road obstacle avoidance through end-to-end learning,” in Advances in neural information processing systems, 2006, pp. 739-746).

Xu und Gao verwenden Ende-zu-Ende-tiefes Lernen [End-to-End-Deep-Learning], um Rohbilder aus zahlreichen Aufnahmen von menschlichen Fahrern auf der Straße sowohl auf übergeordnete Aktionen wie „Anhatten“ und „Losfahren“ als auch auf Lenkwinkelbefehle abzubilden (siehe H. Xu, Y. Gao, F. Yu und T. Darrell, „End-to-end learning of driving models from large-scale video datasets“, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, S. 2174-2182). Das beabsichtigte Verhalten des Modells kann näherungsweise das Spurfolge-, Hindernisvermeidungs- und Spurwechselverhalten von menschlichen Fahrern abdecken. Die Arbeit liefert nur eine Verteilung der Fahrzeugsteuerungen, z. B. der Lenkung, und erhebt daher nicht den Anspruch, hochpräzise zu sein, um ein Fahrzeug in einer Simulation oder einem realen Fahrszenario zu steuern. Das Verfahren modelliert keine Beschleunigungs-/Verzögerungsbefehle, sondern nur übergeordnete Entscheidungen zum Anhalten und Losfahren. Das Modell ist nicht in einem simulierten Verkehrsagenten in einer Simulationsumgebung implementiert.Xu and Gao use end-to-end deep learning to extract raw images from numerous shots of human drivers on the road for both high-level actions such as "get on" and "go" as well as on to map steering angle commands (see H. Xu, Y. Gao, F. Yu and T. Darrell, "End-to-end learning of driving models from large-scale video datasets", in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2174-2182). The intended behavior of the model can approximate lane following, obstacle avoidance and lane changing behavior of human drivers. The work only provides a distribution of vehicle controls, e.g. B. the steering, and therefore does not claim to be highly precise to control a vehicle in a simulation or a real driving scenario. The method does not model acceleration/deceleration commands, only high-level stop and start decisions. The model is not implemented in a simulated traffic agent in a simulation environment.

Codevilla verwendet denselben Ansatz, um das Bild für die Ausführung von Steuerbefehlen in longitudinaler und lateraler Richtung (Lenkung und Beschleunigung) eines Autos anhand der CARLA-Fahrsimulationsdaten zu erlernen (siehe F. Codevilla, M. Miiller, A. Lopez, V. Koltun und A. Dosovitskiy, „End-to-End driving via conditional imitation learning“, in 2018 IEEE International Conference on Robotics and Automation (ICRA). 1em plus 0,5em minus 0,4em IEEE, 2018, S. 1-9). Dem Modell wird beigebracht, annähernd alle Aspekte des Fahrverhaltens zu erlernen, d. h. Spurverfolgung, adaptiver Tempomat, Hindernisvermeidung und Spurwechsel. Das Modell wurde in einer Simulationsumgebung als Verkehrsagent evaluiert. Chen stellt eine ähnliche Lösung in der TORCS-Rennwagen-Simulationsumgebung vor und behauptet ebenfalls, dass das Ende-zu-Ende-Modell explizit lernt, sich auf interpretierbare Wahrnehmungselemente zu konzentrieren, wie z. B. den Abstand zur Fahrspur und zur Straßenbegrenzung, den Abstand zu anderen Fahrzeugen in der Umgebung und die Winkelabweichung von der Straße als Teil einer besser interpretierbaren Lösung zur Modellierung des Fahrverhaltens (siehe C. Chen, A. Seff, A. Kornhauser und J. Xiao, „Deepdriving: Learning affordance for direct perception in autonomous driving,“ in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2722-2730). Beide Lösungen werden jedoch anhand von Simulationsdaten eines computergesteuerten Fahrers trainiert und sind daher nicht in der Lage, tatsächlich menschenähnliches Verhalten zu zeigen.Codevilla uses the same approach to learn the image for executing control commands in the longitudinal and lateral directions (steering and acceleration) of a car using the CARLA driving simulation data (see F. Codevilla, M. Miiller, A. Lopez, V. Koltun and A. Dosovitskiy, "End-to-End driving via conditional imitation learning", in 2018 IEEE International Conference on Robotics and Automation (ICRA. 1em plus 0.5em minus 0.4em IEEE, 2018, pp. 1-9). The model is taught to learn nearly all aspects of driving behavior, ie lane following, adaptive cruise control, obstacle avoidance and lane changing. The model was evaluated in a simulation environment as a traffic agent. Chen presents a similar solution in the TORCS race car simulation environment and also claims that the end-to-end model explicitly learns to focus on interpretable perceptual elements, such as B. the distance to the lane and the road boundary, the distance to other vehicles in the area and the angular deviation from the road as part of a more interpretable solution for modeling driving behavior (see C. Chen, A. Seff, A. Kornhauser and J. Xiao, "Deep driving: Learning affordance for direct perception in autonomous driving," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2722-2730). However, both solutions are trained using simulation data from a computer-controlled driver and are therefore unable to actually show human-like behavior.

Insbesondere beschränken die oben erwähnten einschlägigen Arbeiten zum Stand der Technik die Steuerung des Fahrzeugs entweder nur auf Befehle zum Lenken in lateraler Richtung oder zielen auf eine bestimmte Funktion ab, die mit dem menschlichen Fahren in Verbindung steht, z. B. das Folgen der Fahrspur, den Spurwechsel usw. Diese Beschränkung erlaubt es jedoch nicht, ein tatsächlich menschenähnliches Fahrverhalten in einer simulierten Verkehrsumgebung zu zeigen.In particular, the relevant prior art mentioned above either restricts the control of the vehicle to lateral steering commands only or targets a specific function related to human driving, e.g. B. following the lane, changing lanes, etc. However, this limitation does not allow to show a truly human-like driving behavior in a simulated traffic environment.

Darüber hinaus sind die bisherigen Lösungen auf das implizite Erlernen von Wahrnehmungselementen wie Positionen von Fahrbahnbegrenzungen, Positionen von Verkehrsfahrzeugen usw. aus der visuellen Eingabe ([Input], Bild) angewiesen, was zu weniger genauen Informationen über die Fahrzeugumgebung führt.Furthermore, the previous solutions rely on implicit learning of perceptual elements such as lane boundary positions, traffic vehicle positions, etc. from the visual input ([Input], image), resulting in less accurate information about the vehicle environment.

Angesichts der Mängel des Standes der Technik ist es das Ziel der vorliegenden Erfindung, ein Computersystem und ein Verfahren zur Simulation einer Straßenfahrumgebung in einer Fahrsituation für ein oder mehrere Fahrzeuge bereitzustellen, so dass die Entscheidung eines Verkehrsagenten ein menschenähnliches (naturalistisches) Verhalten widerspiegelt, d.h. die longitudinale und laterale Position des Fahrzeugs, vorzugsweise die Lenkung und die Beschleunigung, in einer Weise steuert, die ein naturalistisches Fahrverhalten im Allgemeinen und insbesondere bei Entscheidungen auf hoher Ebene, wie das Spurwechselverhalten, z.B. bei einem Überholvorgang, aufweist.In view of the shortcomings of the prior art, it is the object of the present invention to provide a computer system and a method for simulating a road driving environment in a driving situation for one or more vehicles, such that the decision of a traffic agent reflects human-like (naturalistic) behavior, i.e. the longitudinal and lateral position of the vehicle, preferably steering and acceleration, in a way that exhibits naturalistic driving behavior in general and in particular in high-level decisions such as lane-changing behavior, e.g. in an overtaking manoeuvre.

KURZE BESCHREIBUNG DER ERFINDUNG:BRIEF DESCRIPTION OF THE INVENTION:

Die vorgenannte Aufgabe wird zumindest teilweise durch den beanspruchten Erfindungsgegenstand gelöst. Vorteile (bevorzugte Ausführungsformen) sind in der nachstehenden detaillierten Beschreibung und/oder den begleitenden Figuren sowie in den abhängigen Ansprüchen dargelegt.The aforementioned object is at least partially achieved by the claimed subject matter of the invention. Advantages (preferred embodiments) are presented in the following detailed description and/or the accompanying figures as well as in the dependent claims.

Dementsprechend bezieht sich ein erster Aspekt der Erfindung auf ein computerimplementiertes Trainingsverfahren eines Verkehrsagenten zur Navigation eines Straßenfahrzeugs in einer Simulationsumgebung. Das Verfahren umfasst oder besteht aus den folgenden Schritten:

a. Bereitstellung von Fahrdaten zu einem oder mehreren Zeitfenstern t_i = [t₁, t₂, ... t_n] für ein oder mehrere Straßenfahrzeuge als Ego-Fahrzeuge, die jeweils von einem Menschen in einer realistischen Situation auf einer Straße gefahren werden, und Bereitstellung von Kartendaten auf der jeweiligen Straße zu den gegebenen Zeitfenstern t_i,
b. Verarbeitung zumindest eines Teils der Fahrdaten und der Kartendaten aus Schritt a) in einen oder mehrere entsprechende Wahrnehmungsrahmen P_i = [p₁, p₂, ... p_n] je gegebenem Zeitfenster t_i, wobei jeder Wahrnehmungsrahmen P_i entsprechende Wahrnehmungsinformationen für (i) die Verkehrssituation, (ii) Informationen über den Eigenzustand des Ego-Fahrzeugs und (iii) die Straßengeometrie enthält,
c. Verarbeitung zumindest eines Teils der Fahrdaten und der Kartendaten von Schritt a) in einen oder mehrere entsprechende Grundwahrheits-Fahrzeugsteuerungsrahmen C_i = [c₁, c₂, ... c_n] je gegebenem Zeitfenster t_i, wobei jeder Fahrzeugsteuerungsrahmen C_i longitudinale und laterale Positionen der jeweiligen Ego-Fahrzeuge enthält,
d. Trainieren eines Entscheider-Computermodells des Verkehrsagenten mit dem einen oder den mehreren Wahrnehmungsrahmen P_i je gegebenen Zeitfenster t_i von Schritt b) als Eingabe in das Modell und mit dem einen oder den mehreren Grundwahrheits-Fahrzeugsteuerungsrahmen C_i je gegebenem Zeitfenster t_i aus Schritt c) als Etikett für das Training des Modells, wobei der Entscheider ein oder mehrere neuronale Netze mit Ende-zu-Ende Modellierung verwendet und konfiguriert ist entsprechende Fahrzeugsteuerungsrahmen Ĉ_i = [c₁, c₂, ... c_n] umfassend die longitudinale und laterale Positionen des jeweiligen Ego-Fahrzeugs vorherzusagen, indem er die vorhergesagten Fahrzeugkontrollrahmen Ĉ_i mit den jeweiligen Grundwahrheits-Fahrzeugsteuerungsrahmen C_i abgleicht,

wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.Accordingly, a first aspect of the invention relates to a computer-implemented training method of a traffic agent for navigating a road vehicle in a simulation environment. The procedure includes or consists of the following steps:

a. Provision of driving data for one or more time windows t _i = [t ₁ , t ₂ , ... t _n ] for one or more road vehicles as ego vehicles, each of which is driven by a human being in a realistic situation on a road, and Provision of map data on the respective street at the given time windows t _i ,
b. processing at least part of the driving data and the map data from step a) into one or more corresponding perceptual frames P _i = [p ₁ , p ₂ , ... p _n ] per given time window t _i , each perceptual frame P _i corresponding perceptual information for ( i) the traffic situation, (ii) information about the state of the ego vehicle and (iii) the road geometry,
c. Processing at least part of the driving data and the map data of step a) into one or more corresponding ground truth vehicle control frames C _i = [c ₁ , c ₂ , ... c _n ] per given time window t _i , each vehicle control frame C _i longitudinal and contains lateral positions of the respective ego vehicles,
i.e. training a decision maker computer model of the traffic agent with the one or more perceptual frames P _i per given time window t _i from step b) as input to the model and with the one or more ground truth vehicle control frames C _i per given time window t _i from step c ) as a label for training the model, where the decider uses one or more neural networks with end-to-end modeling and is configured corresponding vehicle control frames Ĉ _i = [c ₁ , c ₂ , ... c _n ] comprising the longitudinal and predict lateral positions of the respective ego vehicle by matching the predicted vehicle control frames Ĉ _i with the respective ground truth vehicle control frames C _i ,

where i is any number such that i ∈ [1,2,...n] and where n is the limit of frames traveled.

Die Verfahrensschritte b) und c) des erfindungsgemäßen Verfahrens gemäß dem ersten Aspekt können gleichzeitig oder nacheinander in beliebiger Reihenfolge durchgeführt werden.Process steps b) and c) of the process according to the invention according to the first aspect can be carried out simultaneously or one after the other in any order.

Ein zweiter Aspekt der Erfindung betrifft ein Computersystem zum Trainieren eines Verkehrsagenten, der ein Straßenfahrzeug in einer Simulationsumgebung navigiert, umfassend oder bestehend aus einem oder mehreren Prozessoren, einer mit dem einen oder den mehreren Prozessoren gekoppelten Speichervorrichtung und einem Verkehrsagenten als Entscheider in simulierten Fahrsituationen unter Verwendung eines oder mehrerer neuronaler Netze mit Ende-zu-Ende Modellierung, die in der Speichervorrichtung gespeichert und so konfiguriert sind, dass sie von dem einen oder den mehreren Prozessoren ausgeführt werden können, dadurch gekennzeichnet, dass der Verkehrsagent so konfiguriert ist, dass er das computerimplementierte Trainingsverfahren gemäß dem ersten erfinderischen Aspekt ausführt.A second aspect of the invention relates to a computer system for training a traffic agent who navigates a road vehicle in a simulation environment, comprising or consisting of one or more processors, a memory device coupled to the one or more processors and a traffic agent as a decision maker in simulated driving situations using one or more neural networks with end-to-end modeling stored in the storage device and configured to be executed by the one or more processors, characterized in that the traffic agent is configured to computer-implemented training method according to the first inventive aspect.

Ein dritter Aspekt der Erfindung bezieht sich auf ein Computersystem zur Simulation einer Straßenfahrumgebung in Fahrsituationen für ein oder mehrere Fahrzeuge, das einen oder mehrere Prozessoren, eine mit dem einen oder den mehreren Prozessoren gekoppelte Speichervorrichtung und einen Verkehrsagenten umfasst oder daraus besteht, der ein oder mehrere neuronale Netze als Entscheider in simulierten Fahrsituationen verwendet, wobei ein oder mehrere neuronale Netze mit Ende-zu-Ende Modellierung verwendet werden, die in der Speichervorrichtung gespeichert und so konfiguriert sind, dass sie von dem einen oder den mehreren Prozessoren ausgeführt werden, dadurch gekennzeichnet, dass der Verkehrsagent gemäß dem computerimplementierten Trainingsverfahren gemäß dem ersten erfindungsgemäßen Aspekt trainiert wurde, einen oder mehrere Fahrzeugsteuerungsrahmen Ĉ_i als Aktion vorherzusagen, die longitudinale und laterale Positionen enthalten, die auf ein simuliertes Fahrzeug in der Simulationsumgebung anzuwenden sind, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,...n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.A third aspect of the invention relates to a computer system for simulating a road driving environment in driving situations for one or more vehicles, comprising or consisting of one or more processors, a memory device coupled to the one or more processors, and a traffic agent having one or more uses neural networks as decision makers in simulated driving situations, using one or more neural networks with end-to-end modelling, stored in the memory device and configured to be executed by the one or more processors, characterized in that that the traffic agent has been trained, according to the computer-implemented training method according to the first aspect of the invention, to predict as an action one or more vehicle control frames Ĉ _i that contain longitudinal and lateral positions that are related to a simulated vehicle in the simulation environment g are to be applied, where i is an arbitrary number such that i ∈ [1,2,...n] and where n represents the limit of the frames driven.

Die erfindungsgemäßen Aspekte der vorliegenden Erfindung, wie sie hierin offenbart sind, können jede mögliche (Unter-)Kombination der bevorzugten erfindungsgemäßen Ausführungsformen umfassen, wie sie in den abhängigen Ansprüchen dargelegt sind oder wie sie in der folgenden detaillierten Beschreibung und/oder in den begleitenden Figuren offenbart sind, vorausgesetzt, die sich ergebende Kombination von Merkmalen ist für den Fachmann sinnvoll.The inventive aspects of the present invention as disclosed herein may comprise any (sub)combination of the preferred inventive embodiments as set out in the dependent claims or as illustrated in the following detailed description and/or the accompanying figures are disclosed, provided the resulting combination of features makes sense to those skilled in the art.

Figurenlistecharacter list

Weitere Merkmale und Vorteile der vorliegenden Erfindung ergeben sich aus den beigefügten Zeichnungen, wobei

Die 1a) bis 1c) zeigen schematische Darstellungen von (Teilen) des E2E-Fahrzeugsteuerungsmodells der erfindungsgemäßen Computersysteme im Training (1a) und 1b)) bzw. im Einsatz (1c)).
2 zeigt eine schematische Darstellung einer Sechs-Fahrzeug-Nachbarschaftsinformation.
3 zeigt eine schematische Darstellung einer halbkreisförmigen Straßengeometrie und die entsprechenden Verschiebungsvektoren.
4 zeigt ein Verteilungsdiagramm des Fehlers/Bildes im △ Peilung gegen die DFS-Grundwahrheitsvalidierungsdaten in einem erfindungsgemäßen Fahrspurfolgemodul.
5 zeigt ein Verteilungsdiagramm des Fehlers/Bildes in der △ Beschleunigung gegen die DFS-Grundwahrheitsvalidierungsdaten in einem erfindungsgemäßen Fahrspurfolgemodul.
6 zeigt ein Verteilungsdiagramm der Spurmittenabweichung in den realen DFS-Verkehrsdaten im Modul „Fahrspur folgen“.
7 zeigt ein Verteilungsdiagramm der Spurmittenabweichung des Moduls Fahrspur folgen, wenn es in der Simulation ausgeführt wird.
8a) und 8b) zeigen die Verteilungsdiagramme der Relativgeschwindigkeit gegenüber dem relativen Abstand zum Vorderwagen für den Modellversuch in der Simulation (8a)) und die DFS-Daten am Grund (8b)).

Other features and advantages of the present invention will become apparent from the accompanying drawings, wherein

the 1a) until 1c ) show schematic representations of (parts of) the E2E vehicle control model of the computer systems according to the invention in training ( 1a) and 1b) ) or in action ( 1c )).
2 12 shows a schematic representation of six-vehicle neighborhood information.
3 shows a schematic representation of a semi-circular road geometry and the corresponding displacement vectors.
4 12 shows a distribution diagram of the error/image in △ bearing versus the DFS ground truth validation data in a lane following module according to the invention.
5 FIG. 12 shows a distribution diagram of the error/image in the △ acceleration versus the DFS ground truth validation data in a lane following module according to the invention.
6 shows a distribution diagram of the lane center deviation in the real DFS traffic data in the "Follow Lane" module.
7 shows a distribution plot of the lane misalignment of the Lane Follow module when run in the simulation.
8a) and 8b) show the distribution diagrams of the relative speed versus the relative distance to the vehicle in front for the model test in the simulation ( 8a) ) and the DFS data at the bottom ( 8b) ).

AUSFÜHRLICHE BESCHREIBUNG DER ERFINDUNG:DETAILED DESCRIPTION OF THE INVENTION:

Wie im Folgenden näher erläutert, haben die Erfinder der verschiedenen Aspekte der vorliegenden Erfindung herausgefunden, dass die computerimplementierten Systeme und Verfahren gemäß der vorliegenden Erfindung einen Verkehrsagenten, der ein Straßenfahrzeug in einer Simulationsumgebung navigiert, in die Lage versetzen, simulierte Fahrentscheidungen auf hoher Ebene (z.B. Spurwechsel, Überholvorgänge) und auf niedriger / operativer Ebene (Trajektorien- und Bewegungsplanung) simulierte Fahrentscheidungen zu treffen, die ein menschenähnliches (naturalistisches) Verhalten widerspiegeln, d.h. die longitudinale und laterale Position des Fahrzeugs, vorzugsweise Peilung und Beschleunigung, in einer Weise zu steuern, die in jeder Fahrsituation ein naturalistisches Fahrverhalten zeigt.As discussed in more detail below, the inventors of various aspects of the present invention have found that the computer-implemented systems and methods of the present invention enable a traffic agent navigating a road vehicle in a simulation environment to make high-level (e.g., lane changing, overtaking) and low/operational level (trajectory and motion planning) simulated driving decisions that reflect human-like (naturalistic) behavior, i.e. to control the vehicle's longitudinal and lateral position, preferably bearing and acceleration, in a way which shows a naturalistic driving behavior in every driving situation.

So zeigt die vorliegende Erfindung erfolgreich das naturgetreue Entscheidungsverhalten aus den Quelldaten in der Simulationsumgebung in Bezug auf Planung, Sicherheitsabläufe und Einhaltung von Verkehrsregeln.Thus, the present invention successfully shows the lifelike decision-making behavior from the source data in the simulation environment with regard to planning, safety processes and compliance with traffic rules.

Die jeweiligen naturalistischen Fahr- und Kartendaten werden gemäß der vorliegenden Erfindung zu einem oder mehreren Wahrnehmungsrahmen je Zeitfenster verarbeitet, die entsprechende Wahrnehmungsinformationen für (i) die Verkehrssituation, (ii) Informationen über den Eigenzustand des Ego-Fahrzeugs und (iii) die Straßengeometrie enthalten. Darüber hinaus werden die jeweiligen naturalistischen Fahr- und Kartendaten gemäß der vorliegenden Erfindung verarbeitet, um einen oder mehrere jeweilige Fahrzeugkontrollrahmen je Zeitfenster zu bilden, wobei jeder Fahrzeugkontrollrahmen die longitudinale und laterale Position des jeweiligen Ego-Fahrzeugs enthält. Die Anwendung von drei Kategorien des Wahrnehmungsrahmens ist grundlegend, um eine effektive Generalisierung des erfindungsgemäßen Computermodells zu ermöglichen.According to the present invention, the respective naturalistic driving and map data are processed into one or more perception frames per time window, which contain corresponding perception information for (i) the traffic situation, (ii) information about the ego vehicle's own state and (iii) the road geometry. Furthermore, according to the present invention, the respective naturalistic driving and map data are processed to form one or more respective vehicle control frames per time slot, each vehicle control frame containing the longitudinal and lateral position of the respective ego vehicle. The use of three categories of perceptual framework is fundamental to enable effective generalization of the computer model of the present invention.

Gemäß der vorliegenden Erfindung wird das Entscheider-Computermodell des simulierten Verkehrsagenten mit dem jeweiligen einen oder den mehreren Wahrnehmungsrahmen als Eingabe für das Modell und mit dem einen oder den mehreren Grundwahrheits-Fahrzeugkontrollrahmen als Etikett für das Training des Modells trainiert, wobei der Entscheider ein oder mehrere neuronale Netze mit Ende-zu-Ende-Modellierung verwendet und so konfiguriert ist, dass er entsprechende Fahrzeugkontrollrahmen vorhersagt, die longitudinale und laterale Positionen des jeweiligen Ego-Fahrzeugs enthalten, indem er die vorhergesagten Fahrzeugkontrollrahmen mit den jeweiligen Grundwahrheits-Fahrzeugkontrollrahmen abgleicht. Mit anderen Worten, beim Abgleich der vorhergesagten Fahrzeugsteuerungsrahmen mit den jeweiligen Grundwahrheits-Fahrzeugsteuerungsrahmen werden die vorhergesagten Fahrzeugsteuerungsrahmen mit den jeweiligen Grundwahrheits-Fahrzeugsteuerungsrahmen angenähert. Das erfindungsgemäße Trainingsverfahren des Modells basiert auf einem datengesteuerten Ansatz, bei dem das Modell so konfiguriert ist, dass es implizit aus den naturalistischen Daten der Grundwahrheit lernt.According to the present invention, the decision maker computer model of the simulated traffic agent is trained with the respective one or more perceptual frames as input to the model and with the one or more ground truth vehicle control frames as a label for training the model, the decision maker having one or more uses neural networks with end-to-end modeling and is configured to predict respective vehicle control frames containing longitudinal and lateral positions of the respective ego vehicle by matching the predicted vehicle control frames to the respective ground truth vehicle control frames. In other words, in matching the predicted vehicle control frames with the respective ground truth vehicle control frames, the predicted vehicle control frames are approximated with the respective ground truth vehicle control frames. The method of training the model according to the invention is based on a data-driven approach, in which the model is configured to learn implicitly from the naturalistic data of the ground truth.

Im Zusammenhang mit der vorliegenden Erfindung bedeutet der Ausdruck „eine zusätzlich oder alternativ bevorzugte Ausführungsform“ oder „eine zusätzlich oder alternativ weiter bevorzugte Ausführungsform“ oder „eine zusätzliche oder alternative Art und Weise, diese Ausführungsform zu konfigurieren“, dass das Merkmal oder die Merkmalskombination, die in dieser bevorzugten Ausführungsform offenbart ist, zusätzlich zu oder alternativ zu den Merkmalen des erfindungsgemäßen Gegenstands, einschließlich jeder bevorzugten Ausführungsform jedes der erfindungsgemäßen Aspekte, kombiniert werden kann, vorausgesetzt, die sich ergebende Merkmalskombination ist für einen Fachmann sinnvoll.In the context of the present invention, the expression "an additional or alternative preferred embodiment" or "an additional or alternative further preferred embodiment" or "an additional or alternative way of configuring this embodiment" means that the feature or combination of features, disclosed in this preferred embodiment may be combined in addition to or as an alternative to the features of the subject matter of the invention, including any preferred embodiment of any aspect of the invention, provided that the resulting combination of features makes sense to a person skilled in the art.

Im Zusammenhang mit der vorliegenden Erfindung sind die Ausdrücke „umfassend‟ oder „enthaltend“ so zu verstehen, dass sie eine weit gefasste Bedeutung haben, ähnlich wie der Begriff „einschließlich“, und so zu verstehen sind, dass sie die Einbeziehung einer bestimmten ganzen Zahl oder eines bestimmten Schritts oder einer Gruppe von ganzen Zahlen oder Schritten bedeuten, nicht aber den Ausschluss einer anderen ganzen Zahl oder eines anderen Schritts oder einer anderen Gruppe von ganzen Zahlen oder Schritten. Diese Definition gilt auch für Varianten des Begriffs „umfassend‟ wie „umfassen“ und „umfasst‟ sowie für Varianten des Begriffs „enthalten“ wie „enthalten“ und „enthält‟.In the context of the present invention, the terms "comprising" or "including" are to be understood as having a broad meaning, similar to the term "including" and are to be understood as including a specific integer or a particular step or group of integers or steps, but not the exclusion of any other integer or step or group of integers or steps. This definition also applies to variants of the term “comprising” such as “include” and “includes” and variants of the term “contain” such as “contain” and “includes”.

Im Zusammenhang mit der vorliegenden Erfindung ist der Ausdruck „konfiguriert“ auch im Zusammenhang mit Systemen und Computerprogrammkomponenten zu verstehen. Wenn ein System aus einem oder mehreren Computern so konfiguriert ist, dass es bestimmte Operationen oder Aktionen durchführt, bedeutet dies, dass auf dem System Software, Firmware, Hardware oder eine Kombination davon installiert ist, die im Betrieb das System veranlassen, Operationen oder Aktionen durchzuführen. Dass ein oder mehrere Computerprogramme so konfiguriert sind, dass sie bestimmte Operationen oder Aktionen ausführen, bedeutet, dass das eine oder die mehreren Programme Befehle enthalten, die, wenn sie von einem Datenverarbeitungsgerät ausgeführt werden, das Gerät veranlassen, die Operationen oder Aktionen auszuführen.In connection with the present invention, the term “configured” is also to be understood in connection with systems and computer program components. When a system of one or more computers is configured to perform specific operations or actions, it means that the system has software, firmware, hardware, or a combination thereof installed that, when in use, causes the system to perform operations or actions . One or more computer programs configured to perform specific operations or actions means that the one or more programs include instructions that, when executed by a computing device, cause the device to perform the operations or actions.

Um die erfindungsgemäßen Gegenstände, Vorteile und Ziele zu erreichen, ist die vorliegende Erfindung, wie sie in dieser Offenbarung offenbart wird, auf Systeme und Verfahren gerichtet, die Computerhardware und -software nutzen, um einen virtuellen Verkehrsagenten zu trainieren, der unter Verwendung von Algorithmen und Techniken des verstärkten Lernens durch eine Simulationsumgebung navigiert. Ein virtueller Verkehrsagent (im Rahmen der vorliegenden Erfindung auch „Verkehrsagent“ genannt) kann beispielsweise ein Auto, ein LKW, ein Bus, ein Fahrrad oder ein Motorrad sein. Nachdem ein virtueller Verkehrsagent gemäß der vorliegenden Erfindung trainiert wurde, der das menschliche Fahrverhalten insbesondere in komplexen Fahrsituationen des Spurwechsels nachbildet, können ein oder mehrere trainierte virtuelle Verkehrsagenten in eine Simulationsumgebung mit komplexen Fahrsituationen injiziert werden. Eine solche Ausführungsform ist bevorzugt, da die trainierten Verkehrsagenten mit einem autonomen Fahrzeugsystem, das ein zu testendes autonomes Fahrzeug steuert, interagieren, kooperieren und es herausfordern können. Ein weiterer Vorteil ist, dass eine solche Ausführungsform geeignet ist, die Grenzen und Schwächen des autonomen Fahrzeugsystems zu testen, insbesondere in komplexen Fahrsituationen, die auf ein durchsetzungsfähiges oder aggressives Fahrverhalten zurückgeführt werden können.To achieve the inventive objects, advantages, and objectives, the present invention as disclosed in this disclosure is directed to systems and methods that utilize computer hardware and software to train a virtual traffic agent that uses algorithms and Reinforced learning techniques navigated through a simulation environment. A virtual traffic agent (also referred to as “traffic agent” within the scope of the present invention) can be a car, a truck, a bus, a bicycle or a motorcycle, for example. After a virtual traffic agent has been trained according to the present invention, which simulates human driving behavior in particular in complex lane-changing driving situations, one or more trained virtual traffic agents can be injected into a simulation environment with complex driving situations. Such an embodiment is preferred since the trained traffic agents can interact, cooperate and challenge an autonomous vehicle system controlling an autonomous vehicle under test. Another advantage is that such an embodiment is suitable for testing the limits and weaknesses of the autonomous vehicle system, particularly in complex driving situations that can be attributed to assertive or aggressive driving behavior.

Somit haben die erfindungsgemäßen Systeme und Verfahren darüber hinaus den technischen Effekt und den Vorteil, dass sie eine Verbesserung der Computertechnologie für autonome Fahrzeuge darstellen, da das autonome Fahrzeug in der erfindungsgemäßen Simulationsumgebung trainiert wird, die menschenähnliche/natürliche Fahrszenarien widerspiegelt.Thus, the systems and methods according to the invention also have the technical effect and advantage that they represent an improvement in computing technology for autonomous vehicles, since the autonomous vehicle is trained in the simulation environment according to the invention, which reflects human-like/natural driving scenarios.

Gemäß dem ersten Aspekt der vorliegenden Erfindung, ein computerimplementiertes Trainingsverfahren eines Verkehrsagenten für die Navigation eines Straßenfahrzeugs in einer Simulationsumgebung, dadurch gekennzeichnet, dass das Verfahren die folgenden Schritte umfasst oder daraus besteht:

Gemäß Schritt a) liefert das erfindungsgemäße Trainingsverfahren Fahrdaten zu einem oder mehreren Zeitfenstern t_i = [t_i, t₂, ... t_n] für ein oder mehrere Straßenfahrzeuge als Ego-Fahrzeuge, die jeweils von einem Menschen in einer realistischen Situation auf einer Straße gefahren werden, und stellt Kartendaten auf der jeweiligen Straße zu den gegebenen Zeitfenstern t_i bereit, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,...n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt. Die Fahrdaten stellen im Allgemeinen Trajektoriendaten der jeweiligen Ego-Fahrzeuge dar.

According to the first aspect of the present invention, a computer-implemented training method of a traffic agent for navigating a road vehicle in a simulation environment, characterized in that the method comprises or consists of the following steps:

According to step a), the training method according to the invention provides driving data for one or more time windows t _i = [t _i , t ₂ , ... t _n ] for one or more road vehicles as ego vehicles, each of which is triggered by a human being in a realistic situation a road are driven, and provides map data on the respective road at the given time windows t _i , where i is an arbitrary number such that i ∈ [1,2,...n] and where n is the limit of frames driven represents. The driving data generally represent trajectory data of the respective ego vehicles.

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform umfassen oder bestehen die Fahrdaten in Schritt a) für jedes der gegebenen Straßenfahrzeuge aus einem oder mehreren Zustandsmerkmalen der jeweiligen Ego-Fahrzeuge je gegebenem Zeitfenster t_i, vorzugsweise umfassend oder bestehend aus longitudinaler Geschwindigkeit, longitudinaler Beschleunigung und Position des jeweiligen Straßenfahrzeugs in X- bzw. Y-Koordinaten je gegebenem Zeitfenster t_i.In an additional or alternative preferred embodiment, the driving data in step a) for each of the given road vehicles includes or consists of one or more status characteristics of the respective ego vehicles for each given time window t _i , preferably including or consisting of longitudinal speed, longitudinal acceleration and position of the respective road vehicle in X and Y coordinates for each given time window t _i .

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform enthalten die Kartendaten von Schritt a) entsprechende Straßeninformationen, die i) die Anzahl der Fahrspuren der jeweiligen Straße und ii) die Position der Fahrspuren in X-, Y-Koordinaten, optional in X-, Y- und Z-Koordinaten, jeweils für bestimmte Zeiträume t_i umfassen oder daraus bestehen.In an additional or alternative preferred embodiment, the map data from step a) contain corresponding street information which i) the number of lanes of the respective street and ii) the position of the lanes in X, Y coordinates, optionally in X, Y and comprise or consist of Z-coordinates, each for specific periods of time t _i .

Gemäß Schritt b) verarbeitet das erfindungsgemäße Trainingsverfahren zumindest einen Teil der Fahrdaten und Kartendaten aus Schritt a) zu einem oder mehreren jeweiligen Wahrnehmungsrahmen P_i = [p₁,p₂,...p_n] je gegebenem Zeitfenster t_i, wobei jeder Wahrnehmungsrahmen P_i entsprechende Wahrnehmungsinformationen für (i) Verkehrssituation, (ii) Eigenzustandsinformationen des Ego-Fahrzeugs und (iii) Straßengeometrie enthält, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,...n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.According to step b), the training method according to the invention processes at least part of the driving data and map data from step a) into one or more respective perceptual frames P _i = [p ₁ ,p ₂ ,...p _n ] per given time window t _i , each perceptual frame P _i contains corresponding perceptual information for (i) traffic situation, (ii) ego vehicle eigenstate information and (iii) road geometry, where i is any number such that i ∈ [1,2,...n] and where n represents the limit of the driven frames.

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform umfasst oder besteht die Verkehrssituation in Schritt b) aus Sechs-Fahrzeug-Nachbarschaftsinformationen, wobei jedes dargestellte Fahrzeug der sechs Positionen i) den relativen Abstand des jeweiligen Fahrzeugs zum Ego-Fahrzeug und ii) die relative Geschwindigkeit des jeweiligen Fahrzeugs zur Geschwindigkeit des Ego-Fahrzeugs umfasst oder daraus besteht. In Bezug auf die Sechs-Fahrzeug-Nachbarschaft sind die Fahrzeugrollen gemäß der vorliegenden Erfindung wie folgt definiert:

- Das Auto vor dem Ego-Fahrzeug (auf der gleichen Spur).
- Das Auto, das dem Ego-Fahrzeug folgt (auf derselben Fahrspur).
- Die beiden Autos vor dem Mittelpunkt des Ego-Fahrzeugs übertragen auf die beiden benachbarten Fahrspuren.
- Die beiden Autos im hinteren Teil des Ego-Fahrzeugs übertragen auf die beiden Nachbarspuren.

In an additional or alternative preferred embodiment, the traffic situation in step b) comprises or consists of six-vehicle neighborhood information, with each displayed vehicle of the six positions i) the relative distance of the respective vehicle to the ego vehicle and ii) the relative speed of the respective Vehicle to the speed of the ego vehicle includes or consists of. With respect to the six-vehicle neighborhood, the vehicle roles according to the present invention are defined as follows:

- The car in front of the ego vehicle (in the same lane).
- The car following the ego vehicle (on the same lane).
- The two cars in front of the center of the ego vehicle transferred to the two adjacent lanes.
- The two cars in the back of the ego vehicle transfer to the two adjacent lanes.

Jeder dieser Punkte kann für eine bestimmte Zeit/Ego-Fahrzeug-Kombination vorhanden sein oder auch nicht und wird in dem Modell berücksichtigt.Each of these points may or may not be present for a particular time/ego-vehicle combination and are accounted for in the model.

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform umfasst oder besteht die Eigenzustandsinformation der jeweiligen Ego-Fahrzeuge in Schritt b) aus der longitudinalen Geschwindigkeit, der longitudinalen Beschleunigung und der Peilung in Bezug auf die Straßenrichtung (Winkelabweichung Ad). Der Begriff „Peilung“ eines Ego-Fahrzeugs steht im Rahmen der vorliegenden Erfindung für die Ausrichtung des Ego-Fahrzeugs in Bezug auf die globalen x- / y-Achsen. Als Beispiel kann die Winkelabweichung definiert werden als $A d^{i} = θ_{r o a d}^{i} - θ_{e g o}^{i}$

wobei

θ_{r o a d}^{i} und θ_{e g o}^{i}

die Peilung der Straße und des Ego-Fahrzeugs zu jedem gegebenem Zeitfenster t_i darstellen, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt. Um die Genauigkeit zu erhöhen, kann die Peilung der Straße durch die Peilung der Fahrspur ersetzt werden und wie folgt definiert werden

A d^{i} = θ_{l a n e}^{i} - θ_{e g o}^{i}

wobei

θ_{l a n e}^{i} und θ_{e g o}^{i}

die Peilung der Fahrspur und des Ego-Fahrzeugs zu jedem gegebenen Zeitfenster darstellen t_i.In an additional or alternative preferred embodiment, the inherent state information of the respective ego vehicles in step b) comprises or consists of the longitudinal speed, the longitudinal acceleration and the bearing in relation to the road direction (angular deviation Ad). In the context of the present invention, the term “bearing” of an ego vehicle stands for the alignment of the ego vehicle in relation to the global x/y axes. As an example, the angular deviation can be defined as

A {i.e}^{i} = θ_{right O a i.e}^{i} - θ_{e G O}^{i}

whereby

θ_{right O a i.e}^{i} and θ_{e G O}^{i}

represent the bearing of the road and the ego vehicle at any given time window t _i , where i is any number such that i ∈ [1,2,...n] and where n represents the limit of frames driven. To increase accuracy, the bearing of the road can be replaced by the bearing of the lane and defined as follows

A {i.e}^{i} = θ_{l a n e}^{i} - θ_{e G O}^{i}

whereby

θ_{l a n e}^{i} and θ_{e G O}^{i}

represent the bearing of the lane and ego vehicle at any given time window t _i .

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform umfasst oder besteht die Straßengeometrie in Schritt b) aus einer numerischen Darstellung einer jeweiligen Fahrbahngeometrie in Bezug auf das Ego-Fahrzeug, wobei die numerische Darstellung vorzugsweise aus einer kreisförmigen oder halbkreisförmigen Geometrie ausgewählt wird.In an additional or alternative preferred embodiment, the road geometry in step b) comprises or consists of a numeric representation of a respective roadway geometry in relation to the ego vehicle, wherein the numeric representation is preferably selected from a circular or semi-circular geometry.

Die kreisförmige oder halbkreisförmige numerische Darstellung der jeweiligen Fahrspurgeometrie mit zwei Fahrspurbegrenzungen erfolgt beispielsweise in Form eines Vektors von Verschiebungen D_j an jeder der beiden Fahrbahnbegrenzungen zu einem beliebigen Zeitfenster t_i mit $D_{j} = [d_{1}, d_{2} \dots d_{n}]$

wobei jeder Eintrag D_j Teil einer Folge von Verschiebungspunkten zur Position des Ego-Fahrzeugs ist, die auf der Grundlage ihrer relativen Peilwerte zur Ego-Position mit Intervallen von 1° oder mehr um den kreisförmigen oder halbkreisförmigen Bereich vor und/oder hinter dem Ego-Fahrzeug herum unterteilt sind, und wobei die Länge n des Verschiebungsvektors D_j 1 bis 360 für die kreisförmige Geometrie und 1 bis 180 für die halbkreisförmige Geometrie darstellt.The circular or semi-circular numerical representation of the respective lane geometry with two lane boundaries takes place, for example, in the form of a vector of displacements D _j at each of the two lane boundaries for any time window t _i with

D_{j} = [{i.e}_{1}, {i.e}_{2} ... {i.e}_{n}]

where each entry D _j is part of a sequence of offset points to the position of the ego vehicle calculated based on their relative bearings to the ego position at intervals of 1° or more around the circular or semi-circular area in front of and/or behind the ego around the vehicle, and where the length n of the displacement vector D _j represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.

Bei der halbkreisförmigen Geometrie wird der vordere Bereich mit 180° dargestellt, während bei der kreisförmigen Geometrie sowohl der vordere als auch der hintere Bereich mit 360° dargestellt werden.The semi-circular geometry represents the front area as 180°, while the circular geometry represents both the front and rear areas as 360°.

Gemäß Schritt c) verarbeitet das erfindungsgemäße Trainingsverfahren zumindest einen Teil der Fahrdaten und Kartendaten aus Schritt a) zu einem oder mehreren entsprechenden Grundwahrheits-Fahrzeugsteuerungsrahmen C_i = [c₁, c₂, ... c_n] je gegebenem Zeitfenster t_i, wobei jeder Fahrzeugkontrollrahmen C_i longitudinale und laterale Positionen der jeweiligen Ego-Fahrzeuge enthält, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.According to step c), the training method according to the invention processes at least part of the driving data and map data from step a) to form one or more corresponding basic truth driving vehicle control frame C _i = [c ₁ , c ₂ , ... c _n ] per given time window t _i , each vehicle control frame C _i containing longitudinal and lateral positions of respective ego vehicles, where i is any number such that i ∈ [1,2,...n] and where n represents the limit of frames driven.

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform umfassen oder bestehen die longitudinalen und lateralen Positionen der jeweiligen Ego-Fahrzeuge in Schritt c) und Schritt d) aus Beschleunigungs- und Peilwerten, vorzugsweise umfassen oder bestehen sie aus Änderungen der Beschleunigung (ΔBeschleunigung) und Peilung (ΔPeilung), die auf die jeweiligen Ego-Fahrzeuge im Zeitfenster t_i angewendet werden. Die Verwendung von Änderungen der Beschleunigungs- und Peilwerte ist bevorzugt, da aus diesen Werten die Änderungen der Lenkung, der Gaspedaltiefe und der Bremspedaltiefe direkt bestimmt werden können.In an additional or alternative preferred embodiment, the longitudinal and lateral positions of the respective ego vehicles in step c) and step d) comprise or consist of acceleration and bearing values, preferably they comprise or consist of changes in acceleration (Δacceleration) and bearing (Δbearing ) applied to the respective ego vehicles in the time window t _i . The use of changes in acceleration and bearing values is preferred because from these values the changes in steering, accelerator pedal depth and brake pedal depth can be determined directly.

Die Bearbeitungsschritte b) und c) können gleichzeitig oder nacheinander in beliebiger Reihenfolge ausgeführt werden.Processing steps b) and c) can be carried out simultaneously or one after the other in any order.

Gemäß Schritt d) trainiert das erfindungsgemäße Verfahren ein Entscheider-Computermodell des Verkehrsagenten mit dem einen oder mehreren Wahrnehmungsrahmen P_i je gegebenen Zeitfenster t_i von Schritt b) als Eingabe für das Modell und mit dem einen oder den mehreren Grundwahrheits-Fahrzeugkontrollrahmen C_i für gegebene Zeitfenster t_i von Schritt c) als Etikett für das Training des Modells, wobei der Entscheider ein oder mehrere neuronale Netze mit Ende-zu-Ende Modellierung verwendet und konfiguriert ist, die entsprechenden Fahrzeugsteuerungsrahmen Ĉ_i = [c₁, c₂, ... c_n], die longitudinale und laterale Positionen des jeweiligen Ego-Fahrzeugs enthalten, vorherzusagen indem er die vorhergesagten Fahrzeugkontrollrahmen Ĉ_i mit den jeweiligen Grundwahrheits-Fahrzeugsteuerungsrahmen C_i vergleicht, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n der Grenzwert für gefahrene Rahmen darstellt.According to step d), the inventive method trains a decision maker computer model of the traffic agent with the one or more perceptual frames P _i per given time window t _i of step b) as input for the model and with the one or more ground truth vehicle control frames C _i for given ones Time window t _i of step c) as a label for the training of the model, where the decider uses and configures one or more neural networks with end-to-end modeling, the corresponding vehicle control frames Ĉ _i = [c ₁ , c ₂ , .. c _n ] containing longitudinal and lateral positions of the respective ego vehicle by comparing the predicted vehicle control frames Ĉ _i with the respective ground truth vehicle control frames C _i , where i is any number such that i ∈ [1,2 ,...n] and where n is the limit for driven frames.

Mit anderen Worten, die Wahrnehmung P_i der naturalistischen Daten aus Schritt b) wird als Eingabe für das Computermodell verwendet, und die naturalistischen Grundwahrheitsdaten der Fahrzeugkontrollrahmen C_i in Schritt c) werden als Etikett für Trainingszwecke verwendet. Im Gegensatz dazu verwendet der erfindungsgemäß trainierte Verkehrsagent in einem Computersystem, das eine Fahrumgebung gemäß dem dritten erfindungsgemäßen Aspekt simuliert, während des Einsatzes nicht die naturalistischen Fahrzeugkontrollrahmen von Schritt c) und ersetzt die naturalistischen Wahrnehmungsrahmen von Schritt b) durch simulierte Wahrnehmungsrahmen. Während des Einsatzes sagt der Entscheider des erfindungsgemäßen Simulationscomputersystems gemäß dem dritten erfindungsgemäßen Aspekt als Aktion einen oder mehrere Fahrzeugkontrollrahmen Ĉ_i voraus, die longitudinale und laterale Positionen eines simulierten Fahrzeugs in der Simulationsumgebung enthalten, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n der Grenzwert für gesteuerte Rahmen ist.In other words, the perception P _i of the naturalistic data from step b) is used as input to the computer model and the naturalistic ground truth data of the vehicle control frames C _i in step c) is used as a tag for training purposes. In contrast, in a computer system simulating a driving environment according to the third aspect of the invention, the traffic agent trained according to the invention does not use the naturalistic vehicle control frames of step c) during operation and replaces the naturalistic perceptual frames of step b) with simulated perceptual frames. During use, according to the third aspect of the invention, the decider of the inventive simulation computer system predicts as an action one or more vehicle control frames Ĉ _i containing longitudinal and lateral positions of a simulated vehicle in the simulation environment, where i is any number such that i ∈ [1 ,2,...n] and where n is the gated frame limit.

Wie bereits oben in Bezug auf die bevorzugte Ausführungsform von Schritt c) erörtert, werden die Grundwahrheits-Fahrzeugsteuerungsrahmen C_i und die vorhergesagten Fahrzeugkontrollrahmen Ĉ_i Beschleunigungs- und Peilwerte umfassen oder daraus bestehen, vorzugsweise Änderungen der Beschleunigung (Δ Beschleunigung) und Peilung (Δ Peilung), die auf die jeweiligen Ego-Fahrzeuge zum Zeitfenster t_i umfassen oder daraus bestehen.As already discussed above in relation to the preferred embodiment of step c), the ground truth vehicle control frames C _i and the predicted vehicle control frames Ĉ _i will include or consist of acceleration and bearing values, preferably changes in acceleration (Δ acceleration) and bearing (Δ bearing ), which include or consist of the respective ego vehicles at the time window t _i .

In einer zusätzlichen oder bevorzugten Ausführungsform verwendet das Entscheider-Computermodell des Verkehrsagenten in Schritt d) drei oder mehr neuronale Netze, wobei vorzugsweise zumindest ein Teil oder alle neuronalen Netze in einer verzweigten Architektur kombiniert sind.In an additional or preferred embodiment, the decision-maker computer model of the traffic agent in step d) uses three or more neural networks, with preferably at least some or all of the neural networks being combined in a branched architecture.

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform sind zumindest ein Teil oder alle neuronalen Netze tiefe neuronale Netze, wobei vorzugsweise zumindest ein Teil oder alle tiefen neuronalen Netze unabhängig voneinander eine, zwei oder mehr Schichten umfassen oder daraus bestehen, wobei jede Schicht unabhängig voneinander eine Anzahl von Neuronen im Bereich von 1 bis 512 aufweist, wobei vorzugsweise die Anzahl der Neuronen je Schicht in dem tiefen neuronalen Netz unterschiedlich ist.In an additional or alternative preferred embodiment, at least some or all of the neural networks are deep neural networks, preferably at least some or all of the deep neural networks independently of one another comprise or consist of one, two or more layers, each layer independently of one another having a number of Has neurons in the range from 1 to 512, the number of neurons per layer preferably being different in the deep neural network.

Dementsprechend umfasst das erfindungsgemäße Trainingsverfahren in einer Ausführungsform ferner verarbeiten der Fahrdaten aus Schritt a) der jeweiligen Ego-Fahrzeuge je gegebenem Zeitfenster t_i zu binären entsprechenden Grundwahrheits-Situationskategorien von „Fahrspur folgen“ ${SC}_{L F}^{i} = [{SC}_{LF}^{1}, {SC}_{LF}^{2}, \dots {SC}_{L F}^{n}]$

oder „Fahrspur wechseln"

{SC}_{L C}^{i} = [{SC}_{LC}^{1}, {SC}_{LC}^{2}, \dots {SC}_{L C}^{n}]

und wobei das Entscheider-Computermodell des Verkehrsagenten in Schritt d) i) ein neuronales Netz Fahrspur folgen, ii) ein neuronales Netz Fahrspur wechseln und iii) ein neuronales Netz Funktionsklassifikator umfasst, wobei

- der eine oder die mehreren Wahrnehmungsrahmen P_i jeweils als Eingabe für die neuronalen Netze „Fahrspur folgen“, „Fahrspur wechseln“ und „Funktionsklassifikator" verwendet werden,
- der eine oder die mehreren Grundwahrheits-Fahrzeugkontrollrahmen C_i jeweils als Etikett für ein unabhängiges Training der neuronalen Netze für Fahrspur folgen und Fahrspur wechseln verwendet werden, indem die vorhergesagten Fahrzeugkontrollrahmen Ĉ_i mit den jeweiligen Grundwahrheits-Fahrzeugkontrollrahmen C_i abgeglichen werden und
- die jeweils angewandte Grundwahrheits-Situationskategorie ${SC}_{L F}^{i}$
oder ${SC}_{L C}^{i}$
je gegebenem Zeitfenster t_i als Etikett verwendet werden, um das neuronale Netz Funktionsklassifikator unabhängig zu trainieren, eine entsprechende Situationskategorie „Fahrspur folgen“ ${\hat{S C}}_{L F}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, \dots {\hat{S C}}_{L F}^{n}]$
oder „Fahrspur wechseln“ ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, \dots {\hat{S C}}_{L C}^{n}]$
durch Abgleich der vorhergesagten Situationskategorie ${\hat{S C}}_{L F}^{i} oder {\hat{S C}}_{L C}^{i}$
mit der jeweiligen tatsächlichen Situationskategorie ${SC}_{L F}^{i}$
und ${SC}_{L C}^{i}$
vorherzusagen, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,...n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.

Accordingly, in one embodiment, the training method according to the invention also includes processing the driving data from step a) of the respective ego vehicles for each given time window t _i to form binary corresponding basic truth situation categories of "follow the lane".

{SC}_{L f}^{i} = [{SC}_{LF}^{}, {SC}_{LF}^{2}, ... {SC}_{L f}^{n}]

or "change lane"

{SC}_{L C}^{i} = [{SC}_{LC}^{}, {SC}_{LC}^{2}, ... {SC}_{L C}^{n}]

and wherein the decision maker computer model of the traffic agent in step d) comprises i) a follow lane neural network, ii) a lane change neural network and iii) a function classifier neural network, wherein

- the one or more perceptual frames P _i are respectively used as input for the neural networks "follow lane", "change lane" and "function classifier",
- the one or more ground truth vehicle control frames C _i are each used as a label for an independent training of the neural networks for lane following and lane changing by comparing the predicted vehicle control frames Ĉ _i with the respective ground truth vehicle control frames C _i and
- the applied basic truth situation category ${SC}_{L f}^{i}$
or ${SC}_{L C}^{i}$
be used as a label for each given time window t _i to train the neural network function classifier independently, a corresponding situation category "follow lane" ${\hat{S C}}_{L f}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, ... {\hat{S C}}_{L f}^{n}]$
or "change lane" ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, ... {\hat{S C}}_{L C}^{n}]$
by matching the predicted situation category ${\hat{S C}}_{L f}^{i} or {\hat{S C}}_{L C}^{i}$
with the respective actual situation category ${SC}_{L f}^{i}$
and ${SC}_{L C}^{i}$
to predict, where i is any number such that i ∈ [1,2,...n] and where n represents the limit of frames driven.

Mit anderen Worten, beim Abgleich der vorhergesagten Situationskategorien ${\hat{S C}}_{L F}^{i}$

oder

{\hat{S C}}_{L C}^{i}

mit den jeweiligen Grundwahrheits-Situationskategorien

{SC}_{L F}^{i}

und

{SC}_{L C}^{i}

werden die vorhergesagten Situationskategorien

{\hat{S C}}_{L F}^{i}

oder

{\hat{S C}}_{L C}^{i}

mit den jeweiligen Grundwahrheits-Situationskategorien

{SC}_{L F}^{i}

und

{SC}_{L C}^{i}

angenähert. Mit anderen Worten, ist das erfindungsgemäße Trainingscomputersystem für jedes Zeitfenster t_i und Ego-Fahrzeug konfiguriert, zu bestimmen, ob das jeweilige Ego-Fahrzeug der Fahrspur folgt oder die Fahrspur wechselt, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2, ...n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.In other words, when matching the predicted situation categories

{\hat{S C}}_{L f}^{i}

or

{\hat{S C}}_{L C}^{i}

with the respective basic truth situation categories

{SC}_{L f}^{i}

and

{SC}_{L C}^{i}

become the predicted situation categories

{\hat{S C}}_{L f}^{i}

or

{\hat{S C}}_{L C}^{i}

with the respective basic truth situation categories

{SC}_{L f}^{i}

and

{SC}_{L C}^{i}

approximated. In other words, the training computer system according to the invention is configured for each time window t _i and ego vehicle to determine whether the respective ego vehicle is following the lane or changing lanes, where i is any number such that i ∈ [1, 2,...n] and where n represents the limit of frames driven.

Alle Merkmale und Ausführungsformen, die in Bezug auf den ersten Aspekt der vorliegenden Erfindung offenbart sind, sind allein oder in (Unter-)Kombination mit dem zweiten Aspekt oder dem dritten Aspekt der vorliegenden Erfindung einschließlich jeder der bevorzugten Ausführungsformen davon kombinierbar, sofern die sich ergebende Kombination von Merkmalen für einen Fachmann auf dem Gebiet der Technik angemessen ist.All features and embodiments disclosed in relation to the first aspect of the present invention can be combined alone or in (sub)combination with the second aspect or the third aspect of the present invention, including any of the preferred embodiments thereof, provided the resulting combination of features is appropriate for a person skilled in the art.

Gemäß dem zweiten Aspekt der Erfindung wird ein Computersystem zum Trainieren eines Verkehrsagenten, der ein Straßenfahrzeug in einer Simulationsumgebung navigiert, bereitgestellt, das einen oder mehrere Prozessoren, eine mit dem einen oder den mehreren Prozessoren gekoppelte Speichervorrichtung und einen Verkehrsagenten für die Entscheidungsfindung in simulierten Fahrsituationen unter Verwendung eines oder mehrerer neuronaler Netze mit Ende-zu-Ende Modellierung, die in der Speichervorrichtung gespeichert und so konfiguriert sind, dass sie von dem einen oder den mehreren Prozessoren ausgeführt werden können, umfasst oder daraus besteht, dadurch gekennzeichnet, dass der Verkehrsagent so konfiguriert ist, dass er das computerimplementierte Trainingsverfahren gemäß dem ersten erfinderischen Aspekt ausführt.According to the second aspect of the invention, a computer system for training a traffic agent navigating a road vehicle in a simulation environment is provided, comprising one or more processors, a memory device coupled to the one or more processors, and a traffic agent for decision-making in simulated driving situations Comprising or consisting of using one or more neural networks with end-to-end modeling stored in the storage device and configured to be executed by the one or more processors, characterized in that the traffic agent is so configured is that it executes the computer-implemented training method according to the first inventive aspect.

Gemäß einer weiteren oder alternativen bevorzugten Ausführungsform kann das Trainingscomputersystem des zweiten Aspekts so konfiguriert sein, dass der Verkehrsagent separate Module umfasst, so dass die jeweiligen naturalistischen Fahrdaten und Kartendaten in geeigneter Weise verarbeitet werden können. Insbesondere kann der Verkehrsagent ein Modul A zur Verarbeitung zumindest eines Teils der naturalistischen Fahrdaten und Kartendaten umfassen, um die jeweiligen Wahrnehmungsrahmen P_i = [p₁,p₂,...p_n] je gegebenem Zeitfenster t_i, wobei jeder Wahrnehmungsrahmen P_i entsprechende Wahrnehmungsinformationen für (i) die Verkehrssituation, (ii) Informationen über den Eigenzustand des Ego-Fahrzeugs und (iii) die Straßengeometrie enthält. Spezifische Ausführungsformen davon wurden bereits in Bezug auf den ersten erfindungsgemäßen Aspekt diskutiert und gelten auch für das erfindungsgemäße Trainingscomputersystem des zweiten erfindungsgemäßen Aspekts. Darüber hinaus kann der Verkehrsagent ein Modul B zur Verarbeitung zumindest eines Teils der naturalistischen Fahrdaten und der Kartendaten umfassen, um einen oder mehrere entsprechende Grundwahrheits-Fahrzeugsteuerungsrahmen C_i = [c₁ c₂,...c_n] je gegebenen Zeitfenster t_i zu erzeugen, wobei jeder Fahrzeugkontrollrahmen C_i longitudinale und laterale Positionen der jeweiligen Ego-Fahrzeuge enthält. In einer zusätzlichen oder alternativen bevorzugten Ausführungsform umfassen oder bestehen die longitudinalen und lateralen Positionen der jeweiligen Ego-Fahrzeuge aus Beschleunigungs- und Peilwerten, vorzugsweise umfassen oder bestehen sie aus Änderungen der Beschleunigung (ΔBeschleunigung) und Peilung (ΔPeilung), die auf die jeweiligen Ego-Fahrzeuge im Zeitfenster t_i angewendet werden. Die Verwendung von Änderungen der Beschleunigungs- und Peilwerte ist bevorzugt, da aus diesen Werten die Änderungen der Lenkung, der Gaspedaltiefe und der Bremspedaltiefe direkt bestimmt werden können.According to a further or alternative preferred embodiment, the training computer system of the second aspect can be configured such that the traffic agent comprises separate modules so that the respective naturalistic driving data and map data can be processed in a suitable manner. In particular, the traffic agent can comprise a module A for processing at least part of the naturalistic driving data and map data in order to calculate the respective perceptual frames P _i = [p ₁ ,p ₂ ,...p _n ] per given time window t _i , each perceptual frame P _i contains corresponding perception information for (i) the traffic situation, (ii) information about the ego vehicle's own state and (iii) the road geometry. Specific embodiments thereof have already been discussed in relation to the first aspect of the invention and also apply to the inventive training computer system of the second aspect of the invention. In addition, the traffic agent may comprise a module B for processing at least part of the naturalistic driving data and the map data to generate one or more corresponding ground truth vehicle control frames C _i = [c ₁ c ₂ ,...c _n ] per given time window _ti generate, wherein each vehicle control frame C _i contains longitudinal and lateral positions of the respective ego vehicles. In an additional or alternative preferred embodiment, the longitudinal and lateral positions of the respective ego vehicles comprise or consist of acceleration and bearing values, preferably they comprise or consist of changes in acceleration (Δacceleration) and bearing (Δbearing) that are applied to the respective ego Vehicles applied in time window t _i will. The use of changes in acceleration and bearing values is preferred because from these values the changes in steering, accelerator pedal depth and brake pedal depth can be determined directly.

Darüber hinaus kann der Verkehrsagent gemäß dem zweiten erfindungsgemäßen Aspekt das Modul C, auch E2E Entscheider [„Decision Maker“, (E2EDM)] Computermodell genannt, umfassen, das ein oder mehrere neuronale Netze mit Ende-zu-Ende-Modellierung umfasst. Die Ausgaben der Module A und B werden als Eingangsinformationen verwendet, um die ein oder mehreren neuronalen E2E-Netze von Modul C zu trainieren.Furthermore, according to the second aspect of the invention, the traffic agent can comprise the module C, also called the E2E Decision Maker (E2EDM) computer model, comprising one or more neural networks with end-to-end modeling. The outputs of modules A and B are used as input information to train module C's one or more E2E neural networks.

In einer zusätzlichen oder bevorzugten Ausführungsform verwendet das Modul C des Verkehrsagenten drei oder mehr neuronale Netze, wobei vorzugsweise zumindest ein Teil oder alle neuronalen Netze in einer verzweigten Architektur kombiniert sind. Vorzugsweise werden die unabhängigen neuronalen Netze unabhängig voneinander trainiert.In an additional or preferred embodiment, the traffic agent module C uses three or more neural networks, with preferably at least part or all of the neural networks being combined in a branched architecture. The independent neural networks are preferably trained independently of one another.

In einer zusätzlichen oder alternativen bevorzugten Ausführungsform sind zumindest ein Teil oder alle neuronalen Netze tiefe neuronale Netze, wobei vorzugsweise zumindest ein Teil oder alle tiefen neuronalen Netze unabhängig voneinander eine, zwei oder mehr Schichten umfassen oder daraus bestehen, wobei jede Schicht unabhängig voneinander eine Anzahl von Neuronen im Bereich von 1 bis 512 aufweist, wobei weiter bevorzugt die Anzahl der Neuronen je Schicht in dem tiefen neuronalen Netz unterschiedlich ist.In an additional or alternative preferred embodiment, at least some or all of the neural networks are deep neural networks, preferably at least some or all of the deep neural networks independently of one another comprise or consist of one, two or more layers, each layer independently of one another having a number of Having neurons in the range from 1 to 512, wherein more preferably the number of neurons per layer in the deep neural network is different.

Dementsprechend umfasst Modul C in einer Ausführungsform i) ein neuronales Netz zum Folgen der Fahrspur (Modul C2), ii) ein neuronales Netz zum Wechseln der Fahrspur (Modul C3) und iii) ein neuronales Netz zur Funktionsklassifizierung (Modul C1). In diesem Fall kann das Trainingscomputersystem auch ein Modul D umfassen, das so konfiguriert ist, dass es die naturalistischen Fahrdaten und die Kartendaten für die jeweiligen Ego-Fahrzeuge in vorgegebenen Zeitfenstern t_i zu binären, der Grundwahrheit entsprechenden Situationskategorien „Fahrspur folgen“ ${SC}_{L F}^{i} = [{SC}_{LF}^{1}, {SC}_{LF}^{2}, \dots {SC}_{L F}^{n}]$

oder „Fahrspur wechseln“

{SC}_{L C}^{i} = [{SC}_{LC}^{1}, {SC}_{LC}^{2}, \dots {SC}_{L C}^{n}]

zu verarbeiten. Mit anderen Worten, für jedes Zeitfenster t_i und Ego-Fahrzeug ist das erfindungsgemäße Trainingscomputersystem konfiguriert, um zu bestimmen, ob das jeweilige Ego-Fahrzeug der Fahrspur folgt oder die Fahrspur wechselt, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.Accordingly, in one embodiment, module C comprises i) a lane-following neural network (module C2), ii) a lane-changing neural network (module C3), and iii) a function classification neural network (module C1). In this case, the training computer system can also include a module D, which is configured in such a way that it converts the naturalistic driving data and the map data for the respective ego vehicles into binary situation categories “follow the lane” that correspond to the basic truth in predetermined time windows t _i

{SC}_{L f}^{i} = [{SC}_{LF}^{}, {SC}_{LF}^{2}, ... {SC}_{L f}^{n}]

or "change lane"

{SC}_{L C}^{i} = [{SC}_{LC}^{}, {SC}_{LC}^{2}, ... {SC}_{L C}^{n}]

to process. In other words, for each time window t _i and ego vehicle, the training computer system according to the invention is configured to determine whether the respective ego vehicle is following the lane or changing lanes, where i is any number such that i ∈ [1 ,2,... n] and where n represents the limit of the frames driven.

In Bezug auf das Trainingsverfahren des ersten erfindungsgemäßen Konzepts ist das erfindungsgemäße Trainingscomputersystem des zweiten erfindungsgemäßen Aspekts so konfiguriert, dass

- der eine oder die mehreren Wahrnehmungsrahmen P_i werden jeweils als Eingabe für die neuronalen Netze „Fahrspur folgen“ (Modul C2), „Fahrspruch wechseln“ (Modul C3) und „Funktionsklassifikator“ (Modul C1) verwendet,
- der eine oder die mehreren Grundwahrheits-Fahrzeugkontrollrahmen Ĉ_i jeweils als Etikett für ein unabhängiges Training der neuronalen Netze für das Folgen der Fahrspur (Modul C2) und den Fahrspurwechsel (Modul C3) verwendet werden, indem die vorhergesagten Fahrzeugkontrollrahmen Ĉ_i mit den jeweiligen Grundwahrheits-Fahrzeugkontrollrahmen Ĉ_i abgeglichen werden und
- die jeweiligen Grundwahrheits-Situationskategorien ${SC}_{L F}^{i}$
und ${SC}_{L C}^{i}$
je gegebenem Zeitfenster t_i als Etikett verwendet werden, um das neuronale Netz Funktionsklassifikator (Modul C1) unabhängig zu trainieren die entsprechenden Situationskategorien „Fahrspur folgen“ ${\hat{S C}}_{L F}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, \dots {\hat{S C}}_{L F}^{n}]$
oder „Fahrspur wechseln“ ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, \dots {\hat{S C}}_{L C}^{n}]$
durch Abgleich der vorhergesagten Situationskategorie ${\hat{S C}}_{L F}^{i}$
oder ${\hat{S C}}_{L C}^{i}$
mit der jeweiligen Grundwahrheits-Situationskategorie ${SC}_{L F}^{i}$
oder ${SC}_{L C}^{i}$
vorherzusagen.

With regard to the training method of the first inventive concept, the inventive training computer system of the second inventive aspect is configured such that

- the one or more perceptual frames P _i are each used as input for the neural networks “follow lane” (module C2), “change driving call” (module C3) and “function classifier” (module C1),
- the one or more ground truth vehicle control frames Ĉ _i are each used as a label for an independent training of the neural networks for lane following (module C2) and lane changing (module C3) by combining the predicted vehicle control frames Ĉ _i with the respective ground truth -Vehicle control frame Ĉ _i are matched and
- the respective basic truth situation categories ${SC}_{L f}^{i}$
and ${SC}_{L C}^{i}$
be used as a label for each given time window t _i to independently train the neural network function classifier (module C1) the corresponding situation categories "follow the lane" ${\hat{S C}}_{L f}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, ... {\hat{S C}}_{L f}^{n}]$
or "change lane" ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, ... {\hat{S C}}_{L C}^{n}]$
by matching the predicted situation category ${\hat{S C}}_{L f}^{i}$
or ${\hat{S C}}_{L C}^{i}$
with the respective basic truth situation category ${SC}_{L f}^{i}$
or ${SC}_{L C}^{i}$
to predict.

Das erfindungsgemäße Trainingscomputersystem ist ferner so konfiguriert, dass die Ausgabe des Funktionsklassifikators (Modul C1), d.h. die jeweilige Situationskategorie des Ego-Fahrzeugs im Zeitfenster t_i entweder das neuronale Netz „Fahrspur folgen“ (Modul C2) oder „Spurwechsel“ (Modul C3) initiiert.The training computer system according to the invention is also configured so that the output of the function classifier (module C1), ie the respective situation category of the ego vehicle in the time window t _i either the neural network "follow lane" (module C2) or "lane change" (module C3) initiated.

Ein Vorteil des erfindungsgemäßen Computersystems zum Trainieren eines Verkehrsagenten besteht darin, dass der Verkehrsagent trainiert wird, sowohl Längs- als auch Querpositionen eines Fahrzeugs in einer simulierten Umgebung vorherzusagen, wobei die Vorhersage naturalistisches Fahrverhalten widerspiegelt.An advantage of the computer system for training a traffic agent according to the invention is that the traffic agent is trained to predict both longitudinal and lateral positions of a vehicle in a simulated environment, the prediction reflecting naturalistic driving behavior.

Alle Merkmale und Ausführungsformen, die in Bezug auf den zweiten Aspekt der vorliegenden Erfindung offenbart werden, sind allein oder in (Unter-)Kombination mit dem ersten Aspekt oder dem zweiten Aspekt der vorliegenden Erfindung einschließlich jeder der bevorzugten Ausführungsformen davon kombinierbar, sofern die sich ergebende Kombination von Merkmalen für einen Fachmann auf dem Gebiet der Technik vernünftig ist.All features and embodiments disclosed in relation to the second aspect of the present invention can be combined alone or in (sub)combination with the first aspect or the second aspect of the present invention, including any of the preferred embodiments thereof, provided the resulting combination of features is reasonable for a person skilled in the art.

Gemäß dem dritten Aspekt der Erfindung wird ein Computersystem zur Simulation einer Straßenfahrumgebung in Fahrsituationen für ein oder mehrere Fahrzeuge bereitgestellt, das einen oder mehrere Prozessoren, eine mit dem einen oder den mehreren Prozessoren gekoppelte Speichereinrichtung und einen Verkehrsagenten, der ein oder mehrere neuronale Netze zur Entscheidungsfindung in simulierten Fahrsituationen verwendet, umfasst oder daraus besteht, wobei ein oder mehrere neuronale Netze mit Ende-zu-Ende Modellierung verwendet werden, die in der Speichervorrichtung gespeichert und so konfiguriert sind, dass sie von dem einen oder den mehreren Prozessoren ausgeführt werden, dadurch gekennzeichnet, dass der Verkehrsagent gemäß dem computerimplementierten Trainingsverfahren gemäß dem ersten erfindungsgemäßen Aspekt trainiert wird, um als Aktion einen oder mehrere Fahrzeugsteuerungsrahmen Ĉ_i vorauszusagen, die longitudinale und laterale Positionen eines simulierten Fahrzeugs in der Simulationsumgebung je gegebenem Zeitfenster t_i enthalten, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,...n] ist und wobei n der Grenzwert für gesteuerte Rahmen ist. Mit anderen Worten wurde der in dem erfindungsgemäßen Simulationscomputersystem des dritten Aspekts verwendete Verkehrsagent vor dem Einsatz in einer Simulationsumgebung nach dem erfindungsgemäßen Trainingsverfahren trainiert, wobei die Fahrumgebung (Simulation) Umgebungsdaten für ein Ego-Fahrzeug liefern soll, die (i) Karteninformationen, (ii) Verkehrsinformationen und (iii) Verkehrsregeln enthalten. Diese Daten werden dann verarbeitet und Steuerbefehle werden von dem erfindungsgemäß trainierten E2E-Entscheider generiert und zur Positionsaktualisierung an die Umgebung zurückgegeben. Ein solches Computersystem wird auch als integriertes System bezeichnet.According to the third aspect of the invention, there is provided a computer system for simulating a road driving environment in driving situations for one or more vehicles, comprising one or more processors, a memory device coupled to the one or more processors, and a traffic agent, the one or more neural networks for decision making used in simulated driving situations using one or more end-to-end modeling neural networks stored in the storage device and configured to be executed by the one or more processors, characterized that the traffic agent is trained according to the computer-implemented training method according to the first aspect of the invention to predict as an action one or more vehicle control frames Ĉ _i representing longitudinal and lateral positions of a simulated vehicle in the simulationum allowance per given time window t _i , where i is any number such that i ∈ [1,2,...n] and where n is the threshold for gated frames. In other words, the traffic agent used in the third aspect of the simulation computer system according to the invention was trained before being used in a simulation environment according to the training method according to the invention, the driving environment (simulation) being intended to provide environment data for an ego vehicle which (i) map information, (ii) traffic information and (iii) traffic regulations. This data is then processed and control commands are generated by the E2E decision maker trained according to the invention and returned to the environment for position updating. Such a computer system is also referred to as an integrated system.

Wie bereits oben erwähnt, verwendet das erfindungsgemäße Simulationscomputersystem nicht die naturalistischen Fahr- und Kartendaten, die für die Trainingsverfahren als Eingangsinformationen verwendet werden. Daher muss das erfindungsgemäße Simulationscomputersystem kein Modul B' umfassen, das dem Modul B des Trainingscomputersystems entspricht. Im Gegensatz dazu werden die simulierten Fahr- und Kartendaten des simulierten Verkehrsteilnehmers, die von Modul S1' und/oder Modul S2' dem Wahrnehmungsbildungsmodul A' zur Verfügung gestellt werden können, im erfindungsgemäßen Simulationscomputersystem als Eingangsinformationen verwendet, um die jeweiligen Wahrnehmungsrahmen P_i für die jeweiligen Zeitfenster t_i in Modul A' zu generieren. Mit anderen Worten ist das Modul A' so konfiguriert, dass es die jeweiligen Wahrnehmungsrahmen P_i für die jeweiligen Zeitfenster t_i basierend auf den von Modul S1' und/oder S2' gelieferten Simulationsdaten erzeugt. Die Wahrnehmungsrahmen P_i je gegebenem Zeitfenster t_i werden als Eingangsinformationen für das erfindungsgemäße E2E-Entscheider-Computermodell (Modul C') verwendet. Modul C' ist so konfiguriert, dass es einen oder mehrere Fahrzeugkontrollrahmen Ĉ_i vorhersagt, die longitudinale und laterale Positionen, vorzugsweise die Änderungen der longitudinalen und lateralen Position, noch bevorzugter die Änderungen der Beschleunigung und der Peilung je gegebenem Zeitfenster t_i enthalten, die auf ein simuliertes Fahrzeug in der Simulationsumgebung angewendet werden sollen, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.As already mentioned above, the simulation computer system according to the invention does not use the naturalistic driving and map data that are used as input information for the training methods. Therefore, the simulation computer system according to the invention does not have to include a module B', which corresponds to the module B of the training computer system. In contrast, the simulated driving and map data of the simulated road user, which can be made available to the perception formation module A' by module S1' and/or module S2', are used as input information in the simulation computer system according to the invention in order to define the respective perception frames P _i for the respective time window t _i in module A'. In other words, module A' is configured in such a way that it generates the respective perceptual frames P _i for the respective time windows t _i based on the simulation data supplied by module S1' and/or S2'. The perceptual frames P _i for each given time window t _i are used as input information for the E2E decision-maker computer model (module C′) according to the invention. Module C' is configured to predict one or more vehicle control frames Ĉ _i containing longitudinal and lateral positions, preferably the changes in longitudinal and lateral position, more preferably the changes in acceleration and bearing per given time window t _i occurring on a simulated vehicle are to be applied in the simulation environment, where i is an arbitrary number such that i ∈ [1,2,...n] and where n represents the limit of frames driven.

In einer zusätzlichen oder bevorzugten Ausführungsform verwendet das Entscheider-Computermodell (Modul C') des Verkehrsagenten drei oder mehr neuronale Netze, wobei vorzugsweise zumindest ein Teil oder alle neuronalen Netze in einer verzweigten Architektur kombiniert sind.In an additional or preferred embodiment, the decision maker computer model (module C') of the traffic agent uses three or more neural networks, with preferably at least part or all of the neural networks being combined in a branched architecture.

Dementsprechend umfasst das Entscheider-Computermodell (Modul C') des Verkehrsagenten in einer Ausführungsform i) ein neuronales Netz zum Folgen der Fahrspur (Modul C2'), ii) ein neuronales Netz zum Wechseln der Fahrspur (Modul C3') und iii) ein neuronales Netz als Funktionsklassifikator (Modul C1'), die so konfiguriert sind, dass

- einen oder mehrere Wahrnehmungsrahmen P_i der simulierten Fahrzeuge je gegebenem Zeitfenster t_i jeweils als Eingabe für die neuronalen Netze „Fahrspur folgen“ (Modul C2'), „Fahrspur wechseln“ (Modul C3') und „Funktionsklassifikator“ (Modul C3') verwendet werden,
- der Funktionsklassifikator (Modul C1') so konfiguriert ist, dass er die ein oder mehreren Wahrnehmungsrahmen P_i der simulierten Fahrzeuge je gegebenem Zeitfenster t_i in die Situationskategorie „Fahrspur folgen“ ${\hat{S C}}_{L F}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, \dots {\hat{S C}}_{L F}^{n}]$
oder „Fahrspurwechsel“ ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, \dots {\hat{S C}}_{L C}^{n}]$
klassifiziert. Abhängig von der jeweiligen Klassifizierung je gegebenem Zeitfenster t_i, d. h. entweder der Klasse „Fahrspur folgen“ oder der Klasse „Fahrspur wechseln“, initiiert der Funktionsklassifikator (Modul C1') das neuronale Netz von entweder „Fahrspur folgen“ (Modul C2') oder „Fahrspur wechseln“ (Modul C3'). Ein Beispiel: Wenn der Funktionsklassifikator (Modul C1') einen Wahrnehmungsrahmen P₁ zum Zeitfenster t₁ mit der Situationskategorie „Fahrspur folgen“ ${\hat{S C}}_{LF}^{1}$
klassifiziert, dann ist der Funktionsklassifikator (Modul C1') so konfiguriert, dass er das neuronale Netz „Fahrspur folgen“ initiiert, um den Fahrzeugsteuerungsrahmen Ĉ₁ vorherzusagen, der longitudinale und laterale Positionen enthält, vorzugsweise die Änderungen der longitudinalen und lateralen Position, noch bevorzugter die Änderungen der Beschleunigung und der Peilung, die auf das jeweilige simulierte Fahrzeug zum Zeitfenster t₁ anzuwenden sind. Alternativ, falls der Funktionsklassifikator (Modul C1') einen Wahrnehmungsrahmen P₂ zum Zeitfenster t₂ mit der Situationskategorie „Fahrspur wechseln“ ${\hat{S C}}_{LC}^{2}$
klassifiziert, dann ist der Funktionsklassifikator (Modul C1') so konfiguriert, dass er das neuronale Netz „Fahrspur wechseln“ initiiert, um den Fahrzeugsteuerungsrahmen Ĉ₂ vorherzusagen der longitudinale und laterale Positionen enthält, vorzugsweise die Änderungen der longitudinalen und lateralen Position, noch bevorzugter die Änderungen der Beschleunigung und der Peilung, die auf das jeweilige simulierte Fahrzeug zum Zeitfenster t₂ anzuwenden sind.

Accordingly, in one embodiment, the decision-maker computer model (module C') of the traffic agent comprises i) a lane-following neural network (module C2'), ii) a lane-changing neural network (module C3'), and iii) a neural network Network as a function classifier (module C1'), which are configured in such a way that

- one or more perception frames P _i of the simulated vehicles for each given time window t _i as input for the neural networks "follow lane" (module C2'), "change lane" (module C3') and "function classifier" (module C3') be used,
- the function classifier (module C1') is configured in such a way that it assigns the one or more perception frames P _i of the simulated vehicles to the situation category "follow lane" for each given time window t _i ${\hat{S C}}_{L f}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, ... {\hat{S C}}_{L f}^{n}]$
or "lane change" ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, ... {\hat{S C}}_{L C}^{n}]$
classified. Depending on the respective classification for each given time window t _i , ie either the class "follow lane" or the class "change lane", the function classifier (module C1') initiates the neural network of either "follow lane" (module C2') or "Change lane" (module C3'). An example: If the function classifier (module C1') has a perception frame P ₁ for the time window t ₁ with the situation category "follow lane" ${\hat{S C}}_{LF}^{1}$
classified, then the functional classifier (module C1') is configured to initiate the "follow lane" neural network to predict the vehicle control frame Ĉ ₁ containing longitudinal and lateral positions, preferably the changes in longitudinal and lateral position, more preferably the changes in acceleration and bearing to be applied to each simulated vehicle at time window t ₁ . Alternatively, if the function classifier (module C1') has a perception frame P ₂ for the time window t ₂ with the situation category "change lane" ${\hat{S C}}_{LC}^{2}$
classified, then the functional classifier (module C1') is configured to initiate the "change lane" neural network to predict the vehicle control frame Ĉ ₂ containing longitudinal and lateral positions, preferably the longitudinal and lateral position changes, more preferably the Acceleration and bearing changes to be applied to each simulated vehicle at time window t ₂ .

Die Ausgabe des Moduls C' wird dem Modul für die simulierte Fahrumgebung (Modul S2') zur Verfügung gestellt, damit sie auf den simulierten Verkehrsagenten in der Simulationsumgebung angewendet werden kann. Modul S2' ist so konfiguriert, dass es Modul S1' mit den jeweiligen geänderten simulierten Umgebungsdaten versorgt, die Fahrdaten und Kartendaten des simulierten Verkehrsteilnehmers umfassen, so dass Modul S1' Modul A' mit einem geänderten Umgebungsdatensatz versorgt, um den nächsten Wahrnehmungsrahmen zu erzeugen.The output of module C' is provided to the simulated driving environment module (module S2') for application to the simulated traffic agent in the simulation environment. Module S2' is configured to provide Module S1' with the respective modified simulated environment data comprising driving data and map data of the simulated road user, such that Module S1' provides Module A' with a modified environment data set to generate the next perceptual frame.

Die vorliegende Erfindung wird im Folgenden anhand von beispielhaften Ausführungsformen beschrieben, die lediglich als Beispiele dienen und den Umfang des vorliegenden Schutzrechts nicht einschränken sollen.The present invention is described below with reference to exemplary embodiments, which only serve as examples and are not intended to limit the scope of the present protective right.

DETAILLIERTE BESCHREIBUNG DER ABBILDUNGENDETAILED DESCRIPTION OF ILLUSTRATIONS

Weitere Merkmale und Vorteile der vorliegenden Erfindung ergeben sich aus der nachfolgenden Beschreibung von Ausführungsbeispielen der erfindungsgemäßen Aspekte unter Bezugnahme auf die beigefügten Figuren.Further features and advantages of the present invention result from the following description of exemplary embodiments of the aspects according to the invention with reference to the attached figures.

Alle nachstehend in Bezug auf die Ausführungsbeispiele und/oder die begleitenden Figuren offenbarten Merkmale können allein oder in einer beliebigen Unterkombination mit Merkmalen der beiden Aspekte der vorliegenden Erfindung, einschließlich Merkmalen bevorzugter Ausführungsformen davon, kombiniert werden, sofern die sich ergebende Merkmalskombination für einen Fachmann auf dem Gebiet der Technik sinnvoll ist.All features disclosed below in relation to the exemplary embodiments and/or the accompanying figures can be combined alone or in any sub-combination with features of the two aspects of the present invention, including features of preferred embodiments thereof, provided that the resulting combination of features is clear to a person skilled in the art field of technology makes sense.

1a) zeigt eine schematische Darstellung des Verkehrsagenten für die Entscheidungsfindung in simulierten Fahrsituationen (auch „E2E-Fahrzeugsteuerungsmodell“ genannt) 1, der in der Speichereinrichtung gespeichert und so konfiguriert ist, dass er ein oder mehrere neuronale Netze mit Ende-zu-Ende-Modellierung umfasst und das erfindungsgemäße computerimplementierte Trainingsverfahren ausführt. Das erfindungsgemäße Computersystem zum Trainieren eines Verkehrsagenten, der ein Straßenfahrzeug in einer Simulationsumgebung steuert, umfasst oder besteht auch aus einem oder mehreren Prozessoren, einer mit dem einen oder den mehreren Prozessoren gekoppelten Speichervorrichtung, die in 1a) nicht gesondert dargestellt sind. 1a) shows a schematic representation of the traffic agent for decision making in simulated driving situations (also called “E2E vehicle control model”) 1 stored in the storage device and configured to include one or more neural networks with end-to-end modeling and executes the computer-implemented training method according to the invention. The computer system according to the invention for training a traffic agent who controls a road vehicle in a simulation environment also comprises or consists of one or more processors, a memory device coupled to the one or more processors, which in 1a) are not shown separately.

Gemäß 1a) werden die naturalistischen Fahrdaten und Kartendaten (nicht separat dargestellt) als Eingangsinformationen für Modul A (Wahrnehmungsgebäude) und Modul B (Fahrzeugsteuerungsgebäude) verwendet, die in 1a) als kombiniertes Modul 11 dargestellt sind. Die Module A und B können alternativ auch als separate Module vorhanden sein. Die jeweils von den Modulen A und B in Modul 11 erzeugten Ausgangsinformationen werden als Eingangsinformationen verwendet, um den Verkehrsagenten-Entscheider 12 (auch „E2E-Entscheider“ oder Modul C genannt) gemäß dem erfindungsgemäßen Trainingsverfahren zu trainieren, das hierin im Detail beschrieben ist.According to 1a) the naturalistic driving data and map data (not shown separately) are used as input information for module A (perception building) and module B (vehicle control building), which are included in 1a) are shown as a combined module 11. Alternatively, modules A and B can also be present as separate modules. The output information generated by modules A and B in module 11 is used as input information to enable the traffic agent to train decider 12 (also called “E2E decider” or module C) according to the training method according to the invention, which is described in detail herein.

Der erfindungsgemäße Verkehrsagent 1 umfasst beispielsweise ein kombiniertes Modul 11 mit Modul A und Modul B. Modul A ist so konfiguriert, dass es zumindest einen Teil der naturalistischen Fahrdaten und Kartendaten verarbeitet, um die jeweiligen Wahrnehmungsrahmen P_i = [p₁,p₂, ... p_n] je gegebenen Zeitfenster t_i generieren, wobei jeder Wahrnehmungsrahmen P_i entsprechende Wahrnehmungsinformationen für (i) die Verkehrssituation, (ii) Informationen über den Eigenzustand des Ego-Fahrzeugs und (iii) die Straßengeometrie enthält. Spezifische Ausführungsbeispiele wurden bereits in Bezug auf den ersten erfindungsgemäßen Aspekt diskutiert und gelten auch für das erfindungsgemäße Trainingscomputersystem des zweiten erfindungsgemäßen Aspekts. Modul B ist konfiguriert, um zumindest einen Teil der naturalistischen Fahrdaten und der Kartendaten zu verarbeiten, um einen oder mehrere entsprechende Grundwahrheits-Fahrzeugsteuerungsrahmen C_i = [c₁, c₂, ... c_n] je gegebenem Zeitfenster t_i zu erzeugen, wobei jeder Grundwahrheits-Fahrzeugkontrollrahmen C_i longitudinale und laterale Positionen, vorzugsweise Änderungen von longitudinalen und lateralen Positionen, z.B. Änderungen der Beschleunigung (ΔBeschleunigung) und Peilung (ΔPeilung), die auf die jeweiligen Ego-Fahrzeuge je gegebenem Zeitfenster t_i anzuwenden sind, umfasst. Die Verwendung von Änderungen der Beschleunigungs- und Peilwerte ist bevorzugt, da aus diesen Werten die Änderungen der Lenkung, der Gaspedaltiefe und der Bremspedaltiefe direkt bestimmt werden können.The traffic agent 1 according to the invention comprises, for example, a combined module 11 with module A and module B. Module A is configured in such a way that it processes at least part of the naturalistic driving data and map data in order to obtain the respective perception frames P _i =[p ₁ ,p ₂ , . .. p _n ] per given time window t _i , each perception frame P _i containing corresponding perception information for (i) the traffic situation, (ii) information about the ego vehicle's own state and (iii) the road geometry. Specific exemplary embodiments have already been discussed in relation to the first aspect of the invention and also apply to the training computer system of the second aspect of the invention. Module B is configured to process at least a portion of the naturalistic driving data and the map data to generate one or more corresponding ground truth vehicle control frames C _i = [c ₁ , c ₂ , ... c _n ] per given time window t _i , each ground truth vehicle control frame C _i comprising longitudinal and lateral positions, preferably changes in longitudinal and lateral positions, eg changes in acceleration (Δacceleration) and bearing (Δbearing) to be applied to the respective ego vehicles per given time window t _i . The use of changes in acceleration and bearing values is preferred because from these values the changes in steering, accelerator pedal depth and brake pedal depth can be determined directly.

Darüber hinaus kann das kombinierte Modul 11 auch ein zusätzliches Modul D (in 1b) nicht dargestellt) umfassen, das so konfiguriert ist, dass es zumindest einen Teil der Wahrnehmungsrahmen P_i basierend auf den naturalistischen Fahrdaten und den Kartendaten in eine binäre Situationskategorie von entweder „Fahrspur folgen“ oder „Fahrspur wechseln“ je gegebenen Zeitfenster t_i klassifiziert.In addition, the combined module 11 can also have an additional module D (in 1b) not shown) configured to classify at least a portion of the perceptual frames P _i based on the naturalistic driving data and the map data into a binary situation category of either "follow lane" or "change lane" per given time window t _i .

Gemäß einer alternativen oder zusätzlichen bevorzugten Ausführungsform verwendet das Modul 12 (Modul C) des Verkehrsagenten 1 drei oder mehr neuronale Netze, wobei vorzugsweise zumindest ein Teil oder alle neuronalen Netze in einer verzweigten Architektur kombiniert sind. Vorzugsweise werden die unabhängigen neuronalen Netze unabhängig voneinander trainiert. Ein Beispiel für eine solche Konfiguration, bei der der E2E-Entscheider 12 drei neuronale Netze 121 (Funktionsklassifikator), 122 (Fahrspruch folgen) und 123 (Fahrspur wechseln) umfasst, die in einer verzweigten Architektur kombiniert sind, ist in 1b) dargestellt. Als Eingangsinformationen werden die Ausgangsdaten von Modul 11 verwendet.According to an alternative or additional preferred embodiment, the module 12 (module C) of the traffic agent 1 uses three or more neural networks, with preferably at least part or all of the neural networks being combined in a branched architecture. The independent neural networks are preferably trained independently of one another. An example of such a configuration, where the E2E decision maker 12 comprises three neural networks 121 (function classifier), 122 (follow driving instruction) and 123 (change lanes) combined in a branched architecture is shown in FIG 1b) shown. The output data from module 11 is used as input information.

In 1b) ist beispielhaft dargestellt, dass der E2E-Entscheider 12 i) ein neuronales Netz 122 (Modul C2) zum Folgen der Fahrspur, ii) ein neuronales Netz 123 (Modul C3) zum Wechseln der Fahrspur und iii) ein neuronales Netz 121 (Modul C1) zur Funktionsklassifizierung umfasst. In diesem Fall umfasst der Verkehrsagent 1 das Modul D (in 1b) nicht dargestellt) zur Klassifizierung von Situationskategorien, das so konfiguriert ist, dass es die naturalistischen Fahrdaten und Kartendaten für die jeweiligen Ego-Fahrzeuge in vorgegebenen Zeitfenster t_i zu binär korrespondierenden Grundwahrheits-Situationskategorien „Fahrspur folgen" ${SC}_{L F}^{i} = [{SC}_{LF}^{1}, {SC}_{LF}^{2}, \dots {SC}_{L F}^{n}]$

oder „Fahrspur wechseln“

{SC}_{L C}^{i} = [{SC}_{LC}^{1}, {SC}_{LC}^{2}, \dots {SC}_{L C}^{n}]

verarbeitet. Mit anderen Worten, ist das erfindungsgemäße Trainings-Computersystem für jedes Zeitfenster t_i und Ego-Fahrzeug so konfiguriert, das es bestimmt, ob das jeweilige Ego-Fahrzeug der Fahrspur folgt oder die Fahrspur wechselt, wobei i eine beliebige Zahl ist, so dass i ∈ [1,2,... n] ist und wobei n den Grenzwert der gefahrenen Rahmen darstellt.In 1b) is shown by way of example that the E2E decision maker 12 i) a neural network 122 (module C2) for following the lane, ii) a neural network 123 (module C3) for changing lanes and iii) a neural network 121 (module C1) for functional classification. In this case, the traffic agent 1 comprises the module D (in 1b) not shown) for the classification of situation categories, which is configured in such a way that the naturalistic driving data and map data for the respective ego vehicles in predetermined time windows t _i to binary corresponding basic truth situation categories "follow the lane"

{SC}_{L f}^{i} = [{SC}_{LF}^{}, {SC}_{LF}^{2}, ... {SC}_{L f}^{n}]

or "change lane"

{SC}_{L C}^{i} = [{SC}_{LC}^{}, {SC}_{LC}^{2}, ... {SC}_{L C}^{n}]

processed. In other words, the training computer system according to the invention is configured for each time window t _i and ego vehicle in such a way that it determines whether the respective ego vehicle follows the lane or changes lanes, where i is any number such that i ∈ [1,2,... n] and where n represents the limit of the frames driven.

Gemäß diesem Beispiel ist der E2E-Entscheider 12 des erfindungsgemäßen Verkehrsagenten 1 so konfiguriert, dass

- der eine oder die mehreren Wahrnehmungsrahmen P_i jeweils als Eingangsinformationen für die neuronalen Netze „Fahrspur folgen 122“ (Modul C2), „Fahrspur wechseln 123“ (Modul C3) und „Funktionsklassifikator 121“ (Modul C1) verwendet werden,
- der eine oder die mehreren Grundwahrheits-Fahrzeugkontrollrahmen C_i jeweils als Etikett für das unabhängige Trainieren der neuronalen Netze „Fahrspur folgen 122“ (Modul C2) und „Fahrspruch wechseln 123“ (Modul C3) verwendet werden, indem die vorhergesagten Fahrzeugkontrollrahmen C_i mit den jeweiligen Grundwahrheits-Fahrzeugkontrollrahmen C_i abgeglichen werden und
- die jeweiligen Grundwahrheits-Situationskategorien ${SC}_{L F}^{i}$
und ${SC}_{L C}^{i}$
je gegebenem Zeitfenster t_i als Etikett verwendet werden, um das neuronale Netz Funktionsklassifikator 121 (Modul C1) unabhängig voneinander zu trainieren, eine entsprechende Situationskategorie „Fahrspur folgen“ ${\hat{S C}}_{L F}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, \dots {\hat{S C}}_{L F}^{n}]$
oder „Fahrspur wechseln“ ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, \dots {\hat{S C}}_{L C}^{n}]$
der vorhergesagten Situationskategorie ${\hat{S C}}_{L F}^{i}$
oder ${\hat{S C}}_{L C}^{i}$
mit der entsprechenden Grundwahrscheinlichkeitskategorie ${SC}_{L F}^{i}$
oder ${SC}_{L C}^{i}$
abzugleichen.

According to this example, the E2E decider 12 of the traffic agent 1 according to the invention is configured such that

- the one or more perceptual frames P _i are used as input information for the neural networks “follow lane 122” (module C2), “change lane 123” (module C3) and “function classifier 121” (module C1),
- the one or more ground truth vehicle control frames C _i are used as labels for the independent training of the neural networks "follow lane 122" (module C2) and "change driving claim 123" (module C3) by using the predicted vehicle control frames C _i with are matched to the respective ground truth vehicle control framework C _i and
- the respective basic truth situation categories ${SC}_{L f}^{i}$
and ${SC}_{L C}^{i}$
be used as a label for each given time window t _{i in} order to train the neural network function classifier 121 (module C1) independently of one another, a corresponding situation category “follow lane” ${\hat{S C}}_{L f}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, ... {\hat{S C}}_{L f}^{n}]$
or "change lane" ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, ... {\hat{S C}}_{L C}^{n}]$
the predicted situation category ${\hat{S C}}_{L f}^{i}$
or ${\hat{S C}}_{L C}^{i}$
with the corresponding basic probability category ${SC}_{L f}^{i}$
or ${SC}_{L C}^{i}$
to match.

Der E2E-Entscheider 12 ist darüber hinaus so konfiguriert, dass die Ausgabe des Funktionsklassifikators 121 (Modul C1), d.h. die jeweilige Situationskategorie ${\hat{S C}}_{L F}^{i}$

oder

{\hat{S C}}_{L C}^{i}

des Ego-Fahrzeugs zum Zeitfenster t_i entweder das neuronale Netz Fahrspur folgen 122 (Modul C2) oder Fahrspruch wechseln 123 (Modul C3) initiiert.The E2E decision-maker 12 is also configured in such a way that the output of the function classifier 121 (module C1), ie the respective situation category

{\hat{S C}}_{L f}^{i}

or

{\hat{S C}}_{L C}^{i}

of the ego vehicle at the time window t _i either initiates the neural network follow the lane 122 (module C2) or change driving call 123 (module C3).

Ein Vorteil des erfindungsgemäßen Computersystems für das Training besteht darin, dass der Verkehrsagent 1 darauf trainiert wird, sowohl longitudinale als auch laterale Positionen, vorzugsweise Änderungen der longitudinalen und lateralen Positionen, die auf ein Fahrzeug in einer simulierten Umgebung anzuwenden sind, vorherzusagen, wobei die Vorhersage ein natürliches Fahrverhalten widerspiegelt. Einem Beispiel zufolge können die Änderungen der longitudinalen und lateralen Positionen in Form von Änderungen der Beschleunigung und Änderungen der Peilung erfolgen, die auf das simulierte Fahrzeug in einem bestimmten Zeitfenster anzuwenden sind.An advantage of the computer system for training according to the invention is that the traffic agent 1 is trained to predict both longitudinal and lateral positions, preferably changes in longitudinal and lateral positions to be applied to a vehicle in a simulated environment, the prediction reflects natural driving behavior. According to one example, the changes in longitudinal and lateral positions may be in the form of changes in acceleration and changes in bearing to be applied to the simulated vehicle in a given time window.

1c) zeigt eine schematische Darstellung eines erfindungsgemäßen integrierten Simulationscomputersystems 01', das einen erfindungsgemäß ausgebildeten Verkehrsagenten 1' einsetzt, der ein Modul 11' (Modul A') zur Wahrnehmungsbildung auf der Grundlage der von Modul 21' (Modul S1') bereitgestellten simulierten Umgebungsdaten und ein E2E-Entscheidermodell 12' sowie einen oder mehrere Prozessoren und eine mit dem einen oder den mehreren Prozessoren gekoppelte Speichervorrichtung (in 1c) nicht separat dargestellt) umfasst. Das Fahrumgebungsmodul S2' (Simulation) soll in Modul S1' Umgebungsdaten für ein Ego-Fahrzeug liefern, das (i) Karteninformationen, (ii) Verkehrsinformationen und (iii) Verkehrsregeln enthalten. Diese Daten werden dann in Modul 11' verarbeitet, um Wahrnehmungen zu erzeugen, und Steuerbefehle werden von der E2E-Entscheidungsfindung 12' erzeugt und zur Positionsaktualisierung an die Umgebung 22' zurückgegeben. 1c ) shows a schematic representation of an integrated simulation computer system 01' according to the invention, which uses a traffic agent 1' designed according to the invention, which uses a module 11' (module A') to form perceptions on the basis of the simulated environmental data provided by module 21' (module S1') and an E2E arbiter model 12' and one or more processors and a memory device coupled to the one or more processors (in 1c ) not shown separately). The driving environment module S2' (simulation) is intended to supply environment data for an ego vehicle in module S1', which contains (i) map information, (ii) traffic information and (iii) traffic rules. This data is then processed in module 11' to generate perceptions and control commands are generated by E2E decision making 12' and returned to environment 22' for position update.

Das Modul 11' ist so konfiguriert, dass es für das jeweilige simulierte Fahrzeug je Zeitfenster Wahrnehmungsrahmen generiert, die Informationen über (i) die Verkehrssituation, (ii) Informationen über den Eigenzustand des simulierten Fahrzeugs und (iii) die Straßengeometrie enthalten, und die generierten Wahrnehmungsrahmen als Eingangsinformationen für das E2E-Entscheidungsmodul 12' (Modul C') bereitstellt. Das E2E-Entscheidermodul 12' wurde gemäß dem erfindungsgemäßen Trainingsverfahren trainiert. Das E2E-Entscheidermodul 12' ist somit so konfiguriert, dass es als Aktion einen oder mehrere Fahrzeugsteuerungsrahmen Ĉ_i vorhersagt, die longitudinale und laterale Positionen, vorzugsweise Änderungen von longitudinalen und lateralen Positionen, z.B. Änderungen von Beschleunigung und Peilung, enthalten, die auf das simulierte Fahrzeug in der Simulationsumgebung anzuwenden sind.The module 11 'is configured so that it generates for the respective simulated vehicle per time window perception frames containing information about (i) the traffic situation, (ii) information about the inherent state of the simulated vehicle and (iii) the road geometry, and the generated Perceptual framework as input information for the E2E decision module 12 '(module C') provides. The E2E decider module 12' was trained according to the training method according to the invention. The E2E decision-maker module 12' is thus configured to predict as an action one or more vehicle control frames Ĉ _i containing longitudinal and lateral positions, preferably changes in longitudinal and lateral positions, eg changes in acceleration and bearing, which are applied to the simulated vehicle are to be applied in the simulation environment.

Wie bereits oben erwähnt, verwendet das erfindungsgemäße Simulationscomputersystem 01', das den erfindungsgemäßen Verkehrsagenten 1' einsetzt, nicht die naturalistischen Fahr- und Kartendaten, die für das Trainingsverfahren als Eingangsinformationen verwendet werden. Daher muss das erfindungsgemäße Simulationscomputersystem 01' kein Modul B' aufweisen, das dem Modul B des Trainingscomputersystems entspricht. Dagegen werden die simulierten Fahr- und Kartendaten des simulierten Verkehrsagenten 1', die von Modul 21' (Modul S1') an Modul 11' (Modul A') geliefert werden, im erfindungsgemäßen Simulationscomputersystem 01' als Eingangsinformationen verwendet, um die jeweiligen Wahrnehmungsrahmen P_i für die jeweiligen Zeitfenster t_i in Modul 11' (Modul A') zu erzeugen. Mit anderen Worten: Modul 11' (Modul A') ist so konfiguriert, dass es die jeweiligen Wahrnehmungsrahmen P_i für die jeweiligen Zeitfenster t_i auf der Grundlage der von Modul 21' (Modul S1') bereitgestellten Simulationsdaten erzeugt. Die Wahrnehmungsrahmen P_i je gegebenem Zeitfenster t_i, die von Modul 11' erzeugt wurden, werden als Eingangsinformationen für das erfindungsgemäße E2E-Entscheider-Computermodell 12' (Modul C') verwendet. Das Modul 12' (Modul C') ist so konfiguriert, dass es die longitudinale und laterale Position, vorzugsweise die Änderungen der longitudinalen und lateralen Position, besonders bevorzugt die Änderungen der Beschleunigung und der Peilung vorhersagt, die auf den simulierten Verkehrsteilnehmer je gegebenem Zeitfenster t_i anzuwenden sind.As already mentioned above, the simulation computer system 01' according to the invention, which uses the traffic agent 1' according to the invention, does not use the naturalistic driving and map data that are used as input information for the training method. Therefore, the simulation computer system 01' according to the invention does not have to have a module B' that corresponds to the module B of the training computer system. In contrast, the simulated driving and map data of the simulated traffic agent 1', which are supplied by module 21' (module S1') to module 11' (module A'), are used as input information in the simulation computer system 01' according to the invention in order to define the respective perception frames P _i for the respective time window t _i in module 11' (module A'). In other words, module 11' (module A') is configured to generate the respective perceptual frames P _i for the respective time windows t _i based on the simulation data provided by module 21' (module S1'). The perceptual frames P _i per given time window t _i generated by module 11' are used as input information for the E2E decision-maker computer model 12' (module C') according to the invention. The module 12' (module C') is configured to predict the longitudinal and lateral position, preferably the changes in longitudinal and lateral position, most preferably the changes in acceleration and bearing, which will be incident on the simulated road user per given time window t _i are to be applied.

In einer zusätzlichen oder bevorzugten Ausführungsform verwendet das Computermodell des Entscheidungsträgers 12' (Modul C') des eingesetzten Verkehrsagenten 1' drei oder mehr neuronale Netze, vorzugsweise, wobei zumindest ein Teil oder alle neuronalen Netze in einer verzweigten Architektur kombiniert sind.In an additional or preferred embodiment, the computer model of the decision maker 12' (module C') of the deployed traffic agent 1' uses three or more neural networks, preferably with at least part or all of the neural networks being combined in a branched architecture.

Eine Beispielkonfiguration des E2E-Entscheidungsträgers 12' im Einsatz umfasst die analoge Konfiguration des E2E-Entscheidungsträgers 12 wie in 1b) gezeigt. Dementsprechend gelten auch die entsprechenden Einzelheiten und bevorzugten Ausführungsformen, wie sie oben beschrieben wurden.An example configuration of the E2E decision maker 12' in use includes the analogous configuration of the E2E decision maker 12 as in FIG 1b) shown. Accordingly, the corresponding details and preferred embodiments as described above also apply.

So umfasst das Entscheider-Computermodell 12' (Modul C') des eingesetzten Verkehrsagenten 1' i) ein neuronales Netz 122' (Modul C2') für die Spurfolge, ii) ein neuronales Netz 123' (Modul C3') für den Spurwechsel und iii) ein neuronales Netz 121' (Modul C1') für die Funktionsklassifizierung, die so konfiguriert sind, dass

- ein oder mehrere Wahrnehmungsrahmen P_i der simulierten Fahrzeuge je gegebenem Zeitfenster t_i jeweils als Eingabe für die neuronalen Netze "Fahrspur folgen 122''' (Modul C2'), „Fahrspur wechseln 123''' (Modul C3') und „Funktionsklassifikator 121''' (Modul C3') verwendet werden,
- das neuronale Netz Funktionsklassifikator 121' (Modul C1') so konfiguriert ist, dass es die ein oder mehreren Wahrnehmungsrahmen P_i der simulierten Fahrzeuge je gegebenem Zeitfenster t_i in die Situationskategorie „Fahrspur folgen" ${\hat{S C}}_{L F}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, \dots {\hat{S C}}_{L F}^{n}]$

oder „Fahrspur wechseln" ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, \dots {\hat{S C}}_{L C}^{n}]$
klassifiziert. Abhängig von der jeweiligen Klassifizierung je gegebenem Zeitfenster t_i, d.h. entweder der Klasse „Fahrspur folgen“ oder der Klasse „Fahrspur wechseln“, initiiert der Funktionsklassifikator 121' (Modul C1') das neuronale Netz von entweder Fahrspur folgen 122' (Modul C2') oder Fahrspur wechseln 123' (Modul C3'). Ein Beispiel: Wenn der Funktionsklassifikator 121' (Modul C1') einen Wahrnehmungsrahmen P₁ zum Zeitfenster t₁ mit der Situationskategorie „Fahrspur folgen“ ${\hat{S C}}_{LF}^{1}$
klassifiziert, dann ist der Funktionsklassifikator 121' (Modul C1') so konfiguriert, dass er das neuronale Netz „Fahrspur folgen“ 122' initiiert, um den Fahrzeugsteuerungsrahmen Ĉ₁ vorherzusagen, der die longitudinalen und lateralen Positionen, vorzugsweise die Änderungen der longitudinalen und lateralen Position, noch bevorzugter die Änderungen der Beschleunigung und der Peilung enthält, die auf das jeweilige simulierte Fahrzeug zum Zeitfenster t₁ anzuwenden sind. Falls alternativ der Funktionsklassifikator 121' (Modul C1') einen Wahrnehmungsrahmen P₂ zum Zeitfenster t₂ mit der Situationskategorie „Fahrspurwechsel“ ${\hat{S C}}_{LC}^{2}$
klassifiziert, dann ist der Funktionsklassifikator 121' (Modul C1') so konfiguriert, dass er das neuronale Netz „Fahrspurwechsel“ 123' initiiert, um den Fahrzeugsteuerungsrahmen Ć̂₂ vorherzusagen die Änderungen der longitudinalen und lateralen Position, vorzugsweise die Änderungen der longitudinalen und lateralen Position, noch bevorzugter die Änderungen der Beschleunigung und der Peilung, die auf das simulierte Fahrzeug zum Zeitfenster anzuwenden sind t₂.

The decision-maker computer model 12' (module C') of the traffic agent 1' used includes i) a neural network 122' (module C2') for lane following, ii) a neural network 123' (module C3') for changing lanes and iii) a neural network 121' (module C1') for function classification, configured so that

- one or more perception frames P _i of the simulated vehicles for each given time window t _i as input for the neural networks "follow lane 122"' (module C2'), "change lane 123"' (module C3') and "function classifier 121''' (module C3') can be used,
- the neural network function classifier 121' (module C1') is configured in such a way that it includes the one or more perception frames P _i of the simulated vehicles for each given time window t _i in the situation category "follow the lane". ${\hat{S C}}_{L f}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, ... {\hat{S C}}_{L f}^{n}]$

or "change lane" ${\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, ... {\hat{S C}}_{L C}^{n}]$
classified. Depending on the respective classification for each given time window t _i , ie either the “follow lane” class or the “change lane” class, the function classifier 121′ (module C1′) initiates the neural network of either follow lane 122′ (module C2′ ) or changing lanes 123' (module C3'). An example: If the function classifier 121' (module C1') has a perception frame P ₁ for the time window t ₁ with the situation category "follow lane". ${\hat{S C}}_{LF}^{1}$
classified, then the functional classifier 121' (module C1') is configured to initiate the lane-follow neural network 122' to predict the vehicle control frame Ĉ ₁ that includes the longitudinal and lateral positions, preferably the changes in the longitudinal and lateral position, more preferably the changes in acceleration and bearing to be applied to the respective simulated vehicle at time window t ₁ . If, alternatively, the function classifier 121' (module C1') has a perception frame P ₂ for the time window t ₂ with the situation category "lane change" ${\hat{S C}}_{LC}^{2}$
classified, then the function classifier 121' (module C1') is configured to initiate the "lane change" neural network 123' to predict the vehicle control frame - ₂ the longitudinal and lateral position changes, preferably the longitudinal and lateral position changes , more preferably the changes in acceleration and bearing to be applied to the simulated vehicle at time window t ₂ .

Die Ausgangsinformationen des Moduls 12' (Modul C') werden der simulierten Fahrumgebung 22' (Modul S2') zur Verfügung gestellt, um auf den simulierten Verkehrsagenten 1' in der Simulationsumgebung angewendet zu werden. Modul 22' (Modul S2') ist so konfiguriert, dass es Modul 21' (Modul S1') mit den jeweils geänderten simulierten Umgebungsdaten versorgt, die Fahrdaten und Kartendaten des simulierten Verkehrsteilnehmers 1' umfassen, so dass Modul 21' (Modul S1') das Modul 11' (Modul A') mit den geänderten Umgebungsdaten versorgt, um den nächsten Wahrnehmungsrahmen zu erzeugen.The output information of the module 12' (module C') is made available to the simulated driving environment 22' (module S2') in order to be applied to the simulated traffic agent 1' in the simulation environment. Module 22' (module S2') is configured in such a way that it supplies module 21' (module S1') with the respectively changed simulated environment data, which includes driving data and map data of the simulated road user 1', so that module 21' (module S1' ) provides the module 11' (module A') with the changed environmental data to generate the next perceptual frame.

Experimenteller Teilexperimental part

Für die Ausbildung gemäß der vorliegenden Erfindung verwendeten die Erfinder kommerzielle Fahrdaten DataFromSky (DFS, erworben von RCE systems s.r.o. , Tschechische Republik), die Fahrdaten von Fahrzeugen umfassen, die von Menschen für eine Dauer von sechs Stunden auf einem Teil (500 m) der Autobahn A9 in Deutschland gefahren wurden. Der DFS-Datensatz umfasste insbesondere die folgenden Merkmale: Zeitstempel (in Sekunden, s), longitudinale Geschwindigkeit (in Meter/Sekunde, m/s), longitudinale Beschleunigung (in Meter/Quadratsekunde, m/s²) und globale Koordinaten des jeweiligen Fahrzeugs (Verkehrsteilnehmer) (in x-, y-Koordinaten).For the training according to the present invention, the inventors used commercial driving data DataFromSky (DFS, acquired from RCE systems sro, Czech Republic), which includes driving data of vehicles driven by humans for a period of six hours on a part (500 m) of the highway A9 were driven in Germany. In particular, the DFS data set included the following characteristics: time stamp (in seconds, s), longitudinal velocity (in meters/second, m/s), longitudi nal acceleration (in meters/second squared, m/s ² ) and global coordinates of the respective vehicle (road user) (in x, y coordinates).

Darüber hinaus wurde die digitale OpenDRIVE-Karte (heruntergeladen von http://www.opendrive.org/) als Kartendaten in der Simulation verwendet, um Fahrspurpunkte in Bezug auf jede Ego-Position zu generieren, die verwendet wurden, um Straßengeometriedaten für das Modell zu konstruieren, die als Eingabe verwendet werden. Diese Fahrspurpunkte können wie folgt beschrieben werden: $L_{i} = [l_{i}^{1}, l_{i}^{2}, l_{i}^{3}]$

für die aktuelle und zwei benachbarte Fahrspuren eines Subjekt/Ego-Fahrzeugs in einem Zeitintervall t_i wobei jede Fahrspur

l_{i}^{j} = [x_{1}, x_{2}, \dots x_{n}]

als Satz von Koordinaten

x_{j} = [\begin{matrix} x_{j} \\ y_{j} \end{matrix}],

so dass x_n der letzte Punkt auf der Fahrspur

l_{i}^{j}

ist, der sich in einer maximalen Entfernung von 400 m zur Position des Ego/Subjektfahrzeugs bei t_i befindet.In addition, the OpenDRIVE digital map (downloaded from http://www.opendrive.org/) was used as map data in the simulation to generate lane points relative to each ego position, which were used to generate road geometry data for the model to construct to be used as input. These lane points can be described as follows:

L_{i} = [l_{i}^{}, l_{i}^{2}, l_{i}^{3}]

for the current and two adjacent lanes of a subject/ego vehicle in a time interval t _i where each lane

l_{i}^{j} = [x_{1}, x_{2}, ... x_{n}]

as a set of coordinates

x_{j} = [\begin{matrix} x_{j} \\ y_{j} \end{matrix}],

so that x _n is the last point on the lane

l_{i}^{j}

which is at a maximum distance of 400 m from the ego/subject vehicle position at t _i .

Wie bereits in der obigen detaillierten Beschreibung dargelegt, wird der im Rahmen der vorliegenden Erfindung verwendete Wahrnehmungsrahmen in drei Kategorien unterteilt:

1. Verkehrssituation: Die Eingabedaten (DFS und Open DRIVE) werden verarbeitet, um die Informationen über die Sechs-Fahrzeug-Nachbarschaft in Bezug auf jedes Ego-/Subjektfahrzeug zu erstellen, wobei jedes repräsentierte Fahrzeug in den sechs Positionen zwei Informationen bietet: (i) relative Entfernung d zum Ego-Fahrzeug und (ii) relative Geschwindigkeit v_r zur Geschwindigkeit des Ego-Fahrzeugs v_e. 2 zeigt eine schematische Darstellung einer Sechs-Fahrzeug-Nachbarschaftsinformation in einem bestimmten Zeitfenster.

As discussed in the detailed description above, the perceptual framework used in the present invention falls into three categories:

1. Traffic situation: The input data (DFS and Open DRIVE) are processed to create the six-vehicle neighborhood information in relation to each ego/subject vehicle, each represented vehicle in the six positions offering two pieces of information: (i ) relative distance d to the ego vehicle and (ii) relative speed v _r to the speed of the ego vehicle v _e . 2 FIG. 12 shows a schematic representation of six-vehicle neighborhood information in a specific time window.

Wie oben dargelegt, sind die Fahrzeugrollen in einer Sechs-Fahrzeug-Nachbarschaft gemäß der vorliegenden Erfindung wie folgt definiert:

- Der Pkw 311 vor dem ego-Fahrzeug 3 (auf derselben Fahrspur).
- Das Auto 312 folgt dem Ego-Fahrzeug 3 von hinten (auf der gleichen Spur).
- Die beiden Autos 321, 322 vor dem Mittelpunkt des Ego-Fahrzeugs übertragen auf die beiden benachbarten Fahrspuren.
- Die beiden Autos 331, 332 im hinteren Teil des Ego-Fahrzeugs 3 übertragen auf die beiden Nachbarspuren.

As set forth above, the vehicle roles in a six-vehicle neighborhood according to the present invention are defined as follows:

- The car 311 in front of the ego vehicle 3 (on the same lane).
- The car 312 follows the ego vehicle 3 from behind (in the same lane).
- The two cars 321, 322 in front of the center of the ego vehicle are transferred to the two adjacent lanes.
- The two cars 331, 332 in the rear part of the ego vehicle 3 transfer to the two adjacent lanes.

Gemäß 2 liegen alle Positionen zum dargestellten Zeitfenster vor. Alle sechs Fahrzeuge 311, 312, 321, 322, 331 und 332 in der Nachbarschaft des Ego-Fahrzeugs 3 haben den gleichen Abstand d zum Ego-Fahrzeug 3. Die Relativgeschwindigkeit v_r ist die jeweilige Geschwindigkeit v_n eines der sechs Nachbarfahrzeuge 311, 312, 321, 322, 331 und 332 minus der Geschwindigkeit v_e des Ego-Fahrzeugs 3 (unter der Annahme, dass sich die Fahrzeuge in dieselbe Richtung bewegen).According to 2 all positions for the displayed time window are available. All six vehicles 311, 312, 321, 322, 331 and 332 in the vicinity of the ego vehicle 3 have the same distance d to the ego vehicle 3. The relative speed v _r is the respective speed v _n of one of the six neighboring vehicles 311, 312 , 321, 322, 331 and 332 minus the speed v _e of the ego vehicle 3 (assuming that the vehicles are moving in the same direction).

2. Informationen über den Ego-Zustand: Dazu gehören longitudinale Geschwindigkeit, longitudinale Beschleunigung, Winkelabweichung und Peilung des Ego-Fahrzeugs in Bezug auf die Fahrbahnrichtung. Wie oben erwähnt, ist die Winkelabweichung (Ad) definiert als: $A d^{i} = θ_{l a n e}^{i} - θ_{e g o}^{i}$

wobei

θ_{l a n e}^{i} und θ_{e g o}^{i}

die globale Richtung/Ausrichtung der Fahrspur und des Ego-Fahrzeugs zu jedem gegebenen Zeitfenster t_i darstellen.2. Ego state information: This includes longitudinal velocity, longitudinal acceleration, angular deviation and bearing of the ego vehicle with respect to the lane direction. As mentioned above, the angular deviation (Ad) is defined as:

A {i.e}^{i} = θ_{l a n e}^{i} - θ_{e G O}^{i}

whereby

θ_{l a n e}^{i} and θ_{e G O}^{i}

represent the global direction/orientation of the lane and the ego vehicle at any given time window t _i .

3. Straßengeometrie: Gemäß dem vorliegenden Beispielversuch werden die DFS-Fahrdaten und die OpenDRIVE-Kartendaten verarbeitet, um eine halbkreisförmige numerische Darstellung der Straßengeometrie in Form eines Vektors von Verschiebungen, D_j, zu jeder der beiden Fahrbahnbegrenzungen LB1 und LB2 zu jedem Zeitfenster t_i zu erhalten, mit $D_{j} = [d_{1}, d_{2} \dots d_{n}]$

wobei jeder Eintrag D_j Teil einer Folge von Verschiebungspunkten zur Ego-Position ist, die auf der Grundlage ihrer relativen Peilwerte zur Ego-Position in Abständen von 5° um den halbkreisförmigen Bereich vor dem Ego verteilt sind. Daher ist die Länge n des Verschiebungsvektors D_i im Rahmen der Experimente dieser Arbeit auf festgelegt:

n = 180 / 5 = 36

3. Road geometry: According to the present example experiment, the DFS driving data and the _OpenDRIVE map data are processed to produce a semi-circular numerical representation of the road geometry in the form of a vector of displacements, Dj, to each of the two lane boundaries LB1 and LB2 at each time window t _i to get with

D_{j} = [{i.e}_{1}, {i.e}_{2} ... {i.e}_{n}]

where each entry D _j is part of a sequence of offset points to the ego position distributed at 5° intervals around the semi-circular area in front of the ego based on their relative bearings to the ego position. Therefore, in the context of the experiments of this work, the length n of the displacement vector D _i is set to:

n = 180 / 5 = 36

Eine solche halbkreisförmige Straßengeometrie ist in 3 schematisch dargestellt, wobei - der Übersichtlichkeit halber - nur ein Teil der 36 Verschiebungsvektoren D_j des Ego-Fahrzeugs 3 in diesem Fall dargestellt ist.Such a semicircular road geometry is in 3 shown schematically, wherein - for the sake of clarity - only part of the 36 displacement vectors D _j of the ego vehicle 3 is shown in this case.

Die vorliegenden Erfinder untersuchen zwei mögliche Ansätze zur Modellierung des erfindungsgemäßen E2E-Entscheiders unter Verwendung neuronaler Netze in einer Simulationsumgebung:

- Einzelnes Netzmodell, bei dem die Entscheidungsprozesse in einem einzigen neuronalen Netz mit einer Abfolge von Schichten n gelernt werden, wobei n > 3 ∈ ℤ⁺ empirisch bei Experimenten abgeleitet wird.
- Funktional verzweigte Netze, bei denen die Entscheidungsprozesse auf der Grundlage grundlegender Fahrfunktionen, z. B. Fahrspur folgen und Fahrspur wechseln, unterteilt werden. Zur Modellierung dieses Ansatzes werden drei neuronale Netze verwendet, von denen jedes aus einer Folge von n Schichten aufgebaut ist, wobei n > 3 ∈ ℤ⁺ empirisch während der Experimente abgeleitet werden. Diese sind wie folgt:
- Fahrspruchfolgenmodul 122, das zur Steuerung des Fahrzeugs bei allgemeinen Fahrspurfolgeszenarien verwendet wird,
- Fahrspurwechselmodul 123, das zur Steuerung des Fahrzeugs bei allgemeinen Fahrspurfolgeszenarien verwendet wird,
- Funktionsklassifikatormodul 123, das dazu dient, binär zu klassifizieren, ob es sich um eine Situation handelt, in der man der Fahrspur folgt oder die Fahrspur wechselt, und somit eines der beiden entsprechenden Modelle 122 oder 123 initiiert.

The present inventors examine two possible approaches to modeling the E2E decider according to the invention using neural networks in a simulation environment:

- Single network model where the decision processes are learned in a single neural network with a sequence of layers n, where n > 3 ∈ ℤ ⁺ is derived empirically on experiments.
- Functionally branched networks, in which the decision-making processes are based on basic driving functions, e.g. B. follow lane and change lanes, are divided. Three neural networks are used to model this approach, each built up of a sequence of n layers, where n > 3 ∈ ℤ ⁺ are derived empirically during the experiments. These are as follows:
- lane following module 122 used to control the vehicle in general lane following scenarios,
- lane change module 123 used to control the vehicle in general lane following scenarios,
Function classifier module 123, used to classify in binary form whether it is a lane following situation or a lane changing situation, thus initiating one of the two corresponding models 122 or 123.

Jedes dieser Teilmodule wurde unabhängig voneinander trainiert.Each of these sub-modules was trained independently.

Im vorliegenden Versuch zielt das Fahrspurfolgemodul 122 auf zwei abstrakte Szenarien ab:ab:

- Adaptiver Tempomat (ACC): Steuerung der Drosselung/Beschleunigung des Fahrzeugs in Bezug auf das vordere Fahrzeug.
- Verkehrsfreie Lenkung: Steuerung der Lenkung des Fahrzeugs, um die Fahrspur zu halten.

In the present experiment, the lane following module 122 aims at two abstract scenarios:

- Adaptive cruise control (ACC): controls the vehicle's throttle/acceleration in relation to the vehicle in front.
- Traffic Free Steering: Control the steering of the vehicle to stay in lane.

Eine verzweigte neuronale Netzarchitektur, die in zwei völlig getrennte Netze aufgeteilt ist, ohne gemeinsame Schichten für △ Beschleunigung und △ Peilung wurde für das Training mit DFS-Fahrdaten und OpenDRIVE-Kartendaten verwendet. Das Netz wurde anhand der folgenden Verlustfunktion optimiert: $L = \frac{1}{k} \sum_{i = 1}^{k} ({(y_{a c c}^{i} - {\hat{y}}_{a c c}^{i})}^{2} + {(y_{b e a r}^{i} - {\hat{y}}_{b e a r}^{i})}^{2})$

wobei k ∈ ℤ⁺ eine beliebige Anzahl von Datenproben ist, y_acc und ŷ_acc sind Grundwahrheits-Etikette und vorhergesagte Werte von △ Beschleunigung und y_bear und ŷ_bear sind Grundwahrheit-Etikette und vorhergesagte Werte der △ Peilung. Es war ein spezieller Mechanismus erforderlich, um das Problem der Fehlerkaskade während der Testläufe des Modells in der Simulation zu lösen. Dies bedeutete, dass sich winzige Fehler in jedem Einzelbild zu Zuständen addierten, die in den Trainingsdaten für die Fahrspur selten vorkamen, was dazu führte, dass das Modell die Lenkung nicht gut genug kontrollieren konnte, um die Fahrspur zu halten, und das Fahrzeug schließlich aus der Fahrspur geriet. Der Korrekturmechanismus bestand darin, die Trainingsdaten zu filtern, um in jeder Trainingsiteration den Anteil der Situationen zu erhöhen, in denen das Fahrzeug auf beiden Seiten der Fahrspurmitte verschoben war und in denen die △ Peilung so war, dass der Abstand zur Fahrspurmitte verringert wurde.A branched neural network architecture split into two completely separate networks, with no common layers for △ acceleration and △ bearing, was used for training with DFS driving data and OpenDRIVE map data. The network was optimized using the following loss function:

L = \frac{1}{k} \sum_{i = 1}^{k} ({(y_{a c c}^{i} - {\hat{y}}_{a c c}^{i})}^{2} + {(y_{b e a right}^{i} - {\hat{y}}_{b e a right}^{i})}^{2})

where k ∈ ℤ ⁺ is any number of data samples, y _acc and ŷ _acc are ground truth labels and predicted values of △ acceleration, and y _bear and ŷ _bear are ground truth labels and predicted values of △ bearing. A special mechanism was required to solve the error cascade problem during the test runs of the model in the simulation. This meant that tiny errors in each frame added up to conditions that were rare in the lane training data, resulting in the model not being able to control the steering well enough to stay in lane, and the vehicle eventually stalling the lane. The correction mechanism was to filter the training data to increase in each training iteration the proportion of situations where the vehicle was shifted to either side of the lane center and where the △ bearing was such that the distance to the lane center was reduced.

Die Versuchsergebnisse für das Fahrspurfolgemodul 122, das unabhängig an realen Verkehrsdaten der DFS trainiert wurde, sind in den 4 bis 7 dargestellt. Die entsprechenden Diagramme in den 4 und 5 zeigen den Fehler des Fahrspurfolgemoduls 122 je Einzelbild in Bezug auf den Referenzdatensatz für △ Lager- und △ Beschleunigungswerte.The test results for the lane following module 122, which was trained independently on real traffic data from the DFS, are in the 4 until 7 shown. The corresponding diagrams in 4 and 5 show the error of the lane following module 122 per frame in relation to the reference data set for △ storage and △ acceleration values.

Das Diagramm in Figurg 6 zeigt die Verteilung der durchschnittlichen Fahrspurmittenabweichung der Fahrzeuge in den echten Daten, während das Diagramm in 7 dieselbe Verteilung für das Fahrspurfolgemodell zeigt, wenn es in der Simulationsumgebung ausgeführt wird. Die Verteilung der Abweichung von der Fahrspurmitte scheint in den DFS-Daten höher zu sein, was möglicherweise auf die Positionsfehler bei der Aufzeichnung der Daten zurückzuführen ist, die mit bis zu ≈0,5 m angegeben wurden. Das Fahrspurfolgemodul 122 zeigt jedoch eine relativ geringere Abweichung aufgrund des Korrekturmechanismus während des Trainings.The graph in Fig. 6 shows the distribution of the average lane center deviation of the vehicles in the real data, while the graph in Fig 7 shows the same distribution for the lane following model when run in the simulation environment. The distribution of lane center deviation appears to be higher in the DFS data, possibly due to the positional errors in recording the data, reported to be up to ≈0.5 m. However, the lane following module 122 shows relatively less deviation due to the correction mechanism during training.

Das Fahrspurwechselmodul 123 zielte auf bestimmte Situationen ab, in denen das Fahrzeug voraussichtlich auf eine der beiden angrenzenden Fahrspuren wechseln würde. Das Modell sollte lernen, △ Peilung und △ Beschleunigungswerte so vorauszusagen, dass die übergeordnete Entscheidung über die Richtung des Fahrspurwechsels implizit bei jedem Bild in die untergeordnete Ausgabe von △ Peilung und △ Beschleunigungswerten eingebunden wurde. Die langfristige Auswirkung davon ist der sanfte Übergang zur implizit beschlossenen Fahrspur. Dieser Prozess wird im neuronalen Netz selbst berücksichtigt, weshalb der Begriff „implizite“ Entscheidung verwendet wird.The lane change module 123 targeted specific situations where the vehicle would likely change to one of the two adjacent lanes. The model should learn to predict △ bearing and △ acceleration values in such a way that the higher-level decision about the direction of the lane change was implicitly embedded in the lower-level output of △ bearing and △ acceleration values at each frame. The long-term effect of this is the smooth transition to the implicitly decided lane. This process is taken into account in the neural network itself, which is why the term "implicit" decision is used.

Der gleiche Wahrnehmungsvektor wird als Eingabe für das Modell verwendet, mit dem Unterschied, dass der Vektor der Straßengeometrieverschiebung mit Bezug auf die Mittelpunkte der benachbarten Fahrspuren und nicht auf die aktuellen Fahrspuren berechnet wird. Das Netzwerk ist in diesem Fall die gleiche verzweigte Architektur, die im Modell „Fahrspur folgen“ beschrieben ist, und wurde mit der folgenden Verlustfunktion optimiert: $L = \frac{1}{k} \sum_{i = 1}^{k} (| y_{a c c}^{i} - {\hat{y}}_{a c c}^{i} | + | y_{b e a r}^{i} - {\hat{y}}_{b e a r}^{i} |)$

wobei k ∈ ℤ⁺ eine beliebige Anzahl von Datenproben ist, y_acc und ŷ_acc sind Grundwahrheits-Etikette und vorhergesagten Werte von △ Beschleunigung und y_bear und ŷ_bear sind Grundwahrheits-Etikette und vorhergesagten Werte der △ Peilung. In Experimenten wurde festgestellt, dass der absolute mittlere quadratische Fehler gemäß Gleichung 2 aufgrund der kleinen Werte der Grundwahrheit △ Peilung in den Daten der Spurwechselszenarien nicht gut konvergieren konnte und daher der mittlere absolute Fehler, wie in Gleichung 3 dargestellt, bevorzugt wurde.The same perceptual vector is used as input to the model, except that the road geometry displacement vector is computed with respect to the centers of neighboring lanes rather than the current lanes. The network in this case is the same branched architecture described in the Lane Follow model and has been optimized with the following loss function:

L = \frac{1}{k} \sum_{i = 1}^{k} (| y_{a c c}^{i} - {\hat{y}}_{a c c}^{i} | + | y_{b e a right}^{i} - {\hat{y}}_{b e a right}^{i} |)

where k ∈ ℤ ⁺ is any number of data samples, y _acc and ŷ _acc are ground truth labels and predicted values of △ acceleration, and y _bear and ŷ _bear are ground truth labels and predicted values of △ bearing. In experiments, it was found that due to the small values of the ground truth △ bearing in the data of the lane change scenarios, the absolute mean square error according to Equation 2 could not converge well and therefore the mean absolute error as presented in Equation 3 was preferred.

Tabelle 1 unten zeigt die Verteilung der Spurwechselrichtungen in den realen Daten und die Verteilung, die das Modell Fahrspur wechseln 123 während eines Testlaufs in der Simulation zeigt. Es ist zu erkennen, dass die Prozentsätze der entsprechenden Fahrspurwechsel für beide Quellen sehr ähnlich sind, was als grober Anhaltspunkt für die Ähnlichkeit der Daten zu dem Verhalten des Modells mit realen menschlichen Fahrern angesehen werden kann. Tabelle 1: Prozentsatz der vom Fahrspurwechselmodul ausgewählten Richtungen und reale Verkehrsdaten. Linker Fahrspurwechsel Rechter Fahrspurwechsel Grundwahrheitsdaten 56.2 % 43.8 % Modell im Simulationslauf 58.2 % 41.7 % Table 1 below shows the distribution of lane change directions in the real data and the distribution that the Lane Change 123 model shows during a test run in the simulation. It can be seen that the percentages of corresponding lane changes for both sources are very similar, which can be taken as a rough indication of the similarity of the data on the behavior of the model with real human drivers. Table 1: Percentage of directions selected by the lane change module and real traffic data. Left lane change Right lane change ground truth data 56.2% 43.8% model in the simulation run 58.2% 41.7%

Der Funktionsklassifikator 121 soll als Moderator für die beiden Hauptmodule dienen: Fahrspur folgen 122 und Fahrspur wechseln 123. Eine Teilmenge der Wahrnehmungen wird als Eingabe für dieses Modell verwendet, und zwar in Form von (i) der Verkehrssituation und (ii) Informationen über den Ego-Status, um die Einzahlwertwahrscheinlichkeit des Szenarios ${\hat{S C}}_{L C}^{i}$

vorherzusagen ein Fahrspurwechsel P(X = LaneChange) zu sein oder das Szenario

{\hat{S C}}_{L F}^{i}

ein Spurwechsel 1 - P(X = LaneChange) erweitert um die Richtung des Fahrspurwechsels zu sein. Ein einzelnes, vollständig verbundenes neuronales Netz wurde verwendet, um das Modell mit den DFS-Daten zu trainieren, wobei es auf die folgende Kostenfunktion optimiert wurde, die als log loss bekannt ist:

L = \frac{1}{k} \sum_{i = 1}^{k} (- y^{i} log ({\hat{y}}^{i}) + (1 - y^{i}) log (1 - {\hat{y}}^{i}))

wobei k ∈ ℤ⁺ eine beliebige Anzahl von Datenproben ist und y und ŷ die Grundwahrheits- und vorhergesaten Ausgaben des Modells als Wahrscheinlichkeit, dass es sich bei dem Szenario um einen Fahrspurwechsel handelt (und dementsprechend die Wahrscheinlichkeit des Folgens der Fahrspur als 1 - ŷ ist).The functional classifier 121 is intended to serve as a moderator for the two main modules: follow lane 122 and change lane 123. A subset of perceptions is used as input to this model, in the form of (i) the traffic situation and (ii) information about the ego -Status to the single value probability of the scenario

{\hat{S C}}_{L C}^{i}

to predict a lane change P(X = LaneChange) to be or the scenario

{\hat{S C}}_{L f}^{i}

a lane change 1 - P(X = LaneChange) extended to be the direction of the lane change. A single, fully connected neural network was used to train the model on the DFS data, optimizing it to the following cost function, known as the log loss:

L = \frac{1}{k} \sum_{i = 1}^{k} (- y^{i} log ({\hat{y}}^{i}) + (1 - y^{i}) log (1 - {\hat{y}}^{i}))

where k ∈ ℤ ⁺ any number of data samples and y and ŷ are the ground truth and predicted outputs of the model as the probability that the scenario is a lane change (and accordingly the probability of following the lane as 1 - ŷ ).

Tabelle 2 zeigt die Konfusionsmatrix für die Klassifizierung von Fahrspurfolge- und Fahrspurwechselszenarien durch das funktionale Klassifizierungsmodell im Vergleich zu den realen Verkehrsdaten. Die Zahlen in der Tabelle stellen die jeweils aufgezeichneten Instanzen (insgesamt 148.501 Instanzen) in den realen Daten dar. Für das Grundwahrheits-Etikett „Fahrspur folgen“ sagt das erfindungsgemäße Funktionsklassifikator-Mdell 121 in 96.578 Fällen korrekt die Situationskategorie „Fahrspur folgen“ ${\hat{S C}}_{L F}$

und nur in 8.592 Fällen fälschlicherweise die Situationskategorie „Fahrspurwechsel“

{\hat{S C}}_{L C}

voraus. Für das Grundwahrheits-Etikett „Fahrspur wechseln“ sagt das Funktionsklassifikator-Mdell 121 in 35.778 Fällen korrekt die Situationskategorie „Fahrspur wechseln“

{\hat{S C}}_{L C}

und nur in 7.553 Fällen fälschlicherweise die Situationskategorie „Fahrspur folgen“

{\hat{S C}}_{L F}

voraus. Damit weist das Funktionsklassifikator-Mdell 121 einen Präzisionswert von 0,92, einen Recall-Wert von 0,93 und damit einen beachtlichen F1-Wert von 0,925 auf. Tabelle 2: Konfusionsmatrix für die Funktionsklassifizierung im Vergleich zu Referenzdaten

Vorhersagemodell Fahrspur folgen Vorhersagemodell Fahrspur wechseln

Grundwahrheit-Etikett Fahrspur folgen 96578 8592 Grundwahrheit-Etikett Fahrspur wechseln 7553 35778

Table 2 shows the confusion matrix for the classification of lane following and lane change scenarios by the functional classification model compared to the real traffic data. The numbers in the table represent the respective recorded instances (a total of 148,501 instances) in the real data. For the base truth label "follow lane", the function classifier model 121 according to the invention correctly says the situation category "follow lane" in 96,578 cases.

{\hat{S C}}_{L f}

and only in 8,592 cases incorrectly the situation category "lane change"

{\hat{S C}}_{L C}

ahead. For the ground truth label “change lanes”, the function classifier model 121 correctly says the situation category “change lanes” in 35,778 cases

{\hat{S C}}_{L C}

and only in 7,553 cases incorrectly the situation category "Follow lane"

{\hat{S C}}_{L f}

ahead. The function classifier model 121 thus has a precision value of 0.92, a recall value of 0.93 and thus a remarkable F1 value of 0.925. Table 2: Confusion matrix for functional classification compared to reference data

Follow prediction model lane Change lane predictive model

Ground Truth Label Follow Lane 96578 8592 Ground truth label Change lanes 7553 35778

Es wurde festgestellt, dass das allgemeine Verhalten des E2E-Entscheidermodells, wenn es in der Simulationsumgebung ausgeführt wurde, Ähnlichkeit mit dem der menschlichen Fahrer aufweist, die in den realen Verkehrsdaten der DFS gefunden wurden. Die Diagramme in den 8a) und 8b) zeigen den Verhaltenstrend bei der Einhaltung der Geschwindigkeit und des Abstands zum vorausfahrenden Fahrzeug während der Fahrspurfolgeszenarien sowohl für das in der Simulation getestete Modell als auch für die realen Daten. Der Verhaltenstrend der Daten, die von dem erfindungsgemäß trainierten Verkehrsagenten geliefert werden, ist dem Verhaltenstrend der natürlichen DFS-Daten ähnlich.It was found that the general behavior of the E2E decision maker model when run in the simulation environment bears similarity to that of the human drivers found in the real traffic data of the DFS. The diagrams in the 8a) and 8b) show the behavioral trend in maintaining the speed and the distance to the vehicle in front during the lane following scenarios both for the model tested in the simulation and for the real data. The behavioral trend of the data provided by the traffic agent trained according to the invention is similar to the behavioral trend of the natural DFS data.

Das endgültige E2E-Entscheidungsmodul, das anhand von realen Verkehrsdaten (DFS) trainiert wurde, wurde auch im Hinblick auf die Einhaltung der Sicherheitsvorschriften in Bezug auf Kollisionen mit Fahrzeugen des umgebenden Verkehrs bewertet. Das E2E-Entscheidermodul wurde in der Simulationsumgebung getestet und führte in einer 30-minütigen Fahrt in einer dicht befahrenen Umgebung nur zu 4 leichten Kollisionen (Frontalkollisionen bei niedrigen Geschwindigkeiten).The final E2E decision module, trained on real traffic data (DFS), was also evaluated in terms of safety compliance related to collisions with surrounding traffic vehicles. The E2E decision-making module was tested in the simulation environment and resulted in only 4 light collisions (low-speed frontal collisions) in a 30-minute drive in a congested environment.

Claims

Computer-implemented training method of a traffic agent for navigating a road vehicle in a simulation environment, characterized in that the method comprises or consists of the following steps: a. Provision of driving data for one or more time windows t _i = [t ₁ , t ₂ , ... t _n ] for one or more road vehicles as ego vehicles, each of which is driven by a human being in a realistic situation on a road, and Provision of map data on the respective street for the given time windows t _i , b. Processing at least part of the driving data and the map data from step a) into one or more corresponding perceptual frames P _i = [p ₁ ,p ₂ ,...p _n ] per given time window t _i , each perceptual frame P _i corresponding perceptual information for ( i) the traffic situation, (ii) information about the state of the ego vehicle and (iii) the road geometry, c. Processing at least part of the driving data and map data from step a) into one or more corresponding ground truth vehicle control frames C _i = [c ₁ , c ₂ , ... c _n ] per given time window t _i , each vehicle control frame C _i longitudinal and lateral Positions of the respective Ego-Fahr contains witness, d. training a decision maker computer model of the traffic agent with the one or more perceptual frames P _i per given time window t _i from step b) as input to the model and with the one or more ground truth vehicle control frames C _i per given time window t _i from step c ) as a label for training the model, where the decider uses one or more neural networks with end-to-end modeling and is configured corresponding vehicle control frames Ĉ _i = [c ₁ , c ₂ , ... c _n ] comprising the longitudinal and predict lateral positions of the respective ego vehicle by matching the predicted vehicle control frames Ĉ _i with the respective ground truth vehicle control frames C _i , where i is any number such that i ∈ [1,2,...n] and where n represents the limit of the driven frames.

Training method according to one of the Claims 1 , which also uses the driving data from step a) of the respective ego vehicles for each given time window t _i to binary corresponding basic truth situation categories of “follow the lane”

{SC}_{L f}^{i} = [{SC}_{LF}^{1}, {SC}_{LF}^{2}, ... {SC}_{L f}^{n}]

=

or "change lane"

{SC}_{L C}^{i} = [{SC}_{LC}^{1}, {SC}_{LC}^{2}, ... {SC}_{L C}^{n}]

processed and wherein the decision-maker computer model of the traffic agent in step d) comprises i) a follow lane neural network, ii) a lane change neural network and iii) a function classifier neural network, wherein - the one or more perceptual frames P _i each as input be used for the neural networks "follow lane", "change lane" and function classifier", - the one or more ground truth vehicle control frames C _i are used as labels for an independent training of the neural networks for lane follow and change lane by the predicted vehicle control frames Ĉ _i are compared with the respective basic truth vehicle control frames C _i and the respectively applied basic truth situation category

{SC}_{L f}^{i}

or

{SC}_{L C}^{i}

be used as a label for each given time window t _i to train the neural network function classifier independently, a corresponding situation category "follow lane"

{\hat{S C}}_{L f}^{i} = [{\hat{S C}}_{LF}^{1}, {\hat{S C}}_{LF}^{2}, ... {\hat{S C}}_{L f}^{n}]

or "change lane"

{\hat{S C}}_{L C}^{i} = [{\hat{S C}}_{LC}^{1}, {\hat{S C}}_{LC}^{2}, ... {\hat{S C}}_{L C}^{n}]

by matching the predicted situation category

{\hat{S C}}_{L f}^{i}

or

{\hat{S C}}_{L C}^{i}

with the respective actual situation category

{SC}_{L f}^{i}

and

{SC}_{L C}^{i}

to predict, where i is any number such that i ∈ [1,2,...n] and where n represents the limit of frames driven.

training procedure claim 1 or 2 , wherein the driving data in step a) for each of the given road vehicles include or consist of one or more status characteristics of the respective ego vehicles for each given time window t _i , preferably including or consisting of longitudinal speed, longitudinal acceleration and position of the respective road vehicle in X- or Y coordinates per given time window t _i .

Training method according to one of the Claims 1 until 3 , wherein the map data from step a) contain corresponding road information which comprises or consists of i) the number of lanes of the respective road and ii) the position of the lanes in X and Y coordinates for given time windows t _i .

Training method according to one of the Claims 1 until 4 , wherein the traffic situation in step b) includes or consists of six-vehicle neighborhood information, each vehicle shown in the six positions i) the relative distance of the respective vehicle to the ego vehicle and ii) the relative speed of the respective vehicle to the speed of the ego -Vehicle includes or consists of.

Training method according to one of the Claims 1 until 5 , wherein the eigenstate information of the respective ego vehicles in step b) comprises or consists of the longitudinal velocity, the longitudinal acceleration and the bearing to the road direction (angular deviation Ad).

training procedure claim 6 , where the angular deviation Ad is defined as

A {i.e}^{i} = θ_{right O a i.e}^{i} - θ_{e G O}^{i}

whereby

θ_{right O a i.e}^{i} and θ_{e G O}^{i}

represent respectively the bearing of the road and the ego vehicle at any given time window t _i , where i is any number such that i ∈ [1,2,...n] and where n is the limit of frames driven.

Training method according to one of the Claims 1 until 7 , wherein the road geometry in step b) comprises or consists of a numerical representation of a respective roadway geometry in relation to the ego vehicle, wherein the numerical representation is preferably selected from a circular or semi-circular geometry.

training procedure claim 8 , where the circular or semi-circular numerical representation of the respective lane geometry with two lane boundaries is in the form of a vector of displacements D _j to each of the two lane boundaries at any given time window t _i with

D_{j} = [{i.e}_{1}, {i.e}_{2} ... {i.e}_{n}]

where each entry D _j is part of a sequence of displacement points to the ego vehicle position calculated on the basis of their relative bearings to the ego position at intervals of 1° or more around the circular or semi-circular area in front of and/or behind the ego vehicle and where the length n of the displacement vector D _j represents 1 to 360 for the circular geometry and 1 to 180 for the semi-circular geometry.

Training method according to one of the Claims 1 until 9 , wherein the longitudinal and lateral positions of the respective ego vehicles in step c) and step d) include or consist of acceleration and bearing values, preferably include or consist of changes in acceleration and bearing values, which affect the respective ego vehicles in the time window t _i are applied.

Computer system training a traffic agent who navigates a road vehicle in a simulation environment, comprising or consisting of one or more processors, a memory device coupled to the one or more processors and a traffic agent as a decision-maker in simulated driving situations using one or more neural networks with end- to-end modeling stored in the storage device and configured to be executed by the one or more processors, characterized in that the traffic agent is configured to execute the computer-implemented training method according to any one of Claims 1 until 10 executes

Computer system for simulating a road driving environment in driving situations for one or more vehicles, which comprises or consists of one or more processors, a memory device coupled to the one or more processors and a traffic agent that uses one or more neural networks as decision makers in simulated driving situations, using one or more neural networks with end-to-end modeling stored in the storage device and configured to be executed by the one or more processors, characterized in that the traffic agent is trained according to the computer-implemented training method according to a the Claims 1 until 10 has been trained to predict as an action one or more vehicle control frames Ĉ _i containing longitudinal and lateral positions to be applied to a simulated vehicle in the simulation environment, where i is any number such that i ∈ [1,2,. ..n] and where n represents the limit of the frames driven.

Computer system for training a traffic agent according to claim 10 or 11 or for simulating a road traffic environment in a driving situation according to claim 12 wherein three or more neural networks are used, preferably at least part or all of the neural networks are combined in a branched architecture.

computer system according to Claim 13 , wherein at least some or all of the neural networks are deep neural networks, wherein preferably at least some or all of the deep neural networks independently of one another comprise or consist of one, two or more layers, each layer independently of one another having a number of neurons in the range of 1 to 512, the number of neurons per layer in the deep neural network also preferably being different.