NL2035943B1

NL2035943B1 - Socially-compliant automated driving in mixed traffic

Info

Publication number: NL2035943B1
Application number: NL2035943A
Authority: NL
Inventors: Dong Yongqi; Van Arem Bartholomeus; Zhang Li; Farah Haneen
Original assignee: Univ Delft Tech
Priority date: 2023-10-02
Filing date: 2023-10-02
Publication date: 2025-04-10
Also published as: WO2025075500A1

Abstract

Methods and systems are disclosed for controlling a vehicle. The method comprises receiving or determining information about a current state of the vehicle, about a target state of the vehicle and about a current state of an other traffic participant. The method further comprises iteratively optimising a set of control parameters for the vehicle based on an overall cost value associated with the set of control parameter values. The optimisation comprises: determining a potential future state of the vehicle, based on the current state of the vehicle and the set of control parameter values; determining a first cost value using a first 10 cost function, the first cost value being based on a difference between the potential future state and the target state of the vehicle; predicting a future state of the other traffic participant, based on the current state of the other traffic participant; determining a second cost value using a second cost function, the second cost value being based on a perceived risk posed by the vehicle in the potential future state of the vehicle to the other traffic 15 participant in the future state of the other traffic participant; and determining the overall cost value associated with the set of control parameter values, based on the first cost value and the second cost value. Based on the optimised set of control parameters, control signals may be determined that configure the vehicle to adjust its speed and/or steering angle. + Fig. 1

Description

NL36996/Sv-TD

Socially-compliant automated driving in mixed traffic

Technical field

This disclosure relates to automated driving, and in particular, though not exclusively, to methods and systems for controlling an automated vehicle, and to a computer program product enabling a computer system to perform such methods.

Background

Fully autonomous vehicles on roads have been demonstrated to be beneficial to road safety and efficiency. However, the gradual development and deployment of automated vehicles (AVs) and advanced driver assistance systems (ADAS) at various levels results in mixed traffic conditions, where automated vehicles need to interact with human driven vehicles (HDVs). Thus, making automated vehicles’ behaviour understandable, expected, and accepted by human drivers through so-called social-aware driving models is critical for road safety and efficiency under various manoeuvres, especially challenging ones, e.qg., driving on weaving sections, highly curved roads, and driving through roundabouts.

One approach to social-aware driving is to use a model-based method. For example, a game-theoretic-based decision-making approach can be combined with Model Predictive

Control (MPC) under the dynamic bicycle model to build a complete architecture tackling scenarios such as lane changing, overtaking, etc. This approach requires estimation of model parameters for different environments and is not robust to different scenarios.

US 2021/0146984 A1 describes a method comprising an autonomous vehicle estimating the social-value orientation of another vehicle, and to adapt to their driving style, in order to increase efficiency and safety of the autonomous vehicle. However, this requires observing the other vehicle for some amount of time in order to obtain a reasonable estimate of the other vehicle's behaviour, and is therefore not robust and not easily applicable to other situations than it has been trained for.

There is therefore a need in the art for a device and method that increase efficiency and safety of autonomous vehicles.

Summary

It is an aim of embodiments in this disclosure to provide a system and method for controlling a vehicle that avoids, or at least reduces the drawbacks of the prior art.

In an aspect, this disclosure relates to a computer-implemented method for controlling a vehicle. The method comprises receiving or determining information about a current state of the vehicle, receiving or determining information about one or more target states of the vehicle at one or more respective future time steps, and receiving or determining information about a current state of an other traffic participant. The method further comprises iteratively optimising a set of control parameters for the vehicle based on an overall cost value associated with the set of control parameter values. The optimisation comprises: determining one or more potential future states of the vehicle for each of the one or more future time steps, based on the current state of the vehicle and the set of control parameter values; determining a first cost value using a first cost function, the first cost value being based on a difference between the one or more potential future states of the vehicle and the one or more target states of the vehicle at the one or more future time steps; predicting a future state of the other traffic participant at the one or more future time steps, based on the current state of the other traffic participant; determining a second cost value using a second cost function, the second cost value being based on a perceived risk posed by the vehicle in the one or more future states of the vehicle to the other traffic participant in the respective one or more future states of the other traffic participant at the one or more future time steps; and determining the overall cost value associated with the set of control parameter values, based on the first cost value and the second cost value. Based on the optimised set of control parameters, control signals may be determined that configure the vehicle to adjust its speed and/or steering angle.

Thus, optimised control parameters may be determined that balance achievement of a target state for the ego vehicle (e.g., following a planned trajectory at optimal speed) with minimisation of a perceived risk (and hence, hindrance) for at least one other traffic participant. This is achieved by determining a future trajectory for the ego vehicle (for a given set of control parameters), and determining two cost values for this future trajectory: a first cost value Jego, that essentially quantifies the deviation from a reference trajectory for the ego vehicle, and a second cost value Jorner that essentially quantifies the perceived risk or potential hindrance of the future trajectory to the other traffic participant. By taking both cost values into account, the advantage to the ego vehicle may be balanced against the disadvantage to the other traffic participant.

The ego vehicle is typically an automated vehicle. The other traffic participant can be a road user, such as a pedestrian or another vehicle. The other vehicle can be either an automated vehicle or a human driven vehicle. Hereinafter, the term “other vehicle” may be used to refer to any traffic participant, including those not commonly considered vehicles, such as pedestrians. The (ego) vehicle may be a road vehicle such as a car, e.g., a passenger car, a truck, a bus, et cetera. The vehicle can also be a waterborne or airborne vehicle, such as a vessel, a submarine, a plane, a drone, et cetera.

In an embodiment, determining the overall cost value comprises determining a weighted sum of the first cost value and the second cost value, preferably the overall cost value being determined by Joa = COS @ Jego + sina Jother, Wherein fio, represents the overall cost value, Jego represents the first cost value, Jother represents the second cost value, and a represents a social value orientation score of a behaviour of the vehicle, wherein preferably 0° < a < 90°.

Balancing the cost to oneself with the cost to another based on a parametric angle « is also known as a Social Value Orientation (SVO). Social value orientation is a known social psychology-derived approach, and is utilized to measure how individuals make the trade-off between personal benefits and the benefits to others. Different human drivers possess different priorities concerning safety, efficiency, and attitudes toward other vehicles, reflecting their different driving styles, e.g., aggressive and defensive. These driving styles may be encoded in a known manner based on the parameter «.

Furthermore, the needs of passengers of the ego vehicle may vary from time to time, and case by case. For example, for daily commuters and those in a hurry, the efficiency of their journey may be assigned with a higher priority, whereas an elderly or sick person in the vehicle may place more weight on comfort level, and be more willing to give precedence to others to ensure safety. This can be achieved by adjusting the parameter a.

The first and the second cost values may have a similar scale (magnitude), or may be rescaled to a have a similar scale.

In an embodiment, the first cost value is based on a difference between the current state of the vehicle and the future state of the vehicle, preferably on a difference between a current acceleration and a future acceleration and/or between a current steering angle and a future steering angle and/or between a current speed and a future speed and/or between a current heading and a future heading.

Passenger comfort can be increased by minimizing changes in vehicle velocity (speed and heading), and by minimizing jerk, i.e., changes in acceleration and steering angle. Minimising a cost function that increases with increases in one or more of the abovementioned parameters results in a smoother, more comfortable trajectory.

In an embodiment, the difference between the one or more future states of the vehicle and the corresponding one or more target states of the vehicle comprises one or more of: a lateral difference between a future position and a corresponding target position of the vehicle, a longitudinal difference between the future position and the corresponding target position of the vehicle, a difference between a future heading and a corresponding target heading of the vehicle, and a difference between a travelling distance between a current position and the future position and a corresponding target travelling distance.

Separating a difference between a predicted trajectory and a target (or reference) trajectory of the ego vehicle into a longitudinal component (along the target trajectory) and a lateral component (normal to the target trajectory) can decrease the computational load.

Moreover, this separation allows for different weights to be assigned to the different components; generally, a lateral deviation should be penalised more heavily, as it may result in the vehicle going outside its driving lane or even off-road.

In an embodiment, determining the second cost value comprises: determining a longitudinal distance between a future position of the vehicle and a future position of the other traffic participant along a predicted trajectory of the other traffic participant; determining a lateral distance between a future position of the vehicle and the predicted trajectory of the other traffic participant; and determining the second cost value based on determined longitudinal distance and the determined lateral distance.

For example, the second cost value may decay as a polynomial function of the longitudinal distance. The second cost value may decay as an exponential function of the lateral distance. More in particular, the second cost value Jorher may be proportional to

Jomer © (s — 50)? exp(—(d/2 0)?), wherein s represents the longitudinal distance, s, represents a reference longitudinal distance, d represents the lateral distance, and © represents a lateral scaling factor. The lateral scaling factor ¢ may depend on, e.g., the longitudinal distance s, and may depend on the sign of the lateral distance d.

Thus, a risk field for the other traffic participant may be determined. Field-based planning and control can reduce the occurrence of hazards and is highly robust to different scenarios. Such a risk field has been shown to mimic human risk assessment.

In an embodiment, the second cost value is based on a combined mass of the vehicle and the other traffic participant and/or on a difference in velocity between the vehicle and the other traffic participant, preferably the second cost value being proportional to the combined mass of the vehicle and the other traffic participant and/or to the difference in velocity between the vehicle and the other traffic participant.

The seriousness of a collision may increase with the momentum and/or kinetic energy associated with the collision. Hence, the second cost function may take the momentum and/or the kinetic energy into account, associating a higher cost value with a higher momentum and/or kinetic energy. The momentum is proportional to the product of the combined mass of the vehicles and the magnitude of the velocity difference, whereas the kinetic energy is proportional to the product of the combined mass with and the square of the magnitude of the velocity difference.

In an embodiment, the information about the current state of the vehicle comprises one or more of: 5 a current position of the vehicle with respect to a lane in which the vehicle is moving, a current heading of the vehicle with respect to the lane, a current position of the vehicle with respect to an external coordinate system, a current heading of the vehicle with respect to an external coordinate system, a current position of the vehicle with respect to a planned trajectory, a current heading of the vehicle with respect to a planned trajectory, a current speed or velocity of the vehicle, a current steering angle of the vehicle relative to the heading of the vehicle, and a mass of the vehicle.

These parameters may be used to determine a future trajectory of the ego vehicle, for example, using the so-called kinematic bicycle model and model predictive control (MPC), and/or to determine the severity of a collision between the ego vehicle and the other vehicle.

In an embodiment, the information about the target state of the vehicle comprises one or more of: a target position of the vehicle with respect to a lane in which the vehicle is moving, a target heading of the vehicle with respect to the lane, a target position of the vehicle with respect to an external coordinate system, a target heading of the vehicle with respect to an external coordinate system, a target position of the vehicle with respect to a planned trajectory, a target heading of the vehicle with respect to a planned trajectory, a target speed or velocity of the vehicle, a target steering angle of the vehicle relative to the heading of the vehicle.

These parameters may be used to determine, e.g., a deviation from the planned trajectory.

In an embodiment, the information about the other traffic participant comprises one or more of! a position of the other traffic participant with respect to the vehicle, a heading of the other traffic participant with respect to the vehicle, a speed or velocity of the other traffic participant, a steering angle of the other traffic participant, a size of the other traffic participant, a vehicle type of the other traffic participant, a mass of the other traffic participant.

These parameters may be used to determine a future trajectory of the other traffic participant, for example using the so-called kinematic bicycle model (if the other traffic participant is a road vehicle), and/or to determine an severity of a collision between the ego vehicle and the other traffic participant.

It is noted that for an automated vehicle, the entire optimisation is preferably done essentially in real time. This may be understood as the system having a total reaction time that is at least on par with that of a human driver. This includes data acquisition, data (pre)processing, optimisation, model solving, and actuating the results. Hence, the methods as described herein may strike a balance between accuracy and computational burden.

In general, there are several challenges related to mixed traffic comprising both automated vehicle and human-driven vehicles. The first one is to ensure the safety and comfort of all users on the road. It is important to understand the intention of human drivers correctly and try to work with the human driven vehicles correspondingly. Machines and humans do not understand the danger/risk in the same way. Thus, for automated vehicles to cooperate better with human drivers, they need is to “think” more like humans and anticipate possible dangers to interact with other human-driven vehicles safely.

Furthermore, for social-aware driving, the automated vehicle's objective should be balancing its own benefits and the benefits of other vehicles. The automated vehicle may consider the different driving styles and characteristics of human drivers, thus making the automated vehicle accepted by human driven vehicles. Different human drivers possess different priorities concerning safety, efficiency, and attitudes toward other vehicles, reflecting their different driving styles, e.g., aggressive, and defensive. Similarly, the driving style of automated vehicles may be determined by the needs of its passengers, which may vary from time to time, and case by case. For example, for daily commuters and those in a hurry, the efficiency of their journey typically has a high priority. While, if there is an elderly or sick person in the vehicle, he/she probably will place more weight on comfort level and be more willing to give precedence to others to ensure safety.

To tackle these challenges, this disclosure describes an integrated social-aware planning and control algorithm. The algorithm comprises a first model to determine a cost for efficient route planning for the automated vehicle. This first model can be based, e.g., on

Model Predictive Contouring Control (MPCC). The algorithm comprises a second model to estimate the risk posed to others by a proposed trajectory. This second model can be based, e.g., on the Driver's Risk Field (DRF) model, which allows modelling of the surrounding drivers’ perceived risk when interacting with the automated vehicle. A third model balances the advantages for the automated vehicle with the risk for the other. The third model can be based, e.g., on Social Value Orientation (SVO). The Social Value Orientation is a social psychology-derived approach, which can be utilized to measure how individuals make the trade-off between personal benefits and the benefits to others.

The integration of a weighted risk to other vehicles into a model predictive control (MPC) framework allows the integration of both planning and control. This integration avoids approaching the motion planning and feedback control hierarchically, and therefore brings more stability to the system. By changing the parameter(s) of the third model, different driving styles may be implemented, i.e., egotistic and prosocial. An egotistic vehicle will minimise any increase in its own cost, whereas a prosocial vehicle will prefer a minor increase in its own cost or the surrender of part of its benefits to reduce the danger of other vehicles. For example, when approaching an intersection more or less together with another vehicle, an egotistic vehicle may accelerate and take precedence, whereas the prosocial vehicle may decelerate and give way.

The methods according to embodiments described herein can handle also complex manoeuvres, e.g., driving through roundabouts (both single-lane and two-lane) with large curvature, which is one of the most accident-prone scenarios.

In an embodiment, the first cost value, the second cost value, and/or the overall cost value is determined using a model with trained hyperparameters, wherein the trained hyperparameters have been trained by iteratively performing the steps of: selecting a set of hyperparameters of the model; simulating an environment comprising the vehicle and the other traffic participant; determining input parameters for the model based on the simulated environment; determining the optimised set of control parameters using the model with the selected set of hyperparameters;update the simulated environment, based on the optimised set of control parameters; and determining a reward associated with the hyperparameters, based on the updated simulated environment. Training the model may further comprise selecting a set of trained hyperparameters based on the rewards associated with the hyperparameters.

Such a training method may also be referred to as deep reinforcement learning. Deep reinforcement learning may result in a well-trained model.

In an aspect, this disclosure relates to a method of training one or more first hyperparameters of a first cost function for determining a first cost value associated with a set of control parameter values, the first cost value being based on a difference between one or more potential future states of a vehicle and one or more target states of the vehicle at one or more future time steps, and/or one or more second hyperparameters of a second cost function for determining a second cost value associated with the set of control parameter values, the second cost value being based on a perceived risk posed by the vehicle in the one or more future states of the vehicle to another traffic participant in respective predicted one or more future states of the other traffic participant at the one or more future time steps,

and/or one or more third hyperparameters of a third cost function for determining an overall cost value associated with the set of control parameter values, the overall cost value being based on the first cost value and the second cost value. The method comprises iteratively performing the steps of: selecting a set of first, second, and/or third hyperparameters; simulating an environment comprising the vehicle and the other traffic participant; determining input parameters for the model based on the simulated environment; determining the optimised set of control parameters using the first, second, and/or thirs cost function with the selected set of first, second, and/or third hyperparameters; update the simulated environment, based on the optimised set of control parameters; and determining a reward associated with the selected set of first, second, and/or third hyperparameters, based on the updated simulated environment. The method further comprises determining a set of trained first, second, and/or third hyperparameters based on the rewards associated with the first, second, and/or third hyperparameters.

In an aspect, this disclosure relate to a control system for an automated vehicle comprising a processor and a a computer readable storage medium storing executable program code, communicatively coupled to the processor, wherein responsive to executing the executable program code, the processor is configured to perform executable operations, the executable operations comprising a method as described above.

In an embodiment, the control system further comprises one or more sensors communicatively coupled to the processor for providing information about the current state of the vehicle and/or information about the target state of the vehicle and/or information about the current state of the other vehicle.

In an embodiment, the control system further comprises one or more actuators communicatively coupled to the processor, the one or more actuators being configured to, in response to receiving the control signals, adjust the vehicle’s speed and/or steering angle.

In an aspect, this disclosure relate to an automated vehicle, comprising a control system as described above.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system”. Functions described in this disclosure may be implemented as an algorithm executed by a microprocessor of a computer. Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied, e.g., stored, thereon.

Any combination of one or more computer readable medium{s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non- exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fibre, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fibre, cable,

RF, etc., or any suitable combination of the foregoing. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including a functional or an object oriented programming language such as Java, Scala, C++, Python or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer, server or virtualized server.

In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer {for example, through the Internet using an Internet Service Provider).

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.

These computer program instructions may be provided to a processor, in particular a microprocessor or central processing unit (CPU), or graphics processing unit (GPU), of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer, other programmable data processing apparatus, or other devices create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Brief description of the drawings

The embodiments will be further illustrated with reference to the attached schematic drawings, in which:

Fig. 1 schematically depicts a method according to an embodiment;

Fig. 2 illustrates a kinematic bicycle model as may be used in an embodiment;

Fig. 3A and 3B illustrate a model predictive contouring model as may be used in an embodiment;

Fig. 4A-C illustrate a driver’s risk field model as may be used in an embodiment;

Fig. 5 illustrates a social value orientation model as may be used in an embodiment;

Fig. 6 schematically depicts a method according to an embodiment;

Fig. 7A-G are graphs showing experimental results obtained with embodiments of the invention;

Fig. 8 is a flowchart of a method according to an embodiment;

Fig. 9 depicts a block diagram of a system according to an embodiment; and,

Fig. 10 depicts a block diagram illustrating an exemplary data processing system configured to perform one or more method steps according to an embodiment.

Detailed description

In Kolekar ef al., ‘Human-like driving behaviour emerges from a risk-based driver model’, Nature Communications 11 (2020) art, 4850, pp. 1-13, the authors developed a

Driver's Risk Field (DRF) model to quantify the risks perceived by drivers. By coupling the

Driver's Risk Field to a controller that maintains the perceived risk below a predefined threshold, they generated human-like driving behaviour. The required model parameters of the human driver were obtained through simulation. In addition, the model does not require real-time parameter estimation, improving the robustness regarding different environments.

Although field-based planning and control can reduce the occurrence of hazards and is highly robust to different scenarios, little consideration is given to social cooperation and the impact of different driving styles on social compliance to surrounding human driven vehicles.

On the other hand, the capability of Model Predictive Control-based controllers to handle multiple-input multiple-output (MIMO) systems with various constraints makes these controllers particularly suitable for real-world automated (or even autonomous) vehicle planning and control.

MPC methods assume a finite look-ahead horizon for which control signals are calculated to optimize an objective function. MPC allows direct planning and control of the vehicle, whether driving on the highway or parking in low-speed scenarios with different predicted models. Current methods tend to see the human driven vehicle simply as an obstacle and, an optimization goal of the MPC is to move away from the obstacle on the highway. This can lead to unexpected scenarios where vehicles are seen as dangerous objects even if they are driving in the same direction with no conflicts (e.g., on a parallel lane), and it is hard to tackle uncertain environments such as an intersection. To improve on these models, the partially observable Markov decision process (POMDP) has been employed for decision-making before using the MPC based model, allowing it to handle more uncertain scenarios.

Furthermore, several models have been developed in the art that optimise for multiple vehicles simultaneously. However, these models only enable cooperation between connected automated vehicles, and they generally struggle in the presence of other users on the road. In particular, they fail to deliver social-aware driving.

Thus, one disadvantage of MPC based models, is that it is difficult to take into account the risks faced by other vehicles on the road, while purely using the aforementioned social cooperation-based path planning method alone can result in a less flexible and less reliable path.

There is therefore a need in the art for a social-aware driving algorithm that can safely control the motion of a vehicle even in complex manoeuvres, such as driving through a roundabout, while being able to handle potential conflicts with surrounding human driven vehicles and considering different levels of interests of other road users.

Fig. 1 schematically depicts a method according to an embodiment. The method optimizes control parameters of a vehicle, based on a planned trajectory of the vehicle and taking into account effects on another vehicle. The vehicle for which the control parameters are being optimized may be referred to as the “ego vehicle”. The vehicle is typically an automated vehicle, e.g., a car, more specifically a passenger car. The other vehicle may be any type of vehicle, e.g., a car, a truck, a motor cycle, a bicycle, et cetera. The other vehicle can be an automated vehicle (AV) or a human-driven vehicle (HDV).

Thus, the method comprises various parts to determine a total cost value for a given set of control parameters 102. Each of these parts will be discussed in more detail below.

First, there is a vehicle model 120 that determines a response of the vehicle to a set of potential control parameter values. In combination with a current state of the vehicle 130, this results in a (potential) future position or future trajectory of the vehicle 104. Subsequently, for each potential position or trajectory, a first cost function determines a first cost value based on the planned trajectory 106, and a second cost function determines a second cost value based on the (predicted) position or trajectory of the other vehicle 108. Finally, the first and second cost values are combined into a total or overall cost value 110, balancing the optimal trajectory for the ego vehicle with the potential risk or discomfort for the other vehicle. By minimizing the cost value, a set of output control parameters 112 may be obtained.

Various models can be used to model the vehicle, for example the Two-Track Model or the Kinematic Ackermann Model (KAM). In general, a vehicle model comprises a state

X(t) and a control input U(t), where the evolution of the state is governed by the control parameters, i.e., X(t + At) = f(X(£), U(t)). Typically, this model is discretised over time, such that X;4 = f(X;,U;). In the current application, the vehicle state typically comprises at least a position (x,y) and a velocity (x, y), or equivalent parameters. Here, a dot denotes a time-derivative. For many vehicles, it is more natural to express the change in position as a heading 1 and speed v (i.e, essentially in polar coordinates), which may be related, for example, as follows

Xx =vcosy, y= vsiny v= JE2 +92, y = arctan2(%,y)

For such vehicles, the speed can typically be influenced by accelerating/decelerating, i.e. by action of the engine(s) and/or brakes, whereas the heading is typically influenced by steering.

Although the examples herein are typically restricted to two spatial coordinates, it is noted that for some vehicles, motion is essentially one-dimensional (e.g., rail vehicles), whereas for other vehicles, motion is essentially three-dimensional (e.g., submarines, planes, drones). For such vehicles, the vehicle state may be defined in, respectively, one or three spatial dimensions.

A much-used example of a vehicle model is the so-called kinematic bicycle model, in which a vehicle is described by a state X; = [x;, Vv ¥;, v;], whose evolution is governed by an input state U, = [a;, §;]. Here, (x;,y;) represents a position of the centre of mass of the vehicle (in some initial Cartesian coordinate system) at time step i, ; and v; represent the inertial heading and speed, respectively, a; represents the acceleration, and §; represents the steering angle of the front wheels.

The kinematic bicycle model may be given by: x = vcos(y +f), y =vsin(y + f) (1) =7sinf, v=a (2) where is the slip angle of the current velocity of the centre of mass with respect to the longitudinal axis of the vehicle, given by:

B =tan! 7 tan 5) (3)

L +1;

Here, L. and I; are the distance between the centre of mass of the vehicle and the rear and front axle, respectively. These parameter values may be stored in a memory of the vehicle (based on, e.g., an empty vehicle), or they may be determined dynamically based on, e.g., front wheel load and rear wheel load.

Despite its name, the kinematic bicycle model can also be applied to other vehicles such as cars. Several variations are known in the art, e.g., based on the reference position of the vehicle (centre front axle, centre rear axle, centre of gravity), and the presence or absence of a front-wheel steering and rear-wheel steering. These variations lead to slightly different equations. Some models are futher simplified by neglecting the slip angle 8 (such that £ disappears from eq. (1) and the changes in heading reduces to 1 = Titan ó) Other, more complicated models, may also include, e.g., wheel slip. In principle, any suitable vehicle model may be used. A more detailed description of the kinematic bicyle model is given below with reference to Fig. 2.

In general, it is an aim of the models described herein to determine optimised control inputs. Hence, the output of the full model depends on the vehicle model being used. In the current example, the control inputs are acceleration a and steering angle §.

As discussed, for the above-described dynamic bicycle model, the input parameters 130 are the current position (x, v), the current speed v, and the current heading 21. These parameters may be obtained from, e.g., a navigation module, which may use, e.g., satellite navigation. Additionally or alternatively, other inputs may be used, e.g., a vehicle's speedometer, cameras, LIDAR system, RADAR systems, et cetera. These systems may determine the position and velocity of the vehicle with respect to, e.g., driving lanes, the road surface, et cetera. Other vehicle models may use additional or different input parameters, such as each wheel's angle with respect to the vehicle frame, roll, pitch, and yaw sensors, et cetera.

Based on the vehicle model 120 and the current vehicle state 130, a future vehicle state may be predicted 104 for any given set of control parameters. As will be discussed in more detail below, instead of a single future vehicle state, a sequence of future vehicle states may be determined. A group of one or more vehicle states will be referred to as a vehicle trajectory.

Based on the predicted vehicle trajectory and a reference trajectory 132, a cost value may be determined 106, using a cost function that associates a cost with a difference between the predicted trajectory and the reference trajectory. The reference trajectory may be determined based on the current state of the vehicle and a planned path obtained from, e.g. a navigation module and a lane detection module. For example, the navigation module may provide high-level trajectory planning information on which lane is to be driven in, and the lane detection module may determine a position within the lane.

An example of such a cost function is provided by a path-following model, e.q., a model known as model predictive path-following control (MPFC). Another example of such a cost function is a nonlinear MPC model, e.g., a model predictive contouring control (MPCC) model. In this model, a longitudinal error, a lateral error, and an orientation error are determined (estimated) as illustrated in Fig. 3A-C. A reference position (x ef, Vref) may be determined, e.g., based on the current position (x, y) of the vehicle, the planned path, and a reference speed vrer > 0. The reference speed may be thought of as an ‘ideal speed’, and may be determined based on local circumstances, e.g., the local maximum speed, current visibility, curvature of the road, et cetera. The reference heading Yer is typically chosen as the tangent of the path at the reference position.

In an example, the respective errors may be defined (in some Cartesian coordinate system) as:

Ê, = —(x = Xrer) sin Yrer + (V — Vrt) COS YWrer

Ep = (x — Xrer) COS Pre + (V — Vref) SIN Per

E, =1— (cos cos Per + sin sine)

Here, (x,y) is the actual position of the (estimated) centre of mass of the ego vehicle, (xref, Vref) iS the corresponding reference position, is the actual heading, and ref is the reference heading. E. is the lateral error (also known as the contouring error) which indicates how far the vehicle deviates from the planned path, É, is the longitudinal error indicating how far the vehicle is behind, and E, is the orientation error. Evidently, the choice of coordinate system is arbitrary. In some implementations, a coordinate system centred on the ego vehicle may be used (such that x, y, and 1 are always zero). Instead of the centre of mass of the vehicle, a different reference point may be used, e.g., the front of the vehicle.

A cost function may be defined as a weighted sum of these errors:

Jeont = 4 Ec + Ev + qo Eo where, q., g;, and g, are the respective weights for the lateral, longitudinal, and orientation errors. These weights 122 (and any other cost function parameters, when applicable) are typically stored in a memory. These weights may be trained as using a deep reinforcement learning method (DRL) which will explained in more detail below with reference to Fig. 6.

In many applications, a so-called progress variable 8 is added which measures the arc length of the planned path. This results in a cost function

J=4 fc +qÊ +9 Ê, 990 or

J=d Ê +qÊ +9 Ê 9% Ô

In some applications, ô may be implemented as an optimisable control parameter.

In order to improve the forward visibility of the vehicle, a so-called look-ahead error may be introduced, which corresponds to a contouring error at a reference point further away, referred to as the look-ahead point (x;4, Via) with associated look-ahead heading y/,:

Ep = —(x — x0) sina + (y — Via) COS Pia

Together with the previously defined cost function, this leads to a cost function:

J=qÊ +qÊ +90 Eo + qua Bra — 4, 0

In general, results may be further improved using model predictive control. This is an optimisation technique wherein, in each optimisation step, a plurality of future time steps is optimised. A model defining a relation between control parameters and system state is used to predict the future behaviour of the system. Thus, in each time step, the first optimised time step is implemented, and the following time steps are reoptimized based on updated system inputs. This generally leads to more stable behaviour and may reduce, e.g., overcorrection due to system inertia. In many applications, each step in the plurality of steps has the same weight, but in other applications, different (generally decreasing) weights may be used. This way, it is possible to account for an increase in uncertainty in the predicted system behaviour. Again, the weights may be learned through deep reinforcement learning to match the properties of ego vehicle with different driving styles and conditions.

Application of model predictive control to the above-defined model results in a cost function:

Np Np

Iupce = D (de Ede + Qu Ble + Go Ed + Quoc Blu) = Du bi 4) k=1 k=1 where k represents a time step, Np represents the so-called step horizon, and a subscript & on a parameter representing that parameter for the system state at the corresponding time step.

In a typical embodiment, Np may be between about 5 and about 50, for instance about 10-20, e.g., about 15 (within a certain range, the larger Np is, the more accurate the control will be, but the more computational time will be required), and & may correspond to a time step of about 5-50 ms, e.g., about 20 ms. The look-ahead time is typically larger than the optimisation horizon, i.e., At;, > Np At, . Depending on the implementation, the look- ahead position and orientation may be the same for all optimisation time steps, or may increase with the time steps.

In order to increase comfort for the passengers in the vehicle, a balance may be found between limiting changes in acceleration and steering angle on the one hand and maximising path accuracy and speed on the other hand. Depending on the control variables,

changes in other quantities may be similarly limited. If the vector of control variables at step k is represented by u;, an additional term may be included in the cost function, e.g.,

Np

Jeomt = tess = well OF Jeom = > lt = ts k=1 for implementations without and with model predictive control, respectively. Here, represents a suitable norm, e.g., the 2-norm.

The above cost functions may be combined to yield a total cost function for the ego vehicle, e.g.,

Jego = Pi Jmpcc + 2 Jcomt where f£, and §, are weight factors to adjust the relative weights of path optimisation and passenger comfort. These weights may be related via, e.g., 92 = 1 —f,, or fi; = cosf ‚2 = sin 6. The weights may be fixed or adjustable, based on, e.g., passenger input.

In addition to an ego cost value 106 which is used to optimise the control parameters without consideration of other vehicles, a cost value is determined 108 that quantifies the effect of varying the control parameters on another vehicle. The other vehicle can be any type of vehicle that may be encountered by the ego vehicle, e.g., a car, truck, motorcycle, bicycle, et cetera, and may include, in this context, other traffic participants such as pedestrians. The other vehicle can be an autonomous vehicle or a human-driven vehicle. In some embodiments, the cost value for the other vehicle is computed for all of zero or more other vehicles within a predetermined range from the ego vehicle.

The cost for the other vehicle may be based on a perceived risk posed by the ego vehicle to the other vehicle. This perceived risk typically depends both on the state and future trajectory of the ego vehicle and on the state and future trajectory of the other vehicle.

Therefore, the method typically comprises receiving a vehicle model 124 of the other vehicle.

This can be the same model as for the ego vehicle (e.g., the same kinematic bicycle model), or a different model. The model may depend on the type of the other vehicle; for example, different models may be used for cars and pedestrians. The vehicle model may be limited by the availability of parameters describing the state of the other vehicle, and by the accuracy with which the parameters may be estimated.

Typically, the parameters 134 describing the other vehicle comprise at least the position of the other vehicle, its orientation, and its speed. These parameters may be determined relative to the ego vehicle or relative to some external (inertial) reference frame.

If the ego vehicle and the other vehicle share a data connection, the parameters may be shared via the data connection. Otherwise, the parameters may be obtained using one or more sensors, such as, auditory sensors, LIDAR, RADAR, cameras, vehicle connectivity (vehicle-to-vehicle, vehicle-to-infrastructure communication), et cetera, and appropriate signal processing software. Other potentially relevant parameters include steering angle and/or radius of curvature if the other car is not driving in a straight line, acceleration, vehicle size, vehicle type, and vehicle mass.

The perceived risk may be modelled using the so-called Driver's risk Field (DRF) model 114, which computes a risk based on the position of the ego vehicle relative to the (predicted) trajectory of the other vehicle. The longitudinal distance (along the trajectory) and the lateral distance (in a direction normal to the trajectory) are typically treated separately. In particular, the risk may decay as a polynomial function of the longitudinal distance and as an exponential function of the lateral distance.

In a typical embodiment, the risk p is proportional to po (s — so)2 exp(—(d/2 0)?), (5) wherein s represents the longitudinal distance between the other vehicle and the ego vehicle, sg represents a reference longitudinal distance, d represents the lateral distance between the other vehicle and the ego vehicle, and ¢ represents a factor that scales the width of the risk field (which may be different for the left and right sides of the trajectory). In the embodiment of eq. (5), o represents the width (standard deviation) of the Gaussian distribution.

The reference longitudinal distance s, may depend on the speed of the other vehicle, e.g., aS Sy = Vother tla, where Votper is the speed of the other vehicle and ¢, is a look-ahead time. This look-ahead time is, in principle, unrelated to the look-ahead distance used in the

MPCC model described above. The look-ahead time may depend on the type of vehicle; e.g. a heavy vehicle such as a truck may have a larger look-ahead time, taking into account the longer braking distance. The risk may be scaled with a scaling factor p which represents, in a sense, the risk-averseness of the driver of the other vehicle (i.e, a higher value of p corresponds to a higher perceived risk), leading to: h(s, Vother) = P (5 — Vother tia)? wherein h represents the height of the risk field, and hence the risk-averseness of the driver.

As noted, the variable d represents the (lateral) distance between the centre of mass of the ego vehicle and the current trajectory of the other vehicle. If the other vehicle is making a turn, d may be given by: d=G xT GF where (x, yr) is the centre of the curve and R the radius of curvature; (x,y) is again the position of the centre of mass of the ego vehicle. The variable g; represents the width of the risk field. When the other vehicle is not driving straight (but making a curve), the width may be different on the inside and outside of the curve. For example, one may define: op = (u +k; |&otnerl) s +c, i € {inner, outer}

wherein u represents a slope of the widening of the driver's risk field when driving straight and c represents a base width of the driver's risk field. The base width may depend on, e.g., the width and type of the other vehicle. The parameters k; allow for (side-dependent) deviations in shape when steering, where ó,ther represents the steering angle of the other vehicle (it is noted that “inner” and “outer” are only defined for §,her # 0, but for óother = 0, the contribution of k; vanishes).

Thus, in this example, the risk may be given by: d2

DRFother = AS, Votner) €Xp (- zi with h(s, Vother) = P (S = Vother tia)?

As noted, p, 4, ky, ko, tig, c are parameters representing the behaviour of the driver of the other vehicle. These parameters 126 can be fixed (e.g., based on experiments / literature), trained (e.g., using a deep reinforcement learning method as described with reference to Fig. 6), or estimated (e.g., based on car type: one might assume a different profile for sports cars than for family cars; c may depend on an estimated width of the other car, t,, may depend on an estimated braking distance of the other car, et cetera). Estimated parameter values can also (explicitly) depend on other circumstances; e.g., p and 4 might depend on visibility, e.g., having higher values with lower visibility; trained values may implicitly have a similar dependenc, based on the training scenarios.

The driver's risk field model is discussed in more detail below with reference to Fig. 4A—4C. A more complete overview of the driver’s risk field is provided by S.B. Kolekar,

Driver's risk field. A step towards a unified driver model, PhD thesis, (TU Delft 2021), which is hereby incorporated by reference in its entirety.

The gravity of a collision increases with an increase in mass of the vehicles involved inthe collision. The gravity of the collision furthermore increases with a difference in velocity between the involved vehicles. Therefore, the collision risk may be multiplied by a ‘gravity’ factor based on, e.g., the momentum exchange or kinetic energy of a collision, which may be given by, respectively:

I= (m+ Mother) |B — Boter

I= 1, (Mm + Mother) IB — Potner |? where © and vor represent the velocities of the ego vehicle and the other vehicle, respectively, i.e., ¥ = [x,V]7 and Zomer = [¥other Votherl’ (both measured in the same coordinate system). In a coordinate system centred on the ego vehicle, 5 = 0 and the speed

Vother OF the other vehicle may be used.

Furthermore, m represents the mass of the ego vehicle. This may be estimated based on, e.g., an (empty) vehicle mass stored in a memory of the vehicle, possibly in combination with some sensor information {e.g., tire pressure, number of closed seat belts, et cetera).

Similarly, "ner represents the mass of the other vehicle. This may be estimated based on, e.g. size of the other vehicle (possibly using size categories), image recognition used to identify the brand/type of car and a stored database, et cetera.

Taken together, this may result in a cost function of the form:

Jother = I X DRFqther for example,

Jother = 1(M + Mother; |P — Votner |) X DRE other (S, d, Vother) as described above. When several other vehicles are sufficiently close to the ego vehicle to be taken into account, the cost value Jother May contain contributions from each of the other vehicles, which may be determined in the same manner.

When the trajectory for the ego vehicle is optimised over a plurality of time steps (as explained above with respect to model predictive control), the perceived risk may be determined for each time step, based on the modelled position of the ego vehicle at that time step and the corresponding position of the other vehicle at the same time step. Similarly, the velocity difference may be determined for each time step. In general, the behaviour of the other vehicle may be assumed to be independent of the behaviour of the ego vehicle, such that the trajectory of the other vehicle needs to be predicted only once (rather than for each optimisation step).

Finally, an overall cost value Jota1 may be determined based on the cost value J; for the routing of the ego vehicle and the cost value J ner for the risk for the other vehicle, for example:

Jrotal = 1 Jserr + Q2 Jother-

Where a, and a, are weight factors to adjust the relative weights of path optimisation and passenger comfort on the one hand, and (perceived) risk to others on the onther hand.

These weights may be related via, e.g., a; = 1 — a4, Or a; = cosa, a, = sina. This last option is also known as social value orientation; in the current context, typically 0° < a < 90°.

The weights 128 may be fixed or adjustable, based on, e.g., passenger input. The weights can also be trained, for instance using deep reinforcement learning as described with reference to Fig. 6. The social value orientation model is described in more detail with reference to Fig. 5A and 5B.

Again, the total cost value may be determined for each of a plurality of time steps, in order to enable model predictive control.

For an automated vehicle, the entire optimisation must be done essentially real-time.

This may be understood as having a total reaction time that is at least on par with that of a human driver. This includes data acquisition, data (pre)processing, optimisation, and actuating the results.

Fig. 2 illustrates a kinematic bicycle model as may be used in an embodiment. The kinematic bicycle model may be used as a simplified model for four-wheeled vehicles, wherein the two front wheels and the two rear wheels are represented by a single front wheel 202 and rear wheel 204, respectively, each with a single steering angle &; and é,, respectively. The steering angle represents the angle between the longitudinal axis 206 of the vehicle and the wheel(s). For many vehicles, the rear wheels do not steer, such that §, = 0.

The figure shows the vehicle 200 in a global Cartesian coordinate system (X,Y) with coordinates (x, vy) representing the position of a reference point of the vehicle. In the current example, the reference point is the centre of gravity of the vehicle. Other reference points are also used in the art, such as the centre of the front or rear axle. The distance between the reference point and the front axle is denoted with le, and the distance between the reference point and the rear axle is denoted with L,.. In this coordinate system, the (inertial) heading 1 represents the angle between the longitudinal axis 206 of the vehicle and the x-axis. The velocity v represents the direction and speed of travel of the vehicle (i.e., 7 = (x,y), where an overhead dot represents a time derivative). The angle between the velocity of the vehicle © and the longitudinal axis of the vehicle is called the slip angle and is denoted by 8. For small values of 8, B ~ tan é;.

When the vehicle is not driving in a straight line, a turning can be defined with centre 0 210 and radius R, which can be used to derive the relations between 3, B, and §;. The resulting equations have been given above in egs. (1)—(3). As already noted, in this simplified model, the vehicle can be controlled based on the acceleration a in the direction of the velocity (i.e., a = v with v = |v|) and the front steering angle 6, without taking into account vehicle dependent quantities such as the vehicle's aerodynamics, rolling resistance, cornering stiffness of the tires, et cetera. Inclusion of such parameters might require the model to be trained for each individual vehicle.

Fig. 3A-C illustrate a model predictive contouring model as may be used in an embodiment.

As an example, the basis of the cost function for the ego vehicle may be provided by the model predictive contouring control (MPCC) formulation. The main idea of this approach is to track the position of the ego vehicle regarding a reference point on the path and to introduce a new state quantity 8 which measures the progress, so that it is intuitively possible to balance the maximization of progress along the path with the minimization of lateral, longitudinal and angular offset from the reference path.

Even better results may be obtained by adding a look-ahead point, or "far point", which is used mainly as a second reference point to minimize contouring error. It may be defined in a similar way to the lateral error from the reference path.

The progress variable 8 can be seen as the distance that the vehicle had moved along the path; that is, it is a variable that parametrises the reference trajectory. In an embodiment, 8 is chosen such that it measures the path length s, i.e., such that =1.

MPC methods assume a finite look-ahead horizon for which control signals are calculated to optimize an objective function.

Compared with MPC, the state vector in MPCC is updated with 8 to &ypec = [x,y,%,v,6]" and the input of the model is updated by the progress rate as: umpcc = [a, 8, o]". The goal of MPCC is to maximize the progress 8 and track the reference trajectory.

The contouring error £, and the longitudinal error E; are also linked to progress. To improve the efficiency, an approximation is adopted to calculate the two errors:

The contouring error £, is defined as the normal deviation from the desired path, and can be expressed as

Ee = —(xX = Xrep) SIN Yrer + (V — Vrer) COS Pret with

V

Pref = arctan GZ) where 8,.(x, y) is the value of the path parameter where the distance between the point (Crer(@), Vrer(6)} and (x,y) is minimal, as shown in Fig. 3A. The multi-objective control problem involves selecting the control input such that the solution traverse near the desired geometric path, minimising contouring error while maximising path speed.

It is assumed that the desired path (xrer(98), Vrer(8)) is parameterised by arc length, i.e. o = 1, where s denotes the distance travelled along the path. Arc length parameterisation of general curves is nontrivial, however techniques exist in the literature for approximate arc length parameterisation. The vehicle model is augmented with the following dynamics

Or =O + Va, Vg € [0,Vnax], Vmax > 0, (6) where v; is a virtual input to be determined by the controller and 8; denotes the value of the path parameter at time step k. Since the path is parameterised by arc length, v is directly proportional to the path speed. Also, non-reversal of the path is guaranteed, since v‚ = 0.

It is proposed to use 8;,, whose evolution is governed by eq. (6), as an approximation to 8, (x, yi). The contouring error is then approximated by

Ec = (x = xref) sin — (y — ren) COSY

Let E, denote the path distance that (xd(Gr), yd(@r)) lags (xd(8k), yd(8k)) and approximate E; as

Ey = (x — Xref) COS Prep + (¥ — Yrer) SIN Yret

Refer to Fig. 3B for a graphical interpretation of oc, ol and their approximations.

From Fig. 3B, it can be observed that 8,, = 8, (xy, vi) if E;(é, 8) = 0. Therefore, to aid in the problem formulation, it is desired to select v;, such that E; = 0. Note that while 6, in

Fig. 3A is not necessarily unique, the smooth evolution of 8; enforced by the constraint on vi; ensures that the system follows the path smoothly, provided vnax is chosen to be sufficiently small.

Model predictive control involves minimisation of a cost function over a prediction horizon of Np time steps. The cost function represents the control objectives and their relative importance. In the context of contouring control, the competing objectives include minimising contouring error while maximising the path distance travelled at each time step in the horizon. In addition, to allow 8, to be used as an approximation to 8,, it is desired that

EE, 6) = 0.

As noted above, the model may be further extended with an orientation error £,,, which can be estimated as”

E, = 1 - (cosy cos Per + siny sin Yrer)

In order to improve the forward visibility of the vehicle, a so-called look-ahead error may be introduced, which corresponds to a contouring error at a reference point further away, referred to as the look-ahead point (x;,, Y:2) with associated look-ahead heading vg:

Eig = —(x — ia) Sina + (y — Via) COS Wia

Together with the previously defined cost function, this leads to a cost function (for a single time step):

J=qcEc+q E+E + dta Ea dv 0

Fig. 4A-D illustrate a driver's risk field model as may be used in an embodiment. Fig. 4A illustrates some parameters that may be used to compute the divers risk field of a vehicle. In the context of this application, the driver's risk field is computed (or perhaps more properly, estimated) for the other vehicle, but to improve legibility and maintain generality, the subscript ‘other’ is omitted in the discussion of Fig. 4A-D. The current state of the vehicle may be represented by [x, y, §, v]. These parameters may be obtained from the vehicle, or estimated using sensors. These parameters allow the kinematic bicycle model to be used to model the behaviour of the vehcile, as described above with reference to Fig. 2. Again, if the vehicle is turning, a turning circle may be defined with centre (x,y) and radius R. Other embodiments may use different parameters and/or a different vehicle model.

Based on these parameters and/or different parameters, a trajectory may be predicted for the vehicle. If no other data is available, the velocity and steering angle of the vehicle may be assumed to remain constant during a look-ahead time £,,. This leads to a look-ahead distance sg (along the vehicle's trajectory), which may be referred to as the reference longitudinal distance.

Fig. 4B is a contour plot of an exemplary driver's risk field in a global coordinate system (x,y). A reference point of the vehicle (e.g., the centre front, or centre of gravity) is positioned at coordinates (0,0). In this example, the vehicle’s speed v is 10 m/s and its steering angle § = —20°.

The driver's risk field 420 follows the curve of the predicted trajectory 422. The driver's risk field has a characteristic length 424 and a characteristic breadth 426, which is, in this example, larger on the inside of the curve than on the outside of the curve. A grid is overlain representing longitudinal and lateral coordinates (s, d).

The t, is a fixed look-ahead time. Based on it, the look-ahead distance increases linearly with the velocity of the vehicle. And p is a parameter that defines the parabola's steepness. The width of driver's risk field at the location of the vehicle (c} is related to the car width and m defines the slope of widening of the driver's risk field when driving straight.

Then, k; and k, which represent the parameters of the inner and outer edges of the driver's risk field, respectively, can affect the width of the driver’s risk field, and they can help to generate asymmetric driver’s risk fields. With this modeling method, the risk grows linearly with the increasing steering angle. It is similar to a human when the driver controls the steering of the vehicle, which simulates the driver paying more attention to the environment in the direction turned, resulting in a higher risk presented in the other direction. The increase in driver's risk field is proportional to 8, leading to higher risk when driving through sharp curves with cumulatively smaller radii.

In this example, the following parameter values were used: p = 0.0064, t), = 3s,u = 0.01,k; = 0.05,k, = 0.5,andc = 1.

So, all the hyperparameters in the driver's risk field are related to the driver's status instead of the environment. In this work, the driver's risk field is utilized to obtain the possible risk of the other vehicles interacting with the ego vehicle. Therefore, the coordinates (s, d) in the DRF-related equations are from the other vehicle's perspective, while the other parameters represent those of the (human) driver in the other vehicle. The human driver parameters may be identified through, e.g., simulations. The parameters used to create this figure are from a 25-year-old male volunteer driver.

Modelling the risk perceived by the driver of the other vehicle allows the ego vehicle to put themselves, as it were, in the shoes of other drivers to be informed of what they perceive as the probability of danger. This better reflects the consideration for social-aware driving.

Fig. 4C is a longitudinal cross section of the driver's risk field shown in Fig. 4B. The solid 442 line represents the magnitude of the driver's risk field at d = 0 (i.e., along the predicted trajectory 422), while the dashed lines represents the magnitude of the driver's risk fieldatd=-1m 444 d = —2.5 m 446, and d = —5 m 448, respectively, where the negative values represent the inside of the curve. Further to the side, the risk field becomes lower, and the maximum shifts towards larger s.

Fig. 4D is a lateral cross section of the driver's risk field shown in Fig. 4B. The solid 452 line represents the magnitude of the driver's risk field at s = 10 m. (i.e., at the position of arrow 4286), while the dashed lines represents the magnitude of the driver's risk field at s =

Om 454, s = 5 m 456, and s = 15 m 458, respectively. For larger values of s (i.e, further in front of the vehicle), the risk field becomes lower and broader. This asymmetry increases with increasing longitudinal distance s. In this plot, the negative values represent the inside of the curve. On the inside, the risk field drops off more slowly than on the outside of the curve.

Fig. 5A and 5B illustrate a social value orientation model as may be used in an embodiment. Fig. 5A shows the social value orientation circle with several experimental observations, and Fig. 5B is a zoomed-in version on the positive quadrant. Figure reproduced from N. Buckman ef al. ‘Sharing is Caring: Socially-Compliant Autonomous

Intersection Negotiation,” 2019 IEEE/RSJ International Conference on Intelligent Robots and

Systems (IROS) (2019) pp. 6136-6143.

Social Value Orientation (SVO), a metric from social psychology, is a parameter that describes how much a person is willing to consider the benefits of other people versus his/her own. In psychology, each individual wants to maximize the reward and minimize the cost when considering only himself or herself. However, as social road users, some of our planning needs to take into account the welfare of others. The SVO term conducts us to model each individual's social preferences by expressing their cost function as a combination of two terms, the cost to self Jg and the cost to others Jotner:

Jtotal = COS Q Jset + Sin @ Jother-

Where the angle a indicates the value of SVO. It reflects the selfishness or altruism of each individual. Just like in Fig. 5B, when this angle is 0°, it means that the system is completely egotistic (individualistic); while when the angle is 90°, it means that the system is completely altruistic to other systems. Fig. 5B shows that most people's SVO are between 0° and 60° illustrated by the black points.

In the results discussed below, two different styles are compared, prosocial with a = 60° and individualistic with « = 15°. In order to take other vehicles into account, « should be unequal to zero. For most practical implementations, 0° < a < 90°, but for some applications, such as car races, negative values of —90° < a < 0° may be considered.

As noted before, the weight « may be received as an input, but can also be trained, similar to other model parameters.

Fig. 6 schematically depicts a method according to an embodiment. In particular, Fig. 6 depicts a method for training a model as described above. Training the model comprises determining values for the weights of the respective (sub)models, for instance, weights qe Qu do Diack, dv Of the extended model predictive contouring model, weights p, u, ky, ky, tia, € of the driver's risk field model, and/or weight a of the social value orientation model.

The aforementioned model parameters can be optimised under the deep reinforcement learning (DRL) framework. The deep reinforcement learning framework comprises an iterative optimisation loop. Based on a current state 602 of the ego vehicle and {where applicable) one or more other vehicles, the ego vehicle (the ‘agent’ in Fig. 6), in a step 604, determines optimised control parameters, using the model that is being trained.

The optimised control parameters define an action 606. In the current example, the action comprises an acceleration a (throttle/break) and a steering angle §. The action affects a (simulated) environment 608 of the ego vehicle. Based on the changes in the environment, a reward 610 is determined. The reward can be based on, e.g., the safety of the ego vehicle and/or the other vehicles, the efficiency (travel speed) of the ego vehicle and/or the other vehicles, the comfort level of the ego vehicle and/or the other vehicles, the social compliance of the ego vehicle, the overall energy consumption by the ego vehicle and/or the other vehicles, et cetera. The contributions of the effect on the ego vehicle and the other vehicles may be weighted, e.g., in a way similar to the social value orientation model discussed above. The various reward components may have different weights, e.g., safety (collision avoidance) may be give a relatively large weight.

The changed environment (i.e., the result of the action of the ego vehicle and the behaviour of the other vehicles in the environment) leads to an updated state 602, and hence new input for the agent, et cetera. Various deep reinforcement learning platforms are known in the art, such as Asynchronous Advantage Actor-Critic Safety (A3C), Proximal Policy

Optimization (PPO), Deep Deterministic Policy Gradient (DDPG), et cetera.

Various simulation platforms are available in the art for road vehicle simulation, such as the highway-env simulation patform.

In cooperating with a Markov Chain Monte Carlo (MCMC) simulation experiment, model hyperparameters together with vehicle control outputs can be provided as output of the deep reinforcement learning model. As a consequence, they can be trained to match different driving properties of the ego automated vehicle and the driving styles of surrounding vehicles and driving condition.

The model's hyperparameters may be trained separately for separate values of «a,

representing different driving styles, or « may be included as a trainable hyperparameter.

Furthermore, in the described training framework, a Rewards decay mechanism using, e.g., the Bellman equation may be employed to generate long and medium-term trajectory planning and guidance (e.g., a reference trajectory), while the MP(C)C module may account for short-term tracking and control as described above. In short, deep reinforcement learning-based training can generate and match different social properties of the ego vehicle with different driving styles of surrounding vehicles and different traffic conditions.

For training, real-world datasets (e.g., INTERACTION, CitySim) can be utilized to add the generation of driving scenarios. The use of DRL for training is described in more detail in

Y. Dong et al. ‘Comprehensive Training and Evaluation on Deep Reinforcement Learning for

Automated Driving in Various Simulated Driving Maneuvers’, arXiv preprint (2023), arXiv.2306.11466; H. Yuan et al. ‘Safe, Efficient, Comfort, and Energy-saving Automated

Driving through Roundabout Based on Deep Reinforcement Learning,’ arXiv preprint (2023) arXiv:2306.11465; and M. Zhu et a/., ‘Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving,’ Transportation Research Part C:

Emerging Technologies 117 (2020) 102662, which are hereby incorporated by reference.

Fig. 7A-G are graphs showing experimental results obtained with embodiments of the invention. The experiments show comparisons with a relatively simple Nonlinear Model

Predictive Control (NMPC) based on the kinematic bicycle model as described with reference to Fig. 2. Compared to this reference model, the model according to an embodiment includes the extended MPCC cost function described in more detail with reference to Fig. 3, including a look-ahead term taking into account a "far point” . This makes the vehicle trajectory more stable over curves with large curvature.

Moreover, the model according to an embodiment takes the benefits/costs of surrounding vehicles into consideration based on the driver's risk field as described with reference to Fig. 4, and this was balanced with the MPCC cost function using the social value orientation model described with reference to Fig. 5.

Thus, the full cast function of the embodiment model is given by:

Jrotal = COSA Jgerr + SINQ Jother (7)

Np Np Np

Jselr = > (a EZ + qi ER + qo Ede + tac Eber) — > gp Op + > ue U 1)? (B) k=1 k=1 k=1 d2

Jotner = (M + Motner) lv = Varner) X p(s — Vother tia)? exp (- zi) (9) with u; = [ak, Sk, 6" time step size 20 ms and Np = 15. This model may also be referred to as the DRF-SVO-MPCC model.

Since the embodiment model integrates and outputs both planning and control simultaneously, in the simulation experiments, two test cases are carried out. Fig. 7A-C show that the control accuracy of the embodiment model is very high, and outperforms two baseline models, viz., the pure NMPC and the well-established tracking trajectory methods pure pursuit controller combined with PID controller, which is simply referred to as the PP controller in this application. The PP controller is described in more detail in R. Coulter, ‘Implementation of the Pure Pursuit Path Tracking Algorithm,” (Carnegie Mellon University,

Pittsburgh, Pennsylvania, 1990). This control accuracy is done by testing a single-lane roundabout scenario with no other vehicles.

Fig. 7D-G show that the embodiment method considers other vehicles’ benefits/costs and, that it can generate different driving styles under different SVO parameter values, i.e, different values for a in eq. (9). This is done by testing on single-lane and two-lane roundabout scenarios with the ego vehicle interacting with other vehicles in two different situations.

The simulations are performed with the highway-env simulation platform, which is widely used in the field, with Python to test the proposed approach. The highway-env platform is described in more detail in E. Leurent, ‘An Environment for Autonomous Driving

Decision Making,’ (2018), available online at: https://github.com/eleurent/ highway-env

In the simulations, the radius of the (on- or two-lane) roundabout is 22 m, while the connection between the straight road and the roundabout is made with a curve fitted by a sine function. The ego vehicle travels from West to East (left to right), while the othe vehicle travels from south to north (bottom to up) randomly at 3-7 m/s. The vehicle model parameters of the vehicles that appear in all the scenarios are: [, = 2.46 m, ly = 2.49 m,m = 2020 kg, and the vehicles’ width is set at 2.0 m. Because of the road peculiarities of roundabouts, vehicles are generally not allowed to pass through them at very high speeds, so the maximum velocity limit in the simulation is 15 m/s. The initial speed of the ego vehicle

Vo is set randomly within 0-3 m/s.

In the simulations, two baseline controllers, i.e., PP and NMPC controllers, together with the proposed society-aware DRF-SVO-MPCC were tested. In the PP controller, there is only a look-ahead distance that needs to be set, which is set to 5 m. The parameters of the reference model (NMPC) and the embodiment model (DRF-SVO-MPCC) are set as: v.x = 15.0 m/S, Gmin/max = £3.0 m/s?, Ö min/max = £30°, Ômin/max = £30°/s. Both models are solved by the optimization solver framework CasADi, which is described in more detail in J.A.

Andersson et al., ‘Casadi: a software framework for nonlinear optimization and optimal control,” Mathematical Programming Computation, vol. 11, no. 1 (2019) pp. 1-36.

To test and verify the performance of the embodiment model, three main scenarios are implemented. The first scenario focuses on only comparing the control performance of the three controllers with no other vehicles present in the roundabout. Consequently, the contribution Jotger = 0 and hence, Jota = cos a Jser (i.e. the model does not consider social factors).

In the second scenario, another vehicle is merging from other lanes of the roundabout, i.e., the other vehicle enters the roundabout after the ego vehicle. In the third scenario, the other vehicle travels from north to south (up to down) and enters the roundabout first.

The simulations consider two different driving styles of ego vehicles and compare their differences in motion planning. The common parameters of the DRF are: p = 0.0064,t, =3s,u=0.001,k; =0.0,k, = 1.3, and c = 0.5 m. The bottom line in both driving styles is that no collisions can occur, so the AV driving model needs to at least consider the other vehicle's safety cost to some degree, which means that the SVO-parameter a cannot be set to 0°. Manoeuvers of driving through both single-lane and two-lane roundabouts are simulated.

In the first testing scenario, this study focuses on comparing the control accuracy and performance of the three controllers: PP controller, NMPC, and the controller according to an embodiment (DRF-SVO-MPCC). Fig. 7A-C shows the trajectories controlled by the three controllers. The three models can follow the reference path 702, which corresponds to the centerline of the driving lane, to some degree.

Fig. 7A shows that the PP controller gets the worst tracking performance, with a large error between the followed trajectory 704 and the reference path 702. The maximum positional error is about 3 m, which means that the bodywork of the vehicle is partly outside of the lane. Due to the large curvature of the roundabout, the PP controller gets difficulties in trajectory tracking resulting in large fluctuations in steering angle §, especially when x = +20 m (ie. when entering and exeting the roundabout).

Fig. 7B shows that, compared to the PP controller, the optimization-based reference method, NMPC, results in a trajectory 706 that has a much better tracking of the reference trajectory 702, except for two instances of inappropriate steering, again around x = +20 m .

This is probably due to the lack of proper judgments of the future path (i.e., the ‘far point’).

Unlike the PP controller, the NMPC is a lateral and longitudinal coupled control, and therefore the acceleration a will experience waves during steering atx = —18 m and x = 20 m.

Fig. 7C shows that the embodiment model demonstrates a good solution to the above problems. As the roads are stitched together using aggregate shapes (circles and sines), they are not completely smooth at the road joints. As shown in Fig. 7C, the trajectory 708 followed by the embodiment model results in a smoother curve than the reference trajectory 702. Although not shown, it was observed that the embodiment model maintains a smooth acceleration a during steering with high curvature at around x=x20m.

These results are summarized in the to part of Table I.

Table I. Main results of the embodiment model in comparison with baseline models

Scenario Method Driving style Max. bl ov fo Collision

Single-lane, PP controller _ 3.08 1.37 _ no other NMPC _ 1.27 0.65 _ vehicle gmbodiment (irrelevant) 0.23 0.12 ==

Single-lane, NMPC en en en Yes with other a = 60° 0.19 0.09 No vehicle embodiment a = 15° 0.28 0.16 No

Two-lane, NMPC u u u ves with other a = 60° 0.26 0.17 No vehicle embodiment a = 15° 0.34 0.22 No

In the second scenario, an ‘aggressive’ other vehicle is added which attempts to enter the roundabout even if the ego vehicle is already on the roundabout and running from its left.

Two driving styles, viz., prosocial and egotistic, corresponding to a = 60° and a = 15°, respectively, with reference velocities verf = 5.0 m/s and Vref = 6.8 m/s, respectively, are tested with Fig. 7D-E showing the acceleration of the ego vehicle under the two driving styles.

As shown in Fig. 7D, under the prosocial driving style, the ego vehicle will first actively slow down with a = —1.02 m/s? to avoid the other vehicle minimizing the risk to which the other vehicle is exposed, and then it will accelerate to ver. Conversely, an egotistic ego vehicle with a small SVO (e.g., « = 15°), will be more biased to consider minimizing its own costs. Thus, as in Fig. 7E, the ego vehicle decides to accelerate with a = 0.43 m/s?, driving through the junction before the other vehicle to avoid collision and improve its efficiency through the roundabout. These statistics show that the embodiment model can generate different driving styles while maintaining safety.

A two-lane roundabout simulation was performed to test the performance of the embodiment model when the two vehicles are in different lanes. An extra lane is added with ego vehicle driving in the inner lane and other vehicle driving in the outer lane. Fig. 7F shows that the prosocial ego vehicle will still give precedence to the other vehicle by braking with a =-0.52 m/s? waiting to maintain a safe distance from the other vehicle before accelerating back to the v‚.f to pass the roundabout safely. The choice of braking behind the other vehicle was made because it was calculated that there would be a greater risk to the other vehicle if the ver was maintained.

Comparing Fig. 7F and Fig. 7D, it can be seen that, compared to the other vehicle running in the near lane, the ego vehicle will brake more sharply when the other vehicle wants to merge into the same lane. This is caused by the other vehicle blocking the ego vehicle's trajectory when in the same lane which potentially poses a greater risk to both other vehicle and ego vehicle. The simulation demonstrates the embodiment model's capability to handle interacting with other vehicles in different lanes separately.

Fig. 7G shows that, similar to the single-lane roundabout case shown in Fig. 7E, when the driving style is egotistic, the ego vehicle will accelerate aggressively, trying to change to the right lane just before the other vehicle, and then exit the two-lane roundabout without any deceleration throughout the whole process. This helps the ego vehicle maintain a low cost and high benefits while sacrificing the benefits of the other vehicle. Furthermore, it may be dangerous if the other vehicle is (even) more egotistic and more aggressive, at which point a collision might occur.

In the last scenario, other vehicles enter the roundabout first and the ego vehicle plans to merge into the roundabout afterwards. Because of safety and traffic rules, ego vehicles in both driving styles will brake to avoid collision with other vehicles, however, the driving style, as parametrized by a, leads to a different braking behaviour. The egotistic ego vehicle will slow down as late as possible, at a distance of 13.87 m, keeping only a minimum of 3.65 m from the other vehicle for safety and maintaining a higher velocity (3.17 m/s vs 1.47 m/s) compared to the prosocial driving style. While the prosocial ego vehicle starts slowing down earlier at 18.22 m from the other vehicle and keeps a longer distance with the other vehicle of 8.49 m. The results show that the prosocial ego vehicle focuses more on minimizing the risk, and it places more weight on the benefit of other vehicles. On the contrary, the egotistic ego vehicle aims for minimizing its own costs while ensuring the safety of both vehicles.

Fig. 8 is a flowchart of a method according to an embodiment. Although the current flowchart refers to another vehicle, the method applies equally to other traffic participants, such as pedestrians. A step 802 comprises receiving or determining information about a current state of the ego vehicle. The information about the current state may comprise one or more of. a position, a heading, a speed, a steering angle, a size, or a mass of the ego vehicle. The position and heading may be defined relative to, e.g., a driving lane, or a global coordinate system. The information may be obtained from, e.g., sensors in the ego vehicle and/or a navigation system.

A step 804 comprises receiving or determining information about one or more target states of the ego vehicle at one or more respective future time steps. The information about the one or more target states of the ego vehicle may comprise one or more of: a target position, a target heading, a reference trajectory, a target speed, or a maximum speed. The taget position and target heading may be defined with the same reference system as the current position and heading. The information may be obtained from, e.g a navigation system. A target position and target heading relative to a driving lane can be determined, for example, based on imaging data and an image processing system.

A step 806 comprises receiving or determining information about a current state of an other vehicle. The information about the current state of the other vehicle may comprise one or more of: a position, a heading, a speed, a steering angle, a size, a mass, a vehicle type, or status of signalling lights.

The method further comprises iteratively optimising a set of control parameters for the ego vehicle based on an overall cost value associated with the set of control parameter values. Therefore, a step 808 comprises selecting initial control parameter values. These can be based on, e.g., the control parameter values of the previous time step.

A step 810 comprises determining one or more potential future states of the ego vehicle for each of the one or more future time steps, based on the current state of the ego vehicle and the set of control parameter values. The one or more potential future states of the ego vehicle may be determined using a suitable vehicle model, e.g., a kinematic bicycle model as described herein.

A step 812 comprises determining a first cost value using a first cost function, the first cost value being based on a difference between the one or more potential future states of the ego vehicle and the one or more target states of the ego vehicle at the one or more future time steps. The first cost value may be determined using an model predictive control model, e.g., an (extended) model predictive contouring control model as described herein. The first cost value may further be based on a passenger comfort cost, which may be based on a difference between two successive states (e.g. speed, heading, linear acceleration, change in heading).

A step 814 comprises predicting a future state of the other vehicle at the one or more future time steps, based on the current state of the other vehicle. The one or more potential future states of the other vehicle may be predicted using a suitable vehicle model, e.g., the kinematic bicycle model as described herein.

A step 816 comprises determining a second cost value using a second cost function, the second cost value being based on a perceived risk posed by the ego vehicle in the one or more future states of the ego vehicle to the other vehicle in the respective one or more future states of the other vehicle at the one or more future time steps. The second cost value may be determined based on, e.g., the driver's risk model, possibly extended with a collision severity weight based on the momentum and/or kinetic energy involved.

A step 818 comprises determining the overall cost value associated with the set of control parameter values, based on the first cost value and the second cost value. The overall cost value may be based on the social value orientation model as described herein.

A step 820 comprises evaluating a stop criterion based on the determined overall cost value. Some examples of a stop criterion are: the overall cost value being smaller than a predetermined absolute or relative threshold, the overall cost value decreasing less than a predetermined absolute or relative threshold over the last one or more iterations, or a maximum number of iterations having been reached.

If the stop criterion has not been met, a step 822 comprises updating the control parameter values and repeating steps 810-820 with the updated control parameters. The updated parameter may be based on the control parameters of the current optimisation step and on the overall cost value and/or a derivative thereof. The updated control parameter values may be limited by constraints based on, e.q., limitations of the car (such as vy,

Amin» Tmax, Omax) and/or legal limitations (such as Vmax).

If the stop criterion has been met, a step 824 comprises determining control signals that configure the ego vehicle to adjust its speed and/or steering angle based on the optimised set of control parameters. Optionally, the method may further comprise actuating the control signals to configure the ego vehicle to adjust its speed and/or steering angle.

Fig. 9 depicts a block diagram of a system according to an embodiment. The system 900 comprises a central controller 902, e.g., a data processing system as described below with reference to Fig. 10. The central controller comprises a processor 904, a memory 906 communicatively connected to the processor, and a communication interface 908 communicatively connected to the processor. The memory comprises executable instructions configuring the processor to execute any of the method steps described above, e.g., with reerence to Fig. 8.

The processor 904 may be configured to receive, via the communication interface 908, information about a current state of the vehicle comprising the system 900. To that end, the system may comprise one or more sensors 912:-3, e.g., one or more optical sensors such as RGB cameras and/or IR cameras, one or more microphones, a LIDAR system, a

RADAR system, et cetera. The one or more sensors may be coupled directly to the central controller 902, but more typically, at least some of the sensor data is collected and (pre)processed by a sensing and perception module 910. The information mayu include, e.g., a position of the vehicle relative to a driving lane. Mixed arrangements are also possible. The processor may process the received information to derive the necessary parameters.

The processor 904 may, additionally or alternatively, be configured to receive information about the current state of the vehicle from the memory 906. This is typically the case for vehicle-specific information, such as the dimensions of the vehicle and possibly the

(empty) mass of the vehicle.

The processor 904 may, additionally or alternatively, be configured to receive information about the current state of the vehicle from a vehicle controller 920, which may receive data from one or more further sensors 9124, e.g. a speed from a speedometer, steering angle from a steer sensor, et cetera.

The processor 904 may be further configured to receive and/or determine information about a target state of the vehicle. This information may be received from, e.g., a navigation module 914, which may in turn receive information from a further sensor, e.g., a satellite positioning system sensor.

The processor 904 may be further configured to receive, via the communication interface 908, information about another vehicle, e.g. from the sensing and perception module 910. The processor 904 may be further configured to receive, via the communication interface 908, information about another vehicle from the communication module 916. The communication module may be configured to communicate with, e.g., the other vehicle or with a central traffic control system.

The processor 904 is configured to determine control parameters as described above.

The processor may be further configured to output these control parameters to the vehicle controller 920. The vehicle controller may be configured to translate the control parameters into control signals to control, e.g., a steering module 922, a motor 924 (e.g., an electric motor and/or an internal combustion engine), and/or brakes 926.

The processor 904 may be further configured to receive information from an optional user interface 918, e.g., to determine a driving style.

In some embodiments, the vehicle controller, the communication module, the navigation module, and/or the sensing and perception module may be integrated into the central controller.

Fig. 10 depicts a block diagram illustrating an exemplary data processing system that may perform the method as described with reference to Fig. 8.

As shown in Fig. 10, the data processing system 1000 may include at least one processor 1002 coupled to memory elements 1004 through a system bus 1006. As such, the data processing system may store program code within memory elements 1004. Further, the processor 1002 may execute the program code accessed from the memory elements 1004 via a system bus 1006. In one aspect, the data processing system may be implemented as a computer that is suitable for storing and/or executing program code. It should be appreciated, however, that the data processing system 1000 may be implemented in the form of any system including a processor and a memory that is capable of performing the functions described within this specification. The data processing system may be an Internet/cloud server, for example.

The memory elements 1004 may include one or more physical memory devices such as, for example, local memory 1008 and one or more bulk storage devices 1010. The local memory may refer to random access memory or other non-persistent memory device(s) generally used during actual execution of the program code. A bulk storage device may be implemented as a hard drive or other persistent data storage device. The processing system 1000 may also include one or more cache memories {not shown) that provide temporary storage of at least some program code in order to reduce the quantity of times program code must be retrieved from the bulk storage device 1010 during execution. The processing system 1000 may also be able to use memory elements of another processing system, e.g. if the processing system 1000 is part of a cloud-computing platform.

Input/output (1/0) devices depicted as an input device 1012 and an output device 1014 optionally can be coupled to the data processing system. Examples of input devices may include, but are not limited to, a keyboard, a pointing device such as a mouse, a microphone (e.g. for voice and/or speech recognition), or the like. Examples of output devices may include, but are not limited to, a monitor or a display, speakers, or the like. Input and/or output devices may be coupled to the data processing system either directly or through intervening I/O controllers.

In an embodiment, the input and the output devices may be implemented as a combined input/output device (illustrated in Fig. 10 with a dashed line surrounding the input device 1012 and the output device 1014). An example of such a combined device is a touch sensitive display, also sometimes referred to as a “touch screen display” or simply “touch screen”. In such an embodiment, input to the device may be provided by a movement of a physical object, such as e.g. a stylus ar a finger of a user, on or near the touch screen display.

A network adapter 1016 may also be coupled to the data processing system to enable it to become coupled to other systems, computer systems, remote network devices, and/or remote storage devices through intervening private or public networks. The network adapter may comprise a data receiver for receiving data that is transmitted by said systems, devices and/or networks to the data processing system 1000, and a data transmitter for transmitting data from the data processing system 1000 to said systems, devices and/or networks. Modems, cable modems, and Ethernet cards are examples of different types of network adapter that may be used with the data processing system 1000.

As pictured in Fig. 10, the memory elements 1004 may store an application 1018. In various embodiments, the application 1018 may be stored in the local memory 1008, the one or more bulk storage devices 1010, or separate from the local memory and the bulk storage devices. It should be appreciated that the data processing system 1000 may further execute an operating system (not shown in Fig. 10) that can facilitate execution of the application

1018. The application 1018, being implemented in the form of executable program code, can be executed by the data processing system 1000, e.g., by the processor 1002. Responsive to executing the application, the data processing system 1000 may be configured to perform one or more operations or method steps described herein.

Various embodiments of the invention may be implemented as a program product for use with a computer system, where the program(s) of the program product define functions of the embodiments (including the methods described herein). In one embodiment, the program(s) can be contained on a variety of non-transitory computer-readable storage media, where, as used herein, the expression “non-transitory computer readable storage media” comprises all computer-readable media, with the sole exception being a transitory, propagating signal. In another embodiment, the program(s) can be contained on a variety of transitory computer-readable storage media. Illustrative computer-readable storage media include, but are not limited to: (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM disks readable by a CD-ROM drive, ROM chips or any type of solid-state non-volatile semiconductor memory) on which information is permanently stored; and (ii) writable storage media (e.g., flash memory, floppy disks within a diskette drive or hard-disk drive or any type of solid-state random-access semiconductor memory) on which alterable information is stored. The computer program may be run on the processor 1002 described herein.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention.

The embodiments were chosen and described in order to best explain the principles and the practical application, and to enable others of ordinary skill in the art to understand the various embodiments with various modifications as are suited to the particular use contemplated.

Claims

CONCLUSIONS

1. A computer-implemented method for controlling a vehicle, the method comprising: receiving or determining information regarding a current state of the vehicle; receiving or determining information regarding one or more target states of the vehicle at one or more respective future time steps; receiving or determining information regarding a current state of another road user; iteratively optimising a set of control parameters for the vehicle based on an overall cost value associated with the set of control parameter values, the optimisation comprising: — determining one or more potential future states of the vehicle for each of the one or more future time steps based on the current state of the vehicle and the set of control parameter values; - determining a first cost value using a first cost function, the first cost value being based on a difference between the one or more potential future states of the vehicle and the one or more target states of the vehicle at the one or more future time steps; - predicting a future state of the other road user at the one or more future time steps, based on the current state of the other road user; - determining a second cost value using a second cost function, the second cost value being based on a perceived risk posed by the vehicle in the one or more future states of the vehicle to the other road user in the respective one or more future states of the other road user at the one or more future time steps; and — determining the overall cost value associated with the set of steering parameter values, based on the first cost value and the second cost value; and based on the optimised set of steering parameters, determining steering signals that direct the vehicle to adjust its speed and/or steering angle.

2. The method of claim 1, wherein determining the overall cost value comprises determining a weighted sum of the first cost value and the second cost value, preferably the overall cost value being determined by Jotat = COSA Jogo + SING Jothers; wherein Jota represents the overall cost value, Jego represents the first cost value, [other represents the second cost value, and a represents a social score of the vehicle's behavior, preferably 0°<a< 90°.

3. The method of claim 1 or 2, wherein the first cost value is based on a difference between the current state of the vehicle and the future state of the vehicle, preferably on a difference between a current speed and a future speed and/or between a current acceleration and a future acceleration and/or between a current steering angle and a future steering angle.

4. The method of any preceding claim, wherein the difference between the one or more future states of the vehicle and the corresponding one or more target states of the vehicle comprises one or more of: a lateral difference between a future position and a corresponding target position of the vehicle, a longitudinal difference between the future position and the corresponding target position of the vehicle, a difference between a future direction and a corresponding target direction of the vehicle, and a difference between a travel distance between a current and the future position and a corresponding target travel distance.

5. The method of any preceding claim, wherein determining the second cost value comprises: determining a longitudinal difference between a future position of the vehicle and a future position of the other road user along a predicted trajectory of the other road user; determining a lateral difference between a future position of the vehicle and the predicted trajectory of the other road user; and determining a second cost value based on the determined longitudinal distance and the determined lateral distance, wherein the second cost value preferably decreases according to a polynomial function of the longitudinal distance and according to an exponential function of the lateral distance, wherein the second cost value is more preferably proportional to Jotmer X (5 — sg)? exp(—(d/20)?), where Jotmer represents the second cost value, s represents the longitudinal distance, Sg represents a longitudinal reference distance, d represents the lateral distance, and g represents a lateral scale factor.

6. The method according to any one of the preceding claims, wherein the second cost value is based on a combined mass of the vehicle and the other road user and/or on a difference in speed between the vehicle and the other road user, wherein the second cost value is preferably proportional to the combined mass of the vehicle and the other road user and/or to the difference in speed between the vehicle and the other road user.

7. The method of any preceding claim, wherein the information regarding the current state of the vehicle comprises one or more of: a current position of the vehicle relative to a lane in which the vehicle is moving, a current direction of the vehicle relative to the lane, a current position of the vehicle relative to an external coordinate system, a current direction of the vehicle relative to an external coordinate system, a current position of the vehicle relative to a planned trajectory, a current direction of the vehicle relative to a planned trajectory, a current speed or velocity,

a current steering angle of the vehicle relative to the direction of the vehicle, and a mass of the vehicle.

8. The method of any preceding claim, wherein the information regarding the target state of the vehicle comprises one or more of: a target position of the vehicle relative to a lane in which the vehicle is moving, a target direction of the vehicle relative to the lane, a target position of the vehicle relative to an external coordinate system, a target direction of the vehicle relative to an external coordinate system, a target position of the vehicle relative to a planned trajectory, a target direction of the vehicle relative to a planned trajectory, a target speed or velocity, and a target steering angle of the vehicle relative to the direction of the vehicle.

9. The method according to any one of the preceding claims, wherein the information regarding the other road user comprises one or more of: a position of the other road user relative to the vehicle, a direction of the other road user relative to the vehicle, a speed or velocity of the other road user, a steering angle of the other road user,

a size of the other road user, a vehicle type of the other road user, and a mass of the other road user.

10. The method according to any one of the preceding claims, wherein the first cost value, the second cost value, and/or the overall cost value are determined using a model with trained hyperparameters, the trained hyperparameters being trained by iteratively performing the steps of: -— selecting a set of hyperparameters of the model; - simulating an environment comprising the vehicle and the other road user; - determining input parameters for the model based on the simulated environment; - determining the optimized set of control parameters using the model with the selected set of hyperparameters; - updating the simulated environment based on the optimized set of control parameters; and -— determining a reward associated with the hyperparameters based on the updated simulated environment; and selecting a set of trained hyperparameters based on the rewards associated with the hyperparameters.

11. The method according to any of the preceding claims, wherein the vehicle is a car, preferably a passenger car.

12. A method for training one or more first hyperparameters of a first cost function to determine a first cost value associated with a set of steering parameter values, the first cost value being based on a difference between one or more potential future states of a vehicle and one or more target states of the vehicle at one or more future time steps, and/or one or more second hyperparameters of a second cost function to determine a second cost value associated with the set of steering parameter values, the second cost value being based on a perceived risk posed by the vehicle in the one or more future states of the vehicle to the other road user in respective predicted one or more future states of the other road user at the one or more future time steps, and/or one or more third hyperparameters of a third cost function to determine an overall cost value associated with the set of steering parameter values, the overall cost value being based on the first cost value and the second cost value, the method comprising iteratively performing the steps of: — selecting a set of first, second and/or third hyperparameters; — simulating an environment comprising the vehicle and the other road user; — determining input parameters for the model based on the simulated environment; — determining the optimised set of control parameters using the first, second and/or third cost function with the selected set of first, second and/or third hyperparameters; — updating the simulated environment based on the optimised set of control parameters; and — determining a reward associated with the selected set of first, second and/or third hyperparameters based on the updated simulated environment; and determining a set of trained first, second and/or third hyperparameters based on the rewards associated with the first, second or third hyperparameters.

13. An operating system for an automated vehicle comprising a processor and a computer readable storage medium, the storage medium having executable program code stored therein and being communicatively coupled to the processor, wherein in response to execution of the executable program code, the processor is adapted to perform executable operations, the executable operations comprising a method as claimed in any one of claims 1 to 12.

14. The control system of claim 13, further comprising: one or more sensors communicatively coupled to the processor for providing information regarding the current state of the vehicle and/or information regarding the target state of the vehicle and/or information regarding the current state of the other vehicle; and/or one or more actuators communicatively coupled to the processor, the one or more actuators being configured to, in response to receiving the control signals, adjust the speed and/or steering angle of the vehicle.

15. An automated vehicle comprising a control system according to claim 13 or 14.

16. A computer program product comprising software code portions adapted, when executed in the memory of a computer, to perform the method steps of any one of claims 1 to 13.

17. A non-transitive storage medium storing a computer program product according to claim 16.