US12127085B2 - Systems and methods for mitigating vehicle pose error across an aggregated feature map - Google Patents
Systems and methods for mitigating vehicle pose error across an aggregated feature map Download PDFInfo
- Publication number
- US12127085B2 US12127085B2 US17/150,998 US202117150998A US12127085B2 US 12127085 B2 US12127085 B2 US 12127085B2 US 202117150998 A US202117150998 A US 202117150998A US 12127085 B2 US12127085 B2 US 12127085B2
- Authority
- US
- United States
- Prior art keywords
- autonomous vehicle
- vehicle
- environmental representation
- intermediate environmental
- pose
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/40—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
- H04W4/46—Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P] for vehicle-to-vehicle communication [V2V]
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
- B60W60/0027—Planning or execution of driving tasks using trajectory prediction for other traffic participants
- B60W60/00272—Planning or execution of driving tasks using trajectory prediction for other traffic participants relying on extrapolation of current movement
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
- B60W60/0027—Planning or execution of driving tasks using trajectory prediction for other traffic participants
- B60W60/00276—Planning or execution of driving tasks using trajectory prediction for other traffic participants for two or more other traffic participants
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0238—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
- G05D1/024—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0268—Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means
- G05D1/0274—Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means using mapping information stored in a memory device
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/20—Control system inputs
- G05D1/24—Arrangements for determining position or orientation
- G05D1/246—Arrangements for determining position or orientation using environment maps, e.g. simultaneous localisation and mapping [SLAM]
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/80—Arrangements for reacting to or preventing system or operator failure
- G05D1/81—Handing over between on-board automatic and on-board manual control
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/30—Services specially adapted for particular environments, situations or purposes
- H04W4/38—Services specially adapted for particular environments, situations or purposes for collecting sensor information
Definitions
- the present disclosure relates generally to performing autonomous vehicle operations.
- the present disclosure relates to mitigating vehicle pose errors across an aggregated feature map used for performing autonomous vehicle operations.
- An autonomous vehicle can be capable of sensing its environment and navigating with little to no human input.
- an autonomous vehicle can observe its surrounding environment using a variety of sensors and can attempt to comprehend the environment by performing various processing techniques on data collected by the sensors. Given such knowledge, an autonomous vehicle can navigate through the environment.
- the method includes obtaining, by a computing system including one or more computing devices onboard an autonomous vehicle, sensor data associated with an environment of a first autonomous vehicle.
- the method includes obtaining, by the computing system, estimated location data indicative of a first estimated pose of the first autonomous vehicle.
- the method includes determining, by the computing system, a first intermediate environmental representation of at least a first portion of the environment of the first autonomous vehicle based, at least in part, on the sensor data.
- the method includes obtaining, by the computing system, a first message from a second autonomous vehicle.
- the first message can include a second intermediate environmental representation of at least a second portion of the environment of the first autonomous vehicle and second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the method can include determining, by the computing system, a first relative pose between the first autonomous vehicle and the second autonomous vehicle based, at least in part, on the intermediate environmental representation and the second intermediate environmental representation.
- the method can include generating, by the computing system, an updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, and the first relative pose.
- the method can include generating, by the computing system, an autonomy output for the first autonomous vehicle based, at least in part, on the updated intermediate environmental representation.
- the computing system includes one or more processors and one or more tangible, non-transitory, computer readable media that collectively store instructions that when executed by the one or more processors cause the computing system to perform operations.
- the operations include obtaining sensor data associated with an environment of a first autonomous vehicle.
- the operations include obtaining estimated location data indicative of a first estimated pose of the first autonomous vehicle.
- the operations include determining, via a first portion of a machine-learned detection and prediction model, a first intermediate environmental representation of at least a first portion of the environment of the first autonomous vehicle based, at least in part, on the sensor data.
- the operations include obtaining a first message from a second autonomous vehicle.
- the first message includes a second intermediate environmental representation of at least a second portion of the environment of the first autonomous vehicle and second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the operations include determining, via a machine-learned regression model, a first relative pose between the first autonomous vehicle and the second autonomous vehicle based, at least in part, on the intermediate environmental representation and the second intermediate environmental representation. And, the operations include generating, via a second portion of the machine-learned detection and prediction model, an autonomy output for the first autonomous vehicle based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, and the first relative pose.
- the autonomous vehicle includes one or more sensors, one or more processors, and one or more tangible, non-transitory, computer readable media that collectively store instructions that when executed by the one or more processors cause the one or more processors to perform operations.
- the operations include obtaining, via the one or more sensors, sensor data associated with a surrounding environment of the autonomous vehicle.
- the operations include determining a first intermediate environmental representation of at least a first portion of the surrounding environment of the autonomous vehicle based, at least in part, on the sensor data.
- the operations include obtaining a plurality of messages from a plurality of respective autonomous vehicles.
- Each respective message of the plurality of messages includes a respective intermediate environmental representation of at least another portion of the surrounding environment associated with a respective autonomous vehicle of the plurality of respective autonomous vehicles.
- the operations include determining a plurality of relative poses based, at least in part, on the first intermediate environmental representation and the respective intermediate environmental representation.
- the plurality of relative poses include a respective relative pose between the autonomous vehicle and each of the plurality of respective autonomous vehicles.
- the operations include determining a plurality of absolute poses based, at least in part, on the plurality of relative poses.
- the plurality of absolute poses include a respective absolute pose for the autonomous vehicle and each of the plurality of respective autonomous vehicles.
- the operations include generating an updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the respective intermediate environmental representation, and at least one of the plurality of absolute poses.
- the operations include generating an autonomy output for the autonomous vehicle based, at least in part, on the updated intermediate environmental representation.
- the autonomous vehicle technology described herein can help improve the safety of passengers of an autonomous vehicle, improve the safety of the surroundings of the autonomous vehicle, improve the experience of the rider and/or operator of the autonomous vehicle, as well as provide other improvements as described herein.
- the autonomous vehicle technology of the present disclosure can help improve the ability of an autonomous vehicle to effectively provide vehicle services to others and support the various members of the community in which the autonomous vehicle is operating, including persons with reduced mobility and/or persons that are underserved by other transportation options.
- the autonomous vehicle of the present disclosure may reduce traffic congestion in communities as well as provide alternate forms of transportation that may provide environmental benefits.
- FIG. 1 depicts a block diagram of an example system for an autonomous vehicle according to example embodiments of the present disclosure.
- FIG. 2 depicts an example geographic area with a vehicle ecosystem according to example embodiments of the present disclosure.
- FIG. 3 depicts an architecture of example machine-learned models according to example embodiments of the present disclosure.
- FIG. 4 depicts a visualization of an example intermediate environmental representation according to example embodiments of the present disclosure.
- FIG. 5 depicts an example diagram of a multi-agent scenario with one or more estimated pose errors according to example embodiments of the present disclosure.
- FIG. 6 depicts an architecture of example machine-learned models for correcting pose errors according to example embodiments of the present disclosure.
- FIG. 7 depicts a flow diagram of a method for correcting pose errors according to example embodiments of the present disclosure.
- FIG. 8 depicts an example system with various means for performing operations and functions according example implementations of the present disclosure.
- FIG. 9 depicts an example computing system according to example embodiments of the present disclosure.
- aspects of the present disclosure are directed to improved systems and methods for detecting and compensating for pose errors associated with information received during communications among autonomous vehicles and/or other systems. More particularly, communication technology of the present disclosure allows autonomous vehicles and/or other systems to share intermediate feature representations and intelligently aggregate this information to improve autonomous vehicle/system operations.
- a transmitter autonomous vehicle can obtain sensor data (e.g., Light Detection and Ranging (LIDAR) data, etc.) indicative of the vehicle's surrounding environment.
- the transmitter autonomous vehicle can generate an intermediate environmental representation (e.g., an encoded feature map, etc.) based, at least in part, on the sensor data (e.g., voxelized LIDAR data).
- LIDAR Light Detection and Ranging
- a recipient autonomous vehicle can receive the intermediate environmental representation and utilize the intermediate environmental representation to improve its autonomous operations.
- the recipient autonomous vehicle can utilize a machine-learned aggregation model to generate an updated intermediate environmental representation based, at least in part, on the intermediate environmental representation.
- the updated intermediate environmental representation can pool information detected by sensors of both the transmitter and recipient vehicles.
- the updated intermediate environmental representation can be input into a machine-learned model (e.g., a joint machine-learned detection and prediction model) at the recipient autonomous vehicle to identify objects detected by the transmitter, recipient, or both vehicles. This can lead to more accurate estimates of an object's position, size, and shape, as well as the predicted future trajectory of the object and improve the ability of the recipient autonomous vehicle to safely plan its motion though its environment. These advantages can rely on current pose estimates of the associated vehicle.
- an autonomous vehicle can generate an intermediate environmental representation (e.g., a feature map) representative of a portion of its surrounding environment based, at least in part, on sensor data associated with the environment.
- the autonomous vehicle can receive additional intermediate environmental representations of the environment from other associated vehicles.
- Each intermediate environmental representation can be associated with an estimated pose (e.g., spatial coordinates/orientation) determined by a respective autonomous vehicle.
- the autonomous vehicle can determine a relative pose between each of the autonomous vehicles and determine an absolute pose (e.g., updated spatial coordinates/orientation) for each vehicle based, at least in part, on the relative poses.
- the autonomous vehicle can generate an updated intermediate environmental representation by aggregating the information provided by one or more of the intermediate environmental representations based, at least in part, on the determined absolute poses.
- An autonomy output can be generated based, at least in part, on the updated intermediate environmental representation. In this way, machine-learned models can be trained to identify, and correct pose errors provided by associated autonomous vehicles.
- This provides an improvement to computing technology by increasing the accuracy of information derived from autonomous vehicle communications, thereby increasing the speed, efficiency, and reliability of estimating an object's position, size, and shape using knowledge provided by third-parties (e.g., other vehicles, etc.).
- the technology of this disclosure within the context of autonomous vehicles for example purposes only.
- the technology is not limited to an autonomous vehicle setting and can be implemented within other robotic and/or computing systems, such as those utilizing object detection and/or prediction machine-learned models.
- the technology of this disclosure can be described, at times, within the context of vehicle-to-vehicle communications for example purposes only.
- the technology can be utilized by a variety of types of computing systems that may communicate with other computing systems of the same or different type.
- a ground-based autonomous vehicle, aerial-based autonomous vehicle, or any other robotic system can utilize the technology described herein for communicating with another type of vehicle (e.g., ground vehicle, aerial vehicle, etc.) and/or an infrastructure element (e.g., a stationary sensor suite on a building, traffic light, etc.).
- another type of vehicle e.g., ground vehicle, aerial vehicle, etc.
- an infrastructure element e.g., a stationary sensor suite on a building, traffic light, etc.
- An autonomous vehicle can include an onboard vehicle computing system with a variety of components for operating with minimal and/or no interaction from a human operator.
- the computing system can be located onboard the autonomous vehicle and include one or more sensors (e.g., cameras, LIDAR, Radio Detection and Ranging (RADAR), etc.), an autonomy computing system (e.g., for determining autonomous navigation), one or more vehicle control systems (e.g., for controlling braking, steering, powertrain), etc.
- sensors e.g., cameras, LIDAR, Radio Detection and Ranging (RADAR), etc.
- an autonomy computing system e.g., for determining autonomous navigation
- vehicle control systems e.g., for controlling braking, steering, powertrain
- the vehicle computing system can include sub-systems that cooperate to perceive the surrounding environment of the autonomous vehicle and determine a motion plan for controlling the motion of the autonomous vehicle.
- the vehicle computing system can include a joint perception and prediction system configured to perceive object(s) within the surrounding environment of the autonomous vehicle and to predict motion of the object(s) within the surrounding environment of the autonomous vehicle.
- the vehicle computing system can separate these perception and prediction functions into separate systems.
- the vehicle computing system can include a motion planning system configured to plan the motion of the autonomous vehicle with respect to the object(s) within the surrounding environment of the autonomous vehicle.
- Autonomous vehicles can operate within geographic areas or have operating domains that can include other autonomous vehicles.
- a plurality of autonomous vehicles can be located within a geographic area.
- the geographic area can include one or more travel ways (e.g., roadways, etc.) and one or more geographic features (e.g., cross walks, lane boundaries, etc.).
- the geographic area can include infrastructure elements that include computing systems with communication technology capable of communicating with one or more of the autonomous vehicles within the geographic area, as further described herein.
- the geographic area can also include one or more objects.
- the objects can include, for example, static object(s) (e.g., lampposts, parking meters, etc.) and/or dynamic actor objects (e.g., pedestrians, vehicles, bicycles/bicyclists, etc.) that are and/or may be in motion.
- static object(s) e.g., lampposts, parking meters, etc.
- dynamic actor objects e.g., pedestrians, vehicles, bicycles/bicyclists, etc.
- Each of the plurality of autonomous vehicles can include a communication system that allows the respective vehicle's computing system to communicate with system(s) that are remote from the autonomous vehicle.
- an autonomous vehicle can utilize its communication system to send and receive messages (e.g., via an internet connection) from a cloud-based server system that helps support the autonomous vehicle.
- This can include, for example, an offboard service assignment system and routing system that matches the autonomous vehicle to a request for a vehicle service (e.g., rideshare service) and provides the autonomous vehicle with a route for completing the vehicle service.
- Each autonomous vehicle can also have a communication range that allows the autonomous vehicle to communicate with computing systems nearby the autonomous vehicle.
- a first autonomous vehicle can have a first communication range that is based at least in part on the vehicle's communication hardware (e.g., antenna, etc.) and the communication protocol utilized by the first autonomous vehicle.
- the first communication range can be represented by a radial distance from the first autonomous vehicle.
- the first autonomous vehicle can communicate to an ecosystem of autonomous vehicles within the first communication range.
- the first autonomous vehicle (a “recipient autonomous vehicle”) can receive data from a second, different autonomous vehicle (“a first transmitter autonomous vehicle”) that is within the first communication range of the first autonomous vehicle.
- the first autonomous vehicle can receive data from any number of autonomous vehicles (e.g., a third autonomous vehicle (e.g., “a second transmitter vehicle”)) within the first communication range of the first autonomous vehicle.
- the systems and methods of the present disclosure can allow the ecosystem of autonomous vehicles to provide inter-vehicle communications that improve the vehicles' autonomous operations while reducing the communication bandwidth and potential information loss associated with doing so.
- a transmitter (e.g., a first, second, third, etc.) autonomous vehicle can obtain sensor data associated with an environment (e.g., a portion of a surrounding environment of another, recipient autonomous vehicle) of the transmitter autonomous vehicle.
- the sensor data can include one or more types of sensor data associated with one or more sensor modalities.
- the sensor data can include three-dimensional data representative of the surrounding environment of the transmitter autonomous vehicle.
- the sensor data can include three-dimensional point cloud data (e.g., LIDAR point cloud data).
- the sensor data can include a fusion of different types of sensor data.
- the sensor data can include a first type of sensor data (e.g., camera image data) associated with a first sensor modality (e.g., stereo camera) and a second type of sensor data (e.g., LIDAR data) associated with a second sensor modality (e.g., LIDAR system).
- the sensor data can include data acquired by multiple different autonomous vehicles.
- the sensor data can include a first set of sensor data (e.g., a first set of LIDAR data) acquired by the transmitter autonomous vehicle and a second set of sensor data (e.g., a second set of LIDAR data) that was acquired by another autonomous vehicle in its ecosystem and sent to the transmitter autonomous vehicle.
- the transmitter autonomous vehicle can obtain estimated location data indicative of an estimated pose of the transmitter autonomous vehicle.
- the estimated pose for example, can be indicative of one or more estimated spatial coordinates and an estimated orientation of the transmitter autonomous vehicle.
- the autonomous vehicle can include one or more location measurement sensors such as, for example, one or more global positioning systems (e.g., GPSs), inertial measurement units (e.g., IMUs), and/or any other sensors configured to determine a current location and/or orientation of the autonomous vehicle.
- the autonomous vehicle can be configured to obtain the estimated location data from the one or more location measurement sensors.
- the transmitter autonomous vehicle can generate an intermediate representation for use by another autonomous vehicle (e.g., a recipient autonomous vehicle).
- the intermediate representation can be generated based, at least in part, on the sensor data and can be descriptive of at least a portion of an environment of the other autonomous vehicle.
- a transmitter autonomous vehicle can determine an intermediate representation of at least a portion of the surrounding environment of the transmitter autonomous vehicle based at least in part on the sensor data.
- the intermediate representation can include a feature map indicative of at least a portion of the environment.
- the transmitter autonomous vehicle can generate voxelized sensor data by voxelizing the three-dimensional point cloud data of the sensor data.
- the transmitter autonomous vehicle can extract raw features from its LIDAR sensor data and transform them into a bird's eye view (BEV).
- the features can include, for example, a heading and/or shape of an object indicated in the intermediate environmental representation.
- the transmitter autonomous vehicle can voxelize the LIDAR point clouds (e.g., in 15 cm 3 , etc.), apply several convolutional layers, and output feature maps of shape H ⁇ W ⁇ C, where H ⁇ W denotes the scene range in BEV, and C is the number of feature channels.
- the intermediate environmental representation can be generated based, at least in part, on a machine-learned model (e.g., a joint detection and prediction model) associated with the recipient autonomous vehicle.
- the machine-learned model associated with the recipient autonomous vehicle can include a machine-learned model utilized by a recipient vehicle computing system of the recipient autonomous vehicle to perceive (e.g., identify) one or more objects within its surrounding environment and/or predict the motion of the one or more objects.
- the transmitter autonomous vehicle can generate the intermediate environmental representation, via a first portion of the machine-learned model (e.g., a convolutional neural network, etc.), based, at least in part, on the sensor data.
- the first portion for example, can include one or more convolutional layers of the machine-learned model.
- the transmitter autonomous vehicle can input the voxelized sensor data into the machine-learned model.
- the machine-learned model can be configured to apply one or more convolutional layers to the voxelized sensor data.
- the machine-learned model can utilize a plurality of layers (e.g., three layers, etc.) of 3 ⁇ 3 convolution filters (e.g., with strides of 2, 1, 2 respectively) to produce the intermediate representation.
- the intermediate environmental representation can be, for example, a 4 ⁇ down sampled spatial feature map.
- the feature map can be represented as a matrix generated from the array(s) of the sensor data.
- the transmitter autonomous vehicle can receive the intermediate environmental representation (e.g., an activation of intermediate neural network layers, etc.) as an output of the first portion (e.g., one or more first (e.g., intermediate) neural network layers) of the machine-learned model.
- the intermediate environmental representation e.g., an activation of intermediate neural network layers, etc.
- the first portion e.g., one or more first (e.g., intermediate) neural network layers
- the transmitter autonomous vehicle can select a recipient autonomous vehicle for which to communicate data indicative of the intermediate environmental representation from among a plurality of autonomous vehicles.
- the transmitter autonomous vehicle can select a recipient autonomous vehicle based at least in part on a communication range of the transmitter autonomous vehicle. For example, the transmitter autonomous vehicle can determine that it will communicate the data indicative of the intermediate environmental representations to one or more of the autonomous vehicles within its communication range.
- the transmitter autonomous vehicle can select a recipient autonomous vehicle based at least in part on data indicating that the recipient autonomous vehicle is capable of processing the intermediate environmental representations. For example, there can be a plurality of autonomous vehicles within the communication range of the transmitter autonomous vehicle. In some implementations, only a subset of those autonomous vehicles may be capable of processing the intermediate environmental representation.
- the transmitter autonomous vehicle may communicate with the other autonomous vehicles (e.g., when entering the vehicle's communication range, periodically, etc.) to determine which of the autonomous vehicles are able to utilize intermediate environmental representations.
- This can include, for example, an exchange of identifiers, information, and/or other data indicating that a respective autonomous vehicle is able to transmit, receive, and/or process compressed intermediate environmental representations in the manner described herein.
- a first (e.g., recipient) autonomous vehicle can obtain one or more intermediate environmental representations from one or more different autonomous vehicles.
- the first autonomous vehicle can generate a first intermediate environmental representation based at least in part on sensor data indicative of its surrounding environment.
- the first autonomous vehicle can determine an updated intermediate environmental representation by aggregating the first intermediate environment representation with each of the one or more intermediate environmental representations.
- Each intermediate environmental representation can be aggregated based, at least in part, on an estimated pose of the transmitter autonomous vehicle associated with the representation. Inaccurate estimated poses of the transmitter autonomous vehicles can interfere with the accuracy of the updated intermediate environment representation.
- the systems and methods disclosed herein prevent inaccuracies with the updated intermediate environmental representations by detecting, correcting, and compensating for inaccurate pose estimates.
- a first (e.g., recipient) autonomous vehicle can obtain (e.g., via one or more sensors of an onboard vehicle computing system) sensor data (e.g., the sensor data described above with reference to the transmitter autonomous vehicle(s) associated with an environment (e.g., a surrounding environment) of the first autonomous vehicle.
- the first autonomous vehicle can obtain estimated location data (e.g., in the manner described above with reference to the transmitter autonomous vehicle) indicative of a first estimated pose of the first autonomous vehicle.
- the first autonomous vehicle can determine a first intermediate environmental representation (e.g., in the manner described above with reference to the transmitter autonomous vehicle) of at least a first portion of the environment (e.g., surrounding environment) of the first autonomous vehicle based, at least in part, on the sensor data.
- a first intermediate environmental representation e.g., in the manner described above with reference to the transmitter autonomous vehicle
- the first autonomous vehicle (and/or vehicle computing system thereof) can obtain one or more messages from one or more transmitter autonomous vehicle(s) within the environment of the first autonomous vehicle.
- Each message can include an intermediate environmental representation of at least a portion of the environment of the first autonomous vehicle (e.g., from the perspective of a respective transmitter autonomous vehicle) and/or estimated location data indicative of an estimated pose of a respective transmitter autonomous vehicle.
- the first autonomous vehicle can obtain a first message from a second (e.g., a first transmitter) autonomous vehicle.
- the first message can include a second intermediate environmental representation of at least a second portion of the environment of the first autonomous vehicle and/or second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the first autonomous vehicle can obtain a second message from a third autonomous vehicle.
- the second message can include a third intermediate environmental representation of at least a third portion of the environment of the first autonomous vehicle and/or third estimated location data indicative of a third estimated pose of the third autonomous vehicle.
- the first autonomous vehicle i can have a noisy estimate of its own pose (e.g., first estimated pose) denoted , and can receive the noisy poses (e.g., second estimated pose, third estimated pose, etc.) of neighboring autonomous vehicles j (e.g., second autonomous vehicle, third autonomous vehicle, etc.) as part of the messages (e.g., first message, second message, etc.).
- noisy poses e.g., first, second, third estimated poses, etc.
- These noisy poses can be used to compute noisy relative transformations between the first autonomous vehicle i and j denoted .
- the first autonomous vehicle can determine one or more relative poses between each of the associated autonomous vehicles. For example, the first autonomous vehicle can determine a first relative pose between the first autonomous vehicle and the second autonomous vehicle based, at least in part, on the intermediate environmental representation and the second intermediate environmental representation. As another example, the first autonomous vehicle can determine a second relative pose between the first autonomous vehicle and the third autonomous vehicle based, at least in part, on the first intermediate environmental representation and the third intermediate environmental representation. In some implementations, the first autonomous vehicle can determine a third relative pose between the second autonomous vehicle and the third autonomous vehicle based, at least in part, on the second intermediate environmental representation and the third intermediate environmental representation.
- Each relative pose can identify a displacement between one or more autonomous vehicles.
- the first relative pose can be indicative of a first displacement between the first autonomous vehicle and the second autonomous vehicle.
- the second relative pose can be indicative of a second displacement between the first autonomous vehicle and the third autonomous vehicle.
- the third relative pose can be indicative of a third displacement between the second autonomous vehicle and the third autonomous vehicle.
- Each relative pose can be determined based, at least in part, on a pair of intermediate environmental representations and/or a machine-learned regression model.
- the first intermediate environmental representation can include a first feature map encoded with a first plurality of encoded features representative of the first portion of the environment.
- the second intermediate environmental representation can include a second feature map encoded with a second plurality of encoded features representative of the second portion of the environment.
- the third intermediate environmental representation can include a third feature map encoded with a third plurality of encoded features representative of the third portion of the environment.
- Each relative pose can be determined by inputting data indicative of a pair of intermediate environmental representations into the machine-learned regression model.
- the first autonomous vehicle can determine, via the machine-learned regression model, the first relative pose between the first autonomous vehicle and the second autonomous vehicle. To do so, the first autonomous vehicle can generate an intermediate environmental representation pair by concatenating the first plurality of encoded features and the second plurality of encoded features along a features dimension.
- the first autonomous vehicle can input the intermediate environmental representation pair to the machine-learned regression model to obtain the first relative pose.
- the machine-learned regressing model can be configured to output the relative pose based, at least in part, on the intermediate environmental representation pair.
- a convolutional neural network e.g., machine-learned regression model
- An independent prediction e.g., relative pose
- the first autonomous vehicle can determine one or more absolute poses for one or more of the received intermediate environmental representations.
- Each absolute pose for example, can include updated spatial coordinates and/or orientations for each of the transmitter autonomous vehicles.
- the first autonomous vehicle can determine a first absolute pose indicative of one or more first updated spatial coordinates and/or a first updated orientation for the first autonomous vehicle.
- the first autonomous vehicle can determine a second absolute pose indicative of one or more second updated spatial coordinates and/or a second updated orientation for the second autonomous vehicle.
- the first autonomous vehicle can determine a third absolute pose indicative of one or more third updated spatial coordinates and/or a third updated orientation for the third autonomous vehicle.
- the first autonomous vehicle can determine the first, second, and/or third absolute pose based, at least in part, on the first, second, and/or third relative poses.
- the relative pose estimates can be refined from the machine-learned regressions model by finding a set of globally consistent absolute poses among the autonomous vehicle (e.g., first, second, third, etc. autonomous vehicles).
- the absolute poses can be determined using a Markov random field, where each absolute pose is a node conditioned on the estimated relative poses. The distribution of poses can be conditioned on these can have a heavy tail due to outliers.
- Each absolute pose ⁇ i can follow a multivariate student t-distribution with mean ⁇ i ⁇ 3 and scale ⁇ i ⁇ 3 ⁇ 3 given the relative poses.
- the pairwise potential can include likelihoods, weights, and/or weight priors defined below. Unary potentials may not be used.
- ⁇ ⁇ ( i , j ) p ⁇ ( ⁇ ⁇ ji ⁇ ⁇ j ) w ji ⁇ p ⁇ ( ⁇ ⁇ ji - 1 ⁇ ⁇ j ) w ji ⁇ Weighted ⁇ ⁇
- the likelihood terms p( ⁇ circumflex over ( ⁇ ) ⁇ ji ⁇ j ) and p( ⁇ circumflex over ( ⁇ ) ⁇ ji ⁇ 1 ⁇ j ), both t-distributed centered at ⁇ i can encourage the result of the relative transformation from transmitter vehicles (e.g., second, third, etc. autonomous vehicles) to stay close to a recipient vehicle (e.g., first autonomous vehicle). Both directions are included due to symmetry of the rigid transformations. In some implementations, not all pairwise transformations can provide the same amount of information. Since the machine-learned regression module can produce heavy tailed errors, the edge potentials can down weight erroneous pose regression outputs.
- a weight w ji scalar for each term in the pairwise potential p( ⁇ circumflex over ( ⁇ ) ⁇ ji ⁇ j ) w ji , so that zero weighted terms can be ignored.
- the prior distribution for each w ji can be used, where the mean of the distribution o ji —the fraction of spatial overlap between two messages.
- the absolute pose prediction can be trusted more if two messages have more spatial overlap.
- a Gamma prior: p(w ji ) ⁇ (w ji
- the first autonomous vehicle can estimate values of the absolute poses ⁇ i , the scale parameters ⁇ i , and the weights w ji that maximize the product of the pairwise potentials. This can be done using iterated conditional modes described by the algorithm:
- the maximization step on line 4 can happen simultaneously for all nodes via weighted expectation maximization (EM) for the t distribution.
- the maximization step on Line 5 can be computed using the closed form:
- the first autonomous vehicle (e.g., its onboard vehicle computing system) can generate an updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, and/or the third environmental representation.
- the updated intermediate environmental representation can be generated based, at least in part, on the first, second, and/or third relative poses.
- the updated intermediate environmental representation can be generated based, at least in part, on the first, second, and/or third absolute poses.
- the updated intermediate environmental representation can be generated using a machine-learned aggregation model (e.g., one or more portions of the machine-learned detection and prediction model, etc.).
- the machine-learned aggregation model used to create the updated intermediate environmental representation can include a neural network (and/or one or more layers thereof).
- the machine-learned aggregation model can be configured to aggregate a plurality of intermediate environmental representations from a plurality of autonomous vehicles.
- the first autonomous vehicle can determine an updated intermediate environmental representation based at least in part on the first intermediate environmental representation, the second intermediate environmental representation generated by the first transmitter autonomous vehicle, and/or the third intermediate environmental representation generated by the second transmitter autonomous vehicle.
- the first autonomous vehicle e.g., its onboard vehicle computing system
- the machine-learned aggregation model can be configured to aggregate the intermediate environmental representation(s) to create the updated intermediate environmental representation.
- the first autonomous vehicle can obtain the updated intermediate environmental representation as an output of the machine-learned aggregation model.
- the machine-learned aggregation model can include a graph neural network that includes a plurality of nodes.
- Each node of the graph neural network can correspond to a respective autonomous vehicle (e.g., first, second, third, etc. autonomous vehicle) of a plurality of autonomous vehicles within the environment of the first autonomous vehicle. This can include, for example, vehicles within the communication range of the first autonomous vehicle.
- Each node can represent a respective set of spatial coordinates (e.g., an estimated, relative, or absolute pose) of a respective autonomous vehicle (e.g., the first, second, etc. transmitter vehicles).
- the machine-learned aggregation model can be configured to transform inputted intermediate environmental representation(s) based at least in part on a set of spatial coordinates associated with the transmitter autonomous vehicle.
- the machine-learned aggregation model can be configured to initialize a node state of at least one node of the graph neural network and update the node state of the at least one node based at least in part on a spatial transformation.
- each node can maintain a state representation.
- messages can be sent between nodes of the graph and the node states for each node can be updated based, at least in part, on the aggregated received information (e.g., of the messages) using a neural network.
- Graphs for different vehicles can be different, as each vehicle can receive communications from one or more different vehicles (the set of vehicles within one vehicle's communication range may be different than the vehicles within another vehicle's communication range).
- the machine-learned aggregation model can generate a second transformed intermediate environmental representation by transforming the second intermediate environmental representation based, at least in part, on the absolute pose of the second autonomous vehicle.
- the machine-learned aggregation model can generate a second transformed intermediate environmental representation by transforming the second intermediate environmental representation based, at least in part, on the one or more second updated spatial coordinates and/or the second updated orientation for the second autonomous vehicle.
- the machine-learned aggregation model can generate a third transformed intermediate environmental representation by transforming the third intermediate environmental representation based, at least in part, on the absolute pose of the third autonomous vehicle.
- the machine-learned aggregation model can generate the third transformed intermediate environmental representation by transforming the third intermediate environmental representation based, at least in part, on the one or more third updated spatial coordinates and/or the third updated orientation for the third autonomous vehicle.
- each vehicle node of the neural network can represent the updated spatial coordinates and/or orientation of each respective autonomous vehicle.
- the first autonomous vehicle (e.g., the machine-learned aggregation model) can perform message passing to share features between the vehicle nodes.
- the second portion of the machine-learned detection and prediction model can include a machine-learned attention model configured to weigh a plurality of intermediate environmental representations before generating the updated intermediate environmental representation.
- the machine-learned attention model can determine a first weight for the second intermediate environmental representation and/or a second weight for the third intermediate environmental representation.
- Each weight can be indicative of a predicted accuracy of an estimated pose.
- the first weight can be indicative of a first predicted accuracy of the second estimated pose and/or the second weight can be indicative of a second predicted accuracy of the third estimated pose.
- the first autonomous vehicle e.g., machine-learned aggregation model
- the first autonomous vehicle can assign the second weight to the second transformed intermediate environmental representation and/or the third weight to the third transformed intermediate environmental representation.
- the updated intermediate environmental representation can be generated, via the machine-learned aggregation model (e.g., a second portion of the machine-learned detection and prediction model), based, at least in part, on the first and second weights.
- the machine-learned aggregation model (e.g., via the attention model) can prioritize messages (e.g., intermediate environmental representations) to focus on clean messages and ignore noisy ones. To do so, a weight can be assigned to each transformed intermediate environmental representation before the representations are averaged, to suppress the remaining noisy representations.
- the normalized weight a ji ⁇ can be computed by:
- a ji s ji ⁇ + ⁇ k ⁇ adj ⁇ ( i ) ⁇ s ki .
- the learned parameter ⁇ can allow the model to ignore all incoming messages if needed. In this manner, if all the incoming messages are noisy, the resulting weights can be large after the normalization.
- the first autonomous vehicle can generate an autonomy output for the first autonomous vehicle based at least in part on the updated intermediate environmental representation.
- the autonomy output can be indicative of a bounding shape associated with an object within the environment of the recipient autonomous vehicle and one or more predicted future locations of the object.
- the autonomy output can include one or more bounding boxes indicative of one or more objects within the surrounding environment of the first autonomous vehicle.
- the first autonomous vehicle can utilize the machine-learned detection and prediction model (and/or a second portion of the model).
- the machine-learned detection and prediction model can be configured to generate the autonomy output(s).
- the joint detection and prediction model can include several memories and networks.
- the joint detection and prediction model can include a plurality of network branches (e.g., a branch for detection, a branch for prediction, etc.), a plurality of memories (e.g., an appearance memory, object path memory, etc.) as well as an object path proposal network and a path refinement network.
- a plurality of network branches e.g., a branch for detection, a branch for prediction, etc.
- memories e.g., an appearance memory, object path memory, etc.
- an object path proposal network e.g., a path refinement network.
- the first autonomous vehicle can input the updated intermediate environmental representation into the machine-learned detection and prediction model and obtain the autonomy output as an output of the machine-learned detection and prediction model.
- the first autonomous vehicle can apply a set of four convolutional blocks to extract a high-level representation suitable for a perception and prediction task.
- the block helps to capture multi-scale context efficiently, which can be helpful for the prediction function.
- a feature map can be used with two network branches that output detection and motion forecasting estimates, respectively.
- the output of the detection branch can be parameterized as (x, y, w, h, ⁇ ), which denotes the position, size, and orientation of objects. This can be represented as a bounding shape associated with an object within the environment of the first autonomous vehicle.
- the output of the prediction branch can be parameterized as (x t , y t ), which denotes the object's location at future time step t. This can indicate one or more future location(s) of the object.
- the autonomy output can be indicative of the output of the detection branch and the prediction branch (e.g., a bounding shape and predicted future location(s)).
- the first autonomous vehicle (e.g., its onboard vehicle computing system) can generate a motion plan for the first autonomous vehicle based at least in part on the autonomy output.
- the first autonomous vehicle can include a motion planning system.
- the motion planning system can determine a motion plan and generate motion plan data for the first vehicle based at least in part on the autonomy output.
- the motion plan can be generated based at least in part on the autonomy output in that it can consider an object (and/or its future location(s)) described in the autonomy output when planning the motion of the first autonomous vehicle, whether that is overridden by other factors (e.g., other objects, unexpected occurrences, etc.) or eventually effects the actual motion of the vehicle.
- the motion plan data can include vehicle actions, trajectories, waypoints, etc.
- the motion planning system can include one or more machine-learned models/optimization algorithms that consider cost data associated with a vehicle action as well as other objective functions (e.g., cost functions based, at least in part, on speed limits, traffic lights, and/or other aspects of the environment), if any, to determine optimized variables that make up the motion plan data.
- the motion planning system can determine that the vehicle can perform a certain action (e.g., stop for an object) without increasing the potential risk to the vehicle and/or violating any traffic laws (e.g., speed limits, lane boundaries, signage, etc.).
- the motion plan data can include a planned trajectory, velocity, acceleration, and/or other actions.
- the first autonomous vehicle (e.g., its onboard vehicle computing system) can initiate a motion control of the first autonomous vehicle based at least in part on the motion plan.
- a motion control can include an action to implement, change, continue, and/or otherwise affect the motion of the first autonomous vehicle.
- the motion planning system can provide the motion plan data indicative of the vehicle actions, a planned trajectory, and/or other operating parameters to the vehicle control systems to implement the motion plan data for the vehicle.
- the vehicle can include a vehicle interface configured to translate the motion plan data into instructions.
- the vehicle interface can translate motion plan data into instructions for controlling the first vehicle including adjusting the steering of the vehicle “X” degrees and/or applying a certain magnitude of braking force to avoid interfering with an object indicated in the autonomy output.
- the vehicle interface can send one or more control signals to the responsible vehicle control component (e.g., braking control system, steering control system, and/or acceleration control system) to execute the instructions and implement the motion plan data.
- the responsible vehicle control component e.g., braking control system, steering control system, and/or acceleration control system
- the first vehicle can account for and control its motion with respect to object(s) outside of the normal field of view of the first vehicle but within the field of view of the transmitter autonomous vehicle(s).
- the machine-learned model(s) of the described system can be trained in several stages. For instance, a sensor backbone (e.g., LIDAR backbone for helping to create the intermediate environmental representation) and output headers can be pre-trained on a real-LIDAR single-vehicle dataset, bypassing the cross-vehicle aggregation stage.
- the loss function can be cross-entropy on a vehicle classification output and smooth 11 on the bounding box parameters.
- Hard-negative mining can be applied as well to improve performance.
- the sensor backbone e.g., LIDAR backbone
- cross-vehicle aggregation, and output header models can be jointly fine tuned on a simulated vehicle-to-vehicle dataset (described below) with synchronized inputs (e.g., no time delay) using the same loss function.
- the training can avoid the use of a temporal warping function at this stage.
- the number of connected vehicles can be randomly sampled uniformly on [0, min(c, 6)], where c is the number of candidate vehicles available. This can help ensure that the architecture of the machine-learned model(s) can handle arbitrary graph connectivity while also making sure the fraction of vehicles in the scene/ecosystem on the network remains reasonable.
- the machine-learned detection and prediction model and machine-learned attention model can be trained together. For example, identifying noisy examples can be treated as a supervised binary classification task, where clean examples can be assigned a high value and noisy examples can be assigned a low value.
- strong noise can be generated and applied to some vehicles and weak pose noise to others within one scene.
- the noise can be generated via n i ⁇ D w or n i ⁇ D s , where D w is a distribution of weak pose noises, and D s is strong noises.
- the noise can have two translational components and a rotational component, thus n i ⁇ 3 .
- a fixed proportion p of autonomous vehicles can receive noise from the strong distribution while the rest can receive noise from the weak distribution.
- the representation is considered clean when both autonomous vehicles have noise from the weak distribution and considered noisy when either vehicle has noise from the strong distribution.
- the labeling can include:
- the function can produce smooth labels to temper the attention model's predictions so that the attention weights are not just 0 or 1.
- the loss for the joint training task can be defined as follows:
- the machine-learned detection and prediction model and the machine-learned attention model can be frozen and only the regression model can be trained using L c .
- all autonomous vehicle(s) can be assigned noise from the strong noise distribution D s .
- the machine-learned regression model can be trained using a loss which is a sum of losses over each coordinate:
- L sl1 can be the smooth l 1 loss.
- the machine-learned detection and prediction model and the machine-learned regression model can be trained end-to-end via backpropagation.
- the simulated vehicle-to-vehicle dataset (used for training) can be created using a sensor simulation system.
- the sensor simulation system can use a large catalog of 3D static scenes and dynamic objects that are built upon real-world data collections to provide a set of rich and diverse assets from which to simulate new scenarios.
- the sensor simulation system can apply raycasting and machine learning to generate a realistic sensor point cloud such as, for example, a LIDAR point cloud.
- the sensor simulation system can allow for the creation of vehicle-to-vehicle scenes where a percentage of the vehicles are autonomous vehicles and generate realistic sensor data (e.g., LIDAR data) at different vehicle locations.
- the simulations can be based, at least in part, on snippets (e.g., 25-second snippets, etc.) of labeled data recorded by an autonomous vehicle in the real world, which contains temporal tracks of the bounding boxes of all agents in the scene with respect to the recording vehicle.
- snippets e.g., 25-second snippets, etc.
- a real-world snippet can be recreated in a simulated virtual world using these ground-truth tracks.
- the simulation system can replicate realistic traffic and vehicle motion for vehicle-to-vehicle dataset generation.
- actor 3D-assets can be placed into the virtual scene according to real-world labels and generate the sensor data (e.g., LIDAR point cloud data, etc.) at different candidate vehicle locations.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- each autonomous vehicle within the ecosystem can perform the functions of a transmitter autonomous vehicle and the functions of a recipient autonomous vehicle as described above.
- an autonomous vehicle can not only communicate an intermediate environmental representation to other autonomous vehicles but can also receive intermediate environmental representations from other autonomous vehicles.
- the autonomous vehicle(s) can also, or alternatively, send and receive autonomy outputs (e.g., of the joint detection/prediction model) and/or sensor data among one another.
- a computing system can include a machine-learned model that can enable the system to perceive its environment based, at least in part, on sensor data descriptive of the system's environment.
- the system can receive the sensor data, via a communication channel, that can be susceptible to poor localization from associated autonomous vehicles.
- pose errors at an associated autonomous vehicle can cause the machine-learned model to output inaccurate positions and classes of objects within the system's environment.
- the systems and methods of the present disclosure describe techniques for training the machine-learned model to identify pose errors associated with messages received over a communication channel from one or more autonomous vehicles before using the information for perceiving a system's surrounding environment.
- the information can be transformed and prioritized based, at least in part, on the identified pose errors before being used for perceiving a system's surrounding environment.
- the systems and methods described herein provide a practical improvement to autonomous vehicle safety, navigation, perception, and prediction by enabling a system to detect and compensate for pose errors common in the realm of autonomous vehicle computing technology.
- Example aspects of the present disclosure can provide an improvement to computing technology, such as autonomous vehicle computing technology.
- the present disclosure describes a computing system configured to obtain sensor data associated with an environment of the system and estimated location data indicative of a first estimated pose of the system.
- the computing system can determine a first intermediate environmental representation of at least a first portion of the environment of the first autonomous vehicle based, at least in part, on the sensor data.
- the computing system can obtain a first message from a second autonomous vehicle with a second intermediate environmental representation of at least a second portion of the environment of the first autonomous vehicle and second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the computing system can determine a first relative pose between the system and the second autonomous vehicle based, at least in part, on the intermediate environmental representation and the second intermediate environmental representation.
- the computing system can generate an updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, and the first relative pose. And the computing system can generate an autonomy output for the system based, at least in part, on the updated intermediate environmental representation.
- the computing system can accumulate and utilize newly available information in the form of one or more relative and/or absolute poses to provide a practical improvement to machine-learning technology (e.g., machine-learning training technology).
- the relative and/or absolute poses can be used to correct pose errors inherent in autonomous vehicle communications.
- the computing system can improve the accuracy and usability of autonomous vehicle communications.
- This improves the functioning of machine-learning systems and autonomous vehicle computing technology in general by increasing the reliability and accuracy of information shared by autonomous vehicles during collaborative detection and prediction tasks.
- the computing techniques disclosed herein result in more accurate machine-learned systems; thereby improving an autonomous vehicle's perception, prediction, and motion through its environment and enhancing the safety of self-driving systems.
- a computing system can include data acquisition unit(s), intermediate representation generation unit(s), relative pose unit(s), absolute pose unit(s), aggregation unit(s), detection/prediction unit(s), and/or other means for performing the operations and functions described herein.
- one or more of the units may be implemented separately.
- one or more units may be a part of or included in one or more other units.
- These means can include processor(s), microprocessor(s), graphics processing unit(s), logic circuit(s), dedicated circuit(s), application-specific integrated circuit(s), programmable array logic, field-programmable gate array(s), controller(s), microcontroller(s), and/or other suitable hardware.
- the means can also, or alternately, include software control means implemented with a processor or logic circuitry, for example.
- the means can include or otherwise be able to access memory such as, for example, one or more non-transitory computer-readable storage media, such as random-access memory, read-only memory, electrically erasable programmable read-only memory, erasable programmable read-only memory, flash/other memory device(s), data registrar(s), database(s), and/or other suitable hardware.
- the means can be programmed to perform one or more algorithm(s) for carrying out the operations and functions described herein.
- the means e.g., data acquisition unit(s), etc.
- the means can be configured to obtain, via one or more sensors, sensor data associated with a surrounding environment of the autonomous vehicle.
- the means e.g., data acquisition unit(s), etc.
- the means can be configured to obtain a first estimated location data indicative of a first estimated pose of the autonomous vehicle.
- the means e.g., data acquisition unit(s), etc.
- the first message can include a second intermediate environmental representation of at least a second portion of the surrounding environment of the first autonomous vehicle and second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the means can be configured to obtain a second message from a third autonomous vehicle.
- the second message can include a third intermediate environmental representation of at least a third portion of the surrounding environment of the autonomous vehicle and third estimated location data indicative of a third estimated pose of the third autonomous vehicle.
- the means can be configured to determine a first intermediate environmental representation of at least a first portion of the surrounding environment of the autonomous vehicle based, at least in part, on the sensor data.
- the means e.g., relative pose unit(s), etc.
- the means can be configured to determine a first relative pose between the autonomous vehicle and the second autonomous vehicle based, at least in part, on the first intermediate environmental representation and the second intermediate environmental representation, a second relative pose between the autonomous vehicle and the third autonomous vehicle based, at least in part, on the first intermediate environmental representation and the third environmental representation, and a third relative pose between the second autonomous vehicle and the third autonomous vehicle based, at least in part, on the second intermediate environmental representation and the third environmental representation.
- FIG. 1 depicts a block diagram of an example system 100 for controlling and communicating with a vehicle according to example aspects of the present disclosure.
- FIG. 1 shows a system 100 that can include a vehicle 105 and a vehicle computing system 110 associated with the vehicle 105 .
- the vehicle computing system 100 can be located onboard the vehicle 105 (e.g., it can be included on and/or within the vehicle 105 ).
- the vehicle 105 incorporating the vehicle computing system 100 can be various types of vehicles.
- the vehicle 105 can be an autonomous vehicle.
- the vehicle 105 can be a ground-based autonomous vehicle (e.g., car, truck, bus, etc.).
- the vehicle 105 can be an air-based autonomous vehicle (e.g., airplane, helicopter, vertical take-off and lift (VTOL) aircraft, etc.).
- the vehicle 105 can be a lightweight elective vehicle (e.g., bicycle, scooter, etc.).
- the vehicle 105 can be another type of vehicle (e.g., watercraft, etc.).
- the vehicle 105 can drive, navigate, operate, etc. with minimal and/or no interaction from a human operator (e.g., driver, pilot, etc.).
- a human operator can be omitted from the vehicle 105 (and/or also omitted from remote control of the vehicle 105 ).
- a human operator can be included in the vehicle 105 .
- the vehicle 105 can be configured to operate in a plurality of operating modes.
- the vehicle 105 can be configured to operate in a fully autonomous (e.g., self-driving) operating mode in which the vehicle 105 is controllable without user input (e.g., can drive and navigate with no input from a human operator present in the vehicle 105 and/or remote from the vehicle 105 ).
- the vehicle 105 can operate in a semi-autonomous operating mode in which the vehicle 105 can operate with some input from a human operator present in the vehicle 105 (and/or a human operator that is remote from the vehicle 105 ).
- the vehicle 105 can enter into a manual operating mode in which the vehicle 105 is fully controllable by a human operator (e.g., human driver, pilot, etc.) and can be prohibited and/or disabled (e.g., temporary, permanently, etc.) from performing autonomous navigation (e.g., autonomous driving, flying, etc.).
- the vehicle 105 can be configured to operate in other modes such as, for example, park and/or sleep modes (e.g., for use between tasks/actions such as waiting to provide a vehicle service, recharging, etc.).
- the vehicle 105 can implement vehicle operating assistance technology (e.g., collision mitigation system, power assist steering, etc.), for example, to help assist the human operator of the vehicle 105 (e.g., while in a manual mode, etc.).
- vehicle operating assistance technology e.g., collision mitigation system, power assist steering, etc.
- the vehicle computing system 110 can store data indicative of the operating modes of the vehicle 105 in a memory onboard the vehicle 105 .
- the operating modes can be defined by an operating mode data structure (e.g., rule, list, table, etc.) that indicates one or more operating parameters for the vehicle 105 , while in the particular operating mode.
- an operating mode data structure can indicate that the vehicle 105 is to autonomously plan its motion when in the fully autonomous operating mode.
- the vehicle computing system 110 can access the memory when implementing an operating mode.
- the operating mode of the vehicle 105 can be adjusted in a variety of manners.
- the operating mode of the vehicle 105 can be selected remotely, off-board the vehicle 105 .
- a remote computing system e.g., of a vehicle provider and/or service entity associated with the vehicle 105
- data can instruct the vehicle 105 to enter into the fully autonomous operating mode.
- the operating mode of the vehicle 105 can be set onboard and/or near the vehicle 105 .
- the vehicle computing system 110 can automatically determine when and where the vehicle 105 is to enter, change, maintain, etc. a particular operating mode (e.g., without user input).
- the operating mode of the vehicle 105 can be manually selected via one or more interfaces located onboard the vehicle 105 (e.g., key switch, button, etc.) and/or associated with a computing device proximate to the vehicle 105 (e.g., a tablet operated by authorized personnel located near the vehicle 105 ).
- the operating mode of the vehicle 105 can be adjusted by manipulating a series of interfaces in a particular order to cause the vehicle 105 to enter into a particular operating mode.
- the vehicle computing system 110 can include one or more computing devices located onboard the vehicle 105 .
- the computing device(s) can be located on and/or within the vehicle 105 .
- the computing device(s) can include various components for performing various operations and functions.
- the computing device(s) can include one or more processors and one or more tangible, non-transitory, computer readable media (e.g., memory devices, etc.).
- the one or more tangible, non-transitory, computer readable media can store instructions that when executed by the one or more processors cause the vehicle 105 (e.g., its computing system, one or more processors, etc.) to perform operations and functions, such as those described herein for controlling an autonomous vehicle, communicating with other computing systems, correcting pose errors with an aggregated feature map. etc.
- vehicle 105 e.g., its computing system, one or more processors, etc.
- the vehicle 105 can include a communications system 115 configured to allow the vehicle computing system 110 (and its computing device(s)) to communicate with other computing devices.
- the communications system 115 can include any suitable components for interfacing with one or more network(s) 120 , including, for example, transmitters, receivers, ports, controllers, antennas, and/or other suitable components that can help facilitate communication.
- the communications system 115 can include a plurality of components (e.g., antennas, transmitters, and/or receivers) that allow it to implement and utilize multiple-input, multiple-output (MIMO) technology and communication techniques.
- MIMO multiple-input, multiple-output
- the communications system 115 can also be configured to enable the vehicle 105 to communicate with and/or provide and/or receive data and/or signals from a remote computing device associated with a user 125 and/or an item (e.g., an item to be picked-up for a courier service).
- the communications system 115 can allow the vehicle 105 to locate and/or exchange communications with a user device 130 of a user 125 .
- the communications system 115 can allow communication among one or more of the system(s) on-board the vehicle 105 .
- the vehicle 105 can include one or more sensors 135 , an autonomy computing system 140 , a vehicle interface 145 , one or more vehicle control systems 150 , and other systems, as described herein.
- One or more of these systems can be configured to communicate with one another via one or more communication channels.
- the communication channel(s) can include one or more data buses (e.g., controller area network (CAN)), on-board diagnostics connector (e.g., OBD-II), and/or a combination of wired and/or wireless communication links.
- the onboard systems can send and/or receive data, messages, signals, etc. amongst one another via the communication channel(s).
- the sensor(s) 135 can be configured to acquire sensor data 155 .
- the sensor(s) 135 can be external sensors configured to acquire external sensor data. This can include sensor data associated with the surrounding environment of the vehicle 105 .
- the surrounding environment of the vehicle 105 can include/be represented in the field of view of the sensor(s) 135 .
- the sensor(s) 135 can acquire image and/or other data of the environment outside of the vehicle 105 and within a range and/or field of view of one or more of the sensor(s) 135 .
- the sensor(s) 135 can include one or more Light Detection and Ranging (LIDAR) systems, one or more Radio Detection and Ranging (RADAR) systems, one or more cameras (e.g., visible spectrum cameras, infrared cameras, etc.), one or more motion sensors, one or more audio sensors (e.g., microphones, etc.), and/or other types of imaging capture devices and/or sensors.
- LIDAR Light Detection and Ranging
- RADAR Radio Detection and Ranging
- cameras e.g., visible spectrum cameras, infrared cameras, etc.
- motion sensors e.g., motion sensors
- audio sensors e.g., microphones, etc.
- the one or more sensors can be located on various parts of the vehicle 105 including a front side, rear side, left side, right side, top, and/or bottom of the vehicle 105 .
- the sensor data 155 can include image data (e.g., 2D camera data, video data, etc.), RADAR data, LIDAR data (e.g., 3D point cloud data, etc.), audio data, and/or other types of data.
- the vehicle 105 can also include other sensors configured to acquire data associated with the vehicle 105 .
- the vehicle 105 can include inertial measurement unit(s), wheel odometry devices, and/or other sensors.
- the sensor data 155 can be indicative of one or more objects within the surrounding environment of the vehicle 105 .
- the object(s) can include, for example, vehicles, pedestrians, bicycles, and/or other objects.
- the object(s) can be located in front of, to the rear of, to the side of, above, below the vehicle 105 , etc.
- the sensor data 155 can be indicative of locations associated with the object(s) within the surrounding environment of the vehicle 105 at one or more times.
- the object(s) can be static objects (e.g., not in motion) and/or dynamic objects/actors (e.g., in motion or likely to be in motion) in the vehicle's environment.
- the sensor(s) 135 can provide the sensor data 155 to the autonomy computing system 140 .
- the autonomy computing system 140 can obtain map data 160 .
- the map data 160 can provide detailed information about the surrounding environment of the vehicle 105 and/or the geographic area in which the vehicle was, is, and/or will be located.
- the map data 160 can provide information regarding: the identity and location of different roadways, road segments, buildings, or other items or objects (e.g., lampposts, crosswalks and/or curb); the location and directions of traffic lanes (e.g., the location and direction of a parking lane, a turning lane, a bicycle lane, or other lanes within a particular roadway or other travel way and/or one or more boundary markings associated therewith); traffic control data (e.g., the location and instructions of signage, traffic lights, and/or other traffic control devices); obstruction information (e.g., temporary or permanent blockages, etc.); event data (e.g., road closures/traffic rule alterations due to parades, concerts, sporting events, etc.); nominal vehicle path data (e.g.
- the map data 160 can include high definition map data. In some implementations, the map data 160 can include sparse map data indicative of a limited number of environmental features (e.g., lane boundaries, etc.). In some implementations, the map data can be limited to geographic area(s) and/or operating domains in which the vehicle 105 (or autonomous vehicles generally) may travel (e.g., due to legal/regulatory constraints, autonomy capabilities, and/or other factors).
- the vehicle 105 can include a positioning system 165 .
- the positioning system 165 can determine a current position of the vehicle 105 . This can help the vehicle 105 localize itself within its environment.
- the positioning system 165 can be any device or circuitry for analyzing the position of the vehicle 105 .
- the positioning system 165 can determine position by using one or more of inertial sensors (e.g., inertial measurement unit(s), etc.), a satellite positioning system, based on IP address, by using triangulation and/or proximity to network access points or other network components (e.g., cellular towers, WiFi access points, etc.) and/or other suitable techniques.
- the position of the vehicle 105 can be used by various systems of the vehicle computing system 110 and/or provided to a remote computing system.
- the autonomy computing system 140 can perform various functions for autonomously operating the vehicle 105 .
- the autonomy computing system 140 can perform the following functions: perception 170 A, prediction 170 B, and motion planning 170 C.
- the autonomy computing system 130 can obtain the sensor data 155 via the sensor(s) 135 , process the sensor data 155 (and/or other data) to perceive its surrounding environment, predict the motion of objects within the surrounding environment, and generate an appropriate motion plan through such surrounding environment.
- these autonomy functions can be performed by one or more sub-systems such as, for example, a perception system, a prediction system, a motion planning system, and/or other systems that cooperate to perceive the surrounding environment of the vehicle 105 and determine a motion plan for controlling the motion of the vehicle 105 accordingly.
- one or more of the perception, prediction, and/or motion planning functions 170 A, 170 B, 170 C can be performed by (and/or combined into) the same system and/or via shared computing resources. In some implementations, one or more of these functions can be performed via different sub-systems. As further described herein, the autonomy computing system 140 can communicate with the one or more vehicle control systems 150 to operate the vehicle 105 according to the motion plan (e.g., via the vehicle interface 145 , etc.).
- the vehicle computing system 110 can generate perception data 175 A that is indicative of one or more states (e.g., current and/or past state(s)) of one or more objects that are within a surrounding environment of the vehicle 105 .
- the perception data 175 A for each object can describe (e.g., for a given time, time period) an estimate of the object's: current and/or past location (also referred to as position); current and/or past speed/velocity; current and/or past acceleration; current and/or past heading; current and/or past orientation; size/footprint (e.g., as represented by a bounding shape, object highlighting, etc.); class (e.g., pedestrian class vs. vehicle class vs.
- the vehicle computing system 110 can be configured to predict a motion of the object(s) within the surrounding environment of the vehicle 105 .
- the vehicle computing system 110 can generate prediction data 175 B associated with such object(s).
- the prediction data 175 B can be indicative of one or more predicted future locations of each respective object.
- the prediction system 175 B can determine a predicted motion trajectory along which a respective object is predicted to travel over time.
- a predicted motion trajectory can be indicative of a path that the object is predicted to traverse and an associated timing with which the object is predicted to travel along the path.
- the predicted path can include and/or be made up of a plurality of way points.
- the prediction data 175 B can be indicative of the speed and/or acceleration at which the respective object is predicted to travel along its associated predicted motion trajectory.
- the vehicle computing system 110 can utilize one or more algorithms and/or machine-learned model(s) that are configured to predict the future motion of object(s) based at least in part on the sensor data 155 , the perception data 175 A, map data 160 , and/or other data. This can include, for example, one or more neural networks trained to predict the motion of the object(s) within the surrounding environment of the vehicle 105 based at least in part on the past and/or current state(s) of those objects as well as the environment in which the objects are located (e.g., the lane boundary in which it is travelling, etc.).
- the prediction data 175 B can be utilized for the motion planning function 170 C of the autonomy computing system 140 .
- the motion planning system 180 can implement an optimization algorithm, machine-learned model, etc. that considers cost data associated with a vehicle action as well as other objective functions (e.g., cost functions based on speed limits, traffic lights, etc.), if any, to determine optimized variables that make up the motion plan.
- the vehicle computing system 110 can determine that the vehicle 105 can perform a certain action (e.g., pass an object, etc.) without increasing the potential risk to the vehicle 105 and/or violating any traffic laws (e.g., speed limits, lane boundaries, signage, etc.). For instance, the vehicle computing system 110 can evaluate the predicted motion trajectories of one or more objects during its cost data analysis to help determine an optimized vehicle trajectory through the surrounding environment.
- the vehicle computing system 110 can be configured to continuously update the vehicle's motion plan and corresponding planned vehicle motion trajectories. For example, in some implementations, the vehicle computing system 110 can generate new motion planning data 175 C/motion plan(s) for the vehicle 105 (e.g., multiple times per second, etc.). Each new motion plan can describe a motion of the vehicle 105 over the next planning period (e.g., next several seconds, etc.). Moreover, a new motion plan may include a new planned vehicle motion trajectory. Thus, in some implementations, the vehicle computing system 110 can continuously operate to revise or otherwise generate a short-term motion plan based on the currently available data. Once the optimization planner has identified the optimal motion plan (or some other iterative break occurs), the optimal motion plan (and the planned motion trajectory) can be selected and executed by the vehicle 105 .
- the optimization planner Once the optimization planner has identified the optimal motion plan (or some other iterative break occurs), the optimal motion plan (and the planned motion trajectory) can be selected and executed by the vehicle 105 .
- the vehicle computing system 110 can cause the vehicle 105 to initiate a motion control in accordance with at least a portion of the motion planning data 175 C.
- a motion control can be an operation, action, etc. that is associated with controlling the motion of the vehicle 105 .
- the motion planning data 175 C can be provided to the vehicle control system(s) 150 of the vehicle 105 .
- the vehicle control system(s) 150 can be associated with a vehicle interface 145 that is configured to implement a motion plan.
- the vehicle interface 145 can serve as an interface/conduit between the autonomy computing system 140 and the vehicle control systems 150 of the vehicle 105 and any electrical/mechanical controllers associated therewith.
- the vehicle interface 145 can, for example, translate a motion plan into instructions for the appropriate vehicle control component (e.g., acceleration control, brake control, steering control, etc.).
- the vehicle interface 145 can translate a determined motion plan into instructions to adjust the steering of the vehicle 105 “X” degrees, apply a certain magnitude of braking force, increase/decrease speed, etc.
- the vehicle interface 145 can help facilitate the responsible vehicle control (e.g., braking control system, steering control system, acceleration control system, etc.) to execute the instructions and implement a motion plan (e.g., by sending control signal(s), making the translated plan available, etc.). This can allow the vehicle 105 to autonomously travel within the vehicle's surrounding environment.
- the vehicle computing system 110 can store other types of data. For example, an indication, record, and/or other data indicative of the state of the vehicle (e.g., its location, motion trajectory, health information, etc.), the state of one or more users (e.g., passengers, operators, etc.) of the vehicle, and/or the state of an environment including one or more objects (e.g., the physical dimensions and/or appearance of the one or more objects, locations, predicted motion, etc.) can be stored locally in one or more memory devices of the vehicle 105 .
- an indication, record, and/or other data indicative of the state of the vehicle e.g., its location, motion trajectory, health information, etc.
- the state of one or more users e.g., passengers, operators, etc.
- the state of an environment including one or more objects e.g., the physical dimensions and/or appearance of the one or more objects, locations, predicted motion, etc.
- the vehicle 105 can communicate data indicative of the state of the vehicle, the state of one or more passengers of the vehicle, and/or the state of an environment to a computing system that is remote from the vehicle 105 , which can store such information in one or more memories remote from the vehicle 105 . Moreover, the vehicle 105 can provide any of the data created and/or store onboard the vehicle 105 to another vehicle.
- the vehicle computing system 110 can include the one or more vehicle user devices 180 .
- the vehicle computing system 110 can include one or more user devices with one or more display devices located onboard the vehicle 15 .
- a display device e.g., screen of a tablet, laptop, and/or smartphone
- a display device can be viewable by a user of the vehicle 105 that is located in the front of the vehicle 105 (e.g., driver's seat, front passenger seat).
- a display device can be viewable by a user of the vehicle 105 that is located in the rear of the vehicle 105 (e.g., a back passenger seat).
- the user device(s) associated with the display devices can be any type of user device such as, for example, a table, mobile phone, laptop, etc.
- the vehicle user device(s) 180 can be configured to function as human-machine interfaces.
- the vehicle user device(s) 180 can be configured to obtain user input, which can then be utilized by the vehicle computing system 110 and/or another computing system (e.g., a remote computing system, etc.).
- a user e.g., a passenger for transportation service, a vehicle operator, etc.
- the vehicle computing system 110 and/or another computing system can update the destination location of the vehicle 105 and the route associated therewith to reflect the change indicated by the user input.
- the vehicle 105 can be configured to perform vehicle services for one or a plurality of different service entities 185 .
- a vehicle 105 can perform a vehicle service by, for example and as further described herein, travelling (e.g., traveling autonomously) to a location associated with a requested vehicle service, allowing user(s) and/or item(s) to board or otherwise enter the vehicle 105 , transporting the user(s) and/or item(s), allowing the user(s) and/or item(s) to deboard or otherwise exit the vehicle 105 , etc.
- travelling e.g., traveling autonomously
- the vehicle 105 can provide the vehicle service(s) for a service entity to a user.
- a service entity 185 can be associated with the provision of one or more vehicle services.
- a service entity can be an individual, a group of individuals, a company (e.g., a business entity, organization, etc.), a group of entities (e.g., affiliated companies), and/or another type of entity that offers and/or coordinates the provision of one or more vehicle services to one or more users.
- a service entity can offer vehicle service(s) to users via one or more software applications (e.g., that are downloaded onto a user computing device), via a website, and/or via other types of interfaces that allow a user to request a vehicle service.
- the vehicle services can include transportation services (e.g., by which a vehicle transports user(s) from one location to another), delivery services (e.g., by which a vehicle transports/delivers item(s) to a requested destination location), courier services (e.g., by which a vehicle retrieves item(s) from a requested origin location and transports/delivers the item to a requested destination location), and/or other types of services.
- the vehicle services can be wholly performed by the vehicle 105 (e.g., travelling from the user/item origin to the ultimate destination, etc.) or performed by one or more vehicles and/or modes of transportation (e.g., transferring the user/item at intermediate transfer points, etc.).
- An operations computing system 190 A of the service entity 185 can help to coordinate the performance of vehicle services by autonomous vehicles.
- the operations computing system 190 A can include and/or implement one or more service platforms of the service entity.
- the operations computing system 190 A can include one or more computing devices.
- the computing device(s) can include various components for performing various operations and functions.
- the computing device(s) can include one or more processors and one or more tangible, non-transitory, computer readable media (e.g., memory devices, etc.).
- the one or more tangible, non-transitory, computer readable media can store instructions that when executed by the one or more processors cause the operations computing system 190 (e.g., its one or more processors, etc.) to perform operations and functions, such as those described herein for providing an intermediate environmental representation, etc.
- the operations computing system 190 e.g., its one or more processors, etc.
- a user 125 can request a vehicle service from a service entity 185 .
- the user 125 can provide user input to a user device 130 to request a vehicle service (e.g., via a user interface associated with a mobile software application of the service entity 185 running on the user device 130 ).
- the user device 130 can communicate data indicative of a vehicle service request 195 to the operations computing system 190 A associated with the service entity 185 (and/or another associated computing system that can then communicate data to the operations computing system 190 A).
- the vehicle service request 195 can be associated with a user.
- the associated user can be the one that submits the vehicle service request (e.g., via an application on the user device 130 ). In some implementations, the user may not be the user that submits the vehicle service request.
- the vehicle service request can be indicative of the user.
- the vehicle service request can include an identifier associated with the user and/or the user's profile/account with the service entity 185 .
- the vehicle service request 195 can be generated in a manner that avoids the use of personally identifiable information and/or allows the user to control the types of information included in the vehicle service request 195 .
- the vehicle service request 195 can also be generated, communicated, stored, etc. in a secure manner to protect information.
- the vehicle service request 195 can indicate various types of information.
- the vehicle service request 194 can indicate the type of vehicle service that is desired (e.g., a transportation service, a delivery service, a courier service, etc.), one or more locations (e.g., an origin location, a destination location, etc.), timing constraints (e.g., pick-up time, drop-off time, deadlines, etc.), and/or geographic constraints (e.g., to stay within a certain area, etc.).
- the service request 195 can indicate a type/size/class of vehicle such as, for example, a sedan, an SUV, luxury vehicle, standard vehicle, etc.
- the service request 195 can indicate a product of the service entity 185 .
- the service request 195 can indicate that the user is requesting a transportation pool product by which the user would potentially share the vehicle (and costs) with other users/items.
- the service request 195 can explicitly request for the vehicle service to be provided by an autonomous vehicle or a human-driven vehicle.
- the service request 195 can indicate a number of users that will be riding in the vehicle/utilizing the vehicle service.
- the service request 195 can indicate preferences/special accommodations of an associated user (e.g., music preferences, climate preferences, wheelchair accessibility, etc.) and/or other information.
- the operations computing system 190 A of the service entity 185 can process the data indicative of the vehicle service request 195 and generate a vehicle service assignment that is associated with the vehicle service request.
- the operations computing system can identify one or more vehicles that may be able to perform the requested vehicle services to the user 195 .
- the operations computing system 190 A can identify which modes of transportation are available to a user for the requested vehicle service (e.g., light electric vehicles, human-drive vehicles, autonomous vehicles, aerial vehicle, etc.) and/or the number of transportation modes/legs of a potential itinerary of the user for completing the vehicle service (e.g., single or plurality of modes, single or plurality of legs, etc.).
- the operations computing system 190 A can determined which autonomous vehicle(s) are online with the service entity 185 (e.g., available for a vehicle service assignment, addressing a vehicle service assignment, etc.) to help identify which autonomous vehicle(s) would be able to provide the vehicle service.
- the service entity 185 e.g., available for a vehicle service assignment, addressing a vehicle service assignment, etc.
- the operations computing system 190 A and/or the vehicle computing system 110 can communicate with one or more other computing systems 190 B that are remote from the vehicle 105 .
- This can include, for example, computing systems associated with government functions (e.g., emergency services, regulatory bodies, etc.), computing systems associated with vehicle providers other than the service entity, computing systems of other vehicles (e.g., other autonomous vehicles, aerial vehicles, etc.).
- Communication with the other computing systems 190 B can occur via the network(s) 120 .
- Autonomous vehicles can operate within geographic areas or have operating domains that can include other autonomous vehicles.
- FIG. 2 depicts an example geographic area 200 with a vehicle ecosystem according to example embodiments of the present disclosure.
- a plurality of autonomous vehicles 202 , 204 , 206 can be located within the geographic area 200 .
- the plurality of autonomous vehicles can include, for example, a first autonomous vehicle 202 , a second autonomous vehicle 204 , a third autonomous vehicle 206 , etc.
- the first autonomous vehicle 202 , second autonomous vehicle 204 , and/or third autonomous vehicle 206 can be associated with the same fleet operator/owner or a different fleet operator/owner.
- the first autonomous vehicle 202 , second autonomous vehicle 204 , and/or third autonomous vehicle 206 can be the same or different types and/or versions of vehicle.
- Each of the first autonomous vehicle 202 , second autonomous vehicle 204 , and/or third autonomous vehicle 206 can be configured similar to the vehicle 102 described with reference to FIG. 1 and can include the same or similar vehicle computing system(s), functions, operations, etc.
- the first autonomous vehicle 202 , second autonomous vehicle 204 , and/or third autonomous vehicle 206 can be utilized to provide a vehicle service.
- an autonomous vehicle 202 , 204 , 206 can perform vehicle services for one or more service entities.
- a service entity can be associated with the provision of one or more vehicle services.
- a service entity can be an individual, a group of individuals, a company (e.g., a business entity, organization, etc.), a group of entities (e.g., affiliated companies), and/or another type of entity that offers and/or coordinates the provision of vehicle service(s) to one or more users.
- a service entity can offer vehicle service(s) to users via a software application (e.g., on a user computing device), via a website, and/or via other types of interfaces that allow a user to request a vehicle service.
- vehicle services can include user transportation services (e.g., by which the vehicle transports user(s) from one location to another), delivery services (e.g., by which a vehicle delivers item(s) to a requested destination location), courier services (e.g., by which a vehicle retrieves item(s) from a requested origin location and delivers the item to a requested destination location), and/or other types of services.
- An operations computing system (e.g., operations computing system 104 of FIG. 1 ) of the service entity can help to coordinate the performance of vehicle services by autonomous vehicles 202 , 204 , 206 .
- the operations computing system 104 can include a service platform.
- the service platform can include a plurality of back-end services and front-end interfaces, which are accessible via one or more APIs.
- an autonomous vehicle 202 , 204 , 206 and/or another computing system that is remote from the autonomous vehicle (but associated therewith) can communicate/access the service platform (and its backend services) by calling the one or more APIs.
- Such components can facilitate secure, bidirectional communications between autonomous vehicles and/or the service entity's operations system (e.g., including a data center, etc.).
- the service platform can allow an autonomous vehicle 202 , 204 , 206 to obtain data from and/or communicate data to the operations computing system.
- a user can provide (e.g., via a user device) a request for a vehicle service to the operations computing system associated with the service entity.
- the request can indicate the type of vehicle service that the user desires (e.g., a user transportation service, a delivery service, etc.), one or more locations (e.g., an origin, destination, etc.), timing constraints (e.g., pick-up time, drop-off time, deadlines, etc.), a number of user(s) and/or items to be transported in the vehicle, other service parameters (e.g., a need for handicap access, handle with care instructions, etc.), and/or other information.
- the operations computing system of the service entity can process the request and identify one or more autonomous vehicles 202 , 204 , 206 that may be able to perform the requested vehicle services for the user.
- the operations computing system can identify which autonomous vehicle(s) are online with the service entity (e.g., available for a vehicle service assignment, addressing a vehicle service assignment, etc.).
- An autonomous vehicle 202 , 204 , 206 can go online with a service entity by, for example, connecting with the service entity's operations computing system (e.g., the service platform) so that the vehicle's computing system can communicate with the operations computing system via one or more networks.
- the operations computing system can communicate a vehicle service assignment indicative of the requested vehicle services and/or other data to the autonomous vehicle 202 , 204 , 206 .
- the geographic area 200 can include one or more travel ways (e.g., roadways, etc.) and one or more geographic features (e.g., cross walks, lane boundaries, etc.).
- the geographic area 200 can include one or more infrastructure elements 208 that include systems with communication technology capable of communicating with one or more of the autonomous vehicles 202 , 204 , 206 within the geographic area 200 .
- the infrastructure elements 208 can provide sensor data and/or intermediate environmental representations to the autonomous vehicles 202 , 204 , 206 .
- the infrastructure element(s) 208 can include, for example, a sensor suite/system attached and/or included in a building or other structure such as, for example, a lamppost, traffic light, etc.
- the geographic area 200 can also include one or more objects.
- the objects can include, for example, static object(s) (e.g., lampposts, parking meters, etc.) and/or dynamic actor objects (e.g., pedestrians, vehicles, bicycles/bicyclists, etc.) that are and/or may be in motion.
- the geographic area 200 can include an object 212 .
- the object 212 can be within the field of view of sensors associated with one or more of the autonomous vehicles within the geographic area 200 (e.g., first autonomous vehicle 202 , third autonomous vehicle).
- the object 212 can be occluded from a field of view of one or more sensors of the second autonomous vehicle 204 .
- the object 212 can include, for example, a user that is associated with a service request for a vehicle service.
- An autonomous vehicle 202 , 204 , 206 can be assigned to the service request and, thus, may attempt to detect the user so that the user can board, enter, etc. the assigned autonomous vehicle, the vehicle's doors can be unlocked, the vehicle can send a message to the user, etc.
- the geographic area 200 can include one or more non-autonomous vehicles such as, for example, non-autonomous vehicle 214 .
- Each of the plurality of autonomous vehicles 202 , 204 , 206 can include a communication system that allows the respective vehicle's computing system to communicate with system(s) that are remote from the autonomous vehicle 202 , 204 , 206 .
- an autonomous vehicle 202 , 204 , 206 can utilize its communication system to send and receive data (e.g., via an internet connection) from a cloud-based server system that helps support the autonomous vehicle 202 , 204 , 206 .
- This can include, for example, an offboard service assignment system that matches the autonomous vehicle to a request for a vehicle service (e.g., rideshare service), a routing system that provides the autonomous vehicle 202 , 204 , 206 with a route to follow for performing the vehicle service, a remote assistance system that can provide remote assistance to a vehicle, and/or other systems.
- a vehicle service e.g., rideshare service
- a routing system that provides the autonomous vehicle 202 , 204 , 206 with a route to follow for performing the vehicle service
- a remote assistance system that can provide remote assistance to a vehicle, and/or other systems.
- Each autonomous vehicle 202 , 204 , 206 can also have a communication range that allows the autonomous vehicle 202 , 204 , 206 to communicate with systems nearby the autonomous vehicle 202 , 204 , 206 .
- a first autonomous vehicle 202 e.g., a recipient vehicle
- the communication range 210 can be based at least in part on the vehicle's communication hardware (e.g., antenna, etc.). Additionally, or alternatively, the communication range 210 can be based at least in part on the communication protocol utilized by the first autonomous vehicle 202 .
- the communication range 210 can be represented by a radial distance from the first autonomous vehicle 202 .
- the autonomous vehicle 202 can communicate with autonomous vehicle(s) 204 , 206 within the communication range 210 .
- the second autonomous vehicle 204 (a “transmitter autonomous vehicle”) can communicate data to the first autonomous vehicle 202 (“a recipient autonomous vehicle” that is different from the second autonomous vehicle 204 ) that is within the first communication range 210 of the first autonomous vehicle 202 .
- the first autonomous vehicle 202 can receive data from any number of autonomous vehicles (e.g., a third autonomous vehicle 206 (e.g., “a second transmitter vehicle”)) within the communication range 210 of the autonomous vehicle 202 .
- a third autonomous vehicle 206 e.g., “a second transmitter vehicle”
- the systems and methods of the present disclosure can allow the ecosystem of autonomous vehicles/systems within a geographic area 200 to provide inter-vehicle/system communications that improve the vehicles'/systems' autonomous operations while reducing the communication bandwidth and potential information loss associated with doing so.
- FIG. 3 depicts a diagram of an architecture 300 of example machine-learned model(s) that can allow for such improvements.
- the diagram illustrates the architecture 300 across two systems. This can include a transmitter system 302 and a recipient system 304 .
- the recipient system 304 can include a first/recipient autonomous vehicle 202 (e.g., its onboard vehicle computing system) and the transmitter system 302 can include a second/transmitter autonomous vehicle 204 (e.g., its onboard vehicle computing system).
- the recipient system 304 can be a first computing system of a vehicle, robot, infrastructure element, etc. and the transmitter system 302 can be another computing system of a vehicle, robot, infrastructure element, etc.
- the following description is provided within the example context of the first and second autonomous vehicles 202 , 204 communicating with one another.
- the transmitter system 302 can obtain (e.g., via the one or more sensors of the first autonomous vehicle 204 ) sensor data 306 associated with an environment (e.g., a portion of a surrounding environment of another, recipient autonomous vehicle) of the transmitter system 302 (e.g., the second autonomous vehicle 204 ).
- the sensor data 306 can include one or more types of sensor data associated with one or more sensor modalities.
- the sensor data 306 can include three-dimensional point cloud data (e.g., three-dimensional LIDAR point cloud data.).
- the sensor data 306 can include a fusion of different types of sensor data.
- the sensor data 306 can include a first type of sensor data (e.g., camera image data) and a second type of sensor data (e.g., LIDAR data).
- the first type of sensor data can be associated with a first sensor modality (e.g., stereo camera) and the second type of sensor data (e.g., LIDAR data) can be associated with a second sensor modality (e.g., LIDAR system).
- the sensor data 306 can include data acquired by multiple different systems (e.g., autonomous vehicles).
- the sensor data 306 can include a first set of sensor data (e.g., a first set of LIDAR data) acquired by the transmitter system 302 (e.g., the second autonomous vehicle 204 ) and a second set of sensor data (e.g., a second set of LIDAR data) that was acquired by another system (e.g., another autonomous vehicle 202 , 206 ) in its ecosystem and sent to the transmitter system 302 (e.g., the second autonomous vehicle 204 ).
- a first set of sensor data e.g., a first set of LIDAR data
- the transmitter system 302 e.g., the second autonomous vehicle 204
- a second set of sensor data e.g., a second set of LIDAR data
- the transmitter system 302 (e.g., the second autonomous vehicle 204 , its onboard computing system, etc.) can generate an intermediate environmental representation 310 for use by another vehicle (e.g., a first/recipient autonomous vehicle 202 ).
- the intermediate representation 310 can be generated based, at least in part, on the sensor data 306 and can be descriptive of at least a portion of an environment of the other autonomous vehicle.
- the transmitter system 302 can determine the intermediate representation 310 of at least a portion of the surrounding environment of a second/transmitter autonomous vehicle based at least in part on the sensor data 306 .
- the intermediate representation 310 can include a feature map indicative of at least a portion of the environment.
- the transmitter system 302 can generate voxelized sensor data by voxelizing the three-dimensional point cloud data of the sensor data 306 .
- the transmitter system 302 can extract raw features from its LIDAR sensor data and transform them into a bird's eye view (BEV).
- the features can include, for example, a heading and/or shape of an object indicated in the intermediate environmental representation 310 .
- the transmitter system 302 can voxelize the LIDAR point clouds (e.g., in 15 cm 3 , etc.), apply several convolutional layers, and output feature maps of shape H ⁇ W ⁇ C, where H ⁇ W denotes the scene range in BEV, and C is the number of feature channels.
- FIG. 4 depicts a visualization of an example intermediate environmental representation 400 according to example embodiments of the present disclosure.
- the intermediate environmental representation 310 can be generated based, at least in part, on a machine-learned model 308 (e.g., a joint detection and prediction model) associated with the transmitting/recipient autonomous vehicles.
- the machine-learned model 308 associated with the vehicles can include a machine-learned model utilized by the transmitter/recipient systems 302 , 304 to perceive (e.g., identify) one or more objects within a surrounding environment and/or predict the motion of the one or more objects.
- the transmitter system 302 can generate the intermediate environmental representation 310 , via a first portion 308 of the machine-learned model (e.g., a convolutional neural network, etc.), based, at least in part, on the sensor data 306 .
- a first portion 308 of the machine-learned model e.g., a convolutional neural network, etc.
- the first portion 308 can include one or more convolutional layers of the machine-learned model.
- the transmitter autonomous vehicle can input the voxelized sensor data into the machine-learned model 308 .
- the machine-learned model 308 can be configured to apply one or more convolutional layers to the voxelized sensor data.
- the machine-learned model 308 can utilize a plurality of layers (e.g., three layers, etc.) of 3 ⁇ 3 convolution filters (e.g., with strides of 2, 1, 2 respectively) to produce the intermediate environmental representation 310 .
- the intermediate environmental representation 310 can be, for example, a 4 ⁇ down sampled spatial feature map.
- the feature map can be represented as a matrix generated from the array(s) of the sensor data.
- the transmitter system 302 can receive the intermediate environmental representation 310 (e.g., an activation of intermediate neural network layers, etc.) as an output of the first portion 308 (e.g., one or more first (e.g., intermediate) neural network layers) of the machine-learned model.
- the intermediate environmental representation 310 e.g., an activation of intermediate neural network layers, etc.
- the first portion 308 e.g., one or more first (e.g., intermediate) neural network layers
- the transmitter system 302 (e.g., the second autonomous vehicle 204 , its onboard vehicle computing system) can generate a compressed intermediate environmental representation 312 by compressing the intermediated environmental representation 310 of at least the portion of the environment of the transmitter system 302 (e.g., the second autonomous vehicle 204 ).
- the transmitter system 302 can include a machine-learned compressor model 314 .
- the machine-learned compressor model 314 can include an encoder, a quantizer, and entropy coding.
- a variational image compression algorithm can be used, where a convolutional neural network learns to compress an input (e.g., the first intermediate environmental representation 310 ) with the help of a learned hyperprior.
- the latent representation can then be quantized and further encoded losslessly.
- the feature map of the intermediate environmental representation 310 (e.g., a rectangular tensor) can be compressed into a bitstream.
- the transmitter system 302 (e.g., the second autonomous vehicle 204 , its onboard vehicle computing system, etc.) can determine a recipient to which to communicate the first compressed intermediate environmental representation 310 from among a plurality of potential recipients.
- the transmitter system 302 can select the recipient system 304 based at least in part on a communication range of the transmitter system 302 .
- the transmitter system 302 can determine that it will communicate the compressed intermediate environmental representation 312 to one or more of the other computing systems within its communication range.
- the second autonomous vehicle 204 can select the first autonomous vehicle 202 , from among a plurality of autonomous vehicles, based at least in part on a communication range 210 of the autonomous vehicle 202 .
- the second autonomous vehicle 204 can be within the communication range 210 of the first autonomous vehicle 202 and thus may provide the intermediate environmental representation 310 to the first autonomous vehicle 202 .
- the transmitter system 302 (e.g., the second autonomous vehicle 204 , its onboard vehicle computing system, etc.) can communicate the compressed intermediate environmental representation 312 to the recipient system 304 (e.g., the first autonomous vehicle 202 , its onboard vehicle computing system, etc.).
- the compressed intermediate environmental representation 312 can also be associated with a first time T 1 .
- the first time T 1 can be associated with a sensor timestamp of the transmitter system 302 (e.g., the second autonomous vehicle 204 ).
- the sensor timestamp can be indicative of when the sensor data 306 (e.g., utilized to generate the intermediate environmental representation 310 ) was acquired by the sensors of the transmitter system 302 (e.g., second autonomous vehicle 204 ).
- the recipient system 304 (e.g., the first autonomous vehicle 204 , its onboard computing system, etc.) can obtain the compressed intermediate environmental representation 312 , from the transmitter system 302 (e.g., the second autonomous vehicle 204 , its onboard vehicle computing system, etc.).
- the compressed intermediate environmental representation 312 can be indicative of at least a portion of an environment of the recipient system 304 (e.g., the first autonomous vehicle 202 ).
- the compressed intermediate environmental representation 312 can be a compressed feature map that was generated based on sensor data captured within a field of view of at least a portion of the environment of the recipient system 304 .
- the recipient system 304 (e.g., the first autonomous vehicle 202 ) can also obtain compressed intermediate environmental representations from one or more other transmitter systems (e.g., autonomous vehicle 206 , infrastructure element 208 , etc.).
- the recipient system 304 can generate a decompressed intermediate environmental representation 316 by decompressing the compressed intermediate environmental representation 312 .
- the recipient system 304 can include a decompressor model 318 that includes a decoder and entropy decoding techniques.
- the decompressed intermediate environmental representation 316 can be generated via the application of the decompressor model 318 .
- the recipient system 304 (e.g., the first autonomous vehicle 202 , its onboard computing system, etc.) can determine, using one or more machine-learned models 320 A-B, an updated intermediate environmental representation 322 based at least in part on the decompressed intermediate environmental representation 316 (e.g., originally generated by the second autonomous vehicle 204 ) and a second intermediate environmental representation 324 generated by the recipient system 304 (e.g., the first autonomous vehicle 202 , its onboard computing system, etc.).
- the recipient system 304 (e.g., the first autonomous vehicle 202 , its onboard computing system, etc.) can obtain the second intermediate environmental representation 324 generated by the recipient system 304 (e.g., the first autonomous vehicle 202 , its onboard computing system, etc.).
- the second intermediate environmental representation 324 can be generated by the recipient system 304 in a manner similar to that previously described with respect to the transmitter system 302 .
- the recipient system 304 e.g., the first autonomous vehicle 202 , its onboard computing system, etc.
- the recipient system 304 can obtain sensor data 326 via one or more sensors of the recipient system 304 .
- the recipient system 304 can determine the second intermediate environmental representation 324 based at least in part on the sensor data 326 obtained via the one or more sensors of the recipient system 304 .
- This can be done utilizing a machine-learned model 328 (e.g., a first portion of the machine learned model) that is configured/trained similar to or the same as the machine-learned model 308 (of the transmitter system 302 ).
- the machine-learned model 328 can be configured to extract features from the sensor data (e.g., from voxelized three-dimensional sensor data) to create a downsampled spatial feature map.
- the one or more models 320 A-B used to create the updated intermediate environmental representation 322 can include a machine-learned time correction model 320 A.
- the machine-learned time correction model 320 A can include, for example, a neural network.
- the machine-learned time correction model 320 A can be a time delay correction model configured to adjust the decompressed intermediate environmental representation 316 to account for a time delay.
- the compressed intermediate environmental representation 312 can be based at least in part on sensor data 306 acquired by the transmitter system 302 (e.g., the second autonomous vehicle 204 , its onboard computing system, etc.) at a first time T 1 .
- the first time can be, for example, associated with a sensor timestamp of the second autonomous vehicle 202 . This can be a sensor timestamp reflecting the time at which the sensor acquired the sensor data 306 .
- the machine-learned time correction model 320 A can be configured to compensate for time differences between a plurality of times.
- the recipient system 304 can determine, using the machine-learned time correction model 320 A, a time-corrected intermediate environmental representation 330 based at least in part on the first decompressed intermediate environmental representation 316 .
- the time-corrected intermediate environmental representation 330 can be adjusted based at least in part on a time difference between the first time T 1 and a second time T 2 associated with the recipient system 304 .
- the second time can be, for example, indicative of a time at which the recipient system 304 intends to perceive the environment, a time at which the recipient system 304 has acquired its own sensor data to be used for autonomous operations (as described herein), and/or another time.
- the first time T 1 and the second time T 2 can be based on similar time scales/references to help ensure accuracy.
- the first time T 1 and the second time T 2 can be based on global positioning system data.
- the time-corrected intermediate environmental representation 330 can account for a time delay associated with the compressed intermediate environmental representation 312 obtained from the transmitter system 302 via the use of one or more machine-learned models (e.g., a time delay correction model, etc.).
- the recipient system 304 (e.g., a first autonomous vehicle 202 , etc.) can generate an updated intermediate environmental representation 322 based at least in part on the time-corrected intermediate environmental representation 330 .
- the recipient system 304 e.g., the first autonomous vehicle 202 , etc.
- the recipient system 304 (e.g., the first autonomous vehicle 202 , etc.) can generate an autonomy output 334 based at least in part on the time-corrected intermediate environmental representation 330 .
- the recipient system 304 e.g., the first autonomous vehicle 202 , etc.
- the autonomy output 334 can be indicative of, for example, perception data and prediction data associated with the recipient system 304 (e.g., the first autonomous vehicle 202 , etc.).
- the one or more models used to create the updated intermediate environmental representation can include a machine-learned aggregation model 320 B.
- the machine-learned aggregation model 320 B can be configured to aggregate a plurality of intermediate environmental representations from a plurality of sources (e.g., autonomous vehicles).
- the recipient system 304 e.g., the first autonomous vehicle 202 , etc.
- the decompressed intermediate environmental representation 316 can be adjusted to account for the time delay (using a time delay correction model).
- the machine-learned aggregation model 320 B can utilize the time-corrected intermediate environmental representation 330 version of the decompressed intermediate environmental representation 316 .
- the recipient system 304 e.g., the first autonomous vehicle 202 , etc.
- the decompressed intermediate environmental representation 316 e.g., the time-corrected intermediate environmental representation 330 version
- the second intermediate environmental representation 324 e.g., generated by the first autonomous vehicle 202
- the machine-learned aggregation model 320 B can be configured to aggregate the decompressed intermediate environmental representation 316 (e.g., the time-corrected intermediate environmental representation 330 version) and the second intermediate environmental representation 324 to generate the updated intermediate environmental representation 322 .
- the machine-learn aggregation model 320 B can ingest other intermediate environmental representations received from other systems (e.g., vehicles, infrastructure elements, etc.) as well.
- the recipient system 304 e.g., the first autonomous vehicle 202 , etc.
- the recipient system 304 can obtain one or more intermediate environmental representations from one or more different autonomous vehicles.
- FIG. 5 depicts an example diagram 500 of a multi-agent scenario with one or more estimated pose errors according to example embodiments of the present disclosure.
- the recipient system 304 e.g., the first autonomous vehicle 202 , etc.
- can generate a first intermediate environmental representation 324 e.g., via a first portion 308 of the machine-learned model
- sensor data 326 indicative of its surrounding environment.
- the recipient system 304 (e.g., the first autonomous vehicle 202 , etc.) can obtain a second intermediate environmental representation 310 (e.g., generated via a first portion 308 of the machine-learned model by the transmitter system 302 ) based at least in part on sensor data 306 indicative of another portion of its surrounding environment.
- the recipient system 304 (e.g., the first autonomous vehicle 202 , etc.) can determine an updated intermediate environmental representation 504 A, 504 B by aggregating the first intermediate environment representation 324 with each of the one or more intermediate environmental representations (e.g., intermediate environmental representation 310 ).
- Each intermediate environmental representation can be aggregated based, at least in part, on an estimated pose 502 of the transmitter autonomous vehicle (e.g., second autonomous vehicle 204 ) associated with the representation (e.g., second intermediate environmental representation 310 ). Inaccurate estimated poses (e.g., illustrated by alignment 506 ) of the transmitter autonomous vehicle(s) can interfere with the accuracy of the updated intermediate environment representation 504 A, 504 B.
- an incorrect alignment 506 of the intermediate environmental representations 324 , 310 can result in an updated intermediate environment representation 504 B with one or more inaccuracies 512 within the autonomous vehicle's 202 surrounding environment.
- the systems and methods disclosed herein prevent inaccuracies with the updated intermediate environmental representations by detecting, correcting, and compensating for inaccurate pose estimates.
- FIG. 6 depicts an architecture 600 of example machine-learned models for correcting pose errors according to example embodiments of the present disclosure.
- FIG. 6 depicts a first (e.g., recipient) autonomous vehicle 602 that can obtain (e.g., via one or more sensors of an onboard vehicle computing system) sensor data (e.g., the sensor data described above with reference to the transmitter autonomous vehicle(s) illustrated as vehicle 604 , 606 herein) associated with an environment (e.g., a surrounding environment) of the first autonomous vehicle 602 .
- the first autonomous vehicle 602 can obtain estimated location data indicative of a first estimated pose of the first autonomous vehicle 602 .
- the first autonomous vehicle 602 can generate a first intermediate representation 610 (e.g., via a first portion 608 of the machine-learned model) based on the sensor data (e.g., in the manner described above with reference to FIG. 4 ).
- the first autonomous vehicle 602 (and/or vehicle computing system thereof) can obtain one or more messages from one or more transmitter autonomous vehicle(s) 604 , 606 within the environment of the first autonomous vehicle 602 .
- Each message can include an intermediate environmental representation 612 , 614 of at least a portion of the environment of the first autonomous vehicle 602 (e.g., from the perspective of a respective transmitter autonomous vehicle 604 , 606 ) and/or estimated location data indicative of an estimated pose of a respective transmitter autonomous vehicle.
- the autonomous vehicles 602 , 604 , 606 can obtain estimated location data indicative of an estimated pose of the respective autonomous vehicles.
- the estimated pose for example, can be indicative of one or more estimated spatial coordinates and an estimated orientation of the autonomous vehicles 602 , 604 , 606 .
- the autonomous vehicles 602 , 604 , 606 can include one or more location measurement sensors such as, for example, one or more global positioning systems (e.g., GPSs), inertial measurement units (e.g., IMUs), and/or any other sensors configured to determine a current location and/or orientation of the autonomous vehicles 602 , 604 , 606 .
- the autonomous vehicles 602 , 604 , 606 can be configured to obtain the estimated location data from the one or more location measurement sensors.
- the first autonomous vehicle 602 can obtain a first message from a second (e.g., a first transmitter) autonomous vehicle 604 .
- the first message can include a second intermediate environmental representation 612 of at least a second portion of the environment of the first autonomous vehicle 602 and/or second estimated location data indicative of a second estimated pose of the second autonomous vehicle 604 .
- the first autonomous vehicle 602 can obtain a second message from a third autonomous vehicle 606 .
- the second message can include a third intermediate environmental representation 614 of at least a third portion of the environment of the first autonomous vehicle 602 and/or third estimated location data indicative of a third estimated pose of the third autonomous vehicle 606 .
- the first autonomous vehicle 602 (e.g., denoted as i) can have a noisy estimate of its own pose (e.g., first estimated pose) denoted and can receive the noisy poses (e.g., second estimated pose, third estimated pose, etc.) of neighboring autonomous vehicles (e.g., denoted as j) (e.g., second autonomous vehicle 604 , third autonomous vehicle 606 , etc.) as part of the messages (e.g., first message, second message, etc.).
- These noisy poses (e.g., first, second, third estimated poses, etc.) can be used to compute noisy relative transformations between the first autonomous vehicle i and j denoted .
- the first autonomous vehicle 602 can determine one or more relative poses 622 , 624 , 626 between each of the associated autonomous vehicles (as represented by nodes 632 , 634 , 636 ). For example, the first autonomous vehicle 602 can determine a first relative pose 622 between the first autonomous vehicle 602 and the second autonomous vehicle 604 based, at least in part, on the intermediate environmental representation 610 and the second intermediate environmental representation 612 . As another example, the first autonomous vehicle 602 can determine a second relative pose 624 between the first autonomous vehicle 602 and the third autonomous vehicle 606 based, at least in part, on the first intermediate environmental representation 610 and the third intermediate environmental representation 614 . In some implementations, the first autonomous vehicle 602 can determine a third relative pose 626 between the second autonomous vehicle 604 and the third autonomous vehicle 606 based, at least in part, on the second intermediate environmental representation 612 and the third intermediate environmental representation 614 .
- Each relative pose 622 , 624 , 626 can identify a displacement between one or more autonomous vehicles.
- the first relative pose 622 can be indicative of a first displacement between the first autonomous vehicle 602 and the second autonomous vehicle 604 .
- the second relative pose 624 can be indicative of a second displacement between the first autonomous vehicle 602 and the third autonomous vehicle 606 .
- the third relative pose 626 can be indicative of a third displacement between the second autonomous vehicle 604 and the third autonomous vehicle 606 .
- Each relative pose 622 , 624 , 626 can be determined based, at least in part, on a pair of intermediate environmental representations and/or a machine-learned regression model 620 .
- the first intermediate environmental representation 610 can include a first feature map encoded with a first plurality of encoded features representative of the first portion of the environment.
- the second intermediate environmental representation 612 can include a second feature map encoded with a second plurality of encoded features representative of the second portion of the environment.
- the third intermediate environmental representation 614 can include a third feature map encoded with a third plurality of encoded features representative of the third portion of the environment.
- Each relative pose 622 , 624 , 626 can be determined by inputting data indicative of a pair of intermediate environmental representations into the machine-learned regression model 620 .
- the first autonomous vehicle 602 can determine, via the machine-learned regression model 620 , the first relative pose 622 between the first autonomous vehicle 602 and the second autonomous vehicle 604 .
- the first autonomous vehicle 602 can generate an intermediate environmental representation pair by concatenating the first plurality of encoded features (e.g., of the first intermediate environmental representation 612 ) and the second plurality of encoded features (e.g., of the second intermediate environmental representation 614 ) along a features dimension.
- the first autonomous vehicle 602 can input the intermediate environmental representation pair to the machine-learned regression model 620 to obtain the first relative pose 622 .
- the machine-learned regression model 620 can be configured to output the relative pose 622 based, at least in part, on the intermediate environmental representation pair.
- a convolutional neural network e.g., machine-learned regression model 620
- ⁇ can denote the concatenation along the features dimension and ⁇ can represent applying the transformation to the noisy transformation .
- An independent prediction e.g., relative pose 622 , 624 , 626 ) can be determined for each directed edge such that ⁇ .
- the first autonomous vehicle 602 can determine one or more absolute poses 642 , 644 for one or more of the received intermediate environmental representations 612 , 614 .
- Each absolute pose can include updated spatial coordinates and/or orientations for each of the transmitter autonomous vehicles 604 , 606 .
- the first autonomous vehicle 602 can determine a first absolute pose indicative of one or more first updated spatial coordinates and/or a first updated orientation for the first autonomous vehicle 602 .
- the first autonomous vehicle 602 can determine a second absolute pose 642 indicative of one or more second updated spatial coordinates and/or a second updated orientation for the second autonomous vehicle 604 .
- the first autonomous vehicle 602 can determine a third absolute pose 644 indicative of one or more third updated spatial coordinates and/or a third updated orientation for the third autonomous vehicle 606 .
- the first autonomous vehicle 602 can determine the first, second, and/or third absolute pose based, at least in part, on the first, second, and/or third relative poses 622 , 624 , 626 .
- the relative pose estimates can be refined from the machine-learned regression model 620 by finding a set of globally consistent absolute poses among the autonomous vehicles 602 , 604 , 606 (e.g., first, second, third, etc. autonomous vehicles).
- the first autonomous vehicle 602 can mitigate pose error.
- the absolute poses can be determined using a Markov random field, where each absolute pose is a node (e.g., nodes 632 , 634 , 636 ) conditioned on the estimated relative poses 622 , 624 , 626 .
- the distribution of poses can be conditioned on these can have a heavy tail due to outliers.
- Each absolute pose (e.g., denoted ⁇ i ) can follow a multivariate student t-distribution with mean ⁇ i ⁇ 3 and scale ⁇ i ⁇ 3 ⁇ 3 given the relative poses.
- the pairwise potential can include likelihoods, weights, and/or weight priors defined below. Unary potentials may not be used.
- ⁇ ⁇ ( i , j ) p ⁇ ( ⁇ ⁇ ji ⁇ ⁇ j ) w ji ⁇ p ⁇ ( ⁇ ⁇ ji - 1 ⁇ ⁇ j ) w ji ⁇ Weighted ⁇ ⁇
- the likelihood terms p( ⁇ circumflex over ( ⁇ ) ⁇ ji ⁇ j ) and p( ⁇ circumflex over ( ⁇ ) ⁇ ji ⁇ 1 ⁇ j ), both t-distributed centered at ⁇ i can encourage the result of the relative transformation from transmitter vehicles (e.g., second 604 , third 606 , etc. autonomous vehicles) to stay close to a recipient vehicle (e.g., first autonomous vehicle 602 ). Both directions are included due to symmetry of the rigid transformations. In some implementations, not all pairwise transformations can provide the same amount of information. Since the machine-learned regression module 620 can produce heavy tailed errors, the edge potentials can down weight erroneous pose regression outputs.
- a weight w ji scalar for each term in the pairwise potential p( ⁇ circumflex over ( ⁇ ) ⁇ ji ⁇ j ) w ji , so that zero weighted terms can be ignored.
- the prior distribution for each w ji can be used, where the mean of the distribution o ji —the fraction of spatial overlap between two messages.
- the absolute pose prediction can be trusted more if two messages have more spatial overlap.
- a Gamma prior: p(w ji ) ⁇ (w ji
- the first autonomous vehicle 602 can estimate values of the absolute poses ⁇ i , the scale parameters ⁇ i , and the weights w ji that maximize the product of the pairwise potentials. This can be done using iterated conditional modes described by the algorithm:
- the first autonomous vehicle 602 (e.g., its onboard vehicle computing system) can generate an updated intermediate environmental representation 660 based, at least in part, on the first intermediate environmental representation 610 , the second intermediate environmental representation 612 , and/or the third intermediate environmental representation 614 .
- the updated intermediate environmental representation 660 can be generated based, at least in part, on the first, second, and/or third relative poses 622 , 624 , 626 .
- the updated intermediate environmental representation 660 can be generated based, at least in part, on the first, second, and/or third absolute poses 642 , 644 .
- the updated intermediate environmental representation 660 can be generated using a machine-learned aggregation model 320 B (e.g., one or more portions of the machine-learned detection and prediction model, machine-learned model 320 B, etc.).
- the machine-learned aggregation model 320 B used to create the updated intermediate environmental representation 660 can include a neural network (and/or one or more layers thereof).
- the machine-learned aggregation model 320 B can be configured to aggregate a plurality of intermediate environmental representations 610 , 612 , 614 from a plurality of autonomous vehicles 602 , 604 , 606 .
- the first autonomous vehicle 602 can determine an updated intermediate environmental representation 660 based at least in part on the first intermediate environmental representation 610 , the second intermediate environmental representation 612 generated by the first transmitter autonomous vehicle 604 , and/or the third intermediate environmental representation 614 generated by the second transmitter autonomous vehicle 606 .
- the first autonomous vehicle 602 e.g., its onboard vehicle computing system
- the machine-learned aggregation model 320 B can be configured to aggregate the intermediate environmental representation(s) 610 , 612 , 614 to create the updated intermediate environmental representation 660 .
- the first autonomous vehicle 602 can obtain the updated intermediate environmental representation 660 as an output of the machine-learned aggregation model 320 B.
- the machine-learned aggregation model 320 B can include a graph neural network that includes a plurality of nodes.
- Each node of the graph neural network can correspond to a respective autonomous vehicle (e.g., first 602 , second 604 , third 606 , etc. autonomous vehicle) of a plurality of autonomous vehicles within the environment of the first autonomous vehicle 602 . This can include, for example, vehicles within the communication range of the first autonomous vehicle 602 .
- Each node can represent a respective set of spatial coordinates (e.g., an estimated, relative 622 , 624 , 626 , or absolute pose 642 , 644 ) of a respective autonomous vehicle.
- the first autonomous vehicle 602 can be configured to transform inputted intermediate environmental representation(s) 610 , 612 , 614 based, at least in part, on a set of spatial coordinates associated with the transmitter autonomous vehicles.
- the machine-learned aggregation model 320 B can be configured to initialize a node state of at least one node of the graph neural network and update the node state of the at least one node based at least in part on a spatial transformation.
- each node can maintain a state representation.
- messages can be sent between nodes of the graph and the node states for each node can be updated based, at least in part, on the aggregated received information (e.g., of the messages) using a neural network.
- Graphs for different vehicles can be different, as each vehicle can receive communications from one or more different vehicles (the set of vehicles within one vehicle's communication range may be different than the vehicles within another vehicle's communication range).
- the first autonomous vehicle 602 can generate a second transformed intermediate environmental representation 652 by transforming the second intermediate environmental representation 612 based, at least in part, on the absolute pose 642 of the second autonomous vehicle 604 .
- the machine-learned aggregation model 320 B can generate a second transformed intermediate environmental representation 652 by transforming the second intermediate environmental representation 612 based, at least in part, on the one or more second updated spatial coordinates and/or the second updated orientation for the second autonomous vehicle 604 .
- the first autonomous vehicle 602 can generate a third transformed intermediate environmental representation 654 by transforming the third intermediate environmental representation 614 based, at least in part, on the absolute pose 644 of the third autonomous vehicle 606 .
- the machine-learned aggregation model 320 B can generate the third transformed intermediate environmental representation 654 by transforming the third intermediate environmental representation 614 based, at least in part, on the one or more third updated spatial coordinates and/or the third updated orientation for the third autonomous vehicle 606 .
- each vehicle node of the neural network can represent the updated spatial coordinates and/or orientation of each respective autonomous vehicle.
- the first autonomous vehicle 602 (e.g., the machine-learned aggregation model 320 B) can perform message passing to share features between the vehicle nodes.
- the second portion of the machine-learned detection and prediction model can include a machine-learned attention model configured to weigh a plurality of intermediate environmental representations before generating the updated intermediate environmental representation 660 .
- the machine-learned attention model can determine a first weight for the second intermediate environmental representation 612 and/or a second weight for the third intermediate environmental representation 614 .
- Each weight can be indicative of a predicted accuracy of an estimated pose.
- the first weight can be indicative of a first predicted accuracy of the second estimated pose and/or the second weight can be indicative of a second predicted accuracy of the third estimated pose.
- the first autonomous vehicle 602 (e.g., machine-learned aggregation model 320 B) can assign the second weight to the second transformed intermediate environmental representation 652 and/or the third weight to the third transformed intermediate environmental representation 654 .
- the updated intermediate environmental representation 660 can be generated, via the machine-learned aggregation model 320 B (e.g., a second portion of the machine-learned detection and prediction model), based, at least in part, on the first and second weights.
- the machine-learned aggregation model 320 B (e.g., via the attention model) can prioritize messages (e.g., intermediate environmental representations) to focus on clean messages and ignore noisy ones. To do so, a weight can be assigned to each transformed intermediate environmental representation 652 , 654 before the representations are averaged (e.g., to receive the weighted average representation 656 ), to suppress the remaining noisy representations.
- the normalized weight a ji ⁇ can be computed by:
- the first autonomous vehicle 602 can generate an autonomy output 334 for the first autonomous vehicle 602 based at least in part on the updated intermediate environmental representation 660 .
- the autonomy output 334 can be indicative of a bounding shape associated with an object within the environment of the recipient autonomous vehicle 602 and one or more predicted future locations of the object.
- the autonomy output 334 can include one or more bounding boxes indicative of one or more objects within the surrounding environment of the first autonomous vehicle 602 .
- the first autonomous vehicle 602 can utilize the machine-learned detection and prediction model 332 (and/or a second portion of the model).
- the machine-learned detection and prediction model 332 can be configured to generate the autonomy output(s) 334 .
- the joint detection and prediction model 332 can include several memories and networks.
- the joint detection and prediction model 332 can include a plurality of network branches (e.g., a branch for detection, a branch for prediction, etc.), a plurality of memories (e.g., an appearance memory, object path memory, etc.) as well as an object path proposal network and a path refinement network.
- the first autonomous vehicle 602 can input the updated intermediate environmental representation 660 into the machine-learned detection and prediction model 332 and obtain the autonomy output 334 as an output of the machine-learned detection and prediction model 332 .
- the first autonomous vehicle 602 can apply a set of four convolutional blocks to extract a high-level representation suitable for a perception and prediction task.
- the block helps to capture multi-scale context efficiently, which can be helpful for the prediction function.
- a feature map can be used with two network branches that output detection and motion forecasting estimates, respectively.
- the output of the detection branch can be parameterized as (x, y, w, h, ⁇ ), which denotes the position, size, and orientation of objects.
- the output of the prediction branch can be parameterized as (x t , y t ), which denotes the object's location at future time step t. This can indicate one or more future location(s) of the object.
- the autonomy output 334 can be indicative of the output of the detection branch and the prediction branch (e.g., a bounding shape and predicted future location(s)).
- the recipient system 304 (e.g., vehicle computing system of autonomous vehicle 602 ) can generate (e.g., using a machine-learned detection and prediction model 332 ) an autonomy output 334 for the recipient system 304 (e.g., the first autonomous vehicle 202 , 602 , etc.) based at least in part on the updated intermediate environmental representation 322 .
- the autonomy output 334 can be indicative of an object 212 within the environment of the recipient system 304 (e.g., second autonomous vehicle 204 ) and one or more predicted future locations 216 (e.g., shown in FIG. 2 ) of the object 212 .
- the autonomy output 334 can be indicative of a bounding shape 218 (e.g., shown in FIG.
- the one or more predicted future locations 216 of the object 212 can include one or more waypoints (at one or more future times) and can be expressed as a predicted motion trajectory of the object 212 .
- the object 212 can be occluded from a field of view of one or more sensors of the recipient system 304 (e.g., the second autonomous vehicle 204 ).
- the recipient system 304 can utilize a machine-learned detection and prediction model 332 .
- the machine-learned detection and prediction model 332 can be configured to generate the autonomy output(s) 334 .
- the recipient system 304 e.g., first autonomous vehicle 202 , 602
- the recipient system 304 can input the updated intermediate environmental representation 322 into the machine-learned detection and prediction model 332 .
- the recipient system 304 e.g., first autonomous vehicle 202 , 602 , etc.
- the recipient system 304 can apply a set of four convolutional blocks to extract a high-level representation suitable for a perception and prediction task.
- the block helps to capture multi-scale context efficiently, which can be helpful for the prediction function.
- a feature map can be used with two network branches that output detection and motion forecasting estimates, respectively.
- the output of the detection branch can be parameterized as (x, y, w, h, ⁇ ), which denotes the position, size, and orientation of objects. This can be represented as a bounding shape 218 (e.g., shown in FIG.
- the output of the prediction branch can be parameterized as (x t , y t ), which denotes the object's location at future time step t. This can indicate one or more future location(s) 216 of the object 212 .
- the autonomy output 334 can be indicative of the output of the detection branch and the prediction branch (e.g., a bounding shape and predicted future location(s)).
- the machine-learned detection and prediction model 332 can be used by or included in the joint perception/prediction system 123 of the vehicle computing system 112 .
- the first autonomous vehicle 602 (e.g., its onboard vehicle computing system) can generate a motion plan for the first autonomous vehicle 602 based at least in part on the autonomy output 334 .
- the first autonomous vehicle 602 can include a motion planning system (e.g., motion planning system 170 C of FIG. 1 ).
- the motion planning system can determine a motion plan and generate motion plan data for the first autonomous vehicle 602 based at least in part on the autonomy output 334 .
- the motion plan can be generated based at least in part on the autonomy output 334 in that it can consider an object (and/or its future location(s)) described in the autonomy output 334 when planning the motion of the first autonomous vehicle 602 , whether that is overridden by other factors (e.g., other objects, unexpected occurrences, etc.) or eventually effects the actual motion of the vehicle.
- the motion plan data can include vehicle actions, trajectories, waypoints, etc. with respect to the objects proximate to the first autonomous vehicle 602 as well as the predicted movements.
- the motion planning system can include one or more machine-learned models/optimization algorithms that consider cost data associated with a vehicle action as well as other objective functions (e.g., cost functions based, at least in part, on speed limits, traffic lights, and/or other aspects of the environment), if any, to determine optimized variables that make up the motion plan data.
- the motion planning system can determine that the vehicle can perform a certain action (e.g., stop for an object) without increasing the potential risk to the vehicle and/or violating any traffic laws (e.g., speed limits, lane boundaries, signage, etc.).
- the motion plan data can include a planned trajectory, velocity, acceleration, and/or other actions.
- the first autonomous vehicle 602 (e.g., its onboard vehicle computing system) can initiate a motion control of the first autonomous vehicle 602 based at least in part on the motion plan.
- a motion control can include an action to implement, change, continue, and/or otherwise affect the motion of the first autonomous vehicle 602 .
- the motion planning system can provide the motion plan data indicative of the vehicle actions, a planned trajectory, and/or other operating parameters to the vehicle control systems to implement the motion plan data for the vehicle.
- the vehicle 602 can include a vehicle interface configured to translate the motion plan data into instructions.
- the vehicle interface can translate motion plan data into instructions for controlling the first vehicle including adjusting the steering of the vehicle “X” degrees and/or applying a certain magnitude of braking force to avoid interfering with an object indicated in the autonomy output.
- the vehicle interface can send one or more control signals to the responsible vehicle control component (e.g., braking control system, steering control system, and/or acceleration control system) to execute the instructions and implement the motion plan data.
- the responsible vehicle control component e.g., braking control system, steering control system, and/or acceleration control system
- the first vehicle 602 can account for and control its motion with respect to object(s) outside of the normal field of view of the first vehicle 602 but within the field of view of the transmitter autonomous vehicle(s) 604 , 606 .
- the machine-learned model(s) 620 , 320 B, 332 , etc. of the described system can be trained in several stages.
- a sensor backbone e.g., LIDAR backbone for helping to create the intermediate environmental representations 610 , 612 , 614
- output headers can be pre-trained on a real-LIDAR single-vehicle dataset, bypassing the cross-vehicle aggregation stage.
- the loss function can be cross-entropy on a vehicle classification output and smooth 11 on the bounding box parameters.
- Hard-negative mining can be applied as well to improve performance.
- the sensor backbone e.g., LIDAR backbone
- cross-vehicle aggregation, and output header models can be jointly fine-tuned on a simulated vehicle-to-vehicle dataset (described below) with synchronized inputs (e.g., no time delay) using the same loss function.
- the training can avoid the use of a temporal warping function at this stage.
- the number of connected vehicles can be randomly sampled uniformly on [0, min(c, 6)], where c is the number of candidate vehicles available. This can help ensure that the architecture of the machine-learned model(s) can handle arbitrary graph connectivity while also making sure the fraction of vehicles in the scene/ecosystem on the network remains reasonable.
- the machine-learned detection and prediction model 320 B and machine-learned attention model can be trained together. For example, identifying noisy examples can be treated as a supervised binary classification task, where clean examples can be assigned a high value and noisy examples can be assigned a low value.
- strong noise can be generated and applied to some vehicles and weak pose noise to others within one scene.
- the noise can be generated via n i ⁇ D w or n i ⁇ D s , where D w is a distribution of weak pose noises, and D s is strong noises.
- the noise can have two translational components and a rotational component, thus n i ⁇ 3 .
- a fixed proportion p of autonomous vehicles can receive noise from the strong distribution while the rest can receive noise from the weak distribution.
- the representation is considered clean when both autonomous vehicles have noise from the weak distribution and considered noisy when either vehicle has noise from the strong distribution.
- the labeling can include:
- the function can produce smooth labels to temper the attention model's predictions so that the attention weights are not just 0 or 1.
- the loss for the joint training task can be defined as follows:
- L sl1 can be the smooth l 1 loss.
- the machine-learned detection and prediction model 320 B and the machine-learned regression model 620 can be trained end-to-end via backpropagation.
- the simulated vehicle-to-vehicle dataset (used for training) can be created using a sensor simulation system.
- the sensor simulation system can use a large catalog of 3D static scenes and dynamic objects that are built upon real-world data collections to provide a set of rich and diverse assets from which to simulate new scenarios.
- the sensor simulation system can apply ray casting and machine learning to generate a realistic sensor point cloud such as, for example, a LIDAR point cloud.
- the sensor simulation system can allow for the creation of vehicle-to-vehicle scenes where a percentage of the vehicles are autonomous vehicles and generate realistic sensor data (e.g., LIDAR data) at different vehicle locations.
- actor 3D-assets can be placed into the virtual scene according to real-world labels and generate the sensor data (e.g., LIDAR point cloud data, etc.) at different candidate vehicle locations.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- each autonomous vehicle within the ecosystem can perform the functions of transmitter autonomous vehicles 604 , 606 and the functions of recipient autonomous vehicle 602 as described above.
- an autonomous vehicle can not only communicate an intermediate environmental representation to other autonomous vehicles but can also receive intermediate environmental representations from other autonomous vehicles.
- the autonomous vehicle(s) can also, or alternatively, send and receive autonomy outputs (e.g., of the joint detection/prediction model) and/or sensor data among one another.
- FIG. 7 depicts elements performed in a particular order for purposes of illustration and discussion. Those of ordinary skill in the art, using the disclosures provided herein, will understand that the elements of any of the methods discussed herein can be adapted, rearranged, expanded, omitted, combined, and/or modified in various ways without deviating from the scope of the present disclosure. FIG. 7 is described with reference to elements/terms described with respect to other systems and figures for exemplary illustrated purposes and is not meant to be limiting. One or more portions of method 700 can be performed additionally, or alternatively, by other systems.
- the method 700 includes obtaining, via the one or more sensors, sensor data associated with a surrounding environment of an autonomous vehicle.
- a computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- vehicle computing system 110 can obtain, via the one or more sensors, sensor data associated with the surrounding environment of the autonomous vehicle.
- the method 700 includes obtaining first estimated location data indicative of a first estimated pose of the autonomous vehicle.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the computing system can obtain the first estimated location data indicative of the first estimated pose of the autonomous vehicle.
- the method 700 includes determining a first intermediate environmental representation of at least a first portion of the surrounding environment of the autonomous vehicle based, at least in part, on the sensor data.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the computing system can determine the first intermediate environmental representation of at least a first portion of the surrounding environment of the autonomous vehicle based, at least in part, on the sensor data.
- the method 700 includes obtaining a first message from a second autonomous vehicle.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the first message can include a second intermediate environmental representation of at least a second portion of the surrounding environment of the first autonomous vehicle and second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the method 700 includes obtaining a second message from a third autonomous vehicle.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the second message can include a third intermediate environmental representation of at least a third portion of the surrounding environment of the autonomous vehicle and third estimated location data indicative of a third estimated pose of the third autonomous vehicle.
- the method 700 includes determining a first relative pose between the autonomous vehicle and the second autonomous vehicle based, at least in part, on the first intermediate environmental representation and the second intermediate environmental representation, a second relative pose between the autonomous vehicle and the third autonomous vehicle based, at least in part, on the first intermediate environmental representation and the third environmental representation, and a third relative pose between the second autonomous vehicle and the third autonomous vehicle based, at least in part, on the second intermediate environmental representation and the third environmental representation.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the computing system can determine the first relative pose between the autonomous vehicle and the second autonomous vehicle based, at least in part, on the first intermediate environmental representation and the second intermediate environmental representation, the second relative pose between the autonomous vehicle and the third autonomous vehicle based, at least in part, on the first intermediate environmental representation and the third environmental representation, and the third relative pose between the second autonomous vehicle and the third autonomous vehicle based, at least in part, on the second intermediate environmental representation and the third environmental representation.
- the method 700 includes determining at least one of a first absolute pose for the autonomous vehicle, a second absolute pose for the second autonomous vehicle, or a third absolute pose for the third autonomous vehicle based, at least in part, on the first relative pose, the second relative pose, and the third relative pose.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the method 700 includes generating an updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, the third intermediate environmental representation, and the at least one of the first absolute pose, the second absolute pose, or the third absolute pose.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the computing system can generate the updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, the third intermediate environmental representation, and the at least one of the first absolute pose, the second absolute pose, or the third absolute pose.
- the method 700 includes generating an autonomy output for the autonomous vehicle based, at least in part, on the updated intermediate environmental representation.
- the computing system e.g., vehicle computing system 110 , operations computing system 190 A, remote computing system(s) 190 B, computing system 690 , recipient system 304 , etc.
- the computing system can generate the autonomy output for the autonomous vehicle based, at least in part, on the updated intermediate environmental representation.
- FIG. 8 depicts an example system 800 with various means for performing operations and functions according example implementations of the present disclosure.
- a computing system can include data acquisition unit(s) 802 , intermediate representation generation unit(s) 804 , relative pose unit(s) 806 , absolute pose unit(s) 808 , aggregation unit(s) 810 , detection and prediction unit(s) 812 , and/or other means for performing the operations and functions described herein.
- one or more of the units may be implemented separately.
- one or more units may be a part of or included in one or more other units.
- the means can be programmed to perform one or more algorithm(s) for carrying out the operations and functions described herein.
- the means e.g., data acquisition unit(s) 802 , etc.
- the means can be configured to obtain, via one or more sensors, sensor data associated with a surrounding environment of the autonomous vehicle.
- the means e.g., data acquisition unit(s) 802 , etc.
- the means can be configured to obtain a first estimated location data indicative of a first estimated pose of the autonomous vehicle.
- the means e.g., data acquisition unit(s) 802 , etc.
- the first message can include a second intermediate environmental representation of at least a second portion of the surrounding environment of the first autonomous vehicle and second estimated location data indicative of a second estimated pose of the second autonomous vehicle.
- the means e.g., data acquisition unit(s) 802 , etc.
- the second message can include a third intermediate environmental representation of at least a third portion of the surrounding environment of the autonomous vehicle and third estimated location data indicative of a third estimated pose of the third autonomous vehicle.
- the means can be configured to determine a first intermediate environmental representation of at least a first portion of the surrounding environment of the autonomous vehicle based, at least in part, on the sensor data.
- the means e.g., relative pose unit(s) 806 , etc.
- the means can be configured to determine a first relative pose between the autonomous vehicle and the second autonomous vehicle based, at least in part, on the first intermediate environmental representation and the second intermediate environmental representation, a second relative pose between the autonomous vehicle and the third autonomous vehicle based, at least in part, on the first intermediate environmental representation and the third environmental representation, and a third relative pose between the second autonomous vehicle and the third autonomous vehicle based, at least in part, on the second intermediate environmental representation and the third environmental representation.
- the means can be configured to determine at least one of a first absolute pose for the autonomous vehicle, a second absolute pose for the second autonomous vehicle, or a third absolute pose for the third autonomous vehicle based, at least in part, on the first relative pose, the second relative pose, and/or the third relative pose.
- the means e.g., aggregation unit(s) 810 , etc.
- the means can be configured to generate an updated intermediate environmental representation based, at least in part, on the first intermediate environmental representation, the second intermediate environmental representation, the third intermediate environmental representation, and the at least one of the first absolute pose, the second absolute pose, and/or the third absolute pose.
- the means e.g., detection and prediction unit(s) 812 , etc.
- FIG. 9 depicts an example computing system 900 according to example embodiments of the present disclosure.
- the example system 900 can include the computing system 925 and a machine learning computing system 950 that are communicatively coupled over one or more network(s) 975 .
- the computing system 925 can be implemented onboard a transmitter and/or recipient system such as that of a vehicle (e.g., as a portion of the vehicle computing system) and/or can be remote therefrom (e.g., as portion of an operations computing system).
- a vehicle computing system can utilize the operations and model(s) of the computing system 925 (e.g., locally, via wireless network communication, etc.).
- the memory 906 can store information that can be obtained by the one or more processors 904 .
- the memory 906 e.g., one or more non-transitory computer-readable storage mediums, memory devices
- the memory 906 can include computer-readable instructions 908 that can be executed by the one or more processors 904 .
- the instructions 908 can be software written in any suitable programming language or can be implemented in hardware. Additionally, or alternatively, the instructions 908 can be executed in logically and/or virtually separate threads on processor(s) 904 .
- the memory 906 can store instructions 908 that when executed by the one or more processors 904 cause the one or more processors 904 (the computing system 925 ) to perform operations such as any of the operations and functions of a vehicle computing system, transmitter computing system, recipient computing system, target computing system, adverse computing system, and/or for which any of these computing systems are configured such as, for example, to determine, apply, and/or correct pose errors with intermediate environmental representations, as described herein.
- the memory 906 can store data 910 that can be obtained (e.g., received, accessed, written, manipulated, generated, created, stored, etc.).
- the data 910 can include, for instance, sensor data, intermediate environmental representations, location data, relative poses, absolute poses, autonomy outputs, perception data, prediction data, motion planning data, control signals, models, and/or any other data/information described herein.
- the computing device(s) 902 can obtain data from one or more memories that are remote from the computing system 900 .
- the computing device(s) 902 can also include a communication interface 912 used to communicate with one or more other system(s) (e.g., other systems onboard and/or remote from a vehicle, the other systems of FIG. 9 , etc.).
- the communication interface 912 can include any circuits, components, software, etc. for communicating via one or more networks 975 .
- the communication interface 912 can include, for example, one or more of a communications controller, receiver, transceiver, transmitter, port, conductors, software and/or hardware for communicating data/information.
- the computing system 925 can store or include one or more machine-learned models 914 .
- the machine-learned models 914 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks), support vector machines, decision trees, ensemble models, k-nearest neighbors models, Bayesian networks, or other types of models including linear models and/or non-linear models.
- Example neural networks include feed-forward neural networks (e.g., convolutional neural networks, etc.), recurrent neural networks (e.g., long short-term memory recurrent neural networks, etc.), and/or other forms of neural networks.
- the machine-learned models 914 can include the machine-learned models of FIGS. 3 , 6 , and 7 and/or other model(s), as described herein.
- the computing system 925 can receive the one or more machine-learned models 914 from the machine learning computing system 950 over the network(s) 975 and can store the one or more machine-learned models 914 in the memory 906 of the computing system 925 .
- the computing system 925 can use or otherwise implement the one or more machine-learned models 914 (e.g., by processor(s) 904 ).
- the computing system 925 can implement the machine learned model(s) 914 to generate modified intermediate environmental representation(s), autonomy output(s), etc.
- the machine learning computing system 950 can include one or more processors 952 and a memory 954 .
- the one or more processors 952 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected.
- the memory 954 can include one or more non-transitory computer-readable storage media, such as RAM, ROM, EEPROM, EPROM, one or more memory devices, flash memory devices, etc., and combinations thereof.
- the memory 954 can also store computer-readable instructions 958 that can be executed by the one or more processors 952 .
- the instructions 958 can be software written in any suitable programming language or can be implemented in hardware. Additionally, or alternatively, the instructions 958 can be executed in logically and/or virtually separate threads on processor(s) 952 .
- the memory 954 can store the instructions 958 that when executed by the one or more processors 952 cause the one or more processors 952 to perform operations.
- the machine learning computing system 950 can include a communication interface 960 , including devices and/or functions similar to that described with respect to the computing system 925 .
- the machine learning computing system 950 can include one or more machine-learned models 962 .
- the machine-learned models 962 can be or can otherwise include various machine-learned models such as, for example, neural networks (e.g., deep neural networks), support vector machines, decision trees, ensemble models, k-nearest neighbors models, Bayesian networks, or other types of models including linear models and/or non-linear models.
- Example neural networks include feed-forward neural networks (e.g., convolutional neural networks), recurrent neural networks (e.g., long short-term memory recurrent neural networks, etc.), and/or other forms of neural networks.
- the machine-learned models 962 can be similar to and/or the same as the machine-learned models 914 , and/or other model(s) described herein.
- the machine learning computing system 950 can communicate with the computing system 925 according to a client-server relationship.
- the machine learning computing system 950 can implement the machine-learned models 962 to provide a web service to the computing system 925 (e.g., including on a vehicle, implemented as a system remote from the vehicle, etc.).
- the web service can provide machine-learned models to an entity associated with a vehicle; such that the entity can implement the machine-learned model.
- machine-learned models 962 can be located and used at the computing system 925 (e.g., on the vehicle, at the operations computing system, etc.) and/or the machine-learned models 962 can be located and used at the machine learning computing system 950 .
- the machine learning computing system 950 and/or the computing system 925 can train the machine-learned models 914 and/or 962 through use of a model trainer 964 .
- the model trainer 964 can train the machine-learned models 914 and/or 962 using one or more training or learning algorithms.
- One example training technique is backwards propagation of errors.
- the model trainer 964 can perform supervised training techniques using a set of labeled training data.
- the model trainer 964 can perform unsupervised training techniques using a set of unlabeled training data.
- the model trainer 964 can perform a number of generalization techniques to improve the generalization capability of the models being trained. Generalization techniques include weight decays, dropouts, or other techniques.
- the machine-learned model(s) 914 / 962 of the described system can be trained in several stages.
- a sensor backbone e.g., LIDAR backbone for helping to create the intermediate environmental representation
- output headers can be pre-trained on a real-LiDAR single-vehicle dataset, bypassing the cross-vehicle aggregation stage.
- the loss function can be cross-entropy on a vehicle classification output and smooth l1 on the bounding box parameters.
- Hard-negative mining can be applied as well to improve performance.
- the sensor backbone e.g., LIDAR backbone
- cross-vehicle aggregation, and output header models can be jointly fine-tuned on a simulated vehicle-to-vehicle dataset with synchronized inputs (e.g., no time delay) using the same loss function.
- the training can avoid the use of a temporal warping function at this stage.
- the number of connected vehicles can be randomly sampled uniformly on [0,min(c, 6)], where c is the number of candidate systems/vehicles available. This can help ensure that the architecture of the machine-learned model(s) 914 / 962 can handle arbitrary graph connectivity while also making sure the fraction of systems/vehicles in the scene/ecosystem on the network remains reasonable.
- the model trainer 964 can train a machine-learned model 914 and/or 962 based on a set of training data 966 .
- the training data 966 can include a simulated vehicle-to-vehicle dataset that can be created using a sensor simulation system.
- the sensor simulation system can be included in or separate from machine-learning computing system 950 .
- the sensor simulation system can use a large catalog of 3D static scenes and dynamic objects that are built upon real-world data collections to provide a set of rich and diverse assets from which to simulate new scenarios.
- the sensor simulation system can apply raycasting and machine learning to generate a realistic sensor point cloud such as, for example, a LIDAR point cloud.
- the sensor simulation system can allow for the creation of vehicle-to-vehicle scenes where a percentage of the vehicles are autonomous vehicles and generate realistic sensor data (e.g., LIDAR data) at different vehicle locations.
- the simulations can be based on snippets (e.g., 25-second snippets, etc.) of labeled data recorded by a vehicle (e.g., autonomous vehicle) in the real world, which contains temporal tracks of the bounding boxes of all agents in the scene with respect to the recording vehicle.
- a vehicle e.g., autonomous vehicle
- a real-world snippet can be recreated in a simulated virtual world using these ground-truth tracks.
- the simulation system can replicate realistic traffic and vehicle motion for vehicle-to-vehicle dataset generation.
- actor 3D-assets can be placed into the virtual scene according to real-world labels and generate the sensor data (e.g., LIDAR point cloud data, etc.) at different candidate vehicle locations.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- candidate vehicle locations e.g., LIDAR point cloud data, etc.
- the training data 966 can be taken from the same vehicle as that which utilizes the model(s) 914 / 962 . In this way, the models 914 / 962 can be trained to determine outputs in a manner that is tailored to that particular system/vehicle. Additionally, or alternatively, the training data 966 can be taken from one or more different vehicles than that which is utilizing that model 914 / 962 .
- the model trainer 964 can be implemented in hardware, firmware, and/or software controlling one or more processors.
- the network(s) 975 can be any type of network or combination of networks that allows for communication between devices.
- the network(s) 975 can include one or more of a local area network, wide area network, the Internet, secure network, cellular network, mesh network, peer-to-peer communication link and/or some combination thereof and can include any number of wired or wireless links.
- Communication over the network(s) 975 can be accomplished, for instance, via a network interface using any type of protocol, protection scheme, encoding, format, packaging, etc.
- FIG. 9 illustrates one example system 900 that can be used to implement the present disclosure.
- the computing system 925 can include the model trainer 964 and the training dataset 966 .
- the machine-learned models 914 / 962 can be both trained and used locally at the computing system 925 (e.g., at a vehicle).
- Computing tasks discussed herein as being performed at computing device(s) remote from the vehicle can instead be performed at the vehicle (e.g., via the vehicle computing system), or vice versa. Such configurations can be implemented without deviating from the scope of the present disclosure.
- the use of computer-based systems allows for a great variety of possible configurations, combinations, and divisions of tasks and functionality between and among components.
- Computer-implemented operations can be performed on a single component or across multiple components.
- Computer-implemented tasks and/or operations can be performed sequentially or in parallel.
- Data and instructions can be stored in a single memory device or across multiple memory devices.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Automation & Control Theory (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Radar, Positioning & Navigation (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Remote Sensing (AREA)
- Aviation & Aerospace Engineering (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Computer Networks & Wireless Communication (AREA)
- Mechanical Engineering (AREA)
- Transportation (AREA)
- Signal Processing (AREA)
- Probability & Statistics with Applications (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Pure & Applied Mathematics (AREA)
- Computational Mathematics (AREA)
- Mathematical Optimization (AREA)
Abstract
Description
| 1: | ξi ← i | i = 1...n | ||
| 2: | wji ← 1 | (i, j) ∈ | ||
| 3: | for k = 1...num_iters do |
| 4: | ξi, Σi ← argmaxξi,Σi Πj∈adj(i)p({circumflex over (ξ)}ji ∘ ξj)wji | |
| p({circumflex over (ξ)}ij −1 ∘ ξj)wij i = 1...n | ||
| 5: | wji ← argmaxwji p(wji | ξi, Σi) (i, j) ∈ |
| 6: | end for |
| 7: | return ξi | i = 1...n | ||
The absolute poses can be used to update the relative transformations needed to warp the messages.
m i→k (l) =T(h i (l),ξi→k)·M i→k
where T applies the spatial transformation and resampling of the feature state via bilinear-interpolation, and Mi→k masks out out-of-bound regions after warping. This can be helpful because portions of the transmitter autonomous vehicle features may be further away than the first autonomous vehicle's current range.
h i (l+1)=ConvGRU(h i (l),φ([∀j∈N(i) ,m j→i (l))])
where j∈N(i) are the neighboring nodes in the network for node i and φ is the mean operator. After the final iteration, a multilayer perceptron can output the updated intermediate environmental representation:
z j (L)=MLP(h j (L))
The learned parameter σ∈ can allow the model to ignore all incoming messages if needed. In this manner, if all the incoming messages are noisy, the resulting weights can be large after the normalization. The updated intermediate environmental representation can be computed by:
h i =G({a ji m ji}j∈adj(i))
The function can produce smooth labels to temper the attention model's predictions so that the attention weights are not just 0 or 1. The loss for the joint training task can be defined as follows:
where LCE can be a binary cross entropy loss.
with λ=[λpos,λpos,λrot]. Lsl1 can be the smooth l1 loss.
The maximization step on line 4 can happen simultaneously for all nodes via weighted expectation maximization (EM) for the t distribution. The maximization step on Line 5 can be computed using the closed form:
The absolute poses 642, 644 can be used to update the relative transformations needed to warp the messages.
m i→k (l) =T(h i (l)·ξi→k)·M i→k
where T applies the spatial transformation and resampling of the feature state via bilinear-interpolation, and Mi→k masks out out-of-bound regions after warping. This can be helpful because portions of the transmitter autonomous vehicle features may be further away than the first autonomous vehicle's 602 current range.
h i (l+1)=ConvGRU(h i (l),φ([∀j∈N(i) ,m j→i (l)])
where j∈N(i) are the neighboring nodes in the network for node i and φ is the mean operator. After the final iteration, a multilayer perceptron can output the updated intermediate environmental representation:
x j (L)=MLP(h j (L))
The learned parameter σ∈ can allow the model to ignore all incoming messages if needed. In this manner, if all the incoming messages are noisy, the resulting weights can be large after the normalization. The updated intermediate
h i =G({a ji m ji}j∈adj(i))
The function can produce smooth labels to temper the attention model's predictions so that the attention weights are not just 0 or 1. The loss for the joint training task can be defined as follows:
where LCE can be a binary cross entropy loss.
with λ=[λpos, λpos, λrot]. Lsl1 can be the smooth l1 loss.
Claims (20)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/150,998 US12127085B2 (en) | 2020-07-29 | 2021-01-15 | Systems and methods for mitigating vehicle pose error across an aggregated feature map |
| US18/886,553 US20250016534A1 (en) | 2020-07-29 | 2024-09-16 | Systems and Methods for Mitigating Vehicle Pose Error Across an Aggregated Feature Map |
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202063058040P | 2020-07-29 | 2020-07-29 | |
| US202063132792P | 2020-12-31 | 2020-12-31 | |
| US17/150,998 US12127085B2 (en) | 2020-07-29 | 2021-01-15 | Systems and methods for mitigating vehicle pose error across an aggregated feature map |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/886,553 Continuation US20250016534A1 (en) | 2020-07-29 | 2024-09-16 | Systems and Methods for Mitigating Vehicle Pose Error Across an Aggregated Feature Map |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20220032970A1 US20220032970A1 (en) | 2022-02-03 |
| US12127085B2 true US12127085B2 (en) | 2024-10-22 |
Family
ID=80002570
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/150,998 Active 2042-04-02 US12127085B2 (en) | 2020-07-29 | 2021-01-15 | Systems and methods for mitigating vehicle pose error across an aggregated feature map |
| US18/886,553 Pending US20250016534A1 (en) | 2020-07-29 | 2024-09-16 | Systems and Methods for Mitigating Vehicle Pose Error Across an Aggregated Feature Map |
Family Applications After (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/886,553 Pending US20250016534A1 (en) | 2020-07-29 | 2024-09-16 | Systems and Methods for Mitigating Vehicle Pose Error Across an Aggregated Feature Map |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US12127085B2 (en) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10795356B2 (en) * | 2017-08-31 | 2020-10-06 | Uatc, Llc | Systems and methods for determining when to release control of an autonomous vehicle |
| US11543534B2 (en) * | 2019-11-22 | 2023-01-03 | Samsung Electronics Co., Ltd. | System and method for three-dimensional object detection |
| US20230331225A1 (en) * | 2020-09-08 | 2023-10-19 | The Regents Of The University Of California | Autonomous vehicle navigation based on inter-vehicle communication |
| US11858514B2 (en) | 2021-03-30 | 2024-01-02 | Zoox, Inc. | Top-down scene discrimination |
| US11810225B2 (en) * | 2021-03-30 | 2023-11-07 | Zoox, Inc. | Top-down scene generation |
| EP4281345A1 (en) * | 2021-05-19 | 2023-11-29 | Huawei Technologies Co., Ltd. | Apparatus and method for assisting an autonomous vehicle and/or a driver of a vehicle |
| US12391255B2 (en) * | 2022-04-18 | 2025-08-19 | Qualcomm Incorporated | Systems and methods of collaborative enhanced sensing |
| US12292290B2 (en) * | 2022-08-09 | 2025-05-06 | Aptiv Technologies AG | Vehicle localization based on pose corrections from remote vehicles in parking garages |
| US12299997B1 (en) * | 2022-09-26 | 2025-05-13 | Zoox, Inc. | Multi-attention machine learning for object detection and classification |
| CN115626158B (en) * | 2022-12-07 | 2023-03-07 | 深圳曦华科技有限公司 | Vehicle steering anti-rubbing method and related device |
| US20240211797A1 (en) * | 2022-12-22 | 2024-06-27 | Zoox, Inc. | Training a variable autoencoder using a diffusion model |
| CN118967714B (en) * | 2024-07-29 | 2025-10-03 | 华中科技大学 | Medical image segmentation model establishment method based on harmonic attention and medical image segmentation method |
| CN120001667B (en) * | 2025-04-18 | 2025-08-22 | 浙江大学湖州研究院 | Intelligent household garbage sorting system and method based on machine vision |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150228077A1 (en) * | 2014-02-08 | 2015-08-13 | Honda Motor Co., Ltd. | System and method for mapping, localization and pose correction |
| DE102018105293A1 (en) * | 2017-03-08 | 2018-09-20 | GM Global Technology Operations LLC | METHOD AND DEVICE FOR A NETWORKED SCENE REPRESENTATION AND EXPANSION IN VEHICLE ENVIRONMENTS IN AUTONOMOUS DRIVING SYSTEMS |
| US10282861B2 (en) * | 2016-12-12 | 2019-05-07 | Here Global B.V. | Pose error estimation and localization using static features |
| US20190362157A1 (en) * | 2018-05-25 | 2019-11-28 | Vangogh Imaging, Inc. | Keyframe-based object scanning and tracking |
| US20200217972A1 (en) * | 2019-01-07 | 2020-07-09 | Qualcomm Incorporated | Vehicle pose estimation and pose error correction |
-
2021
- 2021-01-15 US US17/150,998 patent/US12127085B2/en active Active
-
2024
- 2024-09-16 US US18/886,553 patent/US20250016534A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150228077A1 (en) * | 2014-02-08 | 2015-08-13 | Honda Motor Co., Ltd. | System and method for mapping, localization and pose correction |
| US10282861B2 (en) * | 2016-12-12 | 2019-05-07 | Here Global B.V. | Pose error estimation and localization using static features |
| DE102018105293A1 (en) * | 2017-03-08 | 2018-09-20 | GM Global Technology Operations LLC | METHOD AND DEVICE FOR A NETWORKED SCENE REPRESENTATION AND EXPANSION IN VEHICLE ENVIRONMENTS IN AUTONOMOUS DRIVING SYSTEMS |
| US20190362157A1 (en) * | 2018-05-25 | 2019-11-28 | Vangogh Imaging, Inc. | Keyframe-based object scanning and tracking |
| US20200217972A1 (en) * | 2019-01-07 | 2020-07-09 | Qualcomm Incorporated | Vehicle pose estimation and pose error correction |
Non-Patent Citations (41)
| Title |
|---|
| Agrawal et al., "Learning to See by Moving", International Conference on Computer Vision, Dec. 11, 2015-Dec. 18, 2015, Santiago, Chile, pp. 37-45. |
| Arrigoni et al., "Robust synchronization in SO(3) and SE(3) via low-rank and sparse matrix decomposition", Computer Vision and Image Understanding, vol. 174, 2018, pp. 95-113. |
| Arrigoni et al., "Spectral Synchronization of Multiple Views in SE(3)", SIAM Journal of Imaging Sciences, vol. 9, No. 4, Nov. 2016, pp. 1963-1990. |
| Balachandar et al., "Collaboration of AI Agents via Cooperative Multi-Agent Deep Reinforcement Learning", arXiv:1907.00327v1, Jun. 30, 2019, 9 pages. |
| Bernard et al., "A Solution for Multi-Alignment by Transformation Synchronisation", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 7-12, 2015, Boston, MA, pp. 2161-2169. |
| Besag, "On the Statistical Analysis of Dirty Pictures", Journal of the Royal Statistical Society Series B (Methodological), vol. 48, No. 3, 1986, pp. 259-279. |
| Birdal et al., "Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC", Conference on Neural Information Processing Systems, Dec. 3-8, 2018, Montreal, Canada, 12 pages. |
| Chen et al., "Cooper: Cooperative Perception for Connected Autonomous Vehicles based on 3D Point Clouds", International Conference on Distributed Computing Systems, Jul. 7-10, 2019, Dallas, TX, pp. 514-524. |
| Choi et al., "A Large Dataset of Object Scans", arXiv:1602.02481v3, May 5, 2016, 7 pages. |
| Dornhege et al., "Visual Odometry for Tracked Vehicles", 2008 IEEE International Workshop on Safety, Security & Rescue Robotics (SSRR), Oct. 21-24, 2008, Sendai, Japan, 6 pages. |
| Fitzgibbon, "Robust registration of 2D and 3D point sets", Image and Vision Computing, vol. 21, 2003, pp. 1145-1153. |
| Gojcic et al., "Learning Multiview 3D point cloud registration", Conference on Computer Vision and Pattern Recognition, Jun. 14-19, 2020, Virtual, pp. 1759-1769. |
| Huang et al., "Learning Transformation Synchronization", Conference on Computer Vision and Pattern Recognition, Jun. 16-20, 2019, Long Beach, CA, pp. 8082-8091. |
| Kaess et al., "Flow Separation for Fast and Robust Stereo Odometry", IEEE International Conference on Robotics and Automation, May 12-17, 2009, Kobe, Japan, pp. 3539-3544. |
| Kingma et al., "ADAM: A Method for Stochastic Optimization", arXiv:1412.6980v9, Jan. 30, 2017, 15 pages. |
| Li et al., "Gated Graph Sequence Neural Networks", arXiv:1511.05493v4, Sep. 22, 2017, 20 pages. |
| Liang et al., "Deep Continuous Fusion for Multi-Sensor 3D Object Detection", European Conference on Computer Vision, Sep. 8-14, 2018, Munich, Germany, 16 pages. |
| Liu et al., "ML Estimation of the t Distribution Using EM and its Extensions, ECM and ECME", Statistica Sinica, vol. 5, 1995, pp. 19-39. |
| Lu et al., "DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration", arXiv:1905.04153v2, Sep. 16, 2019, 10 pages. |
| Luo et al., "Efficient Deep Learning for Stereo Matching", Conference on Computer Vision and Pattern Recognition, Jun. 26-Jul. 1, 2016, Las Vegas, NV, pp. 5695-5703. |
| Manivasagam et al., "LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World", Conference on Computer Vision and Pattern Recognition, Jun. 14-19, 2020, Virtual, pp. 11167-11176. |
| Matthies et al., "Error Modeling in Stereo Navigation", Autonomous Robot Vehicles, Springer, 1990, 12 pages. |
| Mohanty et al., "DeepVO: A Deep Learning approach for Monocular Visual Odometry", arXiv:1611.06069v1, Nov. 18, 2016, 9 pages. |
| Obst et al., "Multi-Sensor Data Fusion for Checking Plausibility of V2V Communications by Vision-based Multiple-Object Tracking" IEEE Vehicular Networking Conference, Dec. 3-5, 2014, Paderborn, Germany, pp. 143-150. |
| Omidshafiei et al., "Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability", International Conference on Machine Learning, Aug. 6-11, 2017, Sydney, Australia, 10 pages. |
| Pomerleau et al., "A Review of Point Cloud Registration Algorithms for Mobile Robotics", Foundations and Trends in Robotics, vol. 4, No. 1, 2013, 107 pages. |
| Purkait et al., "NeuRoRA: Neural Robust Rotation Averaging", arXiv:1912.04485vl, Dec. 10, 2019, 10 pages. |
| Rauch et al., "Car2X-Based Perception in a High-Level Fusion Architecture for CooperativeG/Perception Systems", Intelligent Vehicles Symposium, Jun. 3-7, 2012, Alcala de Henares, Spain, pp. 270-275. |
| Rawashdeh et al., "Collaborative Automated Driving: A Machine Learning-Based Method to Enhance the Accuracy of Shared Information", International Conference on Intelligent Transportation Systems (ITSC), Nov. 4-7, 2018, Maui, Hawaii, pp. 3961-3966. |
| Rockl et al., "V2V Communications in Automotive Multi-sensor Multi-target Tracking", 68th Vehicular Technology Conference, Sep. 21-24, 2008, Calgary, Canada, 5 pages. |
| Rosen et al., "A Certifiably Correct Algorithm for Synchronization over the Special Euclidean Group", arXiv:1611.00128v3, Feb. 10, 2017, 16 pages. |
| Singer, "Angular synchronization by eigenvectors and semidefinite programming", Applied and Computational Harmonic Analysis, vol. 30, 2011, pp. 20-36. |
| Smith et al., "Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates", arXiv:1708.07120v3, May 17, 2018, 18 pages. |
| Sukhbaatar et al., "Learning Multiagent Communication with Backpropagation", Conference on Neural Information Processing Systems, Dec. 5-10, 2016, Barcelona, Spain, 9 pages. |
| Talukder et al., "Real-time detection of moving objects in a dynamic scene from moving robotic vehicles", International Conference on Intelligent Robots and Systems, Oct. 27-Nov. 1, 2003, Las Vegas, NV, pp. 1308-1313. |
| Wang et al., "Deep VO: Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks", IEEE International Conference on Robotics and Automation (ICRA), May 29-Jun. 3, 2017, Singapore, pp. 2043-2050. |
| Wang et al., "Robust Probabilistic Modeling with Bayesian Data Reweighting", International Conference on Machine Learning, Aug. 6-11, 2017, Sydney, Australia, 10 pages. |
| Wang et al., "V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction", European Conference on Computer Vision, Aug. 23-28, 2020, Virtual, 17 pages. |
| Yang et al., "A Polynomial-time Solution for Robust Registration with Extreme Outlier Rates", Robotics: Science and Systems, Jun. 22-26, 2016, Freiburg im Breisgau, Germany, 10 pages. |
| Yew et al., "3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration", European Conference on Computer Vision, Sep. 8-14, 2018, Munich, Germany, 17 pages. |
| Yousif et al., "An Overview to Visual Odometry and Visual SLAM: Applications to Mobile Robotics", Intelligent Industrial Systems, vol. 1, No. 4, 2015, pp. 289-311. |
Also Published As
| Publication number | Publication date |
|---|---|
| US20220032970A1 (en) | 2022-02-03 |
| US20250016534A1 (en) | 2025-01-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12127085B2 (en) | Systems and methods for mitigating vehicle pose error across an aggregated feature map | |
| US12387475B2 (en) | Systems and methods for latent distribution modeling for scene-consistent motion forecasting | |
| US12103554B2 (en) | Systems and methods for autonomous vehicle systems simulation | |
| US11685403B2 (en) | Systems and methods for vehicle-to-vehicle communications for improved autonomous vehicle operations | |
| US20250139980A1 (en) | Systems and Methods for Training Machine-Learned Models with Deviating Intermediate Representations | |
| US11960290B2 (en) | Systems and methods for end-to-end trajectory prediction using radar, LIDAR, and maps | |
| US12282328B2 (en) | Systems and methods for using attention masks to improve motion planning | |
| US11836585B2 (en) | Systems and methods for training probabilistic object motion prediction models using non-differentiable prior knowledge | |
| US11762094B2 (en) | Systems and methods for object detection and motion prediction by fusing multiple sensor sweeps into a range view representation | |
| US12013457B2 (en) | Systems and methods for integrating radar data for improved object detection in autonomous vehicles | |
| US11562490B2 (en) | Systems and methods for video object segmentation | |
| US12344279B2 (en) | Systems and methods for motion forecasting and planning for autonomous vehicles | |
| US11745759B2 (en) | Systems and methods for selective autonomous vehicle ridership and control | |
| US12110042B1 (en) | Systems and methods for generating physically realistic trajectories | |
| US12191888B2 (en) | Systems and methods for compressing and storing sensor data collected by an autonomous vehicle | |
| US12450476B2 (en) | Compression of machine-learned models by vector quantization |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: UBER TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:VADIVELU, NICHOLAS BASKAR;REN, MENGYE;WANG, JINGKANG;REEL/FRAME:055248/0138 Effective date: 20210119 |
|
| AS | Assignment |
Owner name: UATC, LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UBER TECHNOLOGIES, INC.;REEL/FRAME:055267/0625 Effective date: 20201119 |
|
| AS | Assignment |
Owner name: UBER TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TU, XUANYUAN;REEL/FRAME:055703/0732 Effective date: 20191127 |
|
| AS | Assignment |
Owner name: UBER TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:URTASUN SOTIL, RAQUEL;REEL/FRAME:056969/0695 Effective date: 20170413 Owner name: UBER TECHNOLOGIES, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:URTASUN SOTIL, RAQUEL;REEL/FRAME:056969/0695 Effective date: 20170413 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
| AS | Assignment |
Owner name: AURORA OPERATIONS, INC., PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:UATC, LLC;REEL/FRAME:067733/0001 Effective date: 20240321 Owner name: AURORA OPERATIONS, INC., PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNOR'S INTEREST;ASSIGNOR:UATC, LLC;REEL/FRAME:067733/0001 Effective date: 20240321 |
|
| ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |