US20190064841A1 - Cross traffic detection using cameras - Google Patents
Cross traffic detection using cameras Download PDFInfo
- Publication number
- US20190064841A1 US20190064841A1 US15/690,966 US201715690966A US2019064841A1 US 20190064841 A1 US20190064841 A1 US 20190064841A1 US 201715690966 A US201715690966 A US 201715690966A US 2019064841 A1 US2019064841 A1 US 2019064841A1
- Authority
- US
- United States
- Prior art keywords
- vehicle
- region
- target object
- camera
- proposal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0253—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting relative motion information from a plurality of images taken successively, e.g. visual odometry, optical flow
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G06K9/00805—
-
- G06K9/4604—
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/12—Bounding box
Definitions
- the subject disclosure relates to autonomous vehicles and, in particular, to a system and method for detecting cross-traffic conditions using a camera disposed in an autonomous vehicle.
- Autonomous vehicles rely primarily on radar systems and Lidar (light detection and ranging) systems embedded with the vehicle to locate target objects and to determine their parameters of motion.
- Urban intersections however are challenging environments for these detection systems. In such environments, target objects are far away, where they appear small and where Lidar does not provide enough resolution for reliable detection.
- urban environments generate noisy background clutter in which target objects can be occluded or obscured, frustrating radar systems. There is generally insufficient context information to distinguish target objects from background signals, therefore making it difficult to classify target objects.
- radar systems are poor at determining velocity for objects moving across a field of view of the vehicle, such as in cross-traffic situations. Accordingly, it is desirable to provide a system for an autonomous vehicle to classify target objects in an urban environment.
- a method of driving of an autonomous vehicle includes obtaining an image of a surrounding region of the vehicle using a camera, wherein the image includes at least a portion of a target object, selecting a context region within the image, wherein the context region includes a detection region, estimating a confidence level indicative of a presence of at least a portion of the target object in the detection region and a bounding box associated with the target object, determining a proposal region from the bounding box when the confidence level is greater than a selected threshold, determining a parameter of the target object within the proposal region, and altering a driving parameter of motion of the vehicle based on the parameter of the target object.
- Feature maps of the target object are extracted from the context region, and the confidence level and the bounding box are determined from the extracted feature map.
- a plurality of proposal regions for the target object are determined for a plurality of context regions, the plurality of proposal regions are merged to form a merged region, and the proposal region is determined for the merged region.
- the target object is classified within the merged region.
- the camera includes a camera at a front end of the vehicle, a camera at a rear end of the vehicle, or a camera at a front end of the vehicle and a camera at a rear end of the vehicle. Movement of a plurality of temporally spaced bounding boxes is tracked in order to determine a movement of the target object. A velocity of the target object across a line of sight of the vehicle is determined.
- a system for driving an autonomous vehicle includes a camera configured to obtain an image of a surrounding region of the vehicle, an actuation device for controlling a parameter of motion of the vehicle, and a processor.
- the processor is configured to select a context region within the image, the context region including a detection region therein, estimate a confidence level indicative of the presence of at least a portion of the target object in the detection region and a bounding box associated with the target object, determine a proposal region from the bounding box when the confidence level is greater than a selected threshold, determine a parameter of the target object within the proposal region, and control the actuation device to alter a parameter of motion of the vehicle based on the parameter of the target object.
- the processor is further configured to extract a feature map of the target object from the context region and determine the confidence level and the bounding box from the extracted feature map.
- the processor is further configured to determine a plurality of proposal regions for the target object from a plurality of context regions, merge the plurality of proposal regions to form a merged region, and determine the bounding box for the merged region.
- the processor is further configured to classify the target object in the merged region.
- the camera includes a camera at a front end of the vehicle, a camera at a rear end of the vehicle, or a camera at a front end of the vehicle and a camera at a rear end of the vehicle.
- the processor is further configured to track movement of a plurality of temporally spaced bounding boxes to determine a movement of the target object.
- the processor is further configured to determine a velocity of the target object across a line of sight of the vehicle.
- a vehicle in yet another exemplary embodiment, includes a camera configured to obtain an image of a surrounding region of the vehicle, an actuation device for controlling a parameter of motion of the vehicle, and a processor.
- the processor is configured to select a context region within the image, the context region including a detection region therein, estimate a confidence level indicative of the presence of at least a portion of the target object in the detection region and a bounding box associated with the target object, determine a proposal region from the bounding box when the confidence level is greater than a selected threshold, determine a parameter of the target object within the proposal region, and control the actuation device to alter the parameter of motion of the vehicle based on the parameter of the target object.
- the processor is further configured to extract a feature map of the target object from the context region and determine the confidence level and the bounding box from the extracted feature map.
- the processor is further configured to determine a plurality of proposal regions for the target object from a plurality of context regions, merge the plurality of proposal regions to form a merged region and determine the bounding box for the merged region.
- the processor is further configured to classify the target object in the merged region.
- the camera includes a camera at a front end of the vehicle, a camera at a rear end of the vehicle, or a camera at a front end of the vehicle and a camera at a rear end of the vehicle.
- the processor is further configured to track movement of a plurality of temporally spaced bounding boxes to determine a movement of the target object across a line of sight of the vehicle.
- FIG. 1 shows a plan view of an autonomous vehicle in an environment with a cross-traffic condition
- FIG. 2 shows an image taken from a forward camera of the autonomous vehicle of FIG. 1 ;
- FIG. 3 illustrates a schematic diagram of a system for determining a target object from digital camera images
- FIG. 4 shows the image of FIG. 2 with superimposed detection regions determined by a neural network
- FIG. 5 shows a plurality of proposal regions of interest corresponding to the plurality of detection regions in FIG. 4 ;
- FIG. 6 shows bounding boxes corresponding to the regions of interest of FIG. 5 ;
- FIG. 7 shows a diagram illustrating a process for selecting image patches for training a neural network
- FIG. 8 illustrates a method by which the neural network learns to determine a proposal region for a target vehicle
- FIG. 9 illustrates a method by which the neural network learns to create a bounding box for the target object within a proposal region of interest.
- FIG. 1 shows a plan view 100 of an autonomous vehicle 102 in an environment with a cross-traffic condition.
- the plan view 110 further shows target objects 120 and 122 moving across a line-of-sight of the vehicle 102 .
- the vehicle 102 includes one or more cameras 104 a , 104 b and 104 c for obtaining images of a surrounding area or region of the vehicle 102 .
- the cameras 104 a , 104 b and 104 c can be digital cameras that record an image in an array of pixels.
- a forward looking camera 104 a disposed on the vehicle 102 obtains a view of a forward region 106 a in front of the vehicle.
- Left-side camera 104 b obtains a view of a side region 106 b to the left of the vehicle 102 .
- Right-side camera 104 c obtains a view of a side region 106 c to the right of the vehicle 102 .
- forward region 106 a is a sector extending over an angle of about a 120 degrees
- side regions 106 b and 106 c are sectors extending over an angle of about 60 degrees.
- the cameras 104 a , 104 b and 104 c can be arranged so that regions 106 a , 106 b and 106 c are aligned to provide a 240-degree viewing angle for the vehicle 102 .
- the cameras 104 a , 104 b and 104 c are able to obtain images up to a selected distance d.
- a similar setup can be used to obtain a rear view.
- cameras similar to cameras 104 a , 104 b and 104 c are installed at the rear side of the vehicle 102 .
- Rear view cameras can be useful when the vehicle 102 is moving in reverse and there is a need to know if there is cross traffic behind the vehicle 102 .
- the vehicle 102 further includes one or more internal state sensors 110 for measuring parameters of the internal operations of the vehicle 102 .
- an internal state sensor 110 may include a brake sensor, acceleration sensor, a steering wheel sensor, or other sensor that measures a parameter of motion of the vehicle 102 , such as propulsion, braking, steering, etc.
- the vehicle 102 further includes various actuation devices 112 for operating the vehicle 102 , such as by controlling one or more parameters of motion of the vehicle 102 such as speed, orientation, steering angle, etc.
- exemplary actuation devices 112 can include, but are not limited to, braking actuators, propulsion or throttle actuators, steering actuators, etc. These actuation devices 112 cause the vehicle 102 to speed up, slow down, steer, stop, change gear, change direction, etc.
- the measurements from the internal state sensors 110 along with the operation of the actuation devices 112 can be used to enable autonomous driving of the vehicle 102 .
- the vehicle 102 further includes a communication module 114 that provides a communication link to a remote location or remote processor in order to transmit and receive data.
- the vehicle 102 further includes a processor 116 that performs methods of vehicle navigation and of detection of target objects.
- the processor 116 receives one or more images from the cameras 104 a , 104 b and 104 c , locates and classifies a target object within the one or more images and determines parameters of motion of the target object.
- the parameters may include a location, angular location, velocity of the target object, for example.
- the processor 116 can predict an outcome of driving the vehicle 102 based on the parameters of the target object and the internal state parameters of the vehicle 102 and can calculate and implement an updated internal state for providing a different outcome.
- the processor 116 can determine that based on the location and velocity of the target object and based on the direction and velocity of vehicle 102 an impact with the target object is imminent. The processor 116 can then send a control signal to the actuation devices 112 in order to change a parameter of motion such as a speed or direction of the vehicle 102 in order to avoid impact with the target object. The processor 116 further performs methods for detecting and tracking target objects as disclosed herein.
- FIG. 2 shows an image 200 taken from a forward camera 104 a of the vehicle 102 of FIG. 1 .
- the image 200 shows a view of a forward region of vehicle 102 .
- the vehicle 102 is at an intersection.
- target objects also referred to herein as target vehicles 210 , 212 , 214 , and 216 are proceeding into the intersection as part of cross-traffic.
- Another target vehicle 218 is in a lane next to vehicle 102 . From the image 200 , it is clear that some target vehicles (e.g., target vehicle 212 ) can blend into the environment, while another target vehicle (e.g., target vehicle 216 ) can be partially obscured by other features in the environment.
- the image 200 is provided as an example in order to illustrate the method for detecting target objects disclosed herein.
- the image 200 is a pixelate image obtained from a digital camera.
- the processor 102 can define a context region 202 over a section of the image 200 .
- the context region 202 is a proper subset of the image 200 .
- the context region 202 is a 100 pixel by 100 pixel region of the image.
- the context region 202 can be any selected number of pixels.
- the context region 202 is sent to an image processor 116 which detects a target object or at least a part of a target object within a detection region 204 within the context region 202 .
- the detection region 204 is an area inside the context region 202 .
- a processor examines the detection region 204 to determine whether a target vehicle or part of a target vehicle can be located within the detection region 204 .
- the target vehicle 210 is inside the context region 202 but is not inside the detection region 204 . Therefore, the image processor does not generate a positive detection for the target vehicle 210 .
- the detection region 204 is located at the center of the context region and the size of the detection region is 10 ⁇ 10 pixels.
- the location of the detection region 204 can be anywhere inside the context region 202 and can have a size that is any selected number of pixels, as long as the detection region 204 remains within the context region 202 .
- the image processer can be a neural network such as a convolution neural network and generates from the context region 202 a probability that the target object or at least a part of the target object lies within the detection region 204 .
- the processor 116 also determines a bounding box for the target object from the proposal region of interest or from a plurality of regions of interest, as discussed herein.
- FIG. 3 illustrates a schematic diagram 300 of a system for determining a target object from digital camera images.
- the processor 116 receives the image 200 from the camera and scans a context region 202 through the image 200 starting from the top left corner of the image 200 .
- the context region 202 scans from left to the right and from top to the bottom of the image 202 in a raster pattern until the entire image is processed.
- a convolution neural network 302 extracts a set of feature maps 304 from the context region 202 and feeds the feature maps 304 into a proposal network 306 .
- the proposal network 306 generates two outputs.
- the first output is a confidence score which indicates how likely a target vehicle or a portion of a target vehicle is to be found inside the detection region 204 of the context region 202 .
- the second output is a location, width and height of a bounding box for the potential target vehicle.
- the proposal regions predicted from the proposal network 306 and the features maps 304 generated by the convolutional network 302 are combined at a pooling module 308 .
- Proposal regions provided by the propose network 306 generally have feature vectors of different sizes or lengths.
- the pooling module 308 generates feature vectors of each proposal regions with a fixed or same length, since classifier 310 takes in standardized inputs.
- the pooling module 308 standardizes the dimension of the feature vectors from each proposal regions. Since features maps 304 of the whole image have already been extracted in the first processing path, it is not necessary to extract the feature maps 304 again. Therefore, the processor only has to find the location of the features for a selected proposal region. Because extracting features is in general very time consuming, reusing the extracted feature maps 304 saves time and computing power.
- the classifier 310 classifies the object as a target object, i.e., as a target vehicle, pedestrian, cyclist, etc.
- a bounding box is also predicted for the target object.
- the bounding box has a width and height substantially equal to those of the target object within the image. Distance to the target object can be determined as well.
- FIG. 4 shows an image 400 that includes detection regions determined by the neural network 302 superimposed on the image 200 of FIG. 2 . Detection regions are shown in which the confidence level is greater than a selected threshold. These detection regions form clusters 410 , 412 , 414 , 416 and 418 at the location of the respective target vehicles 210 , 212 , 214 , 216 and 218 of FIG. 2 , respectively. For each cluster, a region of interest can be generated, wherein the region of interest is a proposed region in which to locate the target vehicle.
- FIG. 5 shows an image 500 that includes a plurality of proposal regions of interest 510 , 512 , 514 , 516 and 518 corresponding to the plurality of detection regions 410 , 412 , 414 , 416 and 418 in FIG. 4 , respectively.
- the proposal regions of interests 510 , 512 , 514 , 516 and 518 can be determined so as to encompass the target vehicle as determined by the neural network 302 .
- FIG. 6 shows an image 600 that includes bounding boxes 610 , 612 , 614 , 616 and 618 corresponding to the regions of interest 510 , 512 , 514 , 516 and 518 of FIG. 5 , respectively.
- a bounding box is drawn around each target vehicle.
- the bounding box indicates a region taken up by the target object.
- the autonomous vehicle 102 tracks motion of the bounding box in order to know the location of the target object. Once the bounding box is determined, its motion can be tracked as the associated target object moves within the field of view of the vehicle 102 .
- the target object can be classified and various parameters of motion of the target object, such as distance, azimuthal location, velocity, etc., can be determined.
- the processor can track the bounding box across a plurality of temporally separated images in order to track a motion of the vehicle across a line-of-sight of the vehicle 102 .
- FIGS. 7-9 illustrate a method of training a neural network for target vehicle detection.
- FIG. 7 shows a diagram 700 illustrating a process for selecting image patches for training a neural network.
- the diagram 700 includes the image 200 of FIG. 2 and several patches extracted from the image 200 for training purposes.
- Region 702 within image 200 is a region selected to include a target vehicle.
- the region 702 has a length and height that is substantially equal to the length and height the selected target vehicle.
- the selected region 702 is referred to as a ground truth positive region.
- a randomly selected set of patches that include at least a portion of the ground truth positive region 702 are selected, as shown in the image patches of group 704 .
- a number of background or negative image patches are selected, which do not include any part of the target vehicle. These negative image patches are shown in group 706 .
- the positive image patches 704 and the ground truth positive region 702 are input into the neural network in order to teach the neural network how to recognize the target vehicle and to predict a bounding box for the target vehicle.
- the negative image patches can be used to teach the neural network to recognize when an image does not include a part of a target vehicle.
- FIG. 8 illustrates a method by which the neural network learns to determine a proposal region for a target vehicle.
- the ground truth image patch 702 is compared to an image bounded by the context region 802 .
- the context region 802 has a height H and a width W and includes only a portion of the target vehicle.
- a detection region 804 is located at a center 806 of the context region 802 .
- the neural network is able to predict a proposal region, i.e. a bounding box 810 that more closely approximates the dimensions of the target vehicle.
- the bounding box 810 has height hss and width wBB and center 816 that is located on the target vehicle.
- the area of the bounding box 810 can extend outside of the area of the image patch 802 .
- the height NB, width wBB and center 816 of the bounding box 810 can be determined using a regression analysis.
- FIG. 9 illustrates a method by which the neural network learns to create a bounding box for the target object within a proposal region of interest 900 .
- the proposal region of interest 900 may have an area that is greater than a bounding box of the target vehicle.
- the neural network determine a center 902 of the proposal region of interest 900 , determines a bounding box 910 for the target object having a height hss and width wBB and a center 904 .
- the height hss, width wBB and center 904 of the bounding box 910 can be determined using a regression analysis.
- the neural network can be taught how to estimate the distance to the target vehicle directly while determining the dimensions of the bounding box.
- the neural network can determine the distance to the target object using the regression analysis that is used to determine the width, height and center of the bounding box, whereas the regression analysis now also include the known distance to the target object.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Automation & Control Theory (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Aviation & Aerospace Engineering (AREA)
- Data Mining & Analysis (AREA)
- Electromagnetism (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
- The subject disclosure relates to autonomous vehicles and, in particular, to a system and method for detecting cross-traffic conditions using a camera disposed in an autonomous vehicle.
- Autonomous vehicles rely primarily on radar systems and Lidar (light detection and ranging) systems embedded with the vehicle to locate target objects and to determine their parameters of motion. Urban intersections however are challenging environments for these detection systems. In such environments, target objects are far away, where they appear small and where Lidar does not provide enough resolution for reliable detection. In addition, urban environments generate noisy background clutter in which target objects can be occluded or obscured, frustrating radar systems. There is generally insufficient context information to distinguish target objects from background signals, therefore making it difficult to classify target objects. Additionally, radar systems are poor at determining velocity for objects moving across a field of view of the vehicle, such as in cross-traffic situations. Accordingly, it is desirable to provide a system for an autonomous vehicle to classify target objects in an urban environment.
- In one exemplary embodiment, a method of driving of an autonomous vehicle is disclosed. The method includes obtaining an image of a surrounding region of the vehicle using a camera, wherein the image includes at least a portion of a target object, selecting a context region within the image, wherein the context region includes a detection region, estimating a confidence level indicative of a presence of at least a portion of the target object in the detection region and a bounding box associated with the target object, determining a proposal region from the bounding box when the confidence level is greater than a selected threshold, determining a parameter of the target object within the proposal region, and altering a driving parameter of motion of the vehicle based on the parameter of the target object.
- Feature maps of the target object are extracted from the context region, and the confidence level and the bounding box are determined from the extracted feature map. A plurality of proposal regions for the target object are determined for a plurality of context regions, the plurality of proposal regions are merged to form a merged region, and the proposal region is determined for the merged region. The target object is classified within the merged region.
- In various embodiments, the camera includes a camera at a front end of the vehicle, a camera at a rear end of the vehicle, or a camera at a front end of the vehicle and a camera at a rear end of the vehicle. Movement of a plurality of temporally spaced bounding boxes is tracked in order to determine a movement of the target object. A velocity of the target object across a line of sight of the vehicle is determined.
- In another exemplary embodiment, a system for driving an autonomous vehicle is disclosed. The system includes a camera configured to obtain an image of a surrounding region of the vehicle, an actuation device for controlling a parameter of motion of the vehicle, and a processor. The processor is configured to select a context region within the image, the context region including a detection region therein, estimate a confidence level indicative of the presence of at least a portion of the target object in the detection region and a bounding box associated with the target object, determine a proposal region from the bounding box when the confidence level is greater than a selected threshold, determine a parameter of the target object within the proposal region, and control the actuation device to alter a parameter of motion of the vehicle based on the parameter of the target object.
- The processor is further configured to extract a feature map of the target object from the context region and determine the confidence level and the bounding box from the extracted feature map. The processor is further configured to determine a plurality of proposal regions for the target object from a plurality of context regions, merge the plurality of proposal regions to form a merged region, and determine the bounding box for the merged region. The processor is further configured to classify the target object in the merged region.
- In various embodiments, the camera includes a camera at a front end of the vehicle, a camera at a rear end of the vehicle, or a camera at a front end of the vehicle and a camera at a rear end of the vehicle. The processor is further configured to track movement of a plurality of temporally spaced bounding boxes to determine a movement of the target object. The processor is further configured to determine a velocity of the target object across a line of sight of the vehicle.
- In yet another exemplary embodiment, a vehicle is disclosed. The vehicle includes a camera configured to obtain an image of a surrounding region of the vehicle, an actuation device for controlling a parameter of motion of the vehicle, and a processor. The processor is configured to select a context region within the image, the context region including a detection region therein, estimate a confidence level indicative of the presence of at least a portion of the target object in the detection region and a bounding box associated with the target object, determine a proposal region from the bounding box when the confidence level is greater than a selected threshold, determine a parameter of the target object within the proposal region, and control the actuation device to alter the parameter of motion of the vehicle based on the parameter of the target object.
- The processor is further configured to extract a feature map of the target object from the context region and determine the confidence level and the bounding box from the extracted feature map. The processor is further configured to determine a plurality of proposal regions for the target object from a plurality of context regions, merge the plurality of proposal regions to form a merged region and determine the bounding box for the merged region. The processor is further configured to classify the target object in the merged region.
- In various embodiments, the camera includes a camera at a front end of the vehicle, a camera at a rear end of the vehicle, or a camera at a front end of the vehicle and a camera at a rear end of the vehicle. The processor is further configured to track movement of a plurality of temporally spaced bounding boxes to determine a movement of the target object across a line of sight of the vehicle.
- The above features and advantages, and other features and advantages of the disclosure are readily apparent from the following detailed description when taken in connection with the accompanying drawings.
- Other features, advantages and details appear, by way of example only, in the following detailed description, the detailed description referring to the drawings in which:
-
FIG. 1 shows a plan view of an autonomous vehicle in an environment with a cross-traffic condition; -
FIG. 2 shows an image taken from a forward camera of the autonomous vehicle ofFIG. 1 ; -
FIG. 3 illustrates a schematic diagram of a system for determining a target object from digital camera images; -
FIG. 4 shows the image ofFIG. 2 with superimposed detection regions determined by a neural network; -
FIG. 5 shows a plurality of proposal regions of interest corresponding to the plurality of detection regions inFIG. 4 ; -
FIG. 6 shows bounding boxes corresponding to the regions of interest ofFIG. 5 ; -
FIG. 7 shows a diagram illustrating a process for selecting image patches for training a neural network; -
FIG. 8 illustrates a method by which the neural network learns to determine a proposal region for a target vehicle; and -
FIG. 9 illustrates a method by which the neural network learns to create a bounding box for the target object within a proposal region of interest. - The following description is merely exemplary in nature and is not intended to limit the present disclosure, its application or uses. It should be understood that throughout the drawings, corresponding reference numerals indicate like or corresponding parts and features.
- In accordance with an exemplary embodiment of the disclosure,
FIG. 1 shows aplan view 100 of anautonomous vehicle 102 in an environment with a cross-traffic condition. Theplan view 110 further shows 120 and 122 moving across a line-of-sight of thetarget objects vehicle 102. Thevehicle 102 includes one or 104 a, 104 b and 104 c for obtaining images of a surrounding area or region of themore cameras vehicle 102. The 104 a, 104 b and 104 c can be digital cameras that record an image in an array of pixels. A forward lookingcameras camera 104 a disposed on thevehicle 102 obtains a view of aforward region 106 a in front of the vehicle. Left-side camera 104 b obtains a view of aside region 106 b to the left of thevehicle 102. Right-side camera 104 c obtains a view of aside region 106 c to the right of thevehicle 102. In an illustrative embodiment,forward region 106 a is a sector extending over an angle of about a 120 degrees, and 106 b and 106 c are sectors extending over an angle of about 60 degrees. Theside regions 104 a, 104 b and 104 c can be arranged so thatcameras 106 a, 106 b and 106 c are aligned to provide a 240-degree viewing angle for theregions vehicle 102. The 104 a, 104 b and 104 c are able to obtain images up to a selected distance d. A similar setup can be used to obtain a rear view. For rear view detection, cameras similar tocameras 104 a, 104 b and 104 c are installed at the rear side of thecameras vehicle 102. Rear view cameras can be useful when thevehicle 102 is moving in reverse and there is a need to know if there is cross traffic behind thevehicle 102. - The
vehicle 102 further includes one or moreinternal state sensors 110 for measuring parameters of the internal operations of thevehicle 102. For example, aninternal state sensor 110 may include a brake sensor, acceleration sensor, a steering wheel sensor, or other sensor that measures a parameter of motion of thevehicle 102, such as propulsion, braking, steering, etc. - The
vehicle 102 further includesvarious actuation devices 112 for operating thevehicle 102, such as by controlling one or more parameters of motion of thevehicle 102 such as speed, orientation, steering angle, etc.Exemplary actuation devices 112 can include, but are not limited to, braking actuators, propulsion or throttle actuators, steering actuators, etc. Theseactuation devices 112 cause thevehicle 102 to speed up, slow down, steer, stop, change gear, change direction, etc. The measurements from theinternal state sensors 110 along with the operation of theactuation devices 112 can be used to enable autonomous driving of thevehicle 102. Thevehicle 102 further includes acommunication module 114 that provides a communication link to a remote location or remote processor in order to transmit and receive data. - The
vehicle 102 further includes aprocessor 116 that performs methods of vehicle navigation and of detection of target objects. Theprocessor 116 receives one or more images from the 104 a, 104 b and 104 c, locates and classifies a target object within the one or more images and determines parameters of motion of the target object. The parameters may include a location, angular location, velocity of the target object, for example. Thecameras processor 116 can predict an outcome of driving thevehicle 102 based on the parameters of the target object and the internal state parameters of thevehicle 102 and can calculate and implement an updated internal state for providing a different outcome. For example, theprocessor 116 can determine that based on the location and velocity of the target object and based on the direction and velocity ofvehicle 102 an impact with the target object is imminent. Theprocessor 116 can then send a control signal to theactuation devices 112 in order to change a parameter of motion such as a speed or direction of thevehicle 102 in order to avoid impact with the target object. Theprocessor 116 further performs methods for detecting and tracking target objects as disclosed herein. -
FIG. 2 shows animage 200 taken from aforward camera 104 a of thevehicle 102 ofFIG. 1 . Theimage 200 shows a view of a forward region ofvehicle 102. Thevehicle 102 is at an intersection. Four target objects, also referred to herein as 210, 212, 214, and 216 are proceeding into the intersection as part of cross-traffic. Anothertarget vehicles target vehicle 218 is in a lane next tovehicle 102. From theimage 200, it is clear that some target vehicles (e.g., target vehicle 212) can blend into the environment, while another target vehicle (e.g., target vehicle 216) can be partially obscured by other features in the environment. Theimage 200 is provided as an example in order to illustrate the method for detecting target objects disclosed herein. - In one embodiment, the
image 200 is a pixelate image obtained from a digital camera. Theprocessor 102 can define acontext region 202 over a section of theimage 200. Thecontext region 202 is a proper subset of theimage 200. In an exemplary embodiment, thecontext region 202 is a 100 pixel by 100 pixel region of the image. However in other embodiments, thecontext region 202 can be any selected number of pixels. Thecontext region 202 is sent to animage processor 116 which detects a target object or at least a part of a target object within adetection region 204 within thecontext region 202. Thedetection region 204 is an area inside thecontext region 202. A processor examines thedetection region 204 to determine whether a target vehicle or part of a target vehicle can be located within thedetection region 204. For thecontext region 202 shown inFIG. 2 , thetarget vehicle 210 is inside thecontext region 202 but is not inside thedetection region 204. Therefore, the image processor does not generate a positive detection for thetarget vehicle 210. In an exemplary embodiment, thedetection region 204 is located at the center of the context region and the size of the detection region is 10×10 pixels. However in other embodiments, the location of thedetection region 204 can be anywhere inside thecontext region 202 and can have a size that is any selected number of pixels, as long as thedetection region 204 remains within thecontext region 202. The image processer can be a neural network such as a convolution neural network and generates from the context region 202 a probability that the target object or at least a part of the target object lies within thedetection region 204. Theprocessor 116 also determines a bounding box for the target object from the proposal region of interest or from a plurality of regions of interest, as discussed herein. -
FIG. 3 illustrates a schematic diagram 300 of a system for determining a target object from digital camera images. There are two major processing paths in the diagram 300. In the first path, theprocessor 116 receives theimage 200 from the camera and scans acontext region 202 through theimage 200 starting from the top left corner of theimage 200. Thecontext region 202 scans from left to the right and from top to the bottom of theimage 202 in a raster pattern until the entire image is processed. For each location of thecontext region 202, a convolutionneural network 302 extracts a set offeature maps 304 from thecontext region 202 and feeds the feature maps 304 into aproposal network 306. Theproposal network 306 generates two outputs. The first output is a confidence score which indicates how likely a target vehicle or a portion of a target vehicle is to be found inside thedetection region 204 of thecontext region 202. The second output is a location, width and height of a bounding box for the potential target vehicle. Once the image has been scanned, detection regions with confidence scores greater than a predefined threshold are selected and a non-maximum suppression (NMS) algorithm is applied to the bounding boxes associated with the selected detection regions. The NMS algorithm outputs a set of proposal regions in theimage 202 which are likely to include the target vehicle. - In the second processing path, the proposal regions predicted from the
proposal network 306 and the features maps 304 generated by theconvolutional network 302 are combined at apooling module 308. Proposal regions provided by the proposenetwork 306 generally have feature vectors of different sizes or lengths. Thepooling module 308 generates feature vectors of each proposal regions with a fixed or same length, sinceclassifier 310 takes in standardized inputs. Thepooling module 308 standardizes the dimension of the feature vectors from each proposal regions. Since features maps 304 of the whole image have already been extracted in the first processing path, it is not necessary to extract the feature maps 304 again. Therefore, the processor only has to find the location of the features for a selected proposal region. Because extracting features is in general very time consuming, reusing the extractedfeature maps 304 saves time and computing power. - Once the regions of interest have been pooled, the
classifier 310 classifies the object as a target object, i.e., as a target vehicle, pedestrian, cyclist, etc. A bounding box is also predicted for the target object. The bounding box has a width and height substantially equal to those of the target object within the image. Distance to the target object can be determined as well. -
FIG. 4 shows animage 400 that includes detection regions determined by theneural network 302 superimposed on theimage 200 ofFIG. 2 . Detection regions are shown in which the confidence level is greater than a selected threshold. These detection regions form 410, 412, 414, 416 and 418 at the location of theclusters 210, 212, 214, 216 and 218 ofrespective target vehicles FIG. 2 , respectively. For each cluster, a region of interest can be generated, wherein the region of interest is a proposed region in which to locate the target vehicle. -
FIG. 5 shows animage 500 that includes a plurality of proposal regions of 510, 512, 514, 516 and 518 corresponding to the plurality ofinterest 410, 412, 414, 416 and 418 indetection regions FIG. 4 , respectively. The proposal regions of 510, 512, 514, 516 and 518 can be determined so as to encompass the target vehicle as determined by theinterests neural network 302. -
FIG. 6 shows animage 600 that includes bounding 610, 612, 614, 616 and 618 corresponding to the regions ofboxes 510, 512, 514, 516 and 518 ofinterest FIG. 5 , respectively. A bounding box is drawn around each target vehicle. The bounding box indicates a region taken up by the target object. Theautonomous vehicle 102 tracks motion of the bounding box in order to know the location of the target object. Once the bounding box is determined, its motion can be tracked as the associated target object moves within the field of view of thevehicle 102. Additionally, the target object can be classified and various parameters of motion of the target object, such as distance, azimuthal location, velocity, etc., can be determined. In particular, the processor can track the bounding box across a plurality of temporally separated images in order to track a motion of the vehicle across a line-of-sight of thevehicle 102. -
FIGS. 7-9 illustrate a method of training a neural network for target vehicle detection.FIG. 7 shows a diagram 700 illustrating a process for selecting image patches for training a neural network. The diagram 700 includes theimage 200 ofFIG. 2 and several patches extracted from theimage 200 for training purposes.Region 702 withinimage 200 is a region selected to include a target vehicle. Theregion 702 has a length and height that is substantially equal to the length and height the selected target vehicle. The selectedregion 702 is referred to as a ground truth positive region. A randomly selected set of patches that include at least a portion of the ground truthpositive region 702 are selected, as shown in the image patches ofgroup 704. In addition, a number of background or negative image patches are selected, which do not include any part of the target vehicle. These negative image patches are shown ingroup 706. - In one embodiment, the
positive image patches 704 and the ground truthpositive region 702 are input into the neural network in order to teach the neural network how to recognize the target vehicle and to predict a bounding box for the target vehicle. The negative image patches can be used to teach the neural network to recognize when an image does not include a part of a target vehicle. -
FIG. 8 illustrates a method by which the neural network learns to determine a proposal region for a target vehicle. The groundtruth image patch 702 is compared to an image bounded by thecontext region 802. Thecontext region 802 has a height H and a width W and includes only a portion of the target vehicle. Adetection region 804 is located at acenter 806 of thecontext region 802. Upon comparison, the neural network is able to predict a proposal region, i.e. abounding box 810 that more closely approximates the dimensions of the target vehicle. Thebounding box 810 has height hss and width wBB andcenter 816 that is located on the target vehicle. The area of thebounding box 810 can extend outside of the area of theimage patch 802. The height NB, width wBB andcenter 816 of thebounding box 810 can be determined using a regression analysis. -
FIG. 9 illustrates a method by which the neural network learns to create a bounding box for the target object within a proposal region ofinterest 900. The proposal region ofinterest 900 may have an area that is greater than a bounding box of the target vehicle. Similarly to the learning process using the image patches, the neural network determine acenter 902 of the proposal region ofinterest 900, determines abounding box 910 for the target object having a height hss and width wBB and acenter 904. The height hss, width wBB andcenter 904 of thebounding box 910 can be determined using a regression analysis. - If a distance to the target vehicle is known during the training process, the neural network can be taught how to estimate the distance to the target vehicle directly while determining the dimensions of the bounding box. The neural network can determine the distance to the target object using the regression analysis that is used to determine the width, height and center of the bounding box, whereas the regression analysis now also include the known distance to the target object.
- While the above disclosure has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from its scope. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the disclosure without departing from the essential scope thereof. Therefore, it is intended that the disclosure not be limited to the particular embodiments disclosed, but will include all embodiments falling within the scope of the application.
Claims (20)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/690,966 US10268204B2 (en) | 2017-08-30 | 2017-08-30 | Cross traffic detection using cameras |
| CN201810921246.0A CN109421730B (en) | 2017-08-30 | 2018-08-14 | Cross traffic detection using cameras |
| DE102018121008.6A DE102018121008B4 (en) | 2017-08-30 | 2018-08-28 | SYSTEM FOR DRIVING AN AUTONOMOUS VEHICLE AND VEHICLE EQUIPPED THEREFROM |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US15/690,966 US10268204B2 (en) | 2017-08-30 | 2017-08-30 | Cross traffic detection using cameras |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20190064841A1 true US20190064841A1 (en) | 2019-02-28 |
| US10268204B2 US10268204B2 (en) | 2019-04-23 |
Family
ID=65321411
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US15/690,966 Active 2037-11-03 US10268204B2 (en) | 2017-08-30 | 2017-08-30 | Cross traffic detection using cameras |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US10268204B2 (en) |
| CN (1) | CN109421730B (en) |
| DE (1) | DE102018121008B4 (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11043297B2 (en) | 2018-12-13 | 2021-06-22 | Rutgers, The State University Of New Jersey | Neural network-based object detection in visual input |
| CN113924462A (en) * | 2020-01-03 | 2022-01-11 | 移动眼视觉科技有限公司 | Navigation system and method for determining dimensions of an object |
| US11475660B2 (en) * | 2018-08-31 | 2022-10-18 | Advanced New Technologies Co., Ltd. | Method and system for facilitating recognition of vehicle parts based on a neural network |
| EP4380177A1 (en) * | 2022-11-29 | 2024-06-05 | MAN Truck & Bus SE | Vision system for a vehicle |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11778934B2 (en) * | 2019-07-02 | 2023-10-10 | Bear Flag Robotics, Inc. | Agricultural lane following |
| CN110728229B (en) * | 2019-10-09 | 2023-07-18 | 百度在线网络技术(北京)有限公司 | Image processing method, device, equipment and storage medium |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8232872B2 (en) | 2009-12-03 | 2012-07-31 | GM Global Technology Operations LLC | Cross traffic collision alert system |
| DE102014109062A1 (en) * | 2014-06-27 | 2015-12-31 | Connaught Electronics Ltd. | Method for tracking a target vehicle approaching a motor vehicle by means of a camera system of the motor vehicle, camera system and motor vehicle |
| IL236114A (en) * | 2014-12-07 | 2016-04-21 | Yoav Grauer | Improved object recognition in the reimbursement-based imaging unit |
| US11067996B2 (en) * | 2016-09-08 | 2021-07-20 | Siemens Industry Software Inc. | Event-driven region of interest management |
| US10318827B2 (en) * | 2016-12-19 | 2019-06-11 | Waymo Llc | Object detection neural networks |
-
2017
- 2017-08-30 US US15/690,966 patent/US10268204B2/en active Active
-
2018
- 2018-08-14 CN CN201810921246.0A patent/CN109421730B/en active Active
- 2018-08-28 DE DE102018121008.6A patent/DE102018121008B4/en active Active
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11475660B2 (en) * | 2018-08-31 | 2022-10-18 | Advanced New Technologies Co., Ltd. | Method and system for facilitating recognition of vehicle parts based on a neural network |
| US11043297B2 (en) | 2018-12-13 | 2021-06-22 | Rutgers, The State University Of New Jersey | Neural network-based object detection in visual input |
| CN113924462A (en) * | 2020-01-03 | 2022-01-11 | 移动眼视觉科技有限公司 | Navigation system and method for determining dimensions of an object |
| US20230175852A1 (en) * | 2020-01-03 | 2023-06-08 | Mobileye Vision Technologies Ltd. | Navigation systems and methods for determining object dimensions |
| EP4380177A1 (en) * | 2022-11-29 | 2024-06-05 | MAN Truck & Bus SE | Vision system for a vehicle |
Also Published As
| Publication number | Publication date |
|---|---|
| DE102018121008A1 (en) | 2019-02-28 |
| DE102018121008B4 (en) | 2024-03-07 |
| CN109421730A (en) | 2019-03-05 |
| US10268204B2 (en) | 2019-04-23 |
| CN109421730B (en) | 2022-01-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11960293B2 (en) | Systems and methods for navigating lane merges and lane splits | |
| US11669102B2 (en) | Navigating a vehicle based on a detected barrier | |
| US10268204B2 (en) | Cross traffic detection using cameras | |
| US11915491B2 (en) | Controlling host vehicle based on detected door opening events | |
| US11948249B2 (en) | Bounding box estimation and lane vehicle association | |
| US8605947B2 (en) | Method for detecting a clear path of travel for a vehicle enhanced by object detection | |
| JP5363921B2 (en) | Vehicle white line recognition device | |
| US11308717B2 (en) | Object detection device and object detection method | |
| CN115704898A (en) | Correlation of camera images and radar data in autonomous vehicle applications | |
| US11335099B2 (en) | Proceedable direction detection apparatus and proceedable direction detection method | |
| JP5363920B2 (en) | Vehicle white line recognition device | |
| Magnier et al. | Automotive LIDAR objects detection and classification algorithm using the belief theory | |
| Yoneda et al. | Simultaneous state recognition for multiple traffic signals on urban road | |
| JP4901275B2 (en) | Travel guidance obstacle detection device and vehicle control device | |
| Choi et al. | Cut-in vehicle warning system exploiting multiple rotational images of SVM cameras | |
| US20220237921A1 (en) | Outside environment recognition device | |
| US20220237899A1 (en) | Outside environment recognition device | |
| US12409834B2 (en) | Vehicle controller, method, and computer program for vehicle control | |
| CN111766601A (en) | Identification device, vehicle control device, identification method, and storage medium | |
| US20220237922A1 (en) | Outside environment recognition device | |
| US20240123976A1 (en) | Vehicle controller, method, and computer program for vehicle control | |
| US20230260294A1 (en) | Apparatus, method, and computer program for estimating road edge | |
| EP3705906A2 (en) | Multiple vertical layer light detection and ranging system, auto-parking assistance, and computer vision lane detection and keeping | |
| JP2022146384A (en) | object detector |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: GM GLOBAL TECHNOLOGY OPERATIONS LLC, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:TONG, WEI;ZENG, SHUQING;MUDALIGE, UPALI P.;REEL/FRAME:043450/0325 Effective date: 20170828 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |