WO2008048897A2 - Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article - Google Patents
Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article Download PDFInfo
- Publication number
- WO2008048897A2 WO2008048897A2 PCT/US2007/081248 US2007081248W WO2008048897A2 WO 2008048897 A2 WO2008048897 A2 WO 2008048897A2 US 2007081248 W US2007081248 W US 2007081248W WO 2008048897 A2 WO2008048897 A2 WO 2008048897A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- item
- parsed data
- probabilistic analysis
- temporally parsed
- pertains
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/285—Analysis of motion using a sequence of stereo image pairs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/80—Recognising image objects characterised by unique random patterns
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
Definitions
- This invention relates generally to the tracking of multiple items.
- a widely adopted solution to address this need uses a centralized solution that introduces a joint state space representation that concatenates all of the object's states together to form a large resultant meta state.
- This approach provides for inferring the joint data association by characterization of all possible associations between objects and observations using any of a variety of known techniques. Though successful for many purposes, unfortunately such approaches are neither a comprehensive solution nor always a desirable approach in and of themselves.
- FIG. 1 comprises a flow diagram as configured in accordance with various embodiments of the invention
- FIG. 2 comprises a block diagram as configured in accordance with various embodiments of the invention.
- FIG. 3 comprises a model as configured in accordance with various embodiments of the invention.
- FIG. 4 comprises a model as configured in accordance with various embodiments of the invention.
- FIG. 5 comprises a model as configured in accordance with various embodiments of the invention.
- FIG. 6 comprises a model as configured in accordance with various embodiments of the invention.
- FIG. 7 comprises a schematic depiction as configured in accordance with various embodiments of the invention.
- FIG. 8 comprises a model as configured in accordance with various embodiments of the invention.
- FIG. 9 comprises a schematic state diagram as configured in accordance with various embodiments of the invention.
- temporally parsed data regarding at least a first item is captured.
- This temporally parsed data comprises data that corresponds to substantially simultaneous samples of the first item with respect to at least a first and a second different points of view.
- Conditional probabilistic analysis of at least some of this temporally parsed data is then automatically used to disambiguate state information as pertains to this first item.
- This conditional probabilistic analysis comprises analysis of at least some of the temporally parsed data as corresponds in a given sample to both the first point of reference and the second point of reference.
- these teachings will further accommodate automatically using, at least in part, disjoint probabilistic analysis of the temporally parsed data as pertains to multiple such items to disambiguate state information as pertains to a given one of the points of reference for the first item from information as pertains to the given one of the points of reference for a second such item.
- these teachings facilitate the use of multiple data capture points of view when disambiguating state information for a given item. These teachings achieve such disambiguation in a manner that requires considerably less computational capacity and capability than might otherwise be expected. In particular, these teachings are suitable for use in substantially real-time monitoring settings where a relatively high number of items, such as pedestrians or the like, are likely at any given time to be visually interacting with one another in ways that would otherwise tend to lead to confused or ambiguous monitoring results when relying only upon relatively modest computational capabilities.
- these teachings provide a superior solution to multi-target occlusion problems by leveraging the availability of multiocular videos. These teachings permit avoidance of the computational complexity that is generally inherent in centralized methods that rely on joint-state representation and joint data association.
- an illustrative process 100 in these regards provides for capturing 101 temporally parsed data regarding at least a first item.
- This item could comprise any of a wide variety of objects including but not limited to discernable energy waves such as discrete sounds, continuous or discontinuous sound streams from multiple sources, radar images, and so forth. In many application settings, however, this item will comprise a physical object or, perhaps more precisely, images of a physical object.
- This activity of capturing temporally parsed data can therefore comprise, for example, providing a video stream as provided by data capture devices of a particular scene (such as a scene of a sidewalk, an airport security line, and so forth) where various of the frames contain data (that is, images of objects) that represent samples captured at different times.
- data that is, images of objects
- Such data can comprise a wide variety of different kinds of objects, for the sake of simplicity and clarity the remainder of this description shall presume that the objects are images of physical objects unless stated otherwise.
- this convention is undertaken for the sake of illustration and is not intended as any suggestion of limitation with respect to the scope of these teachings.
- this activity of capturing temporally parsed data can comprises capturing temporally parsed data regarding at least a first item, wherein the temporally parsed data comprises data corresponding to substantially simultaneous samples of the at least first item with respect to at least first and second different points of reference.
- This can comprise, for example, providing data that has been captured using at least two cameras that are positioned to have differing view of the first item.
- cameras can comprise any combination of similar or dissimilar cameras: true color cameras, enhanced color cameras, monochrome cameras, still image cameras, video capture cameras, and so forth. It would also be possible to employ cameras that react to illumination sources other than visible light, such as infrared cameras or the like.
- This process 100 then provides for automatically using 102, at least in part, conditional probabilistic analysis of at least some of the temporally parsed data as corresponds in a given sample to the first point of reference and the second point of reference to disambiguate state information as pertains to the first item.
- conditional probabilistic analysis can comprise using conditional probabilistic analysis with respect to state information as corresponds to the first item.
- This can also comprise, if desired, determining whether to use a joint conditional probabilistic analysis or a non- joint conditional probabilistic analysis as will be illustrated in more detail below.
- this can also comprise determining whether to use such conditional probabilistic analysis for only some of the temporally parsed data or for substantially all (or all) of the temporally parsed data as corresponds to the given sample.
- this process 100 will accommodate the use of data as corresponds to more than one item.
- temporally parsed data comprises data corresponding to substantially simultaneous samples regarding at least a first item and a second item with respect to at least a first and a second different points of reference
- the aforementioned step regarding disambiguation can further comprise automatically using conditional probabilistic analysis of at least some of the temporally parsed data to also disambiguate state information as pertains to the first item from information as pertains to the second item.
- these teachings will also accommodate, if desired, optionally automatically using 103, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to a given one of the points of reference for the first item from information as pertains to the given one of the points of reference for the second item.
- the elements ju Jn, ... ⁇ ⁇ 1, ... ,M ⁇ Ju Jn, ... ⁇ i, are indexes of targets whose observations interact with 4 7 -t t '' .
- J t 0. Since the interaction structure among observations is changing, J may vary in time.
- ⁇ ' u represents the sequence of neighboring observation vectors up to time t.
- FIG. 3 illustrates a dynamic graphical model 300 of two consecutive frames for multiple targets in two collaborative cameras (i.e., camera A and camera B).
- Each camera view has two layers: a hidden layer has circle modes representing the targets' states and an observable layer has square nodes representing the observations associated with the hidden states.
- the directed link between consecutive states of the same target in each camera represents the state dynamics.
- the directed link for a target's state to its observation characterizes the local observation likelihood.
- the undirected link in each camera between neighboring observation nodes represents the "interaction.”
- the directed curve link between the counterpart states of the same target in two cameras represents the "camera collaboration. ' " This collaboration is activated between any possible collection of cameras only for targets which need help to improve their tracking robustness. For instance, such help may be needed when the targets are close to occlusion or are possibly completely occluded by other targets in a camera view.
- the direction of the link shows which target resorts to which other targets for help. This need driven-based scheme avoids performing camera collaboration at all times and for all targets; thus, a tremendous amount of computation is saved.
- each target in camera B at time t does not need to activate the camera collaboration because their observations do not interact with the other targets' observations at all.
- each target can be robustly tracked using independent trackers.
- targets 1 and 2 in camera A at time t can serve to activate camera collaboration since their observations interact and may undergo multi-target occlusion. Therefore, external information from other cameras may be helpful to make the tracking of these two targets more stable.
- a graphical model as shown in FIG. 3 is suitable for centralized analysis using joint-state representations. To minimize computational costs, however, one may choose a completely distributed process where multiple collaborative trackers, one tracker per target in each camera, are used for multi-target tracking purposes simultaneously.
- each submodel aims at one target in one camera; (2) for analysis of the observations of a specific camera, only neighboring observations which have direct links to the analyzed target's observation are kept; i.e., all the nodes of both non-neighboring observations which have direct links to the analyzed target's observation are kept; (3) each undirected "interaction" link is decomposed into two different directed links for the different targets (the direction of the link is from the other target's observation to the analyzed target's observation); and (4) since the "camera collaboration" link from a target's state in the analyzed camera view to its counterpart state in another view and the link from this counterpart state to its associated observation have the same direction, this causality can be simplified by a direct link from the grandparent node 401 to its grandson 402 as illustrated in FIG. 4.
- FIG. 5 illustrates the decomposition result 501 of target 1 in camera A.
- FIGS. 4 and 5 One may now consider a Bayesian conditional density propagation structure for each decomposed graphical model as illustrated in FIGS. 4 and 5.
- One objective in this regard is to provide a generic statistical structure to model the interaction among cameras for multi-camera tracking. Since this process proposes using multiple collaborative trackers, one tracker per target in each camera view, for multi-camera multi-target tracking, one can dynamically estimate the posterior based on observations from both the target and its neighbors in the current camera view as well as the target in other camera views, that is, ,z*- t ' I for each tracker and for each camera view.
- a novel likelihood density can be introduced to characterize the collaboration between the same target's counterparts in different camera views. This is referred to herein as a "camera collaboration function.”
- the proposed Bayesian multiple-camera tracking framework can be identical to the Interactively Distributed Multi-Object Tracking (IDMOT) approach which is known in the art, where
- P[Zt ⁇ t ] is uniformly distributed.
- such a formulation can further reduce to traditional Bayesian tracking, where uniformly distributed.
- Modeling the densities in (4) is not necessarily trivial and can have great influence on the performance of practical implementations.
- a proper model can play a significant role in estimating the densities.
- Different target models such as a 2D ellipse model, a 3D object model, a snake or dynamic contour model, and so forth, are known in the art.
- t is the time index
- (ex, cy) is the center of the ellipse
- a is the major axis
- b is the minor axis
- p is the orientation in radians.
- the proposed Bayesian conditional density propagation framework has no specific requirements of the cameras (e.g., fixed or moving, calibrated or not, and so forth) and the collaboration model (e.g., 3D/2D) as long as the model can provide a good estimation of the density
- Epipolar geometry has been used to model the relation across multiple camera views in different ways. Somewhat contrary to prior uses of epipolar geometry, however, the present teachings will accommodate presenting a paradigm of camera collaboration likelihood modeling that uses sequential Monte Carlo implementation that does not require feature matching and recovery of the target's 3D coordinates, but only assumes that the cameras' epipolar geometry is known.
- FIG. 7 illustrates a model setting in 3D space.
- Two targets i andy are projected onto two camera views 701 and 702 respectively.
- view 701 the projections of targets i andy are very close (occluding) while in view 702, they are not.
- these teachings will accommodate only activating the camera collaboration for trackers of targets i andy in view 701 but not in view 702 in order to conserve computational requirements.
- the observations ⁇ ' an d z ' are initially found by tracking in view 702. Then, they are mapped to view 701, producing h ⁇ ' J and h ⁇ ' j, where h(-) is a function of ⁇ x ' or Z x ' characterizing the epipolar geometry transformation. After that, the collaboration likelihood can be calculated based on Sometimes, a more complicated case occurs, for example, target i is occluding with others in both cameras. In this situation, the above scheme is initialized by randomly selecting one view, say, view 702, and using IDMOT to find the observations. These initial estimates may not be very accurate; therefore, in this case, one can iterate several times (usually twice is enough) between different views to get more stable estimates.
- FIG. 8 illustrates a procedure used to calculate the collaboration weight for each particle based on h ⁇ g t ' j.
- the particles ⁇ t ' ' > ⁇ ' '---'X x ' ' / are represented by the circles 801 instead of the ellipse models for simplicity.
- the collaboration weight for particle ⁇ t '''" can be computed as
- FIG. 8 one can simplify ⁇ " by using a point-line distance between the center of the particle and the middle line of the band. Furthermore, the camera collaboration likelihood can be approximated as follows:
- a tracker can be configured to activate the camera collaboration and thus implement the proposed Bayesian multiple-camera tracking only when its associated target needs and can do so. In other situations, the tracker will degrade to implement IDMOT or a traditional Bayesian tracker such as multiple independent regular particle filters.
- FIG. 9 illustrates an approach in this regard.
- every target in each camera view is in one of the following three situations:
- the targets having a bad counterpart or having no counterpart can implement a degraded Bayesian multiple-camera tracking approach, namely, IDMOT 901. These trackers can be upgraded back to Bayesian multiple-camera tracking 902 after reinitialization, when the status may change to having a good counterpart.
- MIPF independent regular particle filters
- the tracker can be configured to have the capability to decide that the associated target has disappeared and should be deleted in either of two cases: (1) the target moves out of the image; or (2) the tracker loses the target and tracks clutter instead. In both situations, the epipolar consistence loop checking fails and the local observation weights of the tracker's particles become very small since there is no target information any more. On the other hand, in the case where the tracker misses its associated target and follows a false target, these processes will not delete the tracker and instead leave it for further evaluation.
- the apparatus 200 comprises a memory
- the memory 201 serves to store and hold available the aforementioned captured temporally parsed data regarding at least a first item, wherein the data comprises data corresponding to substantially simultaneous samples of the first item (and other items when present) with respect to at least first and second differing points of reference.
- data can be provided by, for example, a first 203 through an Nth image capture device 204 (where N comprises an integer greater than one) that are each positioned to have differing views of the first item.
- the processor 202 is configured and arranged to effect selected teachings as have been set forth above. This includes, for example, automatically using, at least in part, conditional probabilistic analysis of at least some of the temporally parsed data as corresponds in a given sample to the first point of reference and the second point of reference to disambiguate state information as pertains to the first item.
- Such an apparatus 200 may be comprised of a plurality of physically distinct elements as is suggested by the illustration shown in FIG. 2. It is also possible, however, to view this illustration as comprising a logical view, in which case one or more of these elements can be enabled and realized via a shared platform. It will also be understood that such a shared platform may comprise a wholly or at least partially programmable platform as are known in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
L'invention concerne des données analysées temporellement concernant au moins un premier article qui sont capturées (101). Ces données analysées temporellement comprennent des données qui correspondent aux échantillons séquentiels sensiblement simultanés du premier article par rapport à au moins un premier et un second points de vue différents. L'analyse probabiliste conditionnelle d'au moins certaines de ces données analysées temporellement est alors utilisée automatiquement (102) pour rendre non ambiguës des informations d'état concernant ce premier article. Cette analyse probabiliste conditionnelle comprend l'analyse d'au moins certaines des données analysées temporellement correspondant, dans un échantillon donné, au premier point de référence et au second point de référence.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US11/549,542 | 2006-10-13 | ||
| US11/549,542 US20080154555A1 (en) | 2006-10-13 | 2006-10-13 | Method and apparatus to disambiguate state information for multiple items tracking |
| US11/614,361 | 2006-12-21 | ||
| US11/614,361 US20080089578A1 (en) | 2006-10-13 | 2006-12-21 | Method and Apparatus to Facilitate Use Of Conditional Probabilistic Analysis Of Multi-Point-Of-Reference Samples of an Item To Disambiguate State Information as Pertains to the Item |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2008048897A2 true WO2008048897A2 (fr) | 2008-04-24 |
| WO2008048897A3 WO2008048897A3 (fr) | 2008-11-06 |
Family
ID=39314759
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2007/081248 Ceased WO2008048897A2 (fr) | 2006-10-13 | 2007-10-12 | Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2008048897A2 (fr) |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6502082B1 (en) * | 1999-06-01 | 2002-12-31 | Microsoft Corp | Modality fusion for object tracking with training system and method |
| GB0127553D0 (en) * | 2001-11-16 | 2002-01-09 | Abb Ab | Provision of data for analysis |
| US20040003391A1 (en) * | 2002-06-27 | 2004-01-01 | Koninklijke Philips Electronics N.V. | Method, system and program product for locally analyzing viewing behavior |
| US7280673B2 (en) * | 2003-10-10 | 2007-10-09 | Intellivid Corporation | System and method for searching for changes in surveillance video |
| US7363299B2 (en) * | 2004-11-18 | 2008-04-22 | University Of Washington | Computing probabilistic answers to queries |
-
2007
- 2007-10-12 WO PCT/US2007/081248 patent/WO2008048897A2/fr not_active Ceased
Also Published As
| Publication number | Publication date |
|---|---|
| WO2008048897A3 (fr) | 2008-11-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Berclaz et al. | Multiple object tracking using flow linear programming | |
| US10268900B2 (en) | Real-time detection, tracking and occlusion reasoning | |
| CN102812491B (zh) | 跟踪方法 | |
| Gabriel et al. | The state of the art in multiple object tracking under occlusion in video sequences | |
| Boult et al. | Omni-directional visual surveillance | |
| JP2009015827A (ja) | 物体追跡方法、物体追跡システム、及び物体追跡プログラム | |
| CN102436662A (zh) | 一种非重叠视域多摄像机网络中的人体目标跟踪方法 | |
| Zappella et al. | Motion segmentation: A review | |
| Qu et al. | Distributed bayesian multiple-target tracking in crowded environments using multiple collaborative cameras | |
| JP2006331416A (ja) | シーンをモデル化する方法 | |
| Pinto et al. | Unsupervised flow-based motion analysis for an autonomous moving system | |
| Lu et al. | Detecting unattended packages through human activity recognition and object association | |
| Pollard et al. | GM-PHD filters for multi-object tracking in uncalibrated aerial videos | |
| Ng et al. | New models for real-time tracking using particle filtering | |
| US20080089578A1 (en) | Method and Apparatus to Facilitate Use Of Conditional Probabilistic Analysis Of Multi-Point-Of-Reference Samples of an Item To Disambiguate State Information as Pertains to the Item | |
| Meingast et al. | Automatic camera network localization using object image tracks | |
| WO2008048897A2 (fr) | Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article | |
| EP1596334A1 (fr) | Un modèle graphique hybride pour le cheminement en ligne de multicamera | |
| Sharma et al. | A survey on moving object detection methods in video surveillance | |
| Tsagkatakis et al. | A random projections model for object tracking under variable pose and multi-camera views | |
| Topçu et al. | Occlusion-aware 3D multiple object tracker with two cameras for visual surveillance | |
| Luvison et al. | Automatic detection of unexpected events in dense areas for videosurveillance applications | |
| Hoseinnezhad et al. | Visual tracking of multiple targets by multi-Bernoulli filtering of background subtracted image data | |
| Du et al. | Tracking by cluster analysis of feature points and multiple particle filters | |
| Kushwaha et al. | 3d target tracking in distributed smart camera networks with in-network aggregation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07844228 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 07844228 Country of ref document: EP Kind code of ref document: A2 |