[go: up one dir, main page]

WO2008048897A2 - Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article - Google Patents

Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article Download PDF

Info

Publication number
WO2008048897A2
WO2008048897A2 PCT/US2007/081248 US2007081248W WO2008048897A2 WO 2008048897 A2 WO2008048897 A2 WO 2008048897A2 US 2007081248 W US2007081248 W US 2007081248W WO 2008048897 A2 WO2008048897 A2 WO 2008048897A2
Authority
WO
WIPO (PCT)
Prior art keywords
item
parsed data
probabilistic analysis
temporally parsed
pertains
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2007/081248
Other languages
English (en)
Other versions
WO2008048897A3 (fr
Inventor
Wei Qu
Dan Schonfeld
Magdi A. Mohamed
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
University of Illinois at Chicago
Original Assignee
Motorola Inc
University of Illinois at Chicago
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/549,542 external-priority patent/US20080154555A1/en
Application filed by Motorola Inc, University of Illinois at Chicago filed Critical Motorola Inc
Publication of WO2008048897A2 publication Critical patent/WO2008048897A2/fr
Publication of WO2008048897A3 publication Critical patent/WO2008048897A3/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/285Analysis of motion using a sequence of stereo image pairs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/80Recognising image objects characterised by unique random patterns
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence

Definitions

  • This invention relates generally to the tracking of multiple items.
  • a widely adopted solution to address this need uses a centralized solution that introduces a joint state space representation that concatenates all of the object's states together to form a large resultant meta state.
  • This approach provides for inferring the joint data association by characterization of all possible associations between objects and observations using any of a variety of known techniques. Though successful for many purposes, unfortunately such approaches are neither a comprehensive solution nor always a desirable approach in and of themselves.
  • FIG. 1 comprises a flow diagram as configured in accordance with various embodiments of the invention
  • FIG. 2 comprises a block diagram as configured in accordance with various embodiments of the invention.
  • FIG. 3 comprises a model as configured in accordance with various embodiments of the invention.
  • FIG. 4 comprises a model as configured in accordance with various embodiments of the invention.
  • FIG. 5 comprises a model as configured in accordance with various embodiments of the invention.
  • FIG. 6 comprises a model as configured in accordance with various embodiments of the invention.
  • FIG. 7 comprises a schematic depiction as configured in accordance with various embodiments of the invention.
  • FIG. 8 comprises a model as configured in accordance with various embodiments of the invention.
  • FIG. 9 comprises a schematic state diagram as configured in accordance with various embodiments of the invention.
  • temporally parsed data regarding at least a first item is captured.
  • This temporally parsed data comprises data that corresponds to substantially simultaneous samples of the first item with respect to at least a first and a second different points of view.
  • Conditional probabilistic analysis of at least some of this temporally parsed data is then automatically used to disambiguate state information as pertains to this first item.
  • This conditional probabilistic analysis comprises analysis of at least some of the temporally parsed data as corresponds in a given sample to both the first point of reference and the second point of reference.
  • these teachings will further accommodate automatically using, at least in part, disjoint probabilistic analysis of the temporally parsed data as pertains to multiple such items to disambiguate state information as pertains to a given one of the points of reference for the first item from information as pertains to the given one of the points of reference for a second such item.
  • these teachings facilitate the use of multiple data capture points of view when disambiguating state information for a given item. These teachings achieve such disambiguation in a manner that requires considerably less computational capacity and capability than might otherwise be expected. In particular, these teachings are suitable for use in substantially real-time monitoring settings where a relatively high number of items, such as pedestrians or the like, are likely at any given time to be visually interacting with one another in ways that would otherwise tend to lead to confused or ambiguous monitoring results when relying only upon relatively modest computational capabilities.
  • these teachings provide a superior solution to multi-target occlusion problems by leveraging the availability of multiocular videos. These teachings permit avoidance of the computational complexity that is generally inherent in centralized methods that rely on joint-state representation and joint data association.
  • an illustrative process 100 in these regards provides for capturing 101 temporally parsed data regarding at least a first item.
  • This item could comprise any of a wide variety of objects including but not limited to discernable energy waves such as discrete sounds, continuous or discontinuous sound streams from multiple sources, radar images, and so forth. In many application settings, however, this item will comprise a physical object or, perhaps more precisely, images of a physical object.
  • This activity of capturing temporally parsed data can therefore comprise, for example, providing a video stream as provided by data capture devices of a particular scene (such as a scene of a sidewalk, an airport security line, and so forth) where various of the frames contain data (that is, images of objects) that represent samples captured at different times.
  • data that is, images of objects
  • Such data can comprise a wide variety of different kinds of objects, for the sake of simplicity and clarity the remainder of this description shall presume that the objects are images of physical objects unless stated otherwise.
  • this convention is undertaken for the sake of illustration and is not intended as any suggestion of limitation with respect to the scope of these teachings.
  • this activity of capturing temporally parsed data can comprises capturing temporally parsed data regarding at least a first item, wherein the temporally parsed data comprises data corresponding to substantially simultaneous samples of the at least first item with respect to at least first and second different points of reference.
  • This can comprise, for example, providing data that has been captured using at least two cameras that are positioned to have differing view of the first item.
  • cameras can comprise any combination of similar or dissimilar cameras: true color cameras, enhanced color cameras, monochrome cameras, still image cameras, video capture cameras, and so forth. It would also be possible to employ cameras that react to illumination sources other than visible light, such as infrared cameras or the like.
  • This process 100 then provides for automatically using 102, at least in part, conditional probabilistic analysis of at least some of the temporally parsed data as corresponds in a given sample to the first point of reference and the second point of reference to disambiguate state information as pertains to the first item.
  • conditional probabilistic analysis can comprise using conditional probabilistic analysis with respect to state information as corresponds to the first item.
  • This can also comprise, if desired, determining whether to use a joint conditional probabilistic analysis or a non- joint conditional probabilistic analysis as will be illustrated in more detail below.
  • this can also comprise determining whether to use such conditional probabilistic analysis for only some of the temporally parsed data or for substantially all (or all) of the temporally parsed data as corresponds to the given sample.
  • this process 100 will accommodate the use of data as corresponds to more than one item.
  • temporally parsed data comprises data corresponding to substantially simultaneous samples regarding at least a first item and a second item with respect to at least a first and a second different points of reference
  • the aforementioned step regarding disambiguation can further comprise automatically using conditional probabilistic analysis of at least some of the temporally parsed data to also disambiguate state information as pertains to the first item from information as pertains to the second item.
  • these teachings will also accommodate, if desired, optionally automatically using 103, at least in part, disjoint probabilistic analysis of the temporally parsed data to disambiguate state information as pertains to a given one of the points of reference for the first item from information as pertains to the given one of the points of reference for the second item.
  • the elements ju Jn, ... ⁇ ⁇ 1, ... ,M ⁇ Ju Jn, ... ⁇ i, are indexes of targets whose observations interact with 4 7 -t t '' .
  • J t 0. Since the interaction structure among observations is changing, J may vary in time.
  • ⁇ ' u represents the sequence of neighboring observation vectors up to time t.
  • FIG. 3 illustrates a dynamic graphical model 300 of two consecutive frames for multiple targets in two collaborative cameras (i.e., camera A and camera B).
  • Each camera view has two layers: a hidden layer has circle modes representing the targets' states and an observable layer has square nodes representing the observations associated with the hidden states.
  • the directed link between consecutive states of the same target in each camera represents the state dynamics.
  • the directed link for a target's state to its observation characterizes the local observation likelihood.
  • the undirected link in each camera between neighboring observation nodes represents the "interaction.”
  • the directed curve link between the counterpart states of the same target in two cameras represents the "camera collaboration. ' " This collaboration is activated between any possible collection of cameras only for targets which need help to improve their tracking robustness. For instance, such help may be needed when the targets are close to occlusion or are possibly completely occluded by other targets in a camera view.
  • the direction of the link shows which target resorts to which other targets for help. This need driven-based scheme avoids performing camera collaboration at all times and for all targets; thus, a tremendous amount of computation is saved.
  • each target in camera B at time t does not need to activate the camera collaboration because their observations do not interact with the other targets' observations at all.
  • each target can be robustly tracked using independent trackers.
  • targets 1 and 2 in camera A at time t can serve to activate camera collaboration since their observations interact and may undergo multi-target occlusion. Therefore, external information from other cameras may be helpful to make the tracking of these two targets more stable.
  • a graphical model as shown in FIG. 3 is suitable for centralized analysis using joint-state representations. To minimize computational costs, however, one may choose a completely distributed process where multiple collaborative trackers, one tracker per target in each camera, are used for multi-target tracking purposes simultaneously.
  • each submodel aims at one target in one camera; (2) for analysis of the observations of a specific camera, only neighboring observations which have direct links to the analyzed target's observation are kept; i.e., all the nodes of both non-neighboring observations which have direct links to the analyzed target's observation are kept; (3) each undirected "interaction" link is decomposed into two different directed links for the different targets (the direction of the link is from the other target's observation to the analyzed target's observation); and (4) since the "camera collaboration" link from a target's state in the analyzed camera view to its counterpart state in another view and the link from this counterpart state to its associated observation have the same direction, this causality can be simplified by a direct link from the grandparent node 401 to its grandson 402 as illustrated in FIG. 4.
  • FIG. 5 illustrates the decomposition result 501 of target 1 in camera A.
  • FIGS. 4 and 5 One may now consider a Bayesian conditional density propagation structure for each decomposed graphical model as illustrated in FIGS. 4 and 5.
  • One objective in this regard is to provide a generic statistical structure to model the interaction among cameras for multi-camera tracking. Since this process proposes using multiple collaborative trackers, one tracker per target in each camera view, for multi-camera multi-target tracking, one can dynamically estimate the posterior based on observations from both the target and its neighbors in the current camera view as well as the target in other camera views, that is, ,z*- t ' I for each tracker and for each camera view.
  • a novel likelihood density can be introduced to characterize the collaboration between the same target's counterparts in different camera views. This is referred to herein as a "camera collaboration function.”
  • the proposed Bayesian multiple-camera tracking framework can be identical to the Interactively Distributed Multi-Object Tracking (IDMOT) approach which is known in the art, where
  • P[Zt ⁇ t ] is uniformly distributed.
  • such a formulation can further reduce to traditional Bayesian tracking, where uniformly distributed.
  • Modeling the densities in (4) is not necessarily trivial and can have great influence on the performance of practical implementations.
  • a proper model can play a significant role in estimating the densities.
  • Different target models such as a 2D ellipse model, a 3D object model, a snake or dynamic contour model, and so forth, are known in the art.
  • t is the time index
  • (ex, cy) is the center of the ellipse
  • a is the major axis
  • b is the minor axis
  • p is the orientation in radians.
  • the proposed Bayesian conditional density propagation framework has no specific requirements of the cameras (e.g., fixed or moving, calibrated or not, and so forth) and the collaboration model (e.g., 3D/2D) as long as the model can provide a good estimation of the density
  • Epipolar geometry has been used to model the relation across multiple camera views in different ways. Somewhat contrary to prior uses of epipolar geometry, however, the present teachings will accommodate presenting a paradigm of camera collaboration likelihood modeling that uses sequential Monte Carlo implementation that does not require feature matching and recovery of the target's 3D coordinates, but only assumes that the cameras' epipolar geometry is known.
  • FIG. 7 illustrates a model setting in 3D space.
  • Two targets i andy are projected onto two camera views 701 and 702 respectively.
  • view 701 the projections of targets i andy are very close (occluding) while in view 702, they are not.
  • these teachings will accommodate only activating the camera collaboration for trackers of targets i andy in view 701 but not in view 702 in order to conserve computational requirements.
  • the observations ⁇ ' an d z ' are initially found by tracking in view 702. Then, they are mapped to view 701, producing h ⁇ ' J and h ⁇ ' j, where h(-) is a function of ⁇ x ' or Z x ' characterizing the epipolar geometry transformation. After that, the collaboration likelihood can be calculated based on Sometimes, a more complicated case occurs, for example, target i is occluding with others in both cameras. In this situation, the above scheme is initialized by randomly selecting one view, say, view 702, and using IDMOT to find the observations. These initial estimates may not be very accurate; therefore, in this case, one can iterate several times (usually twice is enough) between different views to get more stable estimates.
  • FIG. 8 illustrates a procedure used to calculate the collaboration weight for each particle based on h ⁇ g t ' j.
  • the particles ⁇ t ' ' > ⁇ ' '---'X x ' ' / are represented by the circles 801 instead of the ellipse models for simplicity.
  • the collaboration weight for particle ⁇ t '''" can be computed as
  • FIG. 8 one can simplify ⁇ " by using a point-line distance between the center of the particle and the middle line of the band. Furthermore, the camera collaboration likelihood can be approximated as follows:
  • a tracker can be configured to activate the camera collaboration and thus implement the proposed Bayesian multiple-camera tracking only when its associated target needs and can do so. In other situations, the tracker will degrade to implement IDMOT or a traditional Bayesian tracker such as multiple independent regular particle filters.
  • FIG. 9 illustrates an approach in this regard.
  • every target in each camera view is in one of the following three situations:
  • the targets having a bad counterpart or having no counterpart can implement a degraded Bayesian multiple-camera tracking approach, namely, IDMOT 901. These trackers can be upgraded back to Bayesian multiple-camera tracking 902 after reinitialization, when the status may change to having a good counterpart.
  • MIPF independent regular particle filters
  • the tracker can be configured to have the capability to decide that the associated target has disappeared and should be deleted in either of two cases: (1) the target moves out of the image; or (2) the tracker loses the target and tracks clutter instead. In both situations, the epipolar consistence loop checking fails and the local observation weights of the tracker's particles become very small since there is no target information any more. On the other hand, in the case where the tracker misses its associated target and follows a false target, these processes will not delete the tracker and instead leave it for further evaluation.
  • the apparatus 200 comprises a memory
  • the memory 201 serves to store and hold available the aforementioned captured temporally parsed data regarding at least a first item, wherein the data comprises data corresponding to substantially simultaneous samples of the first item (and other items when present) with respect to at least first and second differing points of reference.
  • data can be provided by, for example, a first 203 through an Nth image capture device 204 (where N comprises an integer greater than one) that are each positioned to have differing views of the first item.
  • the processor 202 is configured and arranged to effect selected teachings as have been set forth above. This includes, for example, automatically using, at least in part, conditional probabilistic analysis of at least some of the temporally parsed data as corresponds in a given sample to the first point of reference and the second point of reference to disambiguate state information as pertains to the first item.
  • Such an apparatus 200 may be comprised of a plurality of physically distinct elements as is suggested by the illustration shown in FIG. 2. It is also possible, however, to view this illustration as comprising a logical view, in which case one or more of these elements can be enabled and realized via a shared platform. It will also be understood that such a shared platform may comprise a wholly or at least partially programmable platform as are known in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

L'invention concerne des données analysées temporellement concernant au moins un premier article qui sont capturées (101). Ces données analysées temporellement comprennent des données qui correspondent aux échantillons séquentiels sensiblement simultanés du premier article par rapport à au moins un premier et un second points de vue différents. L'analyse probabiliste conditionnelle d'au moins certaines de ces données analysées temporellement est alors utilisée automatiquement (102) pour rendre non ambiguës des informations d'état concernant ce premier article. Cette analyse probabiliste conditionnelle comprend l'analyse d'au moins certaines des données analysées temporellement correspondant, dans un échantillon donné, au premier point de référence et au second point de référence.
PCT/US2007/081248 2006-10-13 2007-10-12 Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article Ceased WO2008048897A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US11/549,542 2006-10-13
US11/549,542 US20080154555A1 (en) 2006-10-13 2006-10-13 Method and apparatus to disambiguate state information for multiple items tracking
US11/614,361 2006-12-21
US11/614,361 US20080089578A1 (en) 2006-10-13 2006-12-21 Method and Apparatus to Facilitate Use Of Conditional Probabilistic Analysis Of Multi-Point-Of-Reference Samples of an Item To Disambiguate State Information as Pertains to the Item

Publications (2)

Publication Number Publication Date
WO2008048897A2 true WO2008048897A2 (fr) 2008-04-24
WO2008048897A3 WO2008048897A3 (fr) 2008-11-06

Family

ID=39314759

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/081248 Ceased WO2008048897A2 (fr) 2006-10-13 2007-10-12 Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article

Country Status (1)

Country Link
WO (1) WO2008048897A2 (fr)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6502082B1 (en) * 1999-06-01 2002-12-31 Microsoft Corp Modality fusion for object tracking with training system and method
GB0127553D0 (en) * 2001-11-16 2002-01-09 Abb Ab Provision of data for analysis
US20040003391A1 (en) * 2002-06-27 2004-01-01 Koninklijke Philips Electronics N.V. Method, system and program product for locally analyzing viewing behavior
US7280673B2 (en) * 2003-10-10 2007-10-09 Intellivid Corporation System and method for searching for changes in surveillance video
US7363299B2 (en) * 2004-11-18 2008-04-22 University Of Washington Computing probabilistic answers to queries

Also Published As

Publication number Publication date
WO2008048897A3 (fr) 2008-11-06

Similar Documents

Publication Publication Date Title
Berclaz et al. Multiple object tracking using flow linear programming
US10268900B2 (en) Real-time detection, tracking and occlusion reasoning
CN102812491B (zh) 跟踪方法
Gabriel et al. The state of the art in multiple object tracking under occlusion in video sequences
Boult et al. Omni-directional visual surveillance
JP2009015827A (ja) 物体追跡方法、物体追跡システム、及び物体追跡プログラム
CN102436662A (zh) 一种非重叠视域多摄像机网络中的人体目标跟踪方法
Zappella et al. Motion segmentation: A review
Qu et al. Distributed bayesian multiple-target tracking in crowded environments using multiple collaborative cameras
JP2006331416A (ja) シーンをモデル化する方法
Pinto et al. Unsupervised flow-based motion analysis for an autonomous moving system
Lu et al. Detecting unattended packages through human activity recognition and object association
Pollard et al. GM-PHD filters for multi-object tracking in uncalibrated aerial videos
Ng et al. New models for real-time tracking using particle filtering
US20080089578A1 (en) Method and Apparatus to Facilitate Use Of Conditional Probabilistic Analysis Of Multi-Point-Of-Reference Samples of an Item To Disambiguate State Information as Pertains to the Item
Meingast et al. Automatic camera network localization using object image tracks
WO2008048897A2 (fr) Procédé et appareil pour faciliter l'utilisation d'une analyse probabiliste conditionnelle d'échantillons à plusieurs points de référence d'un article pour rendre non ambiguës des informations d'état concernant l'article
EP1596334A1 (fr) Un modèle graphique hybride pour le cheminement en ligne de multicamera
Sharma et al. A survey on moving object detection methods in video surveillance
Tsagkatakis et al. A random projections model for object tracking under variable pose and multi-camera views
Topçu et al. Occlusion-aware 3D multiple object tracker with two cameras for visual surveillance
Luvison et al. Automatic detection of unexpected events in dense areas for videosurveillance applications
Hoseinnezhad et al. Visual tracking of multiple targets by multi-Bernoulli filtering of background subtracted image data
Du et al. Tracking by cluster analysis of feature points and multiple particle filters
Kushwaha et al. 3d target tracking in distributed smart camera networks with in-network aggregation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07844228

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07844228

Country of ref document: EP

Kind code of ref document: A2