[go: up one dir, main page]

WO1996030873A1 - Method and apparatus for multi-frame based segmentation of data streams - Google Patents

Method and apparatus for multi-frame based segmentation of data streams Download PDF

Info

Publication number
WO1996030873A1
WO1996030873A1 PCT/EP1996/001273 EP9601273W WO9630873A1 WO 1996030873 A1 WO1996030873 A1 WO 1996030873A1 EP 9601273 W EP9601273 W EP 9601273W WO 9630873 A1 WO9630873 A1 WO 9630873A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
matrix
reference image
vectors
frames
Prior art date
Application number
PCT/EP1996/001273
Other languages
French (fr)
Inventor
Harald Aagaard Martens
Jan Otto Reberg
Original Assignee
Idt International Digital Technologies Deutschland Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Idt International Digital Technologies Deutschland Gmbh filed Critical Idt International Digital Technologies Deutschland Gmbh
Priority to JP8528905A priority Critical patent/JPH11502653A/en
Priority to AU52735/96A priority patent/AU5273596A/en
Priority to EP96909117A priority patent/EP0815537A1/en
Publication of WO1996030873A1 publication Critical patent/WO1996030873A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/543Motion estimation other than block-based using regions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Definitions

  • the present invention concerns methods for attaining grouping or segmentation in large signal streams by development and analysis of a manifold or subspace of parameters.
  • it concerns methods for spatiotemporal segmentation of video signals.
  • each of the segment models may also be easier to control and interpret, e.g. for editing purposes.
  • subsequent computational treatments of the invidividual segments may be computationally simpler than having to treat the full signal streams, for instance in reducing the amount of high-speed memory required for effective computation.
  • the segmentation process itself must also be statistically and computationally effective.
  • the present invention concerns how to attain relevant and reliable segmentation.
  • image segmentation is important: groups of pixels (pels) that display consistently related spatial patterns of change over groups of frames should be modelled together, as this gives best compression, editability and interpretability.
  • a segment may correspond to one physical object, but it may also correspond to only a part of a physical object, or to a set of several such objects. It may also correspond to non-tangible objects or phenomena such as shadows.
  • the optimal definition of a segment differs with the purpose of the coding:
  • the segments ideally correspond to the groups of pels that are most effectively compressed, but if the purpose is coding for later video manipulation such -J- as editing or video games, then the segments ideally corresponds more to physical objects.
  • the segmentation process must be robust, i.e. it must give visibly acceptable, statistically useful segments with applicability over many related image frames. Yet, it must be computationally feasible with respect to cpu time and memory requirements.
  • the segmentation methods for video coding mainly fall into two main categories: Still image segmentation and motion based segmentation.
  • Still image segmentation is based on defining spatial intensity patterns in an individual image.
  • a drawback with this type of segmentation is that it is difficult to distinguish between spatial contours inside and along the edges of objects.
  • Motion based segmentation concerns how the image intensities vary between images.
  • the segmentation is often based on the latter, and is attained by analysis of estimates of motion fields.
  • One established approach to segmentation is to estimate the motion field between two frames, say from a reference frame R and another frame n (here termed 'Difference in Address', DA Rn ), and to search for groups of pels in DA R ⁇ with similar motions.
  • DA Rn may have one, two or more motion dimensions.
  • Motion based segmentation can be generalized to change-based segmentation, where the changes could also include 'Difference in Intensity', i.e. intensity changes between pairs of frames, DI Rn , e.g. motion compensated and at different color channels.
  • 'Difference in Intensity' i.e. intensity changes between pairs of frames, DI Rn , e.g. motion compensated and at different color channels.
  • segment information facilitate subsequent motion estimation and intensity change estimation and the bilinear modelling of these.
  • segmentation is based on change information for a number of relevant frames, not only on changes between two frames. Thereby the segments obtained are statistically more stabile and have a higher validity.
  • the way the many change data are represented in the segmentation computations is preferably via common manifolds or subspace models, primarily by bilinear modelling based on a common reference position. This further enhances the statistical precision and validity of the segmentation, since some noise-based and other non-significant change types can be ignored. It also reduces the computational complexity of the segmentation work by reducing the dimensionality of the change data needed to be analyzed in the segmentation.
  • the subspace segmentation can be recursively updated in cases when the subspace representation itself is recursively updated, this gives computational advantages.
  • the change information used in the segmentation can be of various kinds, motion information and intensity change information.
  • the invention can be applied to signal streams in general.
  • it can be applied to spatio-temporal segmentation of digital video signals and temporal segmentation of digital sound data
  • Figure 1 illustrates how a frame-size (with nv x nh pixels) motion field in one motion direction (here- DV Rn for the vertical motion for each pixel for moving (warping) image R to approximate image n) can strung out as a one-dimensional vector (with nv * nh elements).
  • Figure 4 illustrates the parameters from Figure 3 pertaining to one single frame.
  • Figure 5 shows the third preferred embodiment, in which motion estimation and segmentation are performed separated.
  • Figure 6 shows the fourth preferred embodiment, in which motion estimation and segmentation are performed simultaneously.
  • the symbol ' * ' is used for multiplication when needed.
  • Boldface uppercase letters are used for representing data matrices, and boldface lowercase letters for data vectors.
  • a motion field describes how the pels in one image, say reference frame R, should be moved in order to approximate another image, say n.
  • Each motion field image e.g. DV Rn can be strung out as a one-dimensional vector d n , with nPel elements, - one element for each pel in the reference image in which the motion information is given, as illustrated in Figure 1.
  • the different motion dimensions may be strung after one another in one and the same vector, which then has a multiple of nPel elements, as illustrated in Figure 2.
  • bilinear modelling is well established as a method for approximating sets of related vectors ( Figure 3).
  • the bilinear factor model can be written as a bilinear matrix product plus a residual (conf. H. Martens & Naes.T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd Chichester UK, which is hereby included by reference): -1-
  • D is the data to be modelled,- it has one row for each frame modelled and one column for each pixel (pel) variable to be modelled simultaneously (e.g. one horizontal and one vertical motion element for each pixel.)
  • the superscript ⁇ means 'transposed'.
  • E represents the Error or unmodelled residual - with the same matrix dimensions D.
  • the loading subspace P ⁇ spanning the most significant row space of D, represents the motion information more or less common to several frames in the sequence.
  • the score vectors t n for frame n 1 ,2,.., estimated for each frame separately or for many frames jointly (see below), serve to convey this common motion information P ⁇ back to each individual frame pair.
  • bilinear modelling e.g. weighted singular value decomposition based on the QR algorithm, with or without adaptive up- and downdating.
  • BLM bilinear modelling
  • PCA principal component analysis
  • the bilinear model may be incrementally updated, as described in the application titled “Method and apparatus for coordination of motion determination” mentioned before
  • the bilinear modelling may be performed after preprocessing consisting of subtracting the mean of each column One may also mean center each row These mean data must be added back upon reconstructions based on the bilinear model More advanced preprocessing may also be used, such as multiplicative scatter correction (MSC) and its extensions, as described by Martens, H. and Naes, T (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference.
  • MSC multiplicative scatter correction
  • Bilinear model parameter estimation methods that involve smoothing of scores and loadings, or modifications of the individual data elements in data matrix D may also be used.
  • the bilinear model is to attain a compact representation of a large amount of input data.
  • the number of 'significant' factors used in the model T * P T should be low,- i.e. the model should be of row rank.
  • This number of significant factors may be estimated in varous ways, e.g. by cross validation or from the -l ⁇ - residuals and leverages after varying number of factors, as described by Martens & Naes 1989, which was mentioned before.
  • Previously defined or estimated loadings may be used as part of the modelling of data matrix D.
  • the scores for these a priori factors are estimated by projecting D on these pseudo-loadings, and the bilinear modelling is performed on the residuals after this projection (weighted regression).
  • the estimation of the scores for one individual frame may be based on linear regression, projecting d Rn on P ⁇ using some weighted or robustly reweighted least squares minimization.
  • it may be based on nonlinear iterative curve fitting by e.g. SIMPLEX optimization (J.A. Nelder and R. Mead, 'A simplex method for function minimization', Computer Journal, vol. 7, p 308-313.)
  • the criterion may be based also on the decoding intensity error arising when using the scores
  • the change information d Rn is represented as e.g. as motion field DA Rn in the reference position, so that it is compatible with the bilinear loadings P also represented in the reference position.
  • the change information may be represented in the position of the pixels in frame n, e g the reverse motion field DA nR , and projected on a compatible version of the loadings P, i.e P temporarily moved to that same position using motion field DA Rn
  • a low-dimensional bilinear model is primarily effective when the motion fields (or other change fields) are stored in D in one given representational system, so that all information about an object is stored in the same pixel positions for -fl ⁇ ail frames.
  • IDLE codec type (according to WO 95/08240 and WO 95/34172), where the motion, intensity changes and other modelled change information for a number of (consecutive) frames is directly or indirectly represented relative to a common 'extended reference image model' for a given class of pixels (spatial 'holon') in a given set of related frames.
  • the whole initial reference image e.g. the first frame in the sequence
  • the main purpose of the spatial segmentation is then to split this initial spatial holon into data structures that each lends itself to simple, low-dimensional mathematical modelling.
  • the motion field estimates can be performed directly from this reference image l R to frame l n , and analyzed directly in D.
  • One advantage of the present invention is hence to use the resulting compact, low-rank summary of several frames' motion fields and other change fields to enhance and stabilize the segmentation in video coding.
  • segmentation may be performed in the temporal domain to find groups of frames where certain spatial patterns are found.
  • the temporal segmentation may then utilize subspaces information derived from bilinear modelling of time-shifted versions of relevant time series (H. Martens & M. Martens (1992) NIR spectroscopy - applied philosophy. Proceedings, 5th Intematl Conf. NIR Spectroscopy (K.I.Hildmm.ed) North Holland; pp 1-10), e.g. time-shifted versions of the static frame scores obtained by bilinear modelling of change fields, as described for equation (1 ).
  • the bilinear summaries of multiple frames' motion fields may be used in several ways in the segmentation.
  • the order in which the frames are modelled is in the preferred embodiments forward and sequential. However, the order may also be selected according to other criteria,- e.g. according to which frame at any given time shows the most need and potential for model refinement.
  • the bilinear model based segmentation may be used pyramidally.
  • One example of this is to perform the segmentation on frames in reduced resolution, in order to identify the major holons in the sequence, and then to use these results as preliminary, tentative input to the same process at higher frame resolution.
  • the motion estimation, bilinear modelling and segmentation may be performed on individual, already identified holons ('input holons'), or on complete, unsegmented images l n . In either case, a multi-holon pre- or post processing of the obtained motion fields, bilinear models and segments is desired in order to resolve overlap between input holons
  • One such pre- or post processing is based on having stored each holon with a 'halo' of neighbour pixels with uncertain holon membership,- i.e that only tentatively can be ascribed to a holon ( and thus is also temporarily stored in other holons or as separate lists of unclear pixels)
  • Such tentative halo pixels are treated specially, e.g by being fitted for all relevant holons, and their memberships to the different holons updated according to the success of the motion estimates
  • Such halo pixels are given very low weight or fitted passively in the bilinear modelling (by Principal Component Regression, Martens, H and Naes, T (1989) Multivariate -IS- Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference).
  • Additional columns in the raw data matrix G may be formed from 'external scores' from other blocks of data.
  • Sources of such 'external scores' are: scores from bilinear modelling of some other data domain,
  • scores from other holons preferably in nonlinear representation (see e.g. A. Gifi: Nonlinear Multi-variate Analysis. J.Wiley & Sons Ltd Chichester 1990), with each quantitative score vector quantized and analyzed as an indicator matrix (p.67) or at the ordinal level (p 187), which is hereby included by reference, or scores from the same holon at a different spatial resolution, scores from external data such as sound
  • weights for such additional variables must be adapted so that their uncertainty level become similar to those of the weighted pixels in the final data matrix to be modelled, D (equations (1 ) and (2)).
  • An alternative way to incorporate uncertain pixels or extenal scores gently, without forcing their information into the bilinear model, is to replace the one-block bilinear modelling with a two- modelling, such as PLS regression (Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK.) or a with multi-block or N-way modelling, such as Parafac (Sharaf, M.A., lllman, D.L. and Kowalski, B.R. Chemometrics, J.Wiley & Sons, New York 1986) or Consensus PCA/PLS (described by Martens, M. and Martens, H. 1986: Partial Least Squares regression.
  • uncertain pixels and external scores contribute positively to the modelling if they fit well, but do not affect the modelling strongly in a negative way if they do not fit. In any way these uncertain pixels and external scores are fitted to the obtained bilinear model.
  • the scores from the present holon's modelling in the present resolution and in the present domain may in turn be used as 'external scores' for other holons or at other resolutions or in other domains.
  • the stabilization of the segmentation by the use of multi-frame summaries may be embodied in different ways.
  • the bilinear segmentation process employs a top-down approach, removing segments from the input holon: Pixel areas in the motion subspace that do not fit to a general holon model are detected as outliers and segmented out from the rest of the input holon
  • the segmentation employs a bottom-up approach, attempting to let stabile seed points grow into homogenous, consistent segments in the input holon
  • the segmentation is separated from the motion estimation and -compensation ( Figure 5), with bilinear modelling of the frames' motion fields and other estimated change data taking place in between
  • motion estimation and actual segmentation are integrated ( Figure 6), followed by the bilinear modelling.
  • the motion estimation, bilinear modelling and segmentation process is done for a whole sequence.
  • the motion estimation, bilinear modelling and segmentation and model is updated gradually for individual frames.
  • the bilinear modelling methods used for segmentation are expanded to include additional criteria than just explained covariance,- in this case spatial and temporal smooting is used as extra criteria. Reweighting of the rows and columns of the input data to the bilinear modelling is also included.
  • the bilinar modelling is combined with optimal scaling so that not only the weights, but also the input data themselves are changed during the model estimation process: As long as the prediction of a data element from a preliminary low-rank bilinear model does not give significantly worse decoding results than the element's input value, its input value is replaced by its bilinear prediction.
  • Figure 5 shows the main building blocks of the bilinear model based segmentation: a motion estimator unit EstMov 520, a bilinear modelling unit EstBLM 540 and a segmentation unit EstSegm 560, and the data streams between them. More details on the data streams will be given in conjunction with the Third Preferred Embodiment.
  • the two first embodiments represent two implementations of the segmentation unit EstSeg 560,- top-down or bottom-up. - ⁇ -
  • the holon input to the EstSeg unit 560 is retained as intact as possible, but if the holon contains parts which display significantly and consistently different behaviour than the rest of the holon, then these parts are split out as separate, new segments.
  • individual pixels, e.g. along holon edges, whose preliminary classification appears questionable, may be pruned away or otherwise identified as unreliable outliers.
  • the method is described for one single frame and for detection of a stiffly moving body.
  • pixels that do not fit well to the spatial model favoured by the majority of the pixels will display significantly large relative residuals R and hence be weighted down so as to reduce their influence on the estimation of coefficients B in the next iteration, in which their residuals will be even larger, etc, so that they have very little influence on the estimation of B the final spatial model upon convergence.
  • Pixels with low final weights are defined as outliers not belonging to the input holon and collected into one new segment.
  • This new outlier segment may be submitted to the same reweighted regression modelling, to check if it should be split further into smaller segments.
  • the resulting segmentation then represents the output result 565.
  • weights of the pixels in (740) effects of neighbouring pixels may also be brought in, to enhance spatial continuity of the holons.
  • the a priori weights (705) may be modified, e.g. by using lower inital weights for pixels already known to be potentially invalid due to occlusions or particularly uncertain due to unsatisfactory bilinear modelling.
  • the uncertainty measure s(pelj) for each individual element (pelj) in Y may have been estimated, and may be used instead of the general uncertainty for each pixel, s(pel), in (745).
  • This individual uncertainty measure may be assymet ⁇ c, so that positive and negative residuals may be assessed differently This is relevant for motion estimates of pixels in flat intensity areas near intensity edges (assymmet ⁇ c slack) The pixel may move far away from the edge without affecting the resulting intensitiy lack-of- fit, but cannot move beyond the edge.
  • the full-rank regression method employed in (710) may be replaced by other estimators, e g reduced-rank regression methods like PLS regression or some extension of this, as described in Martens, H and Naes, T (1989) Multivariate Calibration J Wiley & Sons Ltd, Chichester UK Multi-frame segmentation
  • This basic top-down segmentation method can be used for multi-frame segmentation: Instead of basing the segmentation only on motion fields of the holon for one single frame, using regressands
  • the regressands Y may also be defined as including intensity information, e.g. motion compensated intensity difference images.
  • DI Rn the motion compensated intensity difference between frame n and the common reference frame R, defined for individual color dimensions (e.g. RGB) or as some summary like luminosity. Alternatively, may be used loadings from bilinear intensity factors based on motion compensated intensity differences.
  • Such intensity information may be used together with or separately from the motion informations.
  • the columns Y should be scaled to reflect their relative desired impact on the segmentation, e.g. inversely to their relative average estimated uncertainty variances.
  • the spatial structure model around which the residuals F in (703) are computed may be of another type than the one employed in (702).
  • X may also contain square and cross product terms of the addresses v and h.
  • X may also contain square and cross product terms of the addresses v and h (Lancaster, P. and Salkauskas, K (1986) Curve and survace fitting. Academic Press, p 133, which is hereby incorporated by reference).
  • Splines or piecewise polynomials may also be used (Lancaster & Salkauskas 1986, p 245, which is hereby incorporated by reference)
  • Such higher order models can help distinguish between outlier pixels and pixels taking part in major, smoothly structured motions that are not affine transformations
  • X may also contain a spatially autoregressive part, with spatially shifted versions of Y included in X and where a rank-reducing regression method such as PLS regression is used (H. Martens & M Martens (1992) NIR spectroscopy - applied philosophy Proceedings, 5th Internatl Conf NIR Spectroscopy (K I Hildrum.ed) North Holland, pp 1-10)
  • This spatial autoregressive model part allows the distinction between on one hand, outlier pixels that should be weighted down, and on the other hand, pixels that take part in major, smooth motions which are neither affine -22- transformation structure nor well described by spatial polynomial structure within the holon.
  • Additional information may be introduced in order to optimize the precise positioning of the segment borders.
  • One such source of information is intensity edges in the reference image l R , as detected e.g. by a Sobel filter (J.C. Russ: The Image Processing Handbook, 2nd ed., IEEE Press 1995, which is incorporated herein by reference). If the relative weights W 740 after spatial modelling of Y indicates a segment border close to such an intensity edge, then the segment border is moved to this intensity edge.
  • More advanced statistical methods may also be used for determining the segment edges.
  • One example of this was described by (Kok, F. Lai, 'Deformable contours: Modelling, extraction, detection and classification'. PhD thesis, University of Wisconsin-Madison 1995, which is incorporated herein by reference); for the present application the input information may be intensity l R , intensity residuals DI Rn (or BLM summaries of these), spatial residuals F 720, R 730 or model weights W 740, and/or the Y data themselves.
  • the Second Preferred Embodiment represents a bottom-up approach to segmentation of the input holon. It consists of performing a cluster analysis of the multi- frame motion data or their bilinear summaries.
  • clustering techniques may be used for finding groups of pels.
  • the choice of clustering criterion and clustering algorithm defines the statistical properties of the clusters. For instance, there is choice between performing the cluster analysis for each motion direction (Vertical, Horizontal. Depth) separately, or analyse the directions jointly. The latter is the preferred implementation (but depth dimension may be lacking, at least initially in an encoding process).
  • Two main groups of clustering techniques can be used: Cluster analyses which do not make spatial assumptions with respect to parameter smoothness or neighbourship in the image plane, and cluster analyses which do.
  • the goal is now to find clusters of pixels which display motion patterns that are similar at least in some of the factor dimensions in P,- i.e. pixels that display similar motion patterns, at least in some significant dimensions
  • Spatiotemporal distances can be computed on the basis of common or weighted Pythagorean distance measures, as well as normalized (Mahalanobis) distances.
  • One approach is standard non-hierarchical cluster analytic techniques (Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. Academic Press, Inc., New York., Gudersen, Bob (1983) An adaptive fcv cluster algorithm.
  • the cluster analysis searches for spatially related pixels with similar motion patterns.
  • Boyer et al 1994 published a method of image segmentation which makes extensive - but not exclusive - use of spatial continuity (Boyer, K.L., Mirza, M.J. and Ganguly, G. (1994)
  • the robust sequential estimator A general approach and its application to surface organization in range data. IEEE Transactions on pattern analysis and machine intelligence 16,10 oct 1994 pp 987 - 1001 , which is hereby included by reference).
  • An embodiment of the present invention is to extend their method from one frame (a single radar image Z ), measured in one dimension (distance) to employ information from several frames and measurements in several dimensions (Vertical motion, horizontal motion and other possible characteristics, see below).
  • Y is defined according to (701 , 760 or 770) as motion information from several frames
  • intensity information (775) may be included as well, as described for the third preferred embodiment
  • top-down and bottom-up approach to the multiframe segmentation may be combined.
  • One example of this is First, perform top- down segmentation analysis of the input holon in order to identify outlier regions that do not fit the majority or dominant structure within the holon.
  • the next two preferred embodiments distinguishes between two ways of combining motion estimation and segmentation.
  • the resulting motion estimate DA Rn 525 is passed to the bilinear modelling unit EstBLM 540.
  • the resulting bilinear model parameters 545 are passed to the segmentation unit EstSegm 560, which yields segmentation results 565.
  • the EstMov operator 520 may contain facilities for detecting internal preliminary segmentation indicators such as edges in l R or l n as well as analysis of depth and spatial innovation; these may be used in order enhance the motion estimate DA Rn 525 (e.g. by not blurring the motion fields across apparent preliminary segment borders), and are passed along with the motion estimate to the other units.
  • the bilinear model parameters 545 primarily consists of parameters modelling motion data DA Rn and their uncertainties, but may optionally also include parameters concerning motion compensated intensity changes DI Rn etc.
  • the motion estimator EstMov 520 employs previously established, preliminary segment information 521 in order to optimize handling of edges, occlusions and depth: Motion fields are not to be smoothed across reliable preliminary segment borders.
  • EstMov 520 also uses previously established bilinar modelling results 522 in order to stabilize the motion estimation against underdeterminedness ambiguity and noise sensitivity. These preliminary information 521 and 522 have been obtained, in units EstBLM 540 and EstSeg 560, respectively, for previous frames, other pyramidal frame resolutions or previous iterations.
  • bilinear modelling unit EstBLM 540 bilinear models are developed separately for each preliminary segment (holon), based on preliminary segment information 521.
  • information from other related holons, and from pixels whose holon membership is unclear, may be treated so as not to affect the bilinear modelling detrimentally (as extra X-variables with low weight, in pea-like bilinear modelling of a single block of variables X, or as Y-variables in PLS2- or PCR-like bilinear modelling of two blocks, X and Y.)
  • Motion estimation, depth assessment and segmentation are interdependent processes that optimally should be treated in an integrated way.
  • the motion estimation and the segmentation operators were coordinated through feedback of preliminary bilinear results 521 ,522
  • the operators are fully integrated
  • the bilinear estimation may be done with less computer power in this case, since it operates on more fully isolated segments individually
  • input data 605 and preliminary segmentation and bilinear modelling results 623 are input to EstMov/EstSeg 620, which delivers motion fields DA Rn and its estimated uncertainties, occlusions etc 625 and segment information 665 about the holons foun.
  • EstBLM 640 a bilinear model is then updated for each holon separately.
  • inter-holon relations and pixel with unclear holon classification may tentatively be weighted down or defined as Y- variables, as described in the third preferred embodiment.
  • motion estimation and segmentation may also be seen as integrated parts of the bilinear model estimation.
  • EstBLM 540, 640 the estimation process does not have to be run till convergence or full stability, like in conventional singular value decomposition or eigenvalue decomposition. It is sufficient that the subspace obtained for each holon 545 improves the next round of motion estimation and segmentation.
  • the modification of the input data to EstBLM through improved motion estimation and segmentation may be treated as part of the bilinear estimation process.
  • the whole sequence is subjected to motion estimation; then these motion estimates for the whole sequence are submitted to bilinear modelling.
  • the bilinear model or models of the holons in the sequence is used for segmentation.
  • the bilinear model results 522 and segmentation 521 results from previous iterations (or pyramidal resolution levels) are fed back into the motion estimation 520 and bilinear modelling 540 in order to stabilize and facilitate these estimation processes
  • the modelling of the whole sequence may then be repeated, using the updated bilinear motion and intensity change models and the updated segmentation. - o -
  • the segmentation 565 may likewise be updated after each frame. In the preferred embodiment major re-segmentation is only allowed when the motion data clearly shows the need for it, except for pruning processes along holon edges.
  • the order in which the individual frames are pulled into the modelling and segmentation may not be fixed. Once all the frames have been modelled and segmented, the whole process may be started again for the sequence, now with better start values for the bilinear model and segmentation.
  • Estimated uncertainties in the estimated segment borders are stored with the segment border information itself and used in subsequent encoding steps. Pixels with unclear segment classification, e.g. in a region around the chosen segmentation border, are treated as having high uncertainty. In subsequent motion estimation and bilinear modelling the uncertainty pixels are given low weights or passively fitted by principal component regression (Martens & Naes 1989), mentioned before. In subsequent segmentations, uncertain pixels are included in the new segmentation estimation, but are given low a priori input weights (705)
  • each new iteration is defined as follows:
  • the smoothing method may be a simple boxcar filter, or a method that seeks to avoid smoothing across apparent edges that should be left unsmoothed.
  • the smoothed loading p a is orthogonalized with respect to the loadings of previously estimated factors [p 1 ,p 2 ,...,p a - ⁇ ].
  • a further enhancement of the bilinear modelling in this preferred embodiment is to use iterative reweighted least squares fit of the bilinear model to the data, in order -12- to reduce the effect of outlier frames or outlier pixels:
  • the weights for rows, f r, m . s and columns V P . ⁇ s in eq.(3) may iteratively be updated according to the inverse of updated estimated of average uncertainty standard deviation for the rows and columns, based on corrected residuals from the low-rank bilinear model in the previous iteration.
  • the bilinear modelling 540, 640 not only the bilinear model parameters may be changed to attain better modelling, but also the values in the input data, e.g. DA Rn , may be changed in the bilinear model parameter estimation process.
  • the individual elements in the motion data da n , pe ⁇ for frames and pixels may iteratively be modified to adhere more closely to the model derived from other frames or pixels:
  • segmentation / clustering techniques can be used for determining groups of frames (sequences) which are sensible to analyze together - as well as detecting scene shifts.
  • One embodiment of this is to perform simple bilinear modelling of (possibly subsampled) frame intensities, and perform non-hierarchical cluster analysis in the score space T to find clusters of frames that has much image material in common.
  • a robust single linkage cluster analysis is the preferred cluster method in this embodiment, in order to be able to follow motions within a scene within one single cluster.
  • Yet another embodiment is to apply the above principles to time series data, e.g. sound data, in order to define temporal segments.
  • the spatial motion field data correspond to time warp estimates
  • the spatial intensity change data correspond to temporal intensity change data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present invention concerns methods for attaining grouping or segmentation in large signal streams by development and analysis of a manifold or subspace of parameters. A method for segmenting samples in frames of an input signal comprises the steps of: (1) forming a reference image, consisting of samples from a plurality of said frames, (2) estimating motion from said reference image to each of said frames, (3) reformatting the estimated motion into row vectors, (4) collecting said row vectors into a motion matrix, (5) performing a Principal Component Analysis on the motion matrix, thereby obtaining a score matrix consisting of a plurality of column vectors called score vectors and a loading matrix consisting of a plurality of row vectors called loading vectors, such that each score vector corresponds to one element for each frame, such that each element of each loading vector corresponds to one element of the reference image, such that one column of said score matrix and one loading vector together constitute a factor, and such that the number of factors is lower than or equal to the number of said frames, (6) reformatting each loading back into same format as used for the motion, (7) performing segmenting based on the plurality of reformatted loadings.

Description

Method and apparatus for multi-frame based segmentation of data streams
Related Applications
This application is related to the following applications assigned to the same applicant as the present invention and filed on even date therewith, the disclosure of which is hereby incorporated by reference:
- Method and apparatus for coordination of motion determination over multiple frames (Attorney file: IDT 011 WO)
- Method and apparatus for depth modelling and providing depth information of moving objects (Attorney file IDT 015 WO)
Field of invention
The present invention concerns methods for attaining grouping or segmentation in large signal streams by development and analysis of a manifold or subspace of parameters. In particular, it concerns methods for spatiotemporal segmentation of video signals.
Background
Mathematical parameterization of massive signal streams such as video or sound signals presents both statistical estimation problems and computational capacity problems. Segmentation of the signal streams can reduce both of these problems:
Firstly, by splitting the signal streams into two or more sub-groups or segments of particularly interrelated signals, the subsequent mathematical description of the data may require fewer independent parameters. This simplifies the statistical modelling. Secondly, by being more compact, each of the segment models may also be easier to control and interpret, e.g. for editing purposes.
Thirdly, having segmented the signal streams, subsequent computational treatments of the invidividual segments may be computationally simpler than having to treat the full signal streams, for instance in reducing the amount of high-speed memory required for effective computation.
In order to attain these statistical and computational benefits from having segmented the data stream, the segmentation process itself must also be statistically and computationally effective. The present invention concerns how to attain relevant and reliable segmentation.
While the present invention can also be applied to other types of signals such as sound signals, multi-frame digital video data is a primary application and will be used as the example for illustration.
The use of segments in video coding
In model based video coding, image segmentation is important: groups of pixels (pels) that display consistently related spatial patterns of change over groups of frames should be modelled together, as this gives best compression, editability and interpretability.
A segment may correspond to one physical object, but it may also correspond to only a part of a physical object, or to a set of several such objects. It may also correspond to non-tangible objects or phenomena such as shadows.
In statistically oriented model based video coding the optimal definition of a segment (a 'holon') differs with the purpose of the coding: For pure compression purposes the segments ideally correspond to the groups of pels that are most effectively compressed, but if the purpose is coding for later video manipulation such -J- as editing or video games, then the segments ideally corresponds more to physical objects.
The segmentation process must be robust, i.e. it must give visibly acceptable, statistically useful segments with applicability over many related image frames. Yet, it must be computationally feasible with respect to cpu time and memory requirements.
Some existing segmentation methods are described in the following references:
Boyer, K.L., Mirza, M.J. and Ganguly, G. (1994) The robust sequential estimator: A general approach and its application to surface organization in range data. IEEE Transactions on pattern analysis ad machince intelligence 16,10 Oct. 1994, pp. 987-1001 ,
Gϋnsel, B. and Panayirci, E. (1994) Segmentation of range and intensity images using multiscale Markov random field representation. Proceedings, IEEE Intl. Conf. on Image Proα, Austin Texas, Nov. 13-16 1994, Vol. II, p. 187-191 , IEEE Computer Soc. Press Los Alamitos, CA, U.S.A.,
Dellepiane, S., Fontanta, F. and Vernazza, G. (1994) A robust non-iterative method for image labelling using context. Proceedings, IEEE Intl. Conf. on Image Proα, Austin Texas, Nov. 13-16 1994, Vol. II, IEEE, p. 207-211 , Computer Soc. Press Los Alamitos, CA, U.S.A.,
and
Russ, J.C. (1995) The Image Processing Handbook, 2nd ed., IEEE Press/CRCR Press, London, p. 347-401 , which are hereby included by reference.
The segmentation methods for video coding mainly fall into two main categories: Still image segmentation and motion based segmentation. Still image segmentation is based on defining spatial intensity patterns in an individual image. A drawback with this type of segmentation is that it is difficult to distinguish between spatial contours inside and along the edges of objects.
Motion based segmentation concerns how the image intensities vary between images. In automatic video coding the segmentation is often based on the latter, and is attained by analysis of estimates of motion fields. One established approach to segmentation is to estimate the motion field between two frames, say from a reference frame R and another frame n (here termed 'Difference in Address', DARn), and to search for groups of pels in DA with similar motions. In addition, a premium is put on pels that lie physically close to each other in at least one of the images. DARn may have one, two or more motion dimensions.
Motion based segmentation can be generalized to change-based segmentation, where the changes could also include 'Difference in Intensity', i.e. intensity changes between pairs of frames, DIRn, e.g. motion compensated and at different color channels.
In situations where the segments are going to be used for many frames it is unsatisfactory to base the segmentation on information from only one single frame or one single frame pair, due to the phenomenon of statistical overfitting, and because the chosen frames may not be sufficiently representative for the rest of the frames in question. While the segmentation may appear quite good for the actual frame or frame pair used in the segmentation, the obtained segments may represent very bad grouping of other frames.
To find statistically valid segments that have bearing for many frames, it is necessary to search a good many of these frames,- say. 5-50,- for statistically related clusters of pixels(pels). This has drawbacks To perform the segmentation on each of the frames separately requires the different frames' segmentation results to be coordinated afterwards. Each frame's segmentation is sensitive to noise in this frame's input data. Also, to store the motion fields for many individual frames may require much -r- memory, and to perform the segmentation analysis for all these motion fields simultaneously may be computationally expensive.
Objects of the invention
It is an object of the present invention to facilitate the finding of grouping of signals in a signal stream such that the groups or segments have high statistical robustness and high validity for a number of frames of signal.
It is a further object to perform the multi-frame segmentation in a computationally effective way that keeps down the amount of data needed in the segmentation.
It is a further object to ensure that the segmentation can be recursively up- and downdated.
It is also an objective to be able to use different types of information in the segmentation - both temporal motion and intensity change information as well as spatial continuity and discontinuity information.
It is further an object of the present invention to estimate segment information facilitate subsequent motion estimation and intensity change estimation and the bilinear modelling of these.
It is a further object of the invention to allow segments to be partially overlapping.
It is a further object of the invention to allow segments to be partially transparent.
It is also an object to define the segments so that they have an adequate balance between internal systematic similarity and rigidity on one hand (for statistical stabilization) and internal heterogeneity and flexibility on the other (for realistic description of input data).
Summary of the invention
In the present invention, segmentation is based on change information for a number of relevant frames, not only on changes between two frames. Thereby the segments obtained are statistically more stabile and have a higher validity.
The way the many change data are represented in the segmentation computations is preferably via common manifolds or subspace models, primarily by bilinear modelling based on a common reference position. This further enhances the statistical precision and validity of the segmentation, since some noise-based and other non-significant change types can be ignored. It also reduces the computational complexity of the segmentation work by reducing the dimensionality of the change data needed to be analyzed in the segmentation.
The subspace segmentation can be recursively updated in cases when the subspace representation itself is recursively updated, this gives computational advantages.
The change information used in the segmentation can be of various kinds, motion information and intensity change information.
The invention can be applied to signal streams in general. In particular it can be applied to spatio-temporal segmentation of digital video signals and temporal segmentation of digital sound data
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 illustrates how a frame-size (with nv x nh pixels) motion field in one motion direction (here- DVRn for the vertical motion for each pixel for moving (warping) image R to approximate image n) can strung out as a one-dimensional vector (with nv*nh elements).
Figure 2 illustrates how two frame-size ( nv x nh pixels each) motion fields DARn=[DVRn and DHRn] for the vertical and horizonal directions) can be strung out together as a one-dimensional vector (with 2 *nv*nh elements), for the case when both motion directions are modelled simultaneously. More dimensions, e.g. depth changes, may similarly be included in DARn.
Figure 3 is an illustration of how a matrix D (e.g. motion fields DARn for many frames n=1 ,2,...) can be modelled by the bilinear product of two lower-rank matrices T*PT plus a residual matrix E.
Figure 4 illustrates the parameters from Figure 3 pertaining to one single frame.
Figure 5 shows the third preferred embodiment, in which motion estimation and segmentation are performed separated.
Figure 6 shows the fourth preferred embodiment, in which motion estimation and segmentation are performed simultaneously.
DESCRIPTION
Notation and definitions
In the following, the symbol ' * ' is used for multiplication when needed. The symbol ' x ' is used for representing dimensions of a matrix (e.g. Size = nRows x nCokimns). Boldface uppercase letters are used for representing data matrices, and boldface lowercase letters for data vectors.
Extracting bi-linear summaries of many motion fields Some background information for the present invention is given in Patent applications WO 95/08240 and WO 95/34172. Additional information on coordination between multiframe segmentation, motion estimation and bilinear modelling is given in the applicaton "Method and apparatus for coordination of motion determination over multiple frames", mentioned before. Information about estimation of depth between segment occlusion over several frames is given in the aforementioned application "Method and apparatus for depth modelling and providing depth information of moving objects".
A motion field describes how the pels in one image, say reference frame R, should be moved in order to approximate another image, say n. Such a motion field can be seen as an 'image' in its own right, with values for each motion dimension,- e.g. one image DHRn for horizontal motion (zero values = no horizontal motion, negative values = left motion, positive values = right motion) and one image DVRn for vertical motion (zero values = no vertical motion, negative values = upward motion, positive values = downward motion).
Each motion field image, e.g. DVRn can be strung out as a one-dimensional vector dn, with nPel elements, - one element for each pel in the reference image in which the motion information is given, as illustrated in Figure 1.
The different motion dimensions may be strung after one another in one and the same vector, which then has a multiple of nPel elements, as illustrated in Figure 2.
When a set of such motion field vectors have been estimated, for several frames dn n=1 ,2,...,nFrames, they can be analyzed together as a matrix D.
Bilinear modelling (BLM) is well established as a method for approximating sets of related vectors (Figure 3). The bilinear factor model can be written as a bilinear matrix product plus a residual (conf. H. Martens & Naes.T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd Chichester UK, which is hereby included by reference): -1-
D = T * PT + E (1 )
where
D is the data to be modelled,- it has one row for each frame modelled and one column for each pixel (pel) variable to be modelled simultaneously (e.g. one horizontal and one vertical motion element for each pixel.)
T is the so-called Temporal scores for the bilinear factors,- it has one row for each frame modelled and one column for each bilinear factor modelled, f=1 ,2,...,nf.
Pτ is the so-called spatial loadings for the bilinear facors,- it has one column for each pel variable to be modelled simultanelusly and one row for each bilinear factor model f=1 ,2 nf. The superscript τ means 'transposed'.
E represents the Error or unmodelled residual - with the same matrix dimensions D.
For the motion field between frame R and a given frame n, the bilinear model (Figure 4) is written:
dr= tn * Pτ + en (2)
When the motion fields DARn,n=1 ,2,..., or modifications of these, from a set or subset of frames are defined as data D, the loading subspace Pτ, spanning the most significant row space of D, represents the motion information more or less common to several frames in the sequence. The score vectors tn for frame n=1 ,2,.., estimated for each frame separately or for many frames jointly (see below), serve to convey this common motion information Pτ back to each individual frame pair.
A number of different methods may be used for bilinear model extraction of T*PT from D in the present context of multi-frame segmentation, e.g. weighted singular value decomposition based on the QR algorithm, with or without adaptive up- and downdating. Collectively, they are hereafter referred to as bilinear modelling (BLM) or principal component analysis (PCA).
Details on bilinear modelling methods are given by:
Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK, by Martens, M. and Martens, H. 1986: Partial Least Squares regression. In: Statistical Procedures in Food Research (J.R. Piggott, ed.) Elsevier Applied Sciences London p. 293-360, by Jackson, J.E. (1991 ) A User's guide to principal components. J.Wiley & Sons, Inc. New York, by Jolliffe, IT. (1986) Principal Component Analysis. Springer Series in Statistics, Springer-Verlag New York, by Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. Academic Press, Inc., New York, by Sharaf, M.A., lllman, D.L. and Kowalski, B R. Chemometrics, J.Wiley & Sons, New York 1986 and by Kung, S.Y., Diamantaras, K.I. and Tauer, J.S. (1991 ) Neural Networks for extracting pure/constrainted/oriented principal components. In: R.Vaccaro (ed): SVD and signal processing II. Elsevier Science Publishers 1991 , pp 57-81. These references are hereby included by reference
It is important to note that the bilinear modelling does not have to fully converge or to be optimized with respect to orthogonality, separation of eigenvalues etc. for the present purpose; the important thing is to find an adequate subspace basis for the approximation of the data D.
The bilinear model may be incrementally updated, as described in the application titled "Method and apparatus for coordination of motion determination" mentioned before
The bilinear modelling may be performed after preprocessing consisting of subtracting the mean of each column One may also mean center each row These mean data must be added back upon reconstructions based on the bilinear model More advanced preprocessing may also be used, such as multiplicative scatter correction (MSC) and its extensions, as described by Martens, H. and Naes, T (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference. Bilinear model parameter estimation methods that involve smoothing of scores and loadings, or modifications of the individual data elements in data matrix D may also be used.
If information is available on the relative reliability or validity of the various pels in the various frames, this information may be used to weigh the relative importance of the different input data: The bilinear extraction of factors may then be performed on weighted data: Let G= the motion fields DA, n=1 ,2,..., (possibly after column centering) or modifications of these, from a set or subset of frame pairs, and let
D = Vframβs *G*Vpβ,s (3)
where Vframes = weight matrix of the frames, e.g. diag(1/sn,n=1 ,2,...), and sn = estimate of uncertainty standard deviation of frame n Vpβis = weight matrix of the pixels, e.g. diag(1/sι,, pel=1,2,...), and sι = estimate of uncertainty standard deviation of pixel pel.
In this way, pixels (columns in G) with high uncertainty may be weighted down, but still modelled together with the other, more certain pixels. Alternatively, such separation of certain and uncertain pixels may be attained by two-block instead of one- block BLM. Uncertain pixels are kept out from the bilinear modelling of the more certain pixels. The uncertain pixels' loadings are estimated by regression on the certain pixels' scores, as described for principal component regression (PCR) and partial least squares regression (PLSR) by Martens & Naes 1989 mentioned before. This is applicable also in re-allocating pixels from one segment to another, where the re¬ allocated pixels' loadings need to be estimated relative to their new segment allocation.
One main purpose of the bilinear model is to attain a compact representation of a large amount of input data. To attain this, the number of 'significant' factors used in the model T*PT should be low,- i.e. the model should be of row rank. This number of significant factors may be estimated in varous ways, e.g. by cross validation or from the -lλ- residuals and leverages after varying number of factors, as described by Martens & Naes 1989, which was mentioned before.
Previously defined or estimated loadings ('pseudo-loadings') may be used as part of the modelling of data matrix D. In that case, the scores for these a priori factors are estimated by projecting D on these pseudo-loadings, and the bilinear modelling is performed on the residuals after this projection (weighted regression).
The estimation of the scores for one individual frame may be based on linear regression, projecting dRn on Pτ using some weighted or robustly reweighted least squares minimization. Alternatively, it may be based on nonlinear iterative curve fitting by e.g. SIMPLEX optimization (J.A. Nelder and R. Mead, 'A simplex method for function minimization', Computer Journal, vol. 7, p 308-313.) In that case the criterion may be based also on the decoding intensity error arising when using the scores
As described here, the change information dRn is represented as e.g. as motion field DARn in the reference position, so that it is compatible with the bilinear loadings P also represented in the reference position. Alternatively, the change information may be represented in the position of the pixels in frame n, e g the reverse motion field DAnR, and projected on a compatible version of the loadings P, i.e P temporarily moved to that same position using motion field DARn
Representing the spatial information from many frames in one common image position
The effectiveness of the bilinear model of motion is dependent on how the motion fields are represented When a rigid body moves (translation, rotation, scaling) in 3D space before a camera, the corresponding motion field can be described by a low-dimensional bilinear model Systematic movements of non-rigid body (e g a face breaking into a smile) can also be approximated by the bilinear model
However, a low-dimensional bilinear model is primarily effective when the motion fields (or other change fields) are stored in D in one given representational system, so that all information about an object is stored in the same pixel positions for -fl¬ ail frames. This can be attained by letting the motions of each of the frames in a set of related frames be related to one given 'reference image' R, and stored in this reference image's coordinate system. This reference image could e.g. be the first, middle or last image in the sequence n=1 ,2,...,N, or some synthetic image model with parts from several frames.
One example of this is the IDLE codec type (according to WO 95/08240 and WO 95/34172), where the motion, intensity changes and other modelled change information for a number of (consecutive) frames is directly or indirectly represented relative to a common 'extended reference image model' for a given class of pixels (spatial 'holon') in a given set of related frames. Before the segmentation has started, the whole initial reference image (e.g. the first frame in the sequence) is regarded as one holon. The main purpose of the spatial segmentation is then to split this initial spatial holon into data structures that each lends itself to simple, low-dimensional mathematical modelling.
The motion field estimates can be performed directly from this reference image lR to frame ln, and analyzed directly in D. Alternatively the motion fields may be estimated from a moved (warped) version of this reference image: lm = Move(lR, by DARm) to ln,- the local motion field DAmn estimated and then moved back to the reference position e.g. by the inverse of the motion field that generated lm, yelding the motion estimate DARn = DARm + Move(DAmn by DAmR)
One advantage of the present invention is hence to use the resulting compact, low-rank summary of several frames' motion fields and other change fields to enhance and stabilize the segmentation in video coding.
Similarly, segmentation may be performed in the temporal domain to find groups of frames where certain spatial patterns are found. The temporal segmentation may then utilize subspaces information derived from bilinear modelling of time-shifted versions of relevant time series (H. Martens & M. Martens (1992) NIR spectroscopy - applied philosophy. Proceedings, 5th Intematl Conf. NIR Spectroscopy (K.I.Hildmm.ed) North Holland; pp 1-10), e.g. time-shifted versions of the static frame scores obtained by bilinear modelling of change fields, as described for equation (1 ).
Applications of multi-frame motion-based segmentation
The bilinear summaries of multiple frames' motion fields may be used in several ways in the segmentation.
The order in which the frames are modelled, is in the preferred embodiments forward and sequential. However, the order may also be selected according to other criteria,- e.g. according to which frame at any given time shows the most need and potential for model refinement.
The bilinear model based segmentation may be used pyramidally. One example of this is to perform the segmentation on frames in reduced resolution, in order to identify the major holons in the sequence, and then to use these results as preliminary, tentative input to the same process at higher frame resolution.
In the preferred embodiments, the motion estimation, bilinear modelling and segmentation may be performed on individual, already identified holons ('input holons'), or on complete, unsegmented images ln. In either case, a multi-holon pre- or post processing of the obtained motion fields, bilinear models and segments is desired in order to resolve overlap between input holons
One such pre- or post processing is based on having stored each holon with a 'halo' of neighbour pixels with uncertain holon membership,- i.e that only tentatively can be ascribed to a holon ( and thus is also temporarily stored in other holons or as separate lists of unclear pixels) In the motion estimation such tentative halo pixels are treated specially, e.g by being fitted for all relevant holons, and their memberships to the different holons updated according to the success of the motion estimates Such halo pixels are given very low weight or fitted passively in the bilinear modelling (by Principal Component Regression, Martens, H and Naes, T (1989) Multivariate -IS- Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference).
Extra variables
Additional columns in the raw data matrix G (equation (3)) may be formed from 'external scores' from other blocks of data. Sources of such 'external scores' are: scores from bilinear modelling of some other data domain,
(e.g. motion compensated intensity residuals of the same holon), scores from other holons, preferably in nonlinear representation (see e.g. A. Gifi: Nonlinear Multi-variate Analysis. J.Wiley & Sons Ltd Chichester 1990), with each quantitative score vector quantized and analyzed as an indicator matrix (p.67) or at the ordinal level (p 187), which is hereby included by reference, or scores from the same holon at a different spatial resolution, scores from external data such as sound
(e.g. after bilinear modelling of sound vibration energy spectra of these same frames).
The weights for such additional variables must be adapted so that their uncertainty level become similar to those of the weighted pixels in the final data matrix to be modelled, D (equations (1 ) and (2)).
An alternative way to incorporate uncertain pixels or extenal scores gently, without forcing their information into the bilinear model, is to replace the one-block bilinear modelling with a two- modelling, such as PLS regression (Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK.) or a with multi-block or N-way modelling, such as Parafac (Sharaf, M.A., lllman, D.L. and Kowalski, B.R. Chemometrics, J.Wiley & Sons, New York 1986) or Consensus PCA/PLS (described by Martens, M. and Martens, H. 1986: Partial Least Squares regression. In: Statistical Procedures in Food Research (J.R. Piggott, ed.) Elsevier Applied Sciences London p. 293-360, and by Geladi, P., Martens, H.. Martens, M., Kalvenes, S. and Esbenseπ, K. (1988) Multivariate Comparison of Laboratory Results. -IL-
Proceedings, Symp. Applied Statistics, Copenhagen Jan 25-27 1988 (Per Thorbδll, ed.) Uni.C, Copenhagen pp 16-30. These references are hereby included by reference.)
In this way, the uncertain pixels and external scores contribute positively to the modelling if they fit well, but do not affect the modelling strongly in a negative way if they do not fit. In any way these uncertain pixels and external scores are fitted to the obtained bilinear model.
The scores from the present holon's modelling in the present resolution and in the present domain may in turn be used as 'external scores' for other holons or at other resolutions or in other domains.
Preferred Embodiments
The stabilization of the segmentation by the use of multi-frame summaries may be embodied in different ways.
In the first preferred embodiment, the bilinear segmentation process employs a top-down approach, removing segments from the input holon: Pixel areas in the motion subspace that do not fit to a general holon model are detected as outliers and segmented out from the rest of the input holon
In the second preferred embodiment, the segmentation employs a bottom-up approach, attempting to let stabile seed points grow into homogenous, consistent segments in the input holon
In the third preferred embodiment, the segmentation is separated from the motion estimation and -compensation (Figure 5), with bilinear modelling of the frames' motion fields and other estimated change data taking place in between
In the fourth preferred embodiment, motion estimation and actual segmentation are integrated (Figure 6), followed by the bilinear modelling. In the fifth preferred embodiment, the motion estimation, bilinear modelling and segmentation process is done for a whole sequence.
In the sixth preferred embodiment, the motion estimation, bilinear modelling and segmentation and model is updated gradually for individual frames.
In the seventh preferred embodiment, the bilinear modelling methods used for segmentation are expanded to include additional criteria than just explained covariance,- in this case spatial and temporal smooting is used as extra criteria. Reweighting of the rows and columns of the input data to the bilinear modelling is also included.
In the eigth preferred embodiment, the bilinar modelling is combined with optimal scaling so that not only the weights, but also the input data themselves are changed during the model estimation process: As long as the prediction of a data element from a preliminary low-rank bilinear model does not give significantly worse decoding results than the element's input value, its input value is replaced by its bilinear prediction.
Preferred embodiments
First Preferred Embodiment: Segmentation based on outlier analysis around spatial model
Figure 5 shows the main building blocks of the bilinear model based segmentation: a motion estimator unit EstMov 520, a bilinear modelling unit EstBLM 540 and a segmentation unit EstSegm 560, and the data streams between them. More details on the data streams will be given in conjunction with the Third Preferred Embodiment.
The two first embodiments represent two implementations of the segmentation unit EstSeg 560,- top-down or bottom-up. - ι-
In the first preferred embodiment, the holon input to the EstSeg unit 560 is retained as intact as possible, but if the holon contains parts which display significantly and consistently different behaviour than the rest of the holon, then these parts are split out as separate, new segments. In addition, individual pixels, e.g. along holon edges, whose preliminary classification appears questionable, may be pruned away or otherwise identified as unreliable outliers.
A method to attain this top-down segmentation of a holon is described in the following pseudo-code with line-numbering:
Single-frame segmentation
First, the method is described for one single frame and for detection of a stiffly moving body.
Make a spatial model of each such potential segment, using reweighted linear least squares spatial modelling:
Let the vertical and the horizontal motion estimates for one given frame n relative to a reference frame R be treated as regressands Y in the spatial modelling:
Y=[DVRn DHRn] (701 ).
Let regressors X = [ 1 v h ] (702). where v is the column of vertical addresses for the pixels and h hteir horizontal addresses.
A model for motion by affine transformations is then:
Y = XB + F , (703 )
Estimate the 3 x 2 regression coefficient matrix B by reweighted least squares regression: Estimate the uncertainty standard deviation for every pixel (row) s=[sι, pel=1 ,2 nPels] (704)
Define matrix of initial pixel weights W, e.g. indicated for all pixels: W=diag(1 , 1 ,1 ,1 1nPβis) (705)
While reweighing process not converged (706)
B= (XTWX)'1XTWY (Regression coefficient estimate) (710)
F= Y -XB (Residual) (720)
R= f(F,S) (Residual residual, relative to noise level matrix s) (730) where each pixel's residual f(pelj) for each column in Y, j=1,2,... is divided by the pixel's uncertainty standard deviation s(pel): r(pel,j)=f(pel,j)/s(pel) (735)
W=f(R) (updated weights for the pixels) (740) e.g. a function of the relative residuals accumulated over all Y-variables j=1 ,2,...: w(pel.pel) = c/(c+ r(pel,1 )2 + r(pel,2)2 + ...) (745) where the sensitivity coefficient c= e.g. 1.0.
Check convergence : e.g. is B stabile? (750) end while reweighing process converged (750)
Other estimates than just the sum of the relative residuals r(pel,1 )2 + r(pel,2)2 + ... may be used, e.g. median or some other robust distance measure.
In this process, pixels that do not fit well to the spatial model favoured by the majority of the pixels, will display significantly large relative residuals R and hence be weighted down so as to reduce their influence on the estimation of coefficients B in the next iteration, in which their residuals will be even larger, etc, so that they have very little influence on the estimation of B the final spatial model upon convergence.
Pixels with low final weights (e.g. with w(pel.pel) < 0.1 ) are defined as outliers not belonging to the input holon and collected into one new segment. This new outlier segment may be submitted to the same reweighted regression modelling, to check if it should be split further into smaller segments. The resulting segmentation then represents the output result 565.
In redefining the weights of the pixels in (740), effects of neighbouring pixels may also be brought in, to enhance spatial continuity of the holons. The a priori weights (705) may be modified, e.g. by using lower inital weights for pixels already known to be potentially invalid due to occlusions or particularly uncertain due to unsatisfactory bilinear modelling.
The uncertainty measure s(pelj) for each individual element (pelj) in Y may have been estimated, and may be used instead of the general uncertainty for each pixel, s(pel), in (745). This individual uncertainty measure may be assymetπc, so that positive and negative residuals may be assessed differently This is relevant for motion estimates of pixels in flat intensity areas near intensity edges (assymmetπc slack) The pixel may move far away from the edge without affecting the resulting intensitiy lack-of- fit, but cannot move beyond the edge.
The full-rank regression method employed in (710) may be replaced by other estimators, e g reduced-rank regression methods like PLS regression or some extension of this, as described in Martens, H and Naes, T (1989) Multivariate Calibration J Wiley & Sons Ltd, Chichester UK Multi-frame segmentation
This basic top-down segmentation method can be used for multi-frame segmentation: Instead of basing the segmentation only on motion fields of the holon for one single frame, using regressands
Y=[ DVRn DHRn],
it may be based on the motion fields of several frames:
Y=[ DVR1 DHR1, DVR2 DHR2 DVRn DHRn ] (760)
In this First Preferred Embodiment it is based on the scaled loading space that spans these motion patterns of the holon in several modelled frames:
Y =[ pv PH] = [pVR1 pVR2 pVRJ , pHR1 pHR2, ... , pHRK] (770)
where the number of bilinear factors for the vertical and horizontal motions, (here: J and K) is chosen so that only valid and reliable factors are used (as judged e.g. by cross validation over the frames). The factor loadings (columns in PV and PH ) should be scaled, e.g. so that their uncertainty variance is the same.
The regressands Y may also be defined as including intensity information, e.g. motion compensated intensity difference images.
Y=[DIR1 DIR2 DIR3 DIRn] (775)
where DIRn= the motion compensated intensity difference between frame n and the common reference frame R, defined for individual color dimensions (e.g. RGB) or as some summary like luminosity. Alternatively, may be used loadings from bilinear intensity factors based on motion compensated intensity differences. The preferred way of making such motion compensated intensity differences between a frame n and the reference frame R is first to establish the motion field for the frame, DARn=[ DVRn DHRn] and the corresponding depth estimate etc in motion estimator EstMov 520, then use this DARn to move (warp) the reference image lR to yield lnHat, (i.e. the lR-based approximation to frame ln), then compute the intensity difference between lnHat, and ln, and finally to move this difference back to reference position, using the inverse of this motion operator, DAnR, : DIRn = Move( ( lnHat - ln) by DA„R).
Such intensity information may be used together with or separately from the motion informations. In any case, the columns Y should be scaled to reflect their relative desired impact on the segmentation, e.g. inversely to their relative average estimated uncertainty variances.
Alternative spatial structure models
The spatial structure model around which the residuals F in (703) are computed may be of another type than the one employed in (702). For example, X may also contain square and cross product terms of the addresses v and h. X may also contain square and cross product terms of the addresses v and h (Lancaster, P. and Salkauskas, K (1986) Curve and survace fitting. Academic Press, p 133, which is hereby incorporated by reference). Splines or piecewise polynomials may also be used (Lancaster & Salkauskas 1986, p 245, which is hereby incorporated by reference) Such higher order models can help distinguish between outlier pixels and pixels taking part in major, smoothly structured motions that are not affine transformations
X may also contain a spatially autoregressive part, with spatially shifted versions of Y included in X and where a rank-reducing regression method such as PLS regression is used (H. Martens & M Martens (1992) NIR spectroscopy - applied philosophy Proceedings, 5th Internatl Conf NIR Spectroscopy (K I Hildrum.ed) North Holland, pp 1-10) This spatial autoregressive model part allows the distinction between on one hand, outlier pixels that should be weighted down, and on the other hand, pixels that take part in major, smooth motions which are neither affine -22- transformation structure nor well described by spatial polynomial structure within the holon.
Alternative segment border detection mechanisms
Additional information may be introduced in order to optimize the precise positioning of the segment borders. One such source of information is intensity edges in the reference image lR, as detected e.g. by a Sobel filter (J.C. Russ: The Image Processing Handbook, 2nd ed., IEEE Press 1995, which is incorporated herein by reference). If the relative weights W 740 after spatial modelling of Y indicates a segment border close to such an intensity edge, then the segment border is moved to this intensity edge.
More advanced statistical methods may also be used for determining the segment edges. One example of this was described by (Kok, F. Lai, 'Deformable contours: Modelling, extraction, detection and classification'. PhD thesis, University of Wisconsin-Madison 1995, which is incorporated herein by reference); for the present application the input information may be intensity lR, intensity residuals DIRn (or BLM summaries of these), spatial residuals F 720, R 730 or model weights W 740, and/or the Y data themselves.
Second Preferred Embodiment: Segmentation based on cluster analysis
The Second Preferred Embodiment represents a bottom-up approach to segmentation of the input holon. It consists of performing a cluster analysis of the multi- frame motion data or their bilinear summaries.
A number of different clustering techniques may be used for finding groups of pels. The choice of clustering criterion and clustering algorithm defines the statistical properties of the clusters. For instance, there is choice between performing the cluster analysis for each motion direction (Vertical, Horizontal. Depth) separately, or analyse the directions jointly. The latter is the preferred implementation (but depth dimension may be lacking, at least initially in an encoding process). Two main groups of clustering techniques can be used: Cluster analyses which do not make spatial assumptions with respect to parameter smoothness or neighbourship in the image plane, and cluster analyses which do.
Conventional cluster analysis
The goal is now to find clusters of pixels which display motion patterns that are similar at least in some of the factor dimensions in P,- i.e. pixels that display similar motion patterns, at least in some significant dimensions
There are a number of different clustering principles that can be used on the basis of the bilinear motion subspace. Spatiotemporal distances can be computed on the basis of common or weighted Pythagorean distance measures, as well as normalized (Mahalanobis) distances. One approach is standard non-hierarchical cluster analytic techniques (Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. Academic Press, Inc., New York., Gudersen, Bob (1983) An adaptive fcv cluster algorithm. International J. of Man-Machine Studies 19, pp 97-104, Benzdek et al. Detection and characteristics of cluster sub-structures. SIAM J. of Applied Math 40,(2) April 1981 , Bezdek, J.C. and Pal, S.K. (1992) Fuzzy models for pattern recognition. IEEE New York.) An example of this type of cluster analysis is the partitioning technique described by Mardia, K.V., Kent, J.T. and Bibby, J.M (1979) Multivariate Analysis. Academic Press, Inc., New York pp 361-368 These references are hereby included by reference.
More detail on cluster analysis can be found in Mardia, K.V., Kent, J T. and Bibby, J.M (1979) Multivariate Analysis Academic Press, Iπc , New York, pp 360-390, in Benzdek et al Detection and characteristics of cluster sub-structures SIAM J of Applied Math 40,(2) April 1981. and in Bezdek, J C and Pal, S K (1992) Fuzzy models for pattern recongnition IEEE New York) In particular, the technique of fuzzy clustering (see Gudersen, Bob (1983) An adaptive fcv cluster algorithm International J of Man-Machine Studies 19, pp 97-104,) can be useful, in this technique bilinear modelling is used for parameterizing the internal structure of the clusters, and the - - clusters can be partially overlapping. Hierarchical cluster analyses are described in Sharaf, M.A., lllman, D.L. and Kowalski, B.R. Chemometrics, J.Wiley & Sons, New York 1986, p 219. These references are hereby included by reference.
Cluster analysis with spatial continuity assumptions in the image plane.
In the present embodiment, the cluster analysis searches for spatially related pixels with similar motion patterns. Boyer et al 1994 published a method of image segmentation which makes extensive - but not exclusive - use of spatial continuity (Boyer, K.L., Mirza, M.J. and Ganguly, G. (1994) The robust sequential estimator: A general approach and its application to surface organization in range data. IEEE Transactions on pattern analysis and machine intelligence 16,10 oct 1994 pp 987 - 1001 , which is hereby included by reference). An embodiment of the present invention is to extend their method from one frame (a single radar image Z ), measured in one dimension (distance) to employ information from several frames and measurements in several dimensions (Vertical motion, horizontal motion and other possible characteristics, see below).
The segmentation technique of Boyer et al mentioned above may be summarized into the following elements:
* Analyze the spatial data (in the case of Boyer et al 1994, which was mentioned above: range data Z) to find sufficiently large smooth spatial regions that can serve as starting points for potential segments
* Make a spatial model of each such starting point, using reweighted least squares spatial modelling.
Around each starting point, fit the Y variables (Y=Z) to a spatial model and estimate residuals. In the present embodiment is used a linear model Y =XB + F, fitted by reweighted least squares minimizion, using a motion model X for affine transformations (702), as described above. But polynomial and/or autoregressive extensions may also be used in X. * Grow each such potential segment locally, by including adjacent pixels that seem to fit the preliminary segment model, gradually updating the segment model. This growing process is continued until no new pixels fit well to the holon's spatial segment model
* Attempt to extend each spatial model to the rest of the picture in order to seach for more remote pixels that may be long to the segment
* Merge potential segments that have compatible spatial models
* Prune and fill in outliers and resolve ambiguities along segment edges The precise edges of the segments may be optimized with the methods described in the first preferred embodiment
In the present invention the above technique is applied to other spatial parameter data than the radar range data used by Boyer et al mentioned above Instead of defining Y as the depth Z of an image, Y is defined according to (701 , 760 or 770) as motion information from several frames Optionally, intensity information (775) may be included as well, as described for the third preferred embodiment
Modelling of other parameter representations
Yet another embodiment, not to be detailed here, is to apply this segmentation technique to data, applicable for 1 D, 2D and higher-dimensional data, after having transformed the data to a frequence domain The transformed data may be represented as the result of direct FFT, as real and immagmary representation, or as phase & amplitude representation More complicated representations may also be used One example of this the use of phase correlation representation
It should be noted that the top-down and bottom-up approach to the multiframe segmentation may be combined One example of this is First, perform top- down segmentation analysis of the input holon in order to identify outlier regions that do not fit the majority or dominant structure within the holon. Secondly, using the bottom-up segmentation analysis to search for homogenous regions inside the outlier regions.
The next two preferred embodiments distinguishes between two ways of combining motion estimation and segmentation.
Third Preferred Embodiment: Separated Motion Estimation and Segmentation
In the third preferred embodiment (Figure 5) intensity data for the indiviual frame(s) ln,n=1 ,2,... and of a reference image lR 505 is input to a motion estimator 520. The resulting motion estimate DARn 525 is passed to the bilinear modelling unit EstBLM 540. The resulting bilinear model parameters 545 are passed to the segmentation unit EstSegm 560, which yields segmentation results 565.
The EstMov operator 520 may contain facilities for detecting internal preliminary segmentation indicators such as edges in lR or ln as well as analysis of depth and spatial innovation; these may be used in order enhance the motion estimate DARn 525 (e.g. by not blurring the motion fields across apparent preliminary segment borders), and are passed along with the motion estimate to the other units.
The bilinear model parameters 545 primarily consists of parameters modelling motion data DARn and their uncertainties, but may optionally also include parameters concerning motion compensated intensity changes DIRn etc.
Some relevant methods for coordinating motion estimation and bilinear modelling is given in the aforementioned application titled "Method and apparatus for coordination of motion determination over multiple frames".
Several feed-back loop levels are used in this first embodiment.
First, the motion estimator EstMov 520 employs previously established, preliminary segment information 521 in order to optimize handling of edges, occlusions and depth: Motion fields are not to be smoothed across reliable preliminary segment borders.
EstMov 520 also uses previously established bilinar modelling results 522 in order to stabilize the motion estimation against underdeterminedness ambiguity and noise sensitivity. These preliminary information 521 and 522 have been obtained, in units EstBLM 540 and EstSeg 560, respectively, for previous frames, other pyramidal frame resolutions or previous iterations.
In the bilinear modelling unit EstBLM 540, bilinear models are developed separately for each preliminary segment (holon), based on preliminary segment information 521. In the bilinear modelling block EstBLM 540, information from other related holons, and from pixels whose holon membership is unclear, may be treated so as not to affect the bilinear modelling detrimentally (as extra X-variables with low weight, in pea-like bilinear modelling of a single block of variables X, or as Y-variables in PLS2- or PCR-like bilinear modelling of two blocks, X and Y.)
Likewise, preliminary bilinear model parameter estimates 522 may be modelled together with the new motion fields DARn,n=1 ,2,. . 525 to yield the updated bilinear sequence models 545, for the motions and optionally for the motion compensated intensity changes etc.
Fourth Preferred Embodiment: Joint Motion Estimation and Segmentation
Motion estimation, depth assessment and segmentation are interdependent processes that optimally should be treated in an integrated way. In the third preferred embodiment, the motion estimation and the segmentation operators were coordinated through feedback of preliminary bilinear results 521 ,522 In the fourth preferred embodiment the operators are fully integrated The bilinear estimation may be done with less computer power in this case, since it operates on more fully isolated segments individually In this embodiment, according to Figure 6 input data 605 and preliminary segmentation and bilinear modelling results 623 are input to EstMov/EstSeg 620, which delivers motion fields DARn and its estimated uncertainties, occlusions etc 625 and segment information 665 about the holons foun. In EstBLM 640 a bilinear model is then updated for each holon separately. Optionally, inter-holon relations and pixel with unclear holon classification may tentatively be weighted down or defined as Y- variables, as described in the third preferred embodiment.
It should be noted that with the feedback mechanisms illustrated in Figures 5 and 6, motion estimation and segmentation may also be seen as integrated parts of the bilinear model estimation. In EstBLM 540, 640 the estimation process does not have to be run till convergence or full stability, like in conventional singular value decomposition or eigenvalue decomposition. It is sufficient that the subspace obtained for each holon 545 improves the next round of motion estimation and segmentation. Thus, the modification of the input data to EstBLM through improved motion estimation and segmentation may be treated as part of the bilinear estimation process.
The next two embodiments concern how to perform the coordination of between the frames in the sequence:
Fifth preferred embodiment: Modelling the whole sequence of frames in one step
In the fifth preferred embodiment, the whole sequence is subjected to motion estimation; then these motion estimates for the whole sequence are submitted to bilinear modelling. Finally, the bilinear model or models of the holons in the sequence is used for segmentation. Using Figure 5 for illustration, the bilinear model results 522 and segmentation 521 results from previous iterations (or pyramidal resolution levels) are fed back into the motion estimation 520 and bilinear modelling 540 in order to stabilize and facilitate these estimation processes
The modelling of the whole sequence may then be repeated, using the updated bilinear motion and intensity change models and the updated segmentation. - o -
Sixth Preferred Embodiment: Gradual update of the sequence model
In the sixth preferred embodiment, the bilinear model 545 is updated after the motion estimation for each frame n=1 ,2,... has been finished. This is done for each already isolated holon segment, but may also be performed for all holons in the frame. The segmentation 565 may likewise be updated after each frame. In the preferred embodiment major re-segmentation is only allowed when the motion data clearly shows the need for it, except for pruning processes along holon edges.
More detail on updating of binilear models is given in the patent application "method and apparatus for coordination of motion determination over multiple frames" mentioned before.
Again, the order in which the individual frames are pulled into the modelling and segmentation may not be fixed. Once all the frames have been modelled and segmented, the whole process may be started again for the sequence, now with better start values for the bilinear model and segmentation.
Estimated uncertainties in the estimated segment borders are stored with the segment border information itself and used in subsequent encoding steps. Pixels with unclear segment classification, e.g. in a region around the chosen segmentation border, are treated as having high uncertainty. In subsequent motion estimation and bilinear modelling the uncertainty pixels are given low weights or passively fitted by principal component regression (Martens & Naes 1989), mentioned before. In subsequent segmentations, uncertain pixels are included in the new segmentation estimation, but are given low a priori input weights (705)
The next two embodiments concern special techniques to adapt the bilinear model parameter estimation to the segmentation application
Seventh Preferred Embodiment: Modifying the bilinear modelling with additional smoothness criteria -7ι- The bilinear parameter restimation methods used for extracting the above mentioned bilinear models may be modified to require or favour additional statistical constraints types to be satisfied, such as requiring or biasing the temporal score vectors in T or the spatial loading vectors in P to be smooth, at least where no peliminary segment borders have been found.
This is done inside the NIPALS algorithm iteration for each factor a (Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK.) as shown in this pseudo-code with line numbering:
For each factor a, each new iteration is defined as follows:
(810) Estimate the raw estimate of a spatial loading vector pa.raw by projection of residual data in D after the previous factor, on the smoothed, scaled score vector ta from the previous iteration.
(820) Submit these raw spatial loading vector pa,raw to spatially smoothing: pa = f(Pa,raw). The smoothing method may be a simple boxcar filter, or a method that seeks to avoid smoothing across apparent edges that should be left unsmoothed. The smoothed loading pa is orthogonalized with respect to the loadings of previously estimated factors [p1,p2,...,pa-ι].
(830) Estimate the raw scores ta raw by projection of the residual data on the smoothed loading pa .
(840) Submit this raw score vector to temporal smoothing, e.g. a box car smoothing or a more advanced smoothing: ta = f(t_.ravv).
(850) The smoothed score vector ta is scaled to length 1 , and the process (810 - 850) is repeated iteratively-until sufficient convergence.
A further enhancement of the bilinear modelling in this preferred embodiment is to use iterative reweighted least squares fit of the bilinear model to the data, in order -12- to reduce the effect of outlier frames or outlier pixels: The weights for rows, fr,m.s and columns VPs in eq.(3) may iteratively be updated according to the inverse of updated estimated of average uncertainty standard deviation for the rows and columns, based on corrected residuals from the low-rank bilinear model in the previous iteration.
This is described in more detail in the application "Method and apparatus for coordination of motion determination" mentioned before.
Eighth Preferred Embodiment: Rule-based optimal scaling as part of the bilinear modelling
In the bilinear modelling 540, 640, not only the bilinear model parameters may be changed to attain better modelling, but also the values in the input data, e.g. DARn, may be changed in the bilinear model parameter estimation process. The individual elements in the motion data dan,peι for frames and pixels may iteratively be modified to adhere more closely to the model derived from other frames or pixels:
dan,Pβi = f(dan,pβι (in ut). dantpθιHat, Rules),
where for bilinear modelling dan,ιHat = tn*ppβ).
One examples of a rule is:
if (dan,p_ιHat gives as good or better motion fit din,peι as dan,ι (input) gives), and (dan,PeiHat lies within the statistical uncertainty range of dan,peι <.nP_t)), then (dan.Pei= dan,ιHat) else (daπ pei= dan,peι ( inp t))-
Instead of such a discrete definition of the data element dan,peι, a more continuous weighted averaging function of dan.PeiHat ) and dan.peι <mp_t)) may also be used. -■75-
This is described in more detail in the application "Method and apparatus for coordination of motion determination" mentioned before.
Any combination of the above is also an embodiment.
Other applications:
Segmentation of bilinear structures in the temporal domain
The above mentioned segmentation / clustering techniques can be used for determining groups of frames (sequences) which are sensible to analyze together - as well as detecting scene shifts. One embodiment of this is to perform simple bilinear modelling of (possibly subsampled) frame intensities, and perform non-hierarchical cluster analysis in the score space T to find clusters of frames that has much image material in common. A robust single linkage cluster analysis is the preferred cluster method in this embodiment, in order to be able to follow motions within a scene within one single cluster.
Application to other types of data
Yet another embodiment is to apply the above principles to time series data, e.g. sound data, in order to define temporal segments. In that case the spatial motion field data correspond to time warp estimates, while the spatial intensity change data correspond to temporal intensity change data.
The usage of the output from the present invention for decoding (reconstruction of frames) is explained in W095/34172.
Modifications of the present invention within the scope of the appended claims are possible to an expert. Particularly, the term "plurality" can be interpreted in the sense of "one or more".

Claims

Claims
1. A method for segmenting an image sequence, said sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of:
(1) forming a reference image, consisting of samples from a plurality of said frames,
(2) estimating motion from said reference image to each of said frames,
(3) reformatting the estimated motion into row vectors,
(4) collecting said row vectors into a motion matrix,
(5) performing a Principal Component Analysis on the motion matrix, thereby obtaining a score matrix called consisting of a plurality of column vectors called score vectors and a loading matrix consisting of a plurality of row vectors called loading vectors, such that each score vector corresponds to one element for each frame, such that each element of each loading vector corresponds to one element of the reference image, such that one column of said score matrix and one loading vector together constitute a factor, and such that the number of factors is lower than or equal to the number of said frames,
(6) reformatting each loading back into same format as used for the motion,
(7) performing segmenting based on the first of the reformatted loadings.
2. The method according to claim 1 , wherein the segmenting in step (7) comprises the steps of:
(7a) selecting a position in the first loading,
(7b) forming a local motion model based on elements in a neighbourhood of said selected position.
(7c) choosing candidate elements of said reference image among those elements that have not yet been segmented,
(7d) determining how well each of said candidate elements of said first loading fits to said local motion model,
(7e) including those candidate elements which satisfy a given fidelity criterion to said local motion model, - s
wherein the collection of elements that fit to said local motion model together represent a segment.
3. The method according to claim 2, wherein said local motion model comprises modelling motion as an affine transform.
4. The method according to claim 2, wherein said local motion model comprises modelling motion as a plurality of piecewise polynomial transforms.
5. The method according to any one of claim 2 to 4, wherein step (7d) is replaced by:
(7d1 ) determining how well each of a plurality of further candidate elements of said first loading fits to a subset of the elements that have been included in said local model, said subset being selected for each candidate element based on the position of the candidate element.
6. The method according to any one of claim 2 to 5, wherein step (7e) concerning said fidelity criterion comprises for each candidate element, computing a difference between motion as extrapolated from the local motion model for the position of the candidate element, and motion as computed from multiplying score vectors by loading vectors.
7. The method according to any one of claims 2 to 6, wherein said fidelity criterion also takes into consideration for each element in said reference image an uncertainty corresponding to said motion fields.
8. The method according to claim 7, wherein said fidelity criterion is computed as, for each element in said reference image, a ratio between said difference and said uncertainty corresponding to said motion fields. -Jς-
9. The method according to claim 8, wherein said uncertainity may have one value corresponding to each of a plurality of spatial directions, one of said values being chosen for computation of said fidelity criterion dependent of the direction of said difference.
10. The method according to any one of claim 2 to 9, wherein the method further comprises the step of:
(7f) updating the local motion model.
11. The method according to claim 10, wherein said updating includes a step for adjusting the weights corresponding to the elements according to the fidelity criterion.
12. The method according to claim 10 or 11 , wherein steps (7c) to (7f) are repeated several times.
13. The method according to any one of claim 2 to 12, further comprising the steps of:
(7g) marking the positions of elements that satisfied said fidelity criterion as segmented,
(7h) selecting a new position among those positions that have not been marked as segmented,
(7i) repeating steps (7c) to (7h) until a given convergence criterion is satisfied.
14. The method according to any one of claim 1 to 13, wherein instead of said first factor, a plurality of factors is used.
15. The method according to any one of claim 1 to 14, wherein instead of performing step (5) concerning Principal Component Analysis the subsequent steps operate on said motion fields corresponding to each said frame directly instead of operating on said loading vectors.
16. A method for improving segmenting an image sequence, said sequence consisting of frames, each frame consisting of samples of an input signal, the segmenting being represented as a plurality of segments of said reference image, the method comprising the steps of:
(1 ) for each of the given input segments, performing steps (1 ) to (6) of claim 1 , using the intensity corresponding to said given input segment plus a layer of adjacent elements as reference image, the collection of segment reference images together constituting a total reference image, where each element of the total reference image may be represented in more than one segment reference image,
(2) for each element in each said segment reference image, computing a fidelity criterion,
(3) for each element in the total reference image represented in more than one segment reference image, finding which segment gives the best fidelity, and removing the element from the other segments.
17. The method according to any one of claims 1 to 15, the method further comprising the steps of:
(8) forming an energy potential image based on the fidelity indicators,
(9) for each segment, finding its border to neighbouring segments,
(10) for each border, optimizing its position iteratively to minimize the sum of energy along its trace, wherein the reference image elements inside each border represent the respective segments.
18. The method according to claim 17, wherein the energy potential function is also based on reference image intensity gradients.
19. The method according to any one of claim 17 to 18, further comprising:
(11 ) for each frame in the sequence, moving the frame back to reference position using its motion field or a reconstruction of its motion field computed by multiplying score values by loading vectors,
(12) reformatting the moved frames into row vectors,
(13) collecting the row vectors into an intensity matrix, -31-
(14) performing a Principal Component Analysis on the intensity matrix, resulting in row vectors called intensity loading vectors and column vectors called intensity score vectors, wherein said energy potential image is also dependent on the intensity loading vectors and intensity score vectors.
20. The method according to any one of claims 17 to 19, wherein said optimization of said border also includes optimizing spatial simplicity of the border
21. A method for segmenting an image sequence, said image sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of
(1 ) forming a reference image,
(2) estimating motion to one frame, resulting in, for each element in said reference image, an estimated position in said frame,
(3) forming a regressor matrix, said regressor matrix consisting of column vectors so that there is one column vector corresponding to each spatial dimension of the reference image, each of said columns containing for each element in said reference image one component of the spatial position of said element, and also one column vector containing ones
(3) forming a regressand matrix, said regressand matrix consisting of column vectors so that there is one column vector corresponding to each spatial dimension of the reference image, each column containing for each element one component of the estimated position in said frame,
(4) estimating a regression coefficient matrix so that said regressand matrix is approximated by the product of said regression coefficient matrix and said regressor matrix,
(5) computing for each element in said reference image a membership measurement based on the regression residual, said regression residual being computed as said regressand matrix minus said product of regression coefficient matrix and regressor matrix,
(6) forming a segment based on the membership measurement computed in step
(5), wherein the segment formed in step 6 represents the segmenting of said image sequence.
22. The method according to claim 21 , where said estimating in step (4) is performed using robust reweighted least squares regression.
23. The method according to any one of claim 21 or 22, wherein motion estimation is performed for a plurality of frames, and the regressand matrix contains one column vector for each spatial dimension of each resulting motion field.
24. The method according to any one of claim 21 or 22, wherein a motion estimation is performed for each frame in said sequence, a Principal Component Analysis being performed on the results from said motion estimation, the regressand matrix comprising one column vector for each loading vector.
25. The method according to any one of claims 21 to 24, wherein said regressand matrix also includes, for each frame, a column containing motion-compensated intensity residuals, said motion-compensated intensity residuals being formed by moving each frame back to reference position according to estimated motion and subtracting the reference image.
26. The method according to any one of claims 21 to 24, wherein a Principal Component Analysis is performed on motion-compensated intensity residuals, and said regressand matrix also includes the resulting loading vectors as columns.
27. The method according to any one of claims 21 to 26, wherein said regressor matrix also includes polynomials of said spatial positions as columns.
28. The method according to any one of claims 22 to 27, wherein spatially shifted versions of regressands are also included as columns in the regressor matrix.
29. A method for segmenting an image sequence, said image sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of:
(1 ) performing segmenting according to the method of any one of claims 21 to 28,
(2) repeating step (1 ), disregarded those parts of said reference image that have been found to be outside the segment, wherein the segments found in each repetition of step (1 ) together represent said segmenting.
30. The method according to any one of claims 21 to 28, wherein said regression is performed using a region-growing method.
31. A method for segmenting an image sequence, said image sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of:
(1 ) performing segmenting according to the method of claim 30,
(2) repeating step (1 ), disregarding those parts of said reference image that have been found to be outside the segment, wherein the segments found in each repetition of step (1 ) together represent said segmenting.
32. The method according to any one of claims 21 to 31 , wherein uncertainities for said regressands are computed, and said uncertainties are exploited when estimating said regression coefficient matrix.
33. The method according to any one of claims 21 to 32, wherein uncertainities for said regressands are computed, and said uncertainties are exploited when computing said membership measurement
34. The method according to any one of claims 21 or 23 to 33, wherein step (4) and (5) comprise:
(4) performing a cluster analysis of the samples in the space of the regressand columns,
(5) computing a membership measurement based on said cluster analysis.
35. An apparatus for segmenting an image sequence, said sequence consisting of frames, each frame consisting of samples of an input signal, the apparatus comprising:
(1) means for forming a reference image, consisting of samples from a plurality of said frames,
(2) means for estimating motion from said reference image to each of said frames,
(3) means for reformatting the estimated motion into row vectors,
(4) means for collecting said row vectors into a motion matrix,
(5) means performing a Principal Component Analysis on the motion matrix, thereby obtaining a score matrix called consisting of a plurality of column vectors called score vectors and a loading matrix consisting of a plurality of row vectors called loading vectors, such that each score vector corresponds to one element for each frame, such that each element of each loading vector corresponds to one element of the reference image, such that one column of said score matrix and one loading vector together constitute a factor, and such that the number of factors is lower than or equal to the number of said frames,
(6) means reformatting each loading back into same format as used for the motion,
(7) means for performing segmenting based on the plurality of reformatted loadings.
36. The apparatus of claim 35, adapted to be used according to the method of any one of claim 2 to 34.
PCT/EP1996/001273 1995-03-22 1996-03-22 Method and apparatus for multi-frame based segmentation of data streams WO1996030873A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP8528905A JPH11502653A (en) 1995-03-22 1996-03-22 Multi-frame method and device based on data stream division
AU52735/96A AU5273596A (en) 1995-03-22 1996-03-22 Method and apparatus for multi-frame based segmentation of d ata streams
EP96909117A EP0815537A1 (en) 1995-03-22 1996-03-22 Method and apparatus for multi-frame based segmentation of data streams

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP95104229.0 1995-03-22
EP95104229 1995-03-22

Publications (1)

Publication Number Publication Date
WO1996030873A1 true WO1996030873A1 (en) 1996-10-03

Family

ID=8219106

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP1996/001273 WO1996030873A1 (en) 1995-03-22 1996-03-22 Method and apparatus for multi-frame based segmentation of data streams

Country Status (6)

Country Link
EP (1) EP0815537A1 (en)
JP (1) JPH11502653A (en)
KR (1) KR19980702923A (en)
AU (1) AU5273596A (en)
WO (1) WO1996030873A1 (en)
ZA (1) ZA962304B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010056904A3 (en) * 2008-11-12 2010-08-12 University Of Utah Research Foundation Image pattern recognition
US8285079B2 (en) 2010-03-19 2012-10-09 Sony Corporation Method for highly accurate estimation of motion using phase correlation
US8488007B2 (en) 2010-01-19 2013-07-16 Sony Corporation Method to estimate segmented motion
CN112766325A (en) * 2021-01-04 2021-05-07 清华大学 Traffic data multi-mode missing filling method based on space-time fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GREGORS W. DONOHOE: "Combining segmentation and tracking for the classification of moving objects in video scenes", PROCEEDINGS OF THE ASILOMAR CONFERENCE ON SIGNALS,SYSTEMS AND COMPUTERS , OCTOBER 31-NOVEMBER 2,1988 ;PACIFIC GROVE (US);IEEE COMPUTER SOCIETY PRESS, pages 533 - 537, XP000130310 *
S.Y. KUNG ET AL.: "Neural networks for extracting pure/constrained/Oriented Principal components", SVD AND SIGNAL PROCESSING,II ALGORITHMS ,ANALYSIS AND APPLICATIONS ; ELSEVIER ,AMSTERDAM (NL), 1991, pages 57 - 81, XP000574672 *
TERRANCE E. BOULT ET AL.: "Factorization-based segmentation of motions", PROCEEDINGS OF THE IEEE WORKSHOP ON VISUAL MOTION, OCTOBER 7-9, 1991 ,PRINCETON (US): IEEE COMPUTER SOCIRTY PRESS, LOS ALAMITOS (US), pages 179 - 186, XP000579543 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010056904A3 (en) * 2008-11-12 2010-08-12 University Of Utah Research Foundation Image pattern recognition
US8488007B2 (en) 2010-01-19 2013-07-16 Sony Corporation Method to estimate segmented motion
US8285079B2 (en) 2010-03-19 2012-10-09 Sony Corporation Method for highly accurate estimation of motion using phase correlation
CN112766325A (en) * 2021-01-04 2021-05-07 清华大学 Traffic data multi-mode missing filling method based on space-time fusion

Also Published As

Publication number Publication date
ZA962304B (en) 1996-09-27
EP0815537A1 (en) 1998-01-07
AU5273596A (en) 1996-10-16
KR19980702923A (en) 1998-09-05
JPH11502653A (en) 1999-03-02

Similar Documents

Publication Publication Date Title
US6157677A (en) Method and apparatus for coordination of motion determination over multiple frames
Simon et al. Rethinking the CSC model for natural images
US6535632B1 (en) Image processing in HSI color space using adaptive noise filtering
Li et al. Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models
US6480615B1 (en) Motion estimation within a sequence of data frames using optical flow with adaptive gradients
Moore et al. Panoramic robust pca for foreground–background separation on noisy, free-motion camera video
KR101216161B1 (en) Apparatus and method for processing video data
US5995668A (en) Segmented picture coding method and system, and corresponding decoding method and system
KR100256194B1 (en) Method and system for estimating motion within video sequence
Freeman et al. Learning to estimate scenes from images
US20040189863A1 (en) Tracking semantic objects in vector image sequences
KR20070107722A (en) Apparatus and method for processing video data
Zou et al. A nonlocal low-rank regularization method for fractal image coding
CN118710552B (en) A method, system and storage medium for restoring thangka images
EP0815537A1 (en) Method and apparatus for multi-frame based segmentation of data streams
Pan et al. Nonlocal low rank regularization method for fractal image coding under salt-and-pepper noise
Coelho et al. Data-driven motion estimation with spatial adaptation
Ghosh et al. Robust simultaneous registration and segmentation with sparse error reconstruction
Malézieux et al. Dictionary and prior learning with unrolled algorithms for unsupervised inverse problems
Decenciere et al. Applications of kriging to image sequence coding
Malassiotis et al. Tracking textured deformable objects using a finite-element mesh
US7477769B2 (en) Data processing method usable for estimation of transformed signals
Mester et al. Segmentation of image pairs and sequences by contour relaxation
LaValle A Bayesian framework for considering probability distributions of image segments and segmentations
CN120238709B (en) Trajectory control video generation method and device based on depth information and time-frequency optimization

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 96192717.8

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1019970706331

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 1996909117

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 1996 528905

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 1997 930341

Country of ref document: US

Date of ref document: 19971127

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 1996909117

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWP Wipo information: published in national office

Ref document number: 1019970706331

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1019970706331

Country of ref document: KR

WWW Wipo information: withdrawn in national office

Ref document number: 1996909117

Country of ref document: EP