WO1996030873A1 - Method and apparatus for multi-frame based segmentation of data streams - Google Patents
Method and apparatus for multi-frame based segmentation of data streams Download PDFInfo
- Publication number
- WO1996030873A1 WO1996030873A1 PCT/EP1996/001273 EP9601273W WO9630873A1 WO 1996030873 A1 WO1996030873 A1 WO 1996030873A1 EP 9601273 W EP9601273 W EP 9601273W WO 9630873 A1 WO9630873 A1 WO 9630873A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- motion
- matrix
- reference image
- vectors
- frames
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/277—Analysis of motion involving stochastic approaches, e.g. using Kalman filters
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/537—Motion estimation other than block-based
- H04N19/543—Motion estimation other than block-based using regions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/20—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
Definitions
- the present invention concerns methods for attaining grouping or segmentation in large signal streams by development and analysis of a manifold or subspace of parameters.
- it concerns methods for spatiotemporal segmentation of video signals.
- each of the segment models may also be easier to control and interpret, e.g. for editing purposes.
- subsequent computational treatments of the invidividual segments may be computationally simpler than having to treat the full signal streams, for instance in reducing the amount of high-speed memory required for effective computation.
- the segmentation process itself must also be statistically and computationally effective.
- the present invention concerns how to attain relevant and reliable segmentation.
- image segmentation is important: groups of pixels (pels) that display consistently related spatial patterns of change over groups of frames should be modelled together, as this gives best compression, editability and interpretability.
- a segment may correspond to one physical object, but it may also correspond to only a part of a physical object, or to a set of several such objects. It may also correspond to non-tangible objects or phenomena such as shadows.
- the optimal definition of a segment differs with the purpose of the coding:
- the segments ideally correspond to the groups of pels that are most effectively compressed, but if the purpose is coding for later video manipulation such -J- as editing or video games, then the segments ideally corresponds more to physical objects.
- the segmentation process must be robust, i.e. it must give visibly acceptable, statistically useful segments with applicability over many related image frames. Yet, it must be computationally feasible with respect to cpu time and memory requirements.
- the segmentation methods for video coding mainly fall into two main categories: Still image segmentation and motion based segmentation.
- Still image segmentation is based on defining spatial intensity patterns in an individual image.
- a drawback with this type of segmentation is that it is difficult to distinguish between spatial contours inside and along the edges of objects.
- Motion based segmentation concerns how the image intensities vary between images.
- the segmentation is often based on the latter, and is attained by analysis of estimates of motion fields.
- One established approach to segmentation is to estimate the motion field between two frames, say from a reference frame R and another frame n (here termed 'Difference in Address', DA Rn ), and to search for groups of pels in DA R ⁇ with similar motions.
- DA Rn may have one, two or more motion dimensions.
- Motion based segmentation can be generalized to change-based segmentation, where the changes could also include 'Difference in Intensity', i.e. intensity changes between pairs of frames, DI Rn , e.g. motion compensated and at different color channels.
- 'Difference in Intensity' i.e. intensity changes between pairs of frames, DI Rn , e.g. motion compensated and at different color channels.
- segment information facilitate subsequent motion estimation and intensity change estimation and the bilinear modelling of these.
- segmentation is based on change information for a number of relevant frames, not only on changes between two frames. Thereby the segments obtained are statistically more stabile and have a higher validity.
- the way the many change data are represented in the segmentation computations is preferably via common manifolds or subspace models, primarily by bilinear modelling based on a common reference position. This further enhances the statistical precision and validity of the segmentation, since some noise-based and other non-significant change types can be ignored. It also reduces the computational complexity of the segmentation work by reducing the dimensionality of the change data needed to be analyzed in the segmentation.
- the subspace segmentation can be recursively updated in cases when the subspace representation itself is recursively updated, this gives computational advantages.
- the change information used in the segmentation can be of various kinds, motion information and intensity change information.
- the invention can be applied to signal streams in general.
- it can be applied to spatio-temporal segmentation of digital video signals and temporal segmentation of digital sound data
- Figure 1 illustrates how a frame-size (with nv x nh pixels) motion field in one motion direction (here- DV Rn for the vertical motion for each pixel for moving (warping) image R to approximate image n) can strung out as a one-dimensional vector (with nv * nh elements).
- Figure 4 illustrates the parameters from Figure 3 pertaining to one single frame.
- Figure 5 shows the third preferred embodiment, in which motion estimation and segmentation are performed separated.
- Figure 6 shows the fourth preferred embodiment, in which motion estimation and segmentation are performed simultaneously.
- the symbol ' * ' is used for multiplication when needed.
- Boldface uppercase letters are used for representing data matrices, and boldface lowercase letters for data vectors.
- a motion field describes how the pels in one image, say reference frame R, should be moved in order to approximate another image, say n.
- Each motion field image e.g. DV Rn can be strung out as a one-dimensional vector d n , with nPel elements, - one element for each pel in the reference image in which the motion information is given, as illustrated in Figure 1.
- the different motion dimensions may be strung after one another in one and the same vector, which then has a multiple of nPel elements, as illustrated in Figure 2.
- bilinear modelling is well established as a method for approximating sets of related vectors ( Figure 3).
- the bilinear factor model can be written as a bilinear matrix product plus a residual (conf. H. Martens & Naes.T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd Chichester UK, which is hereby included by reference): -1-
- D is the data to be modelled,- it has one row for each frame modelled and one column for each pixel (pel) variable to be modelled simultaneously (e.g. one horizontal and one vertical motion element for each pixel.)
- the superscript ⁇ means 'transposed'.
- E represents the Error or unmodelled residual - with the same matrix dimensions D.
- the loading subspace P ⁇ spanning the most significant row space of D, represents the motion information more or less common to several frames in the sequence.
- the score vectors t n for frame n 1 ,2,.., estimated for each frame separately or for many frames jointly (see below), serve to convey this common motion information P ⁇ back to each individual frame pair.
- bilinear modelling e.g. weighted singular value decomposition based on the QR algorithm, with or without adaptive up- and downdating.
- BLM bilinear modelling
- PCA principal component analysis
- the bilinear model may be incrementally updated, as described in the application titled “Method and apparatus for coordination of motion determination” mentioned before
- the bilinear modelling may be performed after preprocessing consisting of subtracting the mean of each column One may also mean center each row These mean data must be added back upon reconstructions based on the bilinear model More advanced preprocessing may also be used, such as multiplicative scatter correction (MSC) and its extensions, as described by Martens, H. and Naes, T (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference.
- MSC multiplicative scatter correction
- Bilinear model parameter estimation methods that involve smoothing of scores and loadings, or modifications of the individual data elements in data matrix D may also be used.
- the bilinear model is to attain a compact representation of a large amount of input data.
- the number of 'significant' factors used in the model T * P T should be low,- i.e. the model should be of row rank.
- This number of significant factors may be estimated in varous ways, e.g. by cross validation or from the -l ⁇ - residuals and leverages after varying number of factors, as described by Martens & Naes 1989, which was mentioned before.
- Previously defined or estimated loadings may be used as part of the modelling of data matrix D.
- the scores for these a priori factors are estimated by projecting D on these pseudo-loadings, and the bilinear modelling is performed on the residuals after this projection (weighted regression).
- the estimation of the scores for one individual frame may be based on linear regression, projecting d Rn on P ⁇ using some weighted or robustly reweighted least squares minimization.
- it may be based on nonlinear iterative curve fitting by e.g. SIMPLEX optimization (J.A. Nelder and R. Mead, 'A simplex method for function minimization', Computer Journal, vol. 7, p 308-313.)
- the criterion may be based also on the decoding intensity error arising when using the scores
- the change information d Rn is represented as e.g. as motion field DA Rn in the reference position, so that it is compatible with the bilinear loadings P also represented in the reference position.
- the change information may be represented in the position of the pixels in frame n, e g the reverse motion field DA nR , and projected on a compatible version of the loadings P, i.e P temporarily moved to that same position using motion field DA Rn
- a low-dimensional bilinear model is primarily effective when the motion fields (or other change fields) are stored in D in one given representational system, so that all information about an object is stored in the same pixel positions for -fl ⁇ ail frames.
- IDLE codec type (according to WO 95/08240 and WO 95/34172), where the motion, intensity changes and other modelled change information for a number of (consecutive) frames is directly or indirectly represented relative to a common 'extended reference image model' for a given class of pixels (spatial 'holon') in a given set of related frames.
- the whole initial reference image e.g. the first frame in the sequence
- the main purpose of the spatial segmentation is then to split this initial spatial holon into data structures that each lends itself to simple, low-dimensional mathematical modelling.
- the motion field estimates can be performed directly from this reference image l R to frame l n , and analyzed directly in D.
- One advantage of the present invention is hence to use the resulting compact, low-rank summary of several frames' motion fields and other change fields to enhance and stabilize the segmentation in video coding.
- segmentation may be performed in the temporal domain to find groups of frames where certain spatial patterns are found.
- the temporal segmentation may then utilize subspaces information derived from bilinear modelling of time-shifted versions of relevant time series (H. Martens & M. Martens (1992) NIR spectroscopy - applied philosophy. Proceedings, 5th Intematl Conf. NIR Spectroscopy (K.I.Hildmm.ed) North Holland; pp 1-10), e.g. time-shifted versions of the static frame scores obtained by bilinear modelling of change fields, as described for equation (1 ).
- the bilinear summaries of multiple frames' motion fields may be used in several ways in the segmentation.
- the order in which the frames are modelled is in the preferred embodiments forward and sequential. However, the order may also be selected according to other criteria,- e.g. according to which frame at any given time shows the most need and potential for model refinement.
- the bilinear model based segmentation may be used pyramidally.
- One example of this is to perform the segmentation on frames in reduced resolution, in order to identify the major holons in the sequence, and then to use these results as preliminary, tentative input to the same process at higher frame resolution.
- the motion estimation, bilinear modelling and segmentation may be performed on individual, already identified holons ('input holons'), or on complete, unsegmented images l n . In either case, a multi-holon pre- or post processing of the obtained motion fields, bilinear models and segments is desired in order to resolve overlap between input holons
- One such pre- or post processing is based on having stored each holon with a 'halo' of neighbour pixels with uncertain holon membership,- i.e that only tentatively can be ascribed to a holon ( and thus is also temporarily stored in other holons or as separate lists of unclear pixels)
- Such tentative halo pixels are treated specially, e.g by being fitted for all relevant holons, and their memberships to the different holons updated according to the success of the motion estimates
- Such halo pixels are given very low weight or fitted passively in the bilinear modelling (by Principal Component Regression, Martens, H and Naes, T (1989) Multivariate -IS- Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference).
- Additional columns in the raw data matrix G may be formed from 'external scores' from other blocks of data.
- Sources of such 'external scores' are: scores from bilinear modelling of some other data domain,
- scores from other holons preferably in nonlinear representation (see e.g. A. Gifi: Nonlinear Multi-variate Analysis. J.Wiley & Sons Ltd Chichester 1990), with each quantitative score vector quantized and analyzed as an indicator matrix (p.67) or at the ordinal level (p 187), which is hereby included by reference, or scores from the same holon at a different spatial resolution, scores from external data such as sound
- weights for such additional variables must be adapted so that their uncertainty level become similar to those of the weighted pixels in the final data matrix to be modelled, D (equations (1 ) and (2)).
- An alternative way to incorporate uncertain pixels or extenal scores gently, without forcing their information into the bilinear model, is to replace the one-block bilinear modelling with a two- modelling, such as PLS regression (Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK.) or a with multi-block or N-way modelling, such as Parafac (Sharaf, M.A., lllman, D.L. and Kowalski, B.R. Chemometrics, J.Wiley & Sons, New York 1986) or Consensus PCA/PLS (described by Martens, M. and Martens, H. 1986: Partial Least Squares regression.
- uncertain pixels and external scores contribute positively to the modelling if they fit well, but do not affect the modelling strongly in a negative way if they do not fit. In any way these uncertain pixels and external scores are fitted to the obtained bilinear model.
- the scores from the present holon's modelling in the present resolution and in the present domain may in turn be used as 'external scores' for other holons or at other resolutions or in other domains.
- the stabilization of the segmentation by the use of multi-frame summaries may be embodied in different ways.
- the bilinear segmentation process employs a top-down approach, removing segments from the input holon: Pixel areas in the motion subspace that do not fit to a general holon model are detected as outliers and segmented out from the rest of the input holon
- the segmentation employs a bottom-up approach, attempting to let stabile seed points grow into homogenous, consistent segments in the input holon
- the segmentation is separated from the motion estimation and -compensation ( Figure 5), with bilinear modelling of the frames' motion fields and other estimated change data taking place in between
- motion estimation and actual segmentation are integrated ( Figure 6), followed by the bilinear modelling.
- the motion estimation, bilinear modelling and segmentation process is done for a whole sequence.
- the motion estimation, bilinear modelling and segmentation and model is updated gradually for individual frames.
- the bilinear modelling methods used for segmentation are expanded to include additional criteria than just explained covariance,- in this case spatial and temporal smooting is used as extra criteria. Reweighting of the rows and columns of the input data to the bilinear modelling is also included.
- the bilinar modelling is combined with optimal scaling so that not only the weights, but also the input data themselves are changed during the model estimation process: As long as the prediction of a data element from a preliminary low-rank bilinear model does not give significantly worse decoding results than the element's input value, its input value is replaced by its bilinear prediction.
- Figure 5 shows the main building blocks of the bilinear model based segmentation: a motion estimator unit EstMov 520, a bilinear modelling unit EstBLM 540 and a segmentation unit EstSegm 560, and the data streams between them. More details on the data streams will be given in conjunction with the Third Preferred Embodiment.
- the two first embodiments represent two implementations of the segmentation unit EstSeg 560,- top-down or bottom-up. - ⁇ -
- the holon input to the EstSeg unit 560 is retained as intact as possible, but if the holon contains parts which display significantly and consistently different behaviour than the rest of the holon, then these parts are split out as separate, new segments.
- individual pixels, e.g. along holon edges, whose preliminary classification appears questionable, may be pruned away or otherwise identified as unreliable outliers.
- the method is described for one single frame and for detection of a stiffly moving body.
- pixels that do not fit well to the spatial model favoured by the majority of the pixels will display significantly large relative residuals R and hence be weighted down so as to reduce their influence on the estimation of coefficients B in the next iteration, in which their residuals will be even larger, etc, so that they have very little influence on the estimation of B the final spatial model upon convergence.
- Pixels with low final weights are defined as outliers not belonging to the input holon and collected into one new segment.
- This new outlier segment may be submitted to the same reweighted regression modelling, to check if it should be split further into smaller segments.
- the resulting segmentation then represents the output result 565.
- weights of the pixels in (740) effects of neighbouring pixels may also be brought in, to enhance spatial continuity of the holons.
- the a priori weights (705) may be modified, e.g. by using lower inital weights for pixels already known to be potentially invalid due to occlusions or particularly uncertain due to unsatisfactory bilinear modelling.
- the uncertainty measure s(pelj) for each individual element (pelj) in Y may have been estimated, and may be used instead of the general uncertainty for each pixel, s(pel), in (745).
- This individual uncertainty measure may be assymet ⁇ c, so that positive and negative residuals may be assessed differently This is relevant for motion estimates of pixels in flat intensity areas near intensity edges (assymmet ⁇ c slack) The pixel may move far away from the edge without affecting the resulting intensitiy lack-of- fit, but cannot move beyond the edge.
- the full-rank regression method employed in (710) may be replaced by other estimators, e g reduced-rank regression methods like PLS regression or some extension of this, as described in Martens, H and Naes, T (1989) Multivariate Calibration J Wiley & Sons Ltd, Chichester UK Multi-frame segmentation
- This basic top-down segmentation method can be used for multi-frame segmentation: Instead of basing the segmentation only on motion fields of the holon for one single frame, using regressands
- the regressands Y may also be defined as including intensity information, e.g. motion compensated intensity difference images.
- DI Rn the motion compensated intensity difference between frame n and the common reference frame R, defined for individual color dimensions (e.g. RGB) or as some summary like luminosity. Alternatively, may be used loadings from bilinear intensity factors based on motion compensated intensity differences.
- Such intensity information may be used together with or separately from the motion informations.
- the columns Y should be scaled to reflect their relative desired impact on the segmentation, e.g. inversely to their relative average estimated uncertainty variances.
- the spatial structure model around which the residuals F in (703) are computed may be of another type than the one employed in (702).
- X may also contain square and cross product terms of the addresses v and h.
- X may also contain square and cross product terms of the addresses v and h (Lancaster, P. and Salkauskas, K (1986) Curve and survace fitting. Academic Press, p 133, which is hereby incorporated by reference).
- Splines or piecewise polynomials may also be used (Lancaster & Salkauskas 1986, p 245, which is hereby incorporated by reference)
- Such higher order models can help distinguish between outlier pixels and pixels taking part in major, smoothly structured motions that are not affine transformations
- X may also contain a spatially autoregressive part, with spatially shifted versions of Y included in X and where a rank-reducing regression method such as PLS regression is used (H. Martens & M Martens (1992) NIR spectroscopy - applied philosophy Proceedings, 5th Internatl Conf NIR Spectroscopy (K I Hildrum.ed) North Holland, pp 1-10)
- This spatial autoregressive model part allows the distinction between on one hand, outlier pixels that should be weighted down, and on the other hand, pixels that take part in major, smooth motions which are neither affine -22- transformation structure nor well described by spatial polynomial structure within the holon.
- Additional information may be introduced in order to optimize the precise positioning of the segment borders.
- One such source of information is intensity edges in the reference image l R , as detected e.g. by a Sobel filter (J.C. Russ: The Image Processing Handbook, 2nd ed., IEEE Press 1995, which is incorporated herein by reference). If the relative weights W 740 after spatial modelling of Y indicates a segment border close to such an intensity edge, then the segment border is moved to this intensity edge.
- More advanced statistical methods may also be used for determining the segment edges.
- One example of this was described by (Kok, F. Lai, 'Deformable contours: Modelling, extraction, detection and classification'. PhD thesis, University of Wisconsin-Madison 1995, which is incorporated herein by reference); for the present application the input information may be intensity l R , intensity residuals DI Rn (or BLM summaries of these), spatial residuals F 720, R 730 or model weights W 740, and/or the Y data themselves.
- the Second Preferred Embodiment represents a bottom-up approach to segmentation of the input holon. It consists of performing a cluster analysis of the multi- frame motion data or their bilinear summaries.
- clustering techniques may be used for finding groups of pels.
- the choice of clustering criterion and clustering algorithm defines the statistical properties of the clusters. For instance, there is choice between performing the cluster analysis for each motion direction (Vertical, Horizontal. Depth) separately, or analyse the directions jointly. The latter is the preferred implementation (but depth dimension may be lacking, at least initially in an encoding process).
- Two main groups of clustering techniques can be used: Cluster analyses which do not make spatial assumptions with respect to parameter smoothness or neighbourship in the image plane, and cluster analyses which do.
- the goal is now to find clusters of pixels which display motion patterns that are similar at least in some of the factor dimensions in P,- i.e. pixels that display similar motion patterns, at least in some significant dimensions
- Spatiotemporal distances can be computed on the basis of common or weighted Pythagorean distance measures, as well as normalized (Mahalanobis) distances.
- One approach is standard non-hierarchical cluster analytic techniques (Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. Academic Press, Inc., New York., Gudersen, Bob (1983) An adaptive fcv cluster algorithm.
- the cluster analysis searches for spatially related pixels with similar motion patterns.
- Boyer et al 1994 published a method of image segmentation which makes extensive - but not exclusive - use of spatial continuity (Boyer, K.L., Mirza, M.J. and Ganguly, G. (1994)
- the robust sequential estimator A general approach and its application to surface organization in range data. IEEE Transactions on pattern analysis and machine intelligence 16,10 oct 1994 pp 987 - 1001 , which is hereby included by reference).
- An embodiment of the present invention is to extend their method from one frame (a single radar image Z ), measured in one dimension (distance) to employ information from several frames and measurements in several dimensions (Vertical motion, horizontal motion and other possible characteristics, see below).
- Y is defined according to (701 , 760 or 770) as motion information from several frames
- intensity information (775) may be included as well, as described for the third preferred embodiment
- top-down and bottom-up approach to the multiframe segmentation may be combined.
- One example of this is First, perform top- down segmentation analysis of the input holon in order to identify outlier regions that do not fit the majority or dominant structure within the holon.
- the next two preferred embodiments distinguishes between two ways of combining motion estimation and segmentation.
- the resulting motion estimate DA Rn 525 is passed to the bilinear modelling unit EstBLM 540.
- the resulting bilinear model parameters 545 are passed to the segmentation unit EstSegm 560, which yields segmentation results 565.
- the EstMov operator 520 may contain facilities for detecting internal preliminary segmentation indicators such as edges in l R or l n as well as analysis of depth and spatial innovation; these may be used in order enhance the motion estimate DA Rn 525 (e.g. by not blurring the motion fields across apparent preliminary segment borders), and are passed along with the motion estimate to the other units.
- the bilinear model parameters 545 primarily consists of parameters modelling motion data DA Rn and their uncertainties, but may optionally also include parameters concerning motion compensated intensity changes DI Rn etc.
- the motion estimator EstMov 520 employs previously established, preliminary segment information 521 in order to optimize handling of edges, occlusions and depth: Motion fields are not to be smoothed across reliable preliminary segment borders.
- EstMov 520 also uses previously established bilinar modelling results 522 in order to stabilize the motion estimation against underdeterminedness ambiguity and noise sensitivity. These preliminary information 521 and 522 have been obtained, in units EstBLM 540 and EstSeg 560, respectively, for previous frames, other pyramidal frame resolutions or previous iterations.
- bilinear modelling unit EstBLM 540 bilinear models are developed separately for each preliminary segment (holon), based on preliminary segment information 521.
- information from other related holons, and from pixels whose holon membership is unclear, may be treated so as not to affect the bilinear modelling detrimentally (as extra X-variables with low weight, in pea-like bilinear modelling of a single block of variables X, or as Y-variables in PLS2- or PCR-like bilinear modelling of two blocks, X and Y.)
- Motion estimation, depth assessment and segmentation are interdependent processes that optimally should be treated in an integrated way.
- the motion estimation and the segmentation operators were coordinated through feedback of preliminary bilinear results 521 ,522
- the operators are fully integrated
- the bilinear estimation may be done with less computer power in this case, since it operates on more fully isolated segments individually
- input data 605 and preliminary segmentation and bilinear modelling results 623 are input to EstMov/EstSeg 620, which delivers motion fields DA Rn and its estimated uncertainties, occlusions etc 625 and segment information 665 about the holons foun.
- EstBLM 640 a bilinear model is then updated for each holon separately.
- inter-holon relations and pixel with unclear holon classification may tentatively be weighted down or defined as Y- variables, as described in the third preferred embodiment.
- motion estimation and segmentation may also be seen as integrated parts of the bilinear model estimation.
- EstBLM 540, 640 the estimation process does not have to be run till convergence or full stability, like in conventional singular value decomposition or eigenvalue decomposition. It is sufficient that the subspace obtained for each holon 545 improves the next round of motion estimation and segmentation.
- the modification of the input data to EstBLM through improved motion estimation and segmentation may be treated as part of the bilinear estimation process.
- the whole sequence is subjected to motion estimation; then these motion estimates for the whole sequence are submitted to bilinear modelling.
- the bilinear model or models of the holons in the sequence is used for segmentation.
- the bilinear model results 522 and segmentation 521 results from previous iterations (or pyramidal resolution levels) are fed back into the motion estimation 520 and bilinear modelling 540 in order to stabilize and facilitate these estimation processes
- the modelling of the whole sequence may then be repeated, using the updated bilinear motion and intensity change models and the updated segmentation. - o -
- the segmentation 565 may likewise be updated after each frame. In the preferred embodiment major re-segmentation is only allowed when the motion data clearly shows the need for it, except for pruning processes along holon edges.
- the order in which the individual frames are pulled into the modelling and segmentation may not be fixed. Once all the frames have been modelled and segmented, the whole process may be started again for the sequence, now with better start values for the bilinear model and segmentation.
- Estimated uncertainties in the estimated segment borders are stored with the segment border information itself and used in subsequent encoding steps. Pixels with unclear segment classification, e.g. in a region around the chosen segmentation border, are treated as having high uncertainty. In subsequent motion estimation and bilinear modelling the uncertainty pixels are given low weights or passively fitted by principal component regression (Martens & Naes 1989), mentioned before. In subsequent segmentations, uncertain pixels are included in the new segmentation estimation, but are given low a priori input weights (705)
- each new iteration is defined as follows:
- the smoothing method may be a simple boxcar filter, or a method that seeks to avoid smoothing across apparent edges that should be left unsmoothed.
- the smoothed loading p a is orthogonalized with respect to the loadings of previously estimated factors [p 1 ,p 2 ,...,p a - ⁇ ].
- a further enhancement of the bilinear modelling in this preferred embodiment is to use iterative reweighted least squares fit of the bilinear model to the data, in order -12- to reduce the effect of outlier frames or outlier pixels:
- the weights for rows, f r, m . s and columns V P . ⁇ s in eq.(3) may iteratively be updated according to the inverse of updated estimated of average uncertainty standard deviation for the rows and columns, based on corrected residuals from the low-rank bilinear model in the previous iteration.
- the bilinear modelling 540, 640 not only the bilinear model parameters may be changed to attain better modelling, but also the values in the input data, e.g. DA Rn , may be changed in the bilinear model parameter estimation process.
- the individual elements in the motion data da n , pe ⁇ for frames and pixels may iteratively be modified to adhere more closely to the model derived from other frames or pixels:
- segmentation / clustering techniques can be used for determining groups of frames (sequences) which are sensible to analyze together - as well as detecting scene shifts.
- One embodiment of this is to perform simple bilinear modelling of (possibly subsampled) frame intensities, and perform non-hierarchical cluster analysis in the score space T to find clusters of frames that has much image material in common.
- a robust single linkage cluster analysis is the preferred cluster method in this embodiment, in order to be able to follow motions within a scene within one single cluster.
- Yet another embodiment is to apply the above principles to time series data, e.g. sound data, in order to define temporal segments.
- the spatial motion field data correspond to time warp estimates
- the spatial intensity change data correspond to temporal intensity change data.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP8528905A JPH11502653A (en) | 1995-03-22 | 1996-03-22 | Multi-frame method and device based on data stream division |
| AU52735/96A AU5273596A (en) | 1995-03-22 | 1996-03-22 | Method and apparatus for multi-frame based segmentation of d ata streams |
| EP96909117A EP0815537A1 (en) | 1995-03-22 | 1996-03-22 | Method and apparatus for multi-frame based segmentation of data streams |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP95104229.0 | 1995-03-22 | ||
| EP95104229 | 1995-03-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO1996030873A1 true WO1996030873A1 (en) | 1996-10-03 |
Family
ID=8219106
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP1996/001273 WO1996030873A1 (en) | 1995-03-22 | 1996-03-22 | Method and apparatus for multi-frame based segmentation of data streams |
Country Status (6)
| Country | Link |
|---|---|
| EP (1) | EP0815537A1 (en) |
| JP (1) | JPH11502653A (en) |
| KR (1) | KR19980702923A (en) |
| AU (1) | AU5273596A (en) |
| WO (1) | WO1996030873A1 (en) |
| ZA (1) | ZA962304B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2010056904A3 (en) * | 2008-11-12 | 2010-08-12 | University Of Utah Research Foundation | Image pattern recognition |
| US8285079B2 (en) | 2010-03-19 | 2012-10-09 | Sony Corporation | Method for highly accurate estimation of motion using phase correlation |
| US8488007B2 (en) | 2010-01-19 | 2013-07-16 | Sony Corporation | Method to estimate segmented motion |
| CN112766325A (en) * | 2021-01-04 | 2021-05-07 | 清华大学 | Traffic data multi-mode missing filling method based on space-time fusion |
-
1996
- 1996-03-22 WO PCT/EP1996/001273 patent/WO1996030873A1/en not_active Application Discontinuation
- 1996-03-22 JP JP8528905A patent/JPH11502653A/en active Pending
- 1996-03-22 ZA ZA962304A patent/ZA962304B/en unknown
- 1996-03-22 AU AU52735/96A patent/AU5273596A/en not_active Abandoned
- 1996-03-22 EP EP96909117A patent/EP0815537A1/en not_active Withdrawn
- 1996-03-22 KR KR1019970706331A patent/KR19980702923A/en not_active Withdrawn
Non-Patent Citations (3)
| Title |
|---|
| GREGORS W. DONOHOE: "Combining segmentation and tracking for the classification of moving objects in video scenes", PROCEEDINGS OF THE ASILOMAR CONFERENCE ON SIGNALS,SYSTEMS AND COMPUTERS , OCTOBER 31-NOVEMBER 2,1988 ;PACIFIC GROVE (US);IEEE COMPUTER SOCIETY PRESS, pages 533 - 537, XP000130310 * |
| S.Y. KUNG ET AL.: "Neural networks for extracting pure/constrained/Oriented Principal components", SVD AND SIGNAL PROCESSING,II ALGORITHMS ,ANALYSIS AND APPLICATIONS ; ELSEVIER ,AMSTERDAM (NL), 1991, pages 57 - 81, XP000574672 * |
| TERRANCE E. BOULT ET AL.: "Factorization-based segmentation of motions", PROCEEDINGS OF THE IEEE WORKSHOP ON VISUAL MOTION, OCTOBER 7-9, 1991 ,PRINCETON (US): IEEE COMPUTER SOCIRTY PRESS, LOS ALAMITOS (US), pages 179 - 186, XP000579543 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2010056904A3 (en) * | 2008-11-12 | 2010-08-12 | University Of Utah Research Foundation | Image pattern recognition |
| US8488007B2 (en) | 2010-01-19 | 2013-07-16 | Sony Corporation | Method to estimate segmented motion |
| US8285079B2 (en) | 2010-03-19 | 2012-10-09 | Sony Corporation | Method for highly accurate estimation of motion using phase correlation |
| CN112766325A (en) * | 2021-01-04 | 2021-05-07 | 清华大学 | Traffic data multi-mode missing filling method based on space-time fusion |
Also Published As
| Publication number | Publication date |
|---|---|
| ZA962304B (en) | 1996-09-27 |
| EP0815537A1 (en) | 1998-01-07 |
| AU5273596A (en) | 1996-10-16 |
| KR19980702923A (en) | 1998-09-05 |
| JPH11502653A (en) | 1999-03-02 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US6157677A (en) | Method and apparatus for coordination of motion determination over multiple frames | |
| Simon et al. | Rethinking the CSC model for natural images | |
| US6535632B1 (en) | Image processing in HSI color space using adaptive noise filtering | |
| Li et al. | Multiresolution image classification by hierarchical modeling with two-dimensional hidden Markov models | |
| US6480615B1 (en) | Motion estimation within a sequence of data frames using optical flow with adaptive gradients | |
| Moore et al. | Panoramic robust pca for foreground–background separation on noisy, free-motion camera video | |
| KR101216161B1 (en) | Apparatus and method for processing video data | |
| US5995668A (en) | Segmented picture coding method and system, and corresponding decoding method and system | |
| KR100256194B1 (en) | Method and system for estimating motion within video sequence | |
| Freeman et al. | Learning to estimate scenes from images | |
| US20040189863A1 (en) | Tracking semantic objects in vector image sequences | |
| KR20070107722A (en) | Apparatus and method for processing video data | |
| Zou et al. | A nonlocal low-rank regularization method for fractal image coding | |
| CN118710552B (en) | A method, system and storage medium for restoring thangka images | |
| EP0815537A1 (en) | Method and apparatus for multi-frame based segmentation of data streams | |
| Pan et al. | Nonlocal low rank regularization method for fractal image coding under salt-and-pepper noise | |
| Coelho et al. | Data-driven motion estimation with spatial adaptation | |
| Ghosh et al. | Robust simultaneous registration and segmentation with sparse error reconstruction | |
| Malézieux et al. | Dictionary and prior learning with unrolled algorithms for unsupervised inverse problems | |
| Decenciere et al. | Applications of kriging to image sequence coding | |
| Malassiotis et al. | Tracking textured deformable objects using a finite-element mesh | |
| US7477769B2 (en) | Data processing method usable for estimation of transformed signals | |
| Mester et al. | Segmentation of image pairs and sequences by contour relaxation | |
| LaValle | A Bayesian framework for considering probability distributions of image segments and segmentations | |
| CN120238709B (en) | Trajectory control video generation method and device based on depth information and time-frequency optimization |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| WWE | Wipo information: entry into national phase |
Ref document number: 96192717.8 Country of ref document: CN |
|
| AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU TJ TM |
|
| AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN |
|
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| WWE | Wipo information: entry into national phase |
Ref document number: 1019970706331 Country of ref document: KR |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1996909117 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 1996 528905 Country of ref document: JP Kind code of ref document: A |
|
| ENP | Entry into the national phase |
Ref document number: 1997 930341 Country of ref document: US Date of ref document: 19971127 Kind code of ref document: A |
|
| WWP | Wipo information: published in national office |
Ref document number: 1996909117 Country of ref document: EP |
|
| REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
| WWP | Wipo information: published in national office |
Ref document number: 1019970706331 Country of ref document: KR |
|
| NENP | Non-entry into the national phase |
Ref country code: CA |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 1019970706331 Country of ref document: KR |
|
| WWW | Wipo information: withdrawn in national office |
Ref document number: 1996909117 Country of ref document: EP |