WO1996030873A1

WO1996030873A1 - Method and apparatus for multi-frame based segmentation of data streams

Info

Publication number: WO1996030873A1
Application number: PCT/EP1996/001273
Authority: WO
Inventors: Harald Aagaard Martens; Jan Otto Reberg
Original assignee: Idt International Digital Technologies Deutschland Gmbh
Priority date: 1995-03-22
Filing date: 1996-03-22
Publication date: 1996-10-03
Also published as: ZA962304B; EP0815537A1; AU5273596A; KR19980702923A; JPH11502653A

Abstract

The present invention concerns methods for attaining grouping or segmentation in large signal streams by development and analysis of a manifold or subspace of parameters. A method for segmenting samples in frames of an input signal comprises the steps of: (1) forming a reference image, consisting of samples from a plurality of said frames, (2) estimating motion from said reference image to each of said frames, (3) reformatting the estimated motion into row vectors, (4) collecting said row vectors into a motion matrix, (5) performing a Principal Component Analysis on the motion matrix, thereby obtaining a score matrix consisting of a plurality of column vectors called score vectors and a loading matrix consisting of a plurality of row vectors called loading vectors, such that each score vector corresponds to one element for each frame, such that each element of each loading vector corresponds to one element of the reference image, such that one column of said score matrix and one loading vector together constitute a factor, and such that the number of factors is lower than or equal to the number of said frames, (6) reformatting each loading back into same format as used for the motion, (7) performing segmenting based on the plurality of reformatted loadings.

Description

Method and apparatus for multi-frame based segmentation of data streams

Related Applications

This application is related to the following applications assigned to the same applicant as the present invention and filed on even date therewith, the disclosure of which is hereby incorporated by reference:

- Method and apparatus for coordination of motion determination over multiple frames (Attorney file: IDT 011 WO)

- Method and apparatus for depth modelling and providing depth information of moving objects (Attorney file IDT 015 WO)

Field of invention

The present invention concerns methods for attaining grouping or segmentation in large signal streams by development and analysis of a manifold or subspace of parameters. In particular, it concerns methods for spatiotemporal segmentation of video signals.

Background

Mathematical parameterization of massive signal streams such as video or sound signals presents both statistical estimation problems and computational capacity problems. Segmentation of the signal streams can reduce both of these problems:

Firstly, by splitting the signal streams into two or more sub-groups or segments of particularly interrelated signals, the subsequent mathematical description of the data may require fewer independent parameters. This simplifies the statistical modelling. Secondly, by being more compact, each of the segment models may also be easier to control and interpret, e.g. for editing purposes.

Thirdly, having segmented the signal streams, subsequent computational treatments of the invidividual segments may be computationally simpler than having to treat the full signal streams, for instance in reducing the amount of high-speed memory required for effective computation.

In order to attain these statistical and computational benefits from having segmented the data stream, the segmentation process itself must also be statistically and computationally effective. The present invention concerns how to attain relevant and reliable segmentation.

While the present invention can also be applied to other types of signals such as sound signals, multi-frame digital video data is a primary application and will be used as the example for illustration.

The use of segments in video coding

In model based video coding, image segmentation is important: groups of pixels (pels) that display consistently related spatial patterns of change over groups of frames should be modelled together, as this gives best compression, editability and interpretability.

A segment may correspond to one physical object, but it may also correspond to only a part of a physical object, or to a set of several such objects. It may also correspond to non-tangible objects or phenomena such as shadows.

In statistically oriented model based video coding the optimal definition of a segment (a 'holon') differs with the purpose of the coding: For pure compression purposes the segments ideally correspond to the groups of pels that are most effectively compressed, but if the purpose is coding for later video manipulation such -J- as editing or video games, then the segments ideally corresponds more to physical objects.

The segmentation process must be robust, i.e. it must give visibly acceptable, statistically useful segments with applicability over many related image frames. Yet, it must be computationally feasible with respect to cpu time and memory requirements.

Some existing segmentation methods are described in the following references:

Boyer, K.L., Mirza, M.J. and Ganguly, G. (1994) The robust sequential estimator: A general approach and its application to surface organization in range data. IEEE Transactions on pattern analysis ad machince intelligence 16,10 Oct. 1994, pp. 987-1001 ,

Gϋnsel, B. and Panayirci, E. (1994) Segmentation of range and intensity images using multiscale Markov random field representation. Proceedings, IEEE Intl. Conf. on Image Proα, Austin Texas, Nov. 13-16 1994, Vol. II, p. 187-191 , IEEE Computer Soc. Press Los Alamitos, CA, U.S.A.,

Dellepiane, S., Fontanta, F. and Vernazza, G. (1994) A robust non-iterative method for image labelling using context. Proceedings, IEEE Intl. Conf. on Image Proα, Austin Texas, Nov. 13-16 1994, Vol. II, IEEE, p. 207-211 , Computer Soc. Press Los Alamitos, CA, U.S.A.,

and

Russ, J.C. (1995) The Image Processing Handbook, 2nd ed., IEEE Press/CRCR Press, London, p. 347-401 , which are hereby included by reference.

The segmentation methods for video coding mainly fall into two main categories: Still image segmentation and motion based segmentation. Still image segmentation is based on defining spatial intensity patterns in an individual image. A drawback with this type of segmentation is that it is difficult to distinguish between spatial contours inside and along the edges of objects.

Motion based segmentation concerns how the image intensities vary between images. In automatic video coding the segmentation is often based on the latter, and is attained by analysis of estimates of motion fields. One established approach to segmentation is to estimate the motion field between two frames, say from a reference frame R and another frame n (here termed 'Difference in Address', DA_Rn), and to search for groups of pels in DA_Rπ with similar motions. In addition, a premium is put on pels that lie physically close to each other in at least one of the images. DA_Rn may have one, two or more motion dimensions.

Motion based segmentation can be generalized to change-based segmentation, where the changes could also include 'Difference in Intensity', i.e. intensity changes between pairs of frames, DI_Rn, e.g. motion compensated and at different color channels.

In situations where the segments are going to be used for many frames it is unsatisfactory to base the segmentation on information from only one single frame or one single frame pair, due to the phenomenon of statistical overfitting, and because the chosen frames may not be sufficiently representative for the rest of the frames in question. While the segmentation may appear quite good for the actual frame or frame pair used in the segmentation, the obtained segments may represent very bad grouping of other frames.

To find statistically valid segments that have bearing for many frames, it is necessary to search a good many of these frames,- say. 5-50,- for statistically related clusters of pixels(pels). This has drawbacks To perform the segmentation on each of the frames separately requires the different frames' segmentation results to be coordinated afterwards. Each frame's segmentation is sensitive to noise in this frame's input data. Also, to store the motion fields for many individual frames may require much -r- memory, and to perform the segmentation analysis for all these motion fields simultaneously may be computationally expensive.

Objects of the invention

It is an object of the present invention to facilitate the finding of grouping of signals in a signal stream such that the groups or segments have high statistical robustness and high validity for a number of frames of signal.

It is a further object to perform the multi-frame segmentation in a computationally effective way that keeps down the amount of data needed in the segmentation.

It is a further object to ensure that the segmentation can be recursively up- and downdated.

It is also an objective to be able to use different types of information in the segmentation - both temporal motion and intensity change information as well as spatial continuity and discontinuity information.

It is further an object of the present invention to estimate segment information facilitate subsequent motion estimation and intensity change estimation and the bilinear modelling of these.

It is a further object of the invention to allow segments to be partially overlapping.

It is a further object of the invention to allow segments to be partially transparent.

It is also an object to define the segments so that they have an adequate balance between internal systematic similarity and rigidity on one hand (for statistical stabilization) and internal heterogeneity and flexibility on the other (for realistic description of input data).

Summary of the invention

In the present invention, segmentation is based on change information for a number of relevant frames, not only on changes between two frames. Thereby the segments obtained are statistically more stabile and have a higher validity.

The way the many change data are represented in the segmentation computations is preferably via common manifolds or subspace models, primarily by bilinear modelling based on a common reference position. This further enhances the statistical precision and validity of the segmentation, since some noise-based and other non-significant change types can be ignored. It also reduces the computational complexity of the segmentation work by reducing the dimensionality of the change data needed to be analyzed in the segmentation.

The subspace segmentation can be recursively updated in cases when the subspace representation itself is recursively updated, this gives computational advantages.

The change information used in the segmentation can be of various kinds, motion information and intensity change information.

The invention can be applied to signal streams in general. In particular it can be applied to spatio-temporal segmentation of digital video signals and temporal segmentation of digital sound data

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 illustrates how a frame-size (with nv x nh pixels) motion field in one motion direction (here- DV_Rn for the vertical motion for each pixel for moving (warping) image R to approximate image n) can strung out as a one-dimensional vector (with nv^*nh elements).

Figure 2 illustrates how two frame-size ( nv x nh pixels each) motion fields DA_Rn=[DV_Rn and DH_Rn] for the vertical and horizonal directions) can be strung out together as a one-dimensional vector (with 2 *nv*nh elements), for the case when both motion directions are modelled simultaneously. More dimensions, e.g. depth changes, may similarly be included in DA_Rn.

Figure 3 is an illustration of how a matrix D (e.g. motion fields DA_Rn for many frames n=1 ,2,...) can be modelled by the bilinear product of two lower-rank matrices T*P^T plus a residual matrix E.

Figure 4 illustrates the parameters from Figure 3 pertaining to one single frame.

Figure 5 shows the third preferred embodiment, in which motion estimation and segmentation are performed separated.

Figure 6 shows the fourth preferred embodiment, in which motion estimation and segmentation are performed simultaneously.

DESCRIPTION

Notation and definitions

In the following, the symbol ' ^* ' is used for multiplication when needed. The symbol ' x ' is used for representing dimensions of a matrix (e.g. Size = nRows x nCokimns). Boldface uppercase letters are used for representing data matrices, and boldface lowercase letters for data vectors.

Extracting bi-linear summaries of many motion fields Some background information for the present invention is given in Patent applications WO 95/08240 and WO 95/34172. Additional information on coordination between multiframe segmentation, motion estimation and bilinear modelling is given in the applicaton "Method and apparatus for coordination of motion determination over multiple frames", mentioned before. Information about estimation of depth between segment occlusion over several frames is given in the aforementioned application "Method and apparatus for depth modelling and providing depth information of moving objects".

A motion field describes how the pels in one image, say reference frame R, should be moved in order to approximate another image, say n. Such a motion field can be seen as an 'image' in its own right, with values for each motion dimension,- e.g. one image DH_Rn for horizontal motion (zero values = no horizontal motion, negative values = left motion, positive values = right motion) and one image DV_Rn for vertical motion (zero values = no vertical motion, negative values = upward motion, positive values = downward motion).

Each motion field image, e.g. DV_Rn can be strung out as a one-dimensional vector d_n, with nPel elements, - one element for each pel in the reference image in which the motion information is given, as illustrated in Figure 1.

The different motion dimensions may be strung after one another in one and the same vector, which then has a multiple of nPel elements, as illustrated in Figure 2.

When a set of such motion field vectors have been estimated, for several frames d_n n=1 ,2,...,nFrames, they can be analyzed together as a matrix D.

Bilinear modelling (BLM) is well established as a method for approximating sets of related vectors (Figure 3). The bilinear factor model can be written as a bilinear matrix product plus a residual (conf. H. Martens & Naes.T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd Chichester UK, which is hereby included by reference): -1-

D = T * P^T + E (1 )

where

D is the data to be modelled,- it has one row for each frame modelled and one column for each pixel (pel) variable to be modelled simultaneously (e.g. one horizontal and one vertical motion element for each pixel.)

T is the so-called Temporal scores for the bilinear factors,- it has one row for each frame modelled and one column for each bilinear factor modelled, f=1 ,2,...,nf.

P^τ is the so-called spatial loadings for the bilinear facors,- it has one column for each pel variable to be modelled simultanelusly and one row for each bilinear factor model f=1 ,2 nf. The superscript ^τ means 'transposed'.

E represents the Error or unmodelled residual - with the same matrix dimensions D.

For the motion field between frame R and a given frame n, the bilinear model (Figure 4) is written:

d_r= t_n * P^τ + e_n (2)

When the motion fields DA_Rn,n=1 ,2,..., or modifications of these, from a set or subset of frames are defined as data D, the loading subspace P^τ, spanning the most significant row space of D, represents the motion information more or less common to several frames in the sequence. The score vectors t_n for frame n=1 ,2,.., estimated for each frame separately or for many frames jointly (see below), serve to convey this common motion information P^τ back to each individual frame pair.

A number of different methods may be used for bilinear model extraction of T*P^T from D in the present context of multi-frame segmentation, e.g. weighted singular value decomposition based on the QR algorithm, with or without adaptive up- and downdating. Collectively, they are hereafter referred to as bilinear modelling (BLM) or principal component analysis (PCA).

Details on bilinear modelling methods are given by:

Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK, by Martens, M. and Martens, H. 1986: Partial Least Squares regression. In: Statistical Procedures in Food Research (J.R. Piggott, ed.) Elsevier Applied Sciences London p. 293-360, by Jackson, J.E. (1991 ) A User's guide to principal components. J.Wiley & Sons, Inc. New York, by Jolliffe, IT. (1986) Principal Component Analysis. Springer Series in Statistics, Springer-Verlag New York, by Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. Academic Press, Inc., New York, by Sharaf, M.A., lllman, D.L. and Kowalski, B R. Chemometrics, J.Wiley & Sons, New York 1986 and by Kung, S.Y., Diamantaras, K.I. and Tauer, J.S. (1991 ) Neural Networks for extracting pure/constrainted/oriented principal components. In: R.Vaccaro (ed): SVD and signal processing II. Elsevier Science Publishers 1991 , pp 57-81. These references are hereby included by reference

It is important to note that the bilinear modelling does not have to fully converge or to be optimized with respect to orthogonality, separation of eigenvalues etc. for the present purpose; the important thing is to find an adequate subspace basis for the approximation of the data D.

The bilinear model may be incrementally updated, as described in the application titled "Method and apparatus for coordination of motion determination" mentioned before

The bilinear modelling may be performed after preprocessing consisting of subtracting the mean of each column One may also mean center each row These mean data must be added back upon reconstructions based on the bilinear model More advanced preprocessing may also be used, such as multiplicative scatter correction (MSC) and its extensions, as described by Martens, H. and Naes, T (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference. Bilinear model parameter estimation methods that involve smoothing of scores and loadings, or modifications of the individual data elements in data matrix D may also be used.

If information is available on the relative reliability or validity of the various pels in the various frames, this information may be used to weigh the relative importance of the different input data: The bilinear extraction of factors may then be performed on weighted data: Let G= the motion fields DA_Rπ, n=1 ,2,..., (possibly after column centering) or modifications of these, from a set or subset of frame pairs, and let

D = V_framβs ^*G^*Vp_β,_s (3)

where V_frames = weight matrix of the frames, e.g. diag(1/s_n,n=1 ,2,...), and s_n = estimate of uncertainty standard deviation of frame n Vpβis = weight matrix of the pixels, e.g. diag(1/s_pβι,, pel=1,2,...), and s_pβι = estimate of uncertainty standard deviation of pixel pel.

In this way, pixels (columns in G) with high uncertainty may be weighted down, but still modelled together with the other, more certain pixels. Alternatively, such separation of certain and uncertain pixels may be attained by two-block instead of one- block BLM. Uncertain pixels are kept out from the bilinear modelling of the more certain pixels. The uncertain pixels' loadings are estimated by regression on the certain pixels' scores, as described for principal component regression (PCR) and partial least squares regression (PLSR) by Martens & Naes 1989 mentioned before. This is applicable also in re-allocating pixels from one segment to another, where the re¬ allocated pixels' loadings need to be estimated relative to their new segment allocation.

One main purpose of the bilinear model is to attain a compact representation of a large amount of input data. To attain this, the number of 'significant' factors used in the model T^*P^T should be low,- i.e. the model should be of row rank. This number of significant factors may be estimated in varous ways, e.g. by cross validation or from the -lλ- residuals and leverages after varying number of factors, as described by Martens & Naes 1989, which was mentioned before.

Previously defined or estimated loadings ('pseudo-loadings') may be used as part of the modelling of data matrix D. In that case, the scores for these a priori factors are estimated by projecting D on these pseudo-loadings, and the bilinear modelling is performed on the residuals after this projection (weighted regression).

The estimation of the scores for one individual frame may be based on linear regression, projecting d_Rn on P^τ using some weighted or robustly reweighted least squares minimization. Alternatively, it may be based on nonlinear iterative curve fitting by e.g. SIMPLEX optimization (J.A. Nelder and R. Mead, 'A simplex method for function minimization', Computer Journal, vol. 7, p 308-313.) In that case the criterion may be based also on the decoding intensity error arising when using the scores

As described here, the change information d_Rn is represented as e.g. as motion field DA_Rn in the reference position, so that it is compatible with the bilinear loadings P also represented in the reference position. Alternatively, the change information may be represented in the position of the pixels in frame n, e g the reverse motion field DA_nR, and projected on a compatible version of the loadings P, i.e P temporarily moved to that same position using motion field DA_Rn

Representing the spatial information from many frames in one common image position

The effectiveness of the bilinear model of motion is dependent on how the motion fields are represented When a rigid body moves (translation, rotation, scaling) in 3D space before a camera, the corresponding motion field can be described by a low-dimensional bilinear model Systematic movements of non-rigid body (e g a face breaking into a smile) can also be approximated by the bilinear model

However, a low-dimensional bilinear model is primarily effective when the motion fields (or other change fields) are stored in D in one given representational system, so that all information about an object is stored in the same pixel positions for -fl^¬ ail frames. This can be attained by letting the motions of each of the frames in a set of related frames be related to one given 'reference image' R, and stored in this reference image's coordinate system. This reference image could e.g. be the first, middle or last image in the sequence n=1 ,2,...,N, or some synthetic image model with parts from several frames.

One example of this is the IDLE codec type (according to WO 95/08240 and WO 95/34172), where the motion, intensity changes and other modelled change information for a number of (consecutive) frames is directly or indirectly represented relative to a common 'extended reference image model' for a given class of pixels (spatial 'holon') in a given set of related frames. Before the segmentation has started, the whole initial reference image (e.g. the first frame in the sequence) is regarded as one holon. The main purpose of the spatial segmentation is then to split this initial spatial holon into data structures that each lends itself to simple, low-dimensional mathematical modelling.

The motion field estimates can be performed directly from this reference image l_R to frame l_n, and analyzed directly in D. Alternatively the motion fields may be estimated from a moved (warped) version of this reference image: l_m = Move(l_R, by DA_Rm) to l_n,- the local motion field DA_mn estimated and then moved back to the reference position e.g. by the inverse of the motion field that generated l_m, yelding the motion estimate DA_Rn = DA_Rm + Move(DA_mn by DA_mR)

One advantage of the present invention is hence to use the resulting compact, low-rank summary of several frames' motion fields and other change fields to enhance and stabilize the segmentation in video coding.

Similarly, segmentation may be performed in the temporal domain to find groups of frames where certain spatial patterns are found. The temporal segmentation may then utilize subspaces information derived from bilinear modelling of time-shifted versions of relevant time series (H. Martens & M. Martens (1992) NIR spectroscopy - applied philosophy. Proceedings, 5th Intematl Conf. NIR Spectroscopy (K.I.Hildmm.ed) North Holland; pp 1-10), e.g. time-shifted versions of the static frame scores obtained by bilinear modelling of change fields, as described for equation (1 ).

Applications of multi-frame motion-based segmentation

The bilinear summaries of multiple frames' motion fields may be used in several ways in the segmentation.

The order in which the frames are modelled, is in the preferred embodiments forward and sequential. However, the order may also be selected according to other criteria,- e.g. according to which frame at any given time shows the most need and potential for model refinement.

The bilinear model based segmentation may be used pyramidally. One example of this is to perform the segmentation on frames in reduced resolution, in order to identify the major holons in the sequence, and then to use these results as preliminary, tentative input to the same process at higher frame resolution.

In the preferred embodiments, the motion estimation, bilinear modelling and segmentation may be performed on individual, already identified holons ('input holons'), or on complete, unsegmented images l_n. In either case, a multi-holon pre- or post processing of the obtained motion fields, bilinear models and segments is desired in order to resolve overlap between input holons

One such pre- or post processing is based on having stored each holon with a 'halo' of neighbour pixels with uncertain holon membership,- i.e that only tentatively can be ascribed to a holon ( and thus is also temporarily stored in other holons or as separate lists of unclear pixels) In the motion estimation such tentative halo pixels are treated specially, e.g by being fitted for all relevant holons, and their memberships to the different holons updated according to the success of the motion estimates Such halo pixels are given very low weight or fitted passively in the bilinear modelling (by Principal Component Regression, Martens, H and Naes, T (1989) Multivariate -IS- Calibration. J.Wiley & Sons Ltd, Chichester UK, which is hereby included by reference).

Extra variables

Additional columns in the raw data matrix G (equation (3)) may be formed from 'external scores' from other blocks of data. Sources of such 'external scores' are: scores from bilinear modelling of some other data domain,

(e.g. motion compensated intensity residuals of the same holon), scores from other holons, preferably in nonlinear representation (see e.g. A. Gifi: Nonlinear Multi-variate Analysis. J.Wiley & Sons Ltd Chichester 1990), with each quantitative score vector quantized and analyzed as an indicator matrix (p.67) or at the ordinal level (p 187), which is hereby included by reference, or scores from the same holon at a different spatial resolution, scores from external data such as sound

(e.g. after bilinear modelling of sound vibration energy spectra of these same frames).

The weights for such additional variables must be adapted so that their uncertainty level become similar to those of the weighted pixels in the final data matrix to be modelled, D (equations (1 ) and (2)).

An alternative way to incorporate uncertain pixels or extenal scores gently, without forcing their information into the bilinear model, is to replace the one-block bilinear modelling with a two- modelling, such as PLS regression (Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK.) or a with multi-block or N-way modelling, such as Parafac (Sharaf, M.A., lllman, D.L. and Kowalski, B.R. Chemometrics, J.Wiley & Sons, New York 1986) or Consensus PCA/PLS (described by Martens, M. and Martens, H. 1986: Partial Least Squares regression. In: Statistical Procedures in Food Research (J.R. Piggott, ed.) Elsevier Applied Sciences London p. 293-360, and by Geladi, P., Martens, H.. Martens, M., Kalvenes, S. and Esbenseπ, K. (1988) Multivariate Comparison of Laboratory Results. -IL-

Proceedings, Symp. Applied Statistics, Copenhagen Jan 25-27 1988 (Per Thorbδll, ed.) Uni.C, Copenhagen pp 16-30. These references are hereby included by reference.)

In this way, the uncertain pixels and external scores contribute positively to the modelling if they fit well, but do not affect the modelling strongly in a negative way if they do not fit. In any way these uncertain pixels and external scores are fitted to the obtained bilinear model.

The scores from the present holon's modelling in the present resolution and in the present domain may in turn be used as 'external scores' for other holons or at other resolutions or in other domains.

Preferred Embodiments

The stabilization of the segmentation by the use of multi-frame summaries may be embodied in different ways.

In the first preferred embodiment, the bilinear segmentation process employs a top-down approach, removing segments from the input holon: Pixel areas in the motion subspace that do not fit to a general holon model are detected as outliers and segmented out from the rest of the input holon

In the second preferred embodiment, the segmentation employs a bottom-up approach, attempting to let stabile seed points grow into homogenous, consistent segments in the input holon

In the third preferred embodiment, the segmentation is separated from the motion estimation and -compensation (Figure 5), with bilinear modelling of the frames' motion fields and other estimated change data taking place in between

In the fourth preferred embodiment, motion estimation and actual segmentation are integrated (Figure 6), followed by the bilinear modelling. In the fifth preferred embodiment, the motion estimation, bilinear modelling and segmentation process is done for a whole sequence.

In the sixth preferred embodiment, the motion estimation, bilinear modelling and segmentation and model is updated gradually for individual frames.

In the seventh preferred embodiment, the bilinear modelling methods used for segmentation are expanded to include additional criteria than just explained covariance,- in this case spatial and temporal smooting is used as extra criteria. Reweighting of the rows and columns of the input data to the bilinear modelling is also included.

In the eigth preferred embodiment, the bilinar modelling is combined with optimal scaling so that not only the weights, but also the input data themselves are changed during the model estimation process: As long as the prediction of a data element from a preliminary low-rank bilinear model does not give significantly worse decoding results than the element's input value, its input value is replaced by its bilinear prediction.

Preferred embodiments

First Preferred Embodiment: Segmentation based on outlier analysis around spatial model

Figure 5 shows the main building blocks of the bilinear model based segmentation: a motion estimator unit EstMov 520, a bilinear modelling unit EstBLM 540 and a segmentation unit EstSegm 560, and the data streams between them. More details on the data streams will be given in conjunction with the Third Preferred Embodiment.

The two first embodiments represent two implementations of the segmentation unit EstSeg 560,- top-down or bottom-up. - ι-

In the first preferred embodiment, the holon input to the EstSeg unit 560 is retained as intact as possible, but if the holon contains parts which display significantly and consistently different behaviour than the rest of the holon, then these parts are split out as separate, new segments. In addition, individual pixels, e.g. along holon edges, whose preliminary classification appears questionable, may be pruned away or otherwise identified as unreliable outliers.

A method to attain this top-down segmentation of a holon is described in the following pseudo-code with line-numbering:

Single-frame segmentation

First, the method is described for one single frame and for detection of a stiffly moving body.

Make a spatial model of each such potential segment, using reweighted linear least squares spatial modelling:

Let the vertical and the horizontal motion estimates for one given frame n relative to a reference frame R be treated as regressands Y in the spatial modelling:

Y=[DV_Rn DH_Rn] (701 ).

Let regressors X = [ 1 v h ] (702). where v is the column of vertical addresses for the pixels and h hteir horizontal addresses.

A model for motion by affine transformations is then:

Y = XB + F , (703 )

Estimate the 3 x 2 regression coefficient matrix B by reweighted least squares regression: Estimate the uncertainty standard deviation for every pixel (row) s=[s_pβι, pel=1 ,2 nPels] (704)

Define matrix of initial pixel weights W, e.g. indicated for all pixels: W=diag(1 , 1 ,1 ,1 1_nPβi_s) (705)

While reweighing process not converged (706)

B= (X^TWX)^'1X^TWY (Regression coefficient estimate) (710)

F= Y -XB (Residual) (720)

R= f(F,S) (Residual residual, relative to noise level matrix s) (730) where each pixel's residual f(pelj) for each column in Y, j=1,2,... is divided by the pixel's uncertainty standard deviation s(pel): r(pel,j)=f(pel,j)/s(pel) (735)

W=f(R) (updated weights for the pixels) (740) e.g. a function of the relative residuals accumulated over all Y-variables j=1 ,2,...: w(pel.pel) = c/(c+ r(pel,1 )² + r(pel,2)² + ...) (745) where the sensitivity coefficient c= e.g. 1.0.

Check convergence : e.g. is B stabile? (750) end while reweighing process converged (750)

Other estimates than just the sum of the relative residuals r(pel,1 )² + r(pel,2)² + ... may be used, e.g. median or some other robust distance measure.

In this process, pixels that do not fit well to the spatial model favoured by the majority of the pixels, will display significantly large relative residuals R and hence be weighted down so as to reduce their influence on the estimation of coefficients B in the next iteration, in which their residuals will be even larger, etc, so that they have very little influence on the estimation of B the final spatial model upon convergence.

Pixels with low final weights (e.g. with w(pel.pel) < 0.1 ) are defined as outliers not belonging to the input holon and collected into one new segment. This new outlier segment may be submitted to the same reweighted regression modelling, to check if it should be split further into smaller segments. The resulting segmentation then represents the output result 565.

In redefining the weights of the pixels in (740), effects of neighbouring pixels may also be brought in, to enhance spatial continuity of the holons. The a priori weights (705) may be modified, e.g. by using lower inital weights for pixels already known to be potentially invalid due to occlusions or particularly uncertain due to unsatisfactory bilinear modelling.

The uncertainty measure s(pelj) for each individual element (pelj) in Y may have been estimated, and may be used instead of the general uncertainty for each pixel, s(pel), in (745). This individual uncertainty measure may be assymetπc, so that positive and negative residuals may be assessed differently This is relevant for motion estimates of pixels in flat intensity areas near intensity edges (assymmetπc slack) The pixel may move far away from the edge without affecting the resulting intensitiy lack-of- fit, but cannot move beyond the edge.

The full-rank regression method employed in (710) may be replaced by other estimators, e g reduced-rank regression methods like PLS regression or some extension of this, as described in Martens, H and Naes, T (1989) Multivariate Calibration J Wiley & Sons Ltd, Chichester UK Multi-frame segmentation

This basic top-down segmentation method can be used for multi-frame segmentation: Instead of basing the segmentation only on motion fields of the holon for one single frame, using regressands

Y=[ DV_Rn DH_Rn],

it may be based on the motion fields of several frames:

Y=[ DV_R1 DH_R1, DV_R2 DH_R2 DV_Rn DH_Rn ] (760)

In this First Preferred Embodiment it is based on the scaled loading space that spans these motion patterns of the holon in several modelled frames:

Y =[ pv PH] = [pV_R1 pV_R2 pV_RJ , pH_R1 pH_R2, ... , pH_RK] (770)

where the number of bilinear factors for the vertical and horizontal motions, (here: J and K) is chosen so that only valid and reliable factors are used (as judged e.g. by cross validation over the frames). The factor loadings (columns in PV and PH ) should be scaled, e.g. so that their uncertainty variance is the same.

The regressands Y may also be defined as including intensity information, e.g. motion compensated intensity difference images.

Y=[DI_R1 DI_R2 DI_R3 DI_Rn] (775)

where DI_Rn= the motion compensated intensity difference between frame n and the common reference frame R, defined for individual color dimensions (e.g. RGB) or as some summary like luminosity. Alternatively, may be used loadings from bilinear intensity factors based on motion compensated intensity differences. The preferred way of making such motion compensated intensity differences between a frame n and the reference frame R is first to establish the motion field for the frame, DA_Rn=[ DV_Rn DH_Rn] and the corresponding depth estimate etc in motion estimator EstMov 520, then use this DA_Rn to move (warp) the reference image l_R to yield l_nHat, (i.e. the l_R-based approximation to frame l_n), then compute the intensity difference between l_nHat, and l_n, and finally to move this difference back to reference position, using the inverse of this motion operator, DA_nR, : DI_Rn = Move( ( l_nHat - l_n) by DA„_R).

Such intensity information may be used together with or separately from the motion informations. In any case, the columns Y should be scaled to reflect their relative desired impact on the segmentation, e.g. inversely to their relative average estimated uncertainty variances.

Alternative spatial structure models

The spatial structure model around which the residuals F in (703) are computed may be of another type than the one employed in (702). For example, X may also contain square and cross product terms of the addresses v and h. X may also contain square and cross product terms of the addresses v and h (Lancaster, P. and Salkauskas, K (1986) Curve and survace fitting. Academic Press, p 133, which is hereby incorporated by reference). Splines or piecewise polynomials may also be used (Lancaster & Salkauskas 1986, p 245, which is hereby incorporated by reference) Such higher order models can help distinguish between outlier pixels and pixels taking part in major, smoothly structured motions that are not affine transformations

X may also contain a spatially autoregressive part, with spatially shifted versions of Y included in X and where a rank-reducing regression method such as PLS regression is used (H. Martens & M Martens (1992) NIR spectroscopy - applied philosophy Proceedings, 5th Internatl Conf NIR Spectroscopy (K I Hildrum.ed) North Holland, pp 1-10) This spatial autoregressive model part allows the distinction between on one hand, outlier pixels that should be weighted down, and on the other hand, pixels that take part in major, smooth motions which are neither affine -22- transformation structure nor well described by spatial polynomial structure within the holon.

Alternative segment border detection mechanisms

Additional information may be introduced in order to optimize the precise positioning of the segment borders. One such source of information is intensity edges in the reference image l_R, as detected e.g. by a Sobel filter (J.C. Russ: The Image Processing Handbook, 2nd ed., IEEE Press 1995, which is incorporated herein by reference). If the relative weights W 740 after spatial modelling of Y indicates a segment border close to such an intensity edge, then the segment border is moved to this intensity edge.

More advanced statistical methods may also be used for determining the segment edges. One example of this was described by (Kok, F. Lai, 'Deformable contours: Modelling, extraction, detection and classification'. PhD thesis, University of Wisconsin-Madison 1995, which is incorporated herein by reference); for the present application the input information may be intensity l_R, intensity residuals DI_Rn (or BLM summaries of these), spatial residuals F 720, R 730 or model weights W 740, and/or the Y data themselves.

Second Preferred Embodiment: Segmentation based on cluster analysis

The Second Preferred Embodiment represents a bottom-up approach to segmentation of the input holon. It consists of performing a cluster analysis of the multi- frame motion data or their bilinear summaries.

A number of different clustering techniques may be used for finding groups of pels. The choice of clustering criterion and clustering algorithm defines the statistical properties of the clusters. For instance, there is choice between performing the cluster analysis for each motion direction (Vertical, Horizontal. Depth) separately, or analyse the directions jointly. The latter is the preferred implementation (but depth dimension may be lacking, at least initially in an encoding process). Two main groups of clustering techniques can be used: Cluster analyses which do not make spatial assumptions with respect to parameter smoothness or neighbourship in the image plane, and cluster analyses which do.

Conventional cluster analysis

The goal is now to find clusters of pixels which display motion patterns that are similar at least in some of the factor dimensions in P,- i.e. pixels that display similar motion patterns, at least in some significant dimensions

There are a number of different clustering principles that can be used on the basis of the bilinear motion subspace. Spatiotemporal distances can be computed on the basis of common or weighted Pythagorean distance measures, as well as normalized (Mahalanobis) distances. One approach is standard non-hierarchical cluster analytic techniques (Mardia, K.V., Kent, J.T. and Bibby, J.M. (1979) Multivariate Analysis. Academic Press, Inc., New York., Gudersen, Bob (1983) An adaptive fcv cluster algorithm. International J. of Man-Machine Studies 19, pp 97-104, Benzdek et al. Detection and characteristics of cluster sub-structures. SIAM J. of Applied Math 40,(2) April 1981 , Bezdek, J.C. and Pal, S.K. (1992) Fuzzy models for pattern recognition. IEEE New York.) An example of this type of cluster analysis is the partitioning technique described by Mardia, K.V., Kent, J.T. and Bibby, J.M (1979) Multivariate Analysis. Academic Press, Inc., New York pp 361-368 These references are hereby included by reference.

More detail on cluster analysis can be found in Mardia, K.V., Kent, J T. and Bibby, J.M (1979) Multivariate Analysis Academic Press, Iπc , New York, pp 360-390, in Benzdek et al Detection and characteristics of cluster sub-structures SIAM J of Applied Math 40,(2) April 1981. and in Bezdek, J C and Pal, S K (1992) Fuzzy models for pattern recongnition IEEE New York) In particular, the technique of fuzzy clustering (see Gudersen, Bob (1983) An adaptive fcv cluster algorithm International J of Man-Machine Studies 19, pp 97-104,) can be useful, in this technique bilinear modelling is used for parameterizing the internal structure of the clusters, and the - - clusters can be partially overlapping. Hierarchical cluster analyses are described in Sharaf, M.A., lllman, D.L. and Kowalski, B.R. Chemometrics, J.Wiley & Sons, New York 1986, p 219. These references are hereby included by reference.

Cluster analysis with spatial continuity assumptions in the image plane.

In the present embodiment, the cluster analysis searches for spatially related pixels with similar motion patterns. Boyer et al 1994 published a method of image segmentation which makes extensive - but not exclusive - use of spatial continuity (Boyer, K.L., Mirza, M.J. and Ganguly, G. (1994) The robust sequential estimator: A general approach and its application to surface organization in range data. IEEE Transactions on pattern analysis and machine intelligence 16,10 oct 1994 pp 987 - 1001 , which is hereby included by reference). An embodiment of the present invention is to extend their method from one frame (a single radar image Z ), measured in one dimension (distance) to employ information from several frames and measurements in several dimensions (Vertical motion, horizontal motion and other possible characteristics, see below).

The segmentation technique of Boyer et al mentioned above may be summarized into the following elements:

^* Analyze the spatial data (in the case of Boyer et al 1994, which was mentioned above: range data Z) to find sufficiently large smooth spatial regions that can serve as starting points for potential segments

* Make a spatial model of each such starting point, using reweighted least squares spatial modelling.

Around each starting point, fit the Y variables (Y=Z) to a spatial model and estimate residuals. In the present embodiment is used a linear model Y =XB + F, fitted by reweighted least squares minimizion, using a motion model X for affine transformations (702), as described above. But polynomial and/or autoregressive extensions may also be used in X. * Grow each such potential segment locally, by including adjacent pixels that seem to fit the preliminary segment model, gradually updating the segment model. This growing process is continued until no new pixels fit well to the holon's spatial segment model

* Attempt to extend each spatial model to the rest of the picture in order to seach for more remote pixels that may be long to the segment

^* Merge potential segments that have compatible spatial models

^* Prune and fill in outliers and resolve ambiguities along segment edges The precise edges of the segments may be optimized with the methods described in the first preferred embodiment

In the present invention the above technique is applied to other spatial parameter data than the radar range data used by Boyer et al mentioned above Instead of defining Y as the depth Z of an image, Y is defined according to (701 , 760 or 770) as motion information from several frames Optionally, intensity information (775) may be included as well, as described for the third preferred embodiment

Modelling of other parameter representations

Yet another embodiment, not to be detailed here, is to apply this segmentation technique to data, applicable for 1 D, 2D and higher-dimensional data, after having transformed the data to a frequence domain The transformed data may be represented as the result of direct FFT, as real and immagmary representation, or as phase & amplitude representation More complicated representations may also be used One example of this the use of phase correlation representation

It should be noted that the top-down and bottom-up approach to the multiframe segmentation may be combined One example of this is First, perform top- down segmentation analysis of the input holon in order to identify outlier regions that do not fit the majority or dominant structure within the holon. Secondly, using the bottom-up segmentation analysis to search for homogenous regions inside the outlier regions.

The next two preferred embodiments distinguishes between two ways of combining motion estimation and segmentation.

Third Preferred Embodiment: Separated Motion Estimation and Segmentation

In the third preferred embodiment (Figure 5) intensity data for the indiviual frame(s) l_n,n=1 ,2,... and of a reference image l_R 505 is input to a motion estimator 520. The resulting motion estimate DA_Rn 525 is passed to the bilinear modelling unit EstBLM 540. The resulting bilinear model parameters 545 are passed to the segmentation unit EstSegm 560, which yields segmentation results 565.

The EstMov operator 520 may contain facilities for detecting internal preliminary segmentation indicators such as edges in l_R or l_n as well as analysis of depth and spatial innovation; these may be used in order enhance the motion estimate DA_Rn 525 (e.g. by not blurring the motion fields across apparent preliminary segment borders), and are passed along with the motion estimate to the other units.

The bilinear model parameters 545 primarily consists of parameters modelling motion data DA_Rn and their uncertainties, but may optionally also include parameters concerning motion compensated intensity changes DI_Rn etc.

Some relevant methods for coordinating motion estimation and bilinear modelling is given in the aforementioned application titled "Method and apparatus for coordination of motion determination over multiple frames".

Several feed-back loop levels are used in this first embodiment.

First, the motion estimator EstMov 520 employs previously established, preliminary segment information 521 in order to optimize handling of edges, occlusions and depth: Motion fields are not to be smoothed across reliable preliminary segment borders.

EstMov 520 also uses previously established bilinar modelling results 522 in order to stabilize the motion estimation against underdeterminedness ambiguity and noise sensitivity. These preliminary information 521 and 522 have been obtained, in units EstBLM 540 and EstSeg 560, respectively, for previous frames, other pyramidal frame resolutions or previous iterations.

In the bilinear modelling unit EstBLM 540, bilinear models are developed separately for each preliminary segment (holon), based on preliminary segment information 521. In the bilinear modelling block EstBLM 540, information from other related holons, and from pixels whose holon membership is unclear, may be treated so as not to affect the bilinear modelling detrimentally (as extra X-variables with low weight, in pea-like bilinear modelling of a single block of variables X, or as Y-variables in PLS2- or PCR-like bilinear modelling of two blocks, X and Y.)

Likewise, preliminary bilinear model parameter estimates 522 may be modelled together with the new motion fields DA_Rn,n=1 ,2,. . 525 to yield the updated bilinear sequence models 545, for the motions and optionally for the motion compensated intensity changes etc.

Fourth Preferred Embodiment: Joint Motion Estimation and Segmentation

Motion estimation, depth assessment and segmentation are interdependent processes that optimally should be treated in an integrated way. In the third preferred embodiment, the motion estimation and the segmentation operators were coordinated through feedback of preliminary bilinear results 521 ,522 In the fourth preferred embodiment the operators are fully integrated The bilinear estimation may be done with less computer power in this case, since it operates on more fully isolated segments individually In this embodiment, according to Figure 6 input data 605 and preliminary segmentation and bilinear modelling results 623 are input to EstMov/EstSeg 620, which delivers motion fields DA_Rn and its estimated uncertainties, occlusions etc 625 and segment information 665 about the holons foun. In EstBLM 640 a bilinear model is then updated for each holon separately. Optionally, inter-holon relations and pixel with unclear holon classification may tentatively be weighted down or defined as Y- variables, as described in the third preferred embodiment.

It should be noted that with the feedback mechanisms illustrated in Figures 5 and 6, motion estimation and segmentation may also be seen as integrated parts of the bilinear model estimation. In EstBLM 540, 640 the estimation process does not have to be run till convergence or full stability, like in conventional singular value decomposition or eigenvalue decomposition. It is sufficient that the subspace obtained for each holon 545 improves the next round of motion estimation and segmentation. Thus, the modification of the input data to EstBLM through improved motion estimation and segmentation may be treated as part of the bilinear estimation process.

The next two embodiments concern how to perform the coordination of between the frames in the sequence:

Fifth preferred embodiment: Modelling the whole sequence of frames in one step

In the fifth preferred embodiment, the whole sequence is subjected to motion estimation; then these motion estimates for the whole sequence are submitted to bilinear modelling. Finally, the bilinear model or models of the holons in the sequence is used for segmentation. Using Figure 5 for illustration, the bilinear model results 522 and segmentation 521 results from previous iterations (or pyramidal resolution levels) are fed back into the motion estimation 520 and bilinear modelling 540 in order to stabilize and facilitate these estimation processes

The modelling of the whole sequence may then be repeated, using the updated bilinear motion and intensity change models and the updated segmentation. - o -

Sixth Preferred Embodiment: Gradual update of the sequence model

In the sixth preferred embodiment, the bilinear model 545 is updated after the motion estimation for each frame n=1 ,2,... has been finished. This is done for each already isolated holon segment, but may also be performed for all holons in the frame. The segmentation 565 may likewise be updated after each frame. In the preferred embodiment major re-segmentation is only allowed when the motion data clearly shows the need for it, except for pruning processes along holon edges.

More detail on updating of binilear models is given in the patent application "method and apparatus for coordination of motion determination over multiple frames" mentioned before.

Again, the order in which the individual frames are pulled into the modelling and segmentation may not be fixed. Once all the frames have been modelled and segmented, the whole process may be started again for the sequence, now with better start values for the bilinear model and segmentation.

Estimated uncertainties in the estimated segment borders are stored with the segment border information itself and used in subsequent encoding steps. Pixels with unclear segment classification, e.g. in a region around the chosen segmentation border, are treated as having high uncertainty. In subsequent motion estimation and bilinear modelling the uncertainty pixels are given low weights or passively fitted by principal component regression (Martens & Naes 1989), mentioned before. In subsequent segmentations, uncertain pixels are included in the new segmentation estimation, but are given low a priori input weights (705)

The next two embodiments concern special techniques to adapt the bilinear model parameter estimation to the segmentation application

Seventh Preferred Embodiment: Modifying the bilinear modelling with additional smoothness criteria -7ι- The bilinear parameter restimation methods used for extracting the above mentioned bilinear models may be modified to require or favour additional statistical constraints types to be satisfied, such as requiring or biasing the temporal score vectors in T or the spatial loading vectors in P to be smooth, at least where no peliminary segment borders have been found.

This is done inside the NIPALS algorithm iteration for each factor a (Martens, H. and Naes, T. (1989) Multivariate Calibration. J.Wiley & Sons Ltd, Chichester UK.) as shown in this pseudo-code with line numbering:

For each factor a, each new iteration is defined as follows:

(810) Estimate the raw estimate of a spatial loading vector p_a.raw by projection of residual data in D after the previous factor, on the smoothed, scaled score vector t_a from the previous iteration.

(820) Submit these raw spatial loading vector p_a,_raw to spatially smoothing: p_a = f(Pa,raw). The smoothing method may be a simple boxcar filter, or a method that seeks to avoid smoothing across apparent edges that should be left unsmoothed. The smoothed loading p_a is orthogonalized with respect to the loadings of previously estimated factors [p₁,p₂,...,p_a-ι].

(830) Estimate the raw scores t_{a raw} by projection of the residual data on the smoothed loading p_a .

(840) Submit this raw score vector to temporal smoothing, e.g. a box car smoothing or a more advanced smoothing: t_a = f(t__.ravv).

(850) The smoothed score vector t_a is scaled to length 1 , and the process (810 - 850) is repeated iteratively-until sufficient convergence.

A further enhancement of the bilinear modelling in this preferred embodiment is to use iterative reweighted least squares fit of the bilinear model to the data, in order -12- to reduce the effect of outlier frames or outlier pixels: The weights for rows, _fr,_m._s and columns V_P.ι_s in eq.(3) may iteratively be updated according to the inverse of updated estimated of average uncertainty standard deviation for the rows and columns, based on corrected residuals from the low-rank bilinear model in the previous iteration.

This is described in more detail in the application "Method and apparatus for coordination of motion determination" mentioned before.

Eighth Preferred Embodiment: Rule-based optimal scaling as part of the bilinear modelling

In the bilinear modelling 540, 640, not only the bilinear model parameters may be changed to attain better modelling, but also the values in the input data, e.g. DA_Rn, may be changed in the bilinear model parameter estimation process. The individual elements in the motion data da_n,_peι for frames and pixels may iteratively be modified to adhere more closely to the model derived from other frames or pixels:

da_n,_Pβi ⁼ f(da_n,p_βι (in ut). da_ntp_θιHat, Rules),

where for bilinear modelling da_n,_pβιHat = t_n*p_pβ).

One examples of a rule is:

if (da_n,_p_ιHat gives as good or better motion fit di_n,_peι as da_n,_pβι (input) gives), and (da_n,_PeiHat lies within the statistical uncertainty range of da_n,_peι <.n_P__t)), then (da_n._Pei= da_n,_pβιHat) else (da_{π p}ei⁼ da_n,_peι ( inp t))-

Instead of such a discrete definition of the data element da_n,_peι, a more continuous weighted averaging function of da_n._PeiHat ) and da_n._peι <mp_t)) may also be used. -■75-

Any combination of the above is also an embodiment.

Other applications:

Segmentation of bilinear structures in the temporal domain

The above mentioned segmentation / clustering techniques can be used for determining groups of frames (sequences) which are sensible to analyze together - as well as detecting scene shifts. One embodiment of this is to perform simple bilinear modelling of (possibly subsampled) frame intensities, and perform non-hierarchical cluster analysis in the score space T to find clusters of frames that has much image material in common. A robust single linkage cluster analysis is the preferred cluster method in this embodiment, in order to be able to follow motions within a scene within one single cluster.

Application to other types of data

Yet another embodiment is to apply the above principles to time series data, e.g. sound data, in order to define temporal segments. In that case the spatial motion field data correspond to time warp estimates, while the spatial intensity change data correspond to temporal intensity change data.

The usage of the output from the present invention for decoding (reconstruction of frames) is explained in W095/34172.

Modifications of the present invention within the scope of the appended claims are possible to an expert. Particularly, the term "plurality" can be interpreted in the sense of "one or more".

Claims

1. A method for segmenting an image sequence, said sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of:

(1) forming a reference image, consisting of samples from a plurality of said frames,

(2) estimating motion from said reference image to each of said frames,

(3) reformatting the estimated motion into row vectors,

(4) collecting said row vectors into a motion matrix,

(5) performing a Principal Component Analysis on the motion matrix, thereby obtaining a score matrix called consisting of a plurality of column vectors called score vectors and a loading matrix consisting of a plurality of row vectors called loading vectors, such that each score vector corresponds to one element for each frame, such that each element of each loading vector corresponds to one element of the reference image, such that one column of said score matrix and one loading vector together constitute a factor, and such that the number of factors is lower than or equal to the number of said frames,

(6) reformatting each loading back into same format as used for the motion,

(7) performing segmenting based on the first of the reformatted loadings.

2. The method according to claim 1 , wherein the segmenting in step (7) comprises the steps of:

(7a) selecting a position in the first loading,

(7b) forming a local motion model based on elements in a neighbourhood of said selected position.

(7c) choosing candidate elements of said reference image among those elements that have not yet been segmented,

(7d) determining how well each of said candidate elements of said first loading fits to said local motion model,

(7e) including those candidate elements which satisfy a given fidelity criterion to said local motion model, - s

wherein the collection of elements that fit to said local motion model together represent a segment.

3. The method according to claim 2, wherein said local motion model comprises modelling motion as an affine transform.

4. The method according to claim 2, wherein said local motion model comprises modelling motion as a plurality of piecewise polynomial transforms.

5. The method according to any one of claim 2 to 4, wherein step (7d) is replaced by:

(7d1 ) determining how well each of a plurality of further candidate elements of said first loading fits to a subset of the elements that have been included in said local model, said subset being selected for each candidate element based on the position of the candidate element.

6. The method according to any one of claim 2 to 5, wherein step (7e) concerning said fidelity criterion comprises for each candidate element, computing a difference between motion as extrapolated from the local motion model for the position of the candidate element, and motion as computed from multiplying score vectors by loading vectors.

7. The method according to any one of claims 2 to 6, wherein said fidelity criterion also takes into consideration for each element in said reference image an uncertainty corresponding to said motion fields.

8. The method according to claim 7, wherein said fidelity criterion is computed as, for each element in said reference image, a ratio between said difference and said uncertainty corresponding to said motion fields. -Jς-

9. The method according to claim 8, wherein said uncertainity may have one value corresponding to each of a plurality of spatial directions, one of said values being chosen for computation of said fidelity criterion dependent of the direction of said difference.

10. The method according to any one of claim 2 to 9, wherein the method further comprises the step of:

(7f) updating the local motion model.

11. The method according to claim 10, wherein said updating includes a step for adjusting the weights corresponding to the elements according to the fidelity criterion.

12. The method according to claim 10 or 11 , wherein steps (7c) to (7f) are repeated several times.

13. The method according to any one of claim 2 to 12, further comprising the steps of:

(7g) marking the positions of elements that satisfied said fidelity criterion as segmented,

(7h) selecting a new position among those positions that have not been marked as segmented,

(7i) repeating steps (7c) to (7h) until a given convergence criterion is satisfied.

14. The method according to any one of claim 1 to 13, wherein instead of said first factor, a plurality of factors is used.

15. The method according to any one of claim 1 to 14, wherein instead of performing step (5) concerning Principal Component Analysis the subsequent steps operate on said motion fields corresponding to each said frame directly instead of operating on said loading vectors.

16. A method for improving segmenting an image sequence, said sequence consisting of frames, each frame consisting of samples of an input signal, the segmenting being represented as a plurality of segments of said reference image, the method comprising the steps of:

(1 ) for each of the given input segments, performing steps (1 ) to (6) of claim 1 , using the intensity corresponding to said given input segment plus a layer of adjacent elements as reference image, the collection of segment reference images together constituting a total reference image, where each element of the total reference image may be represented in more than one segment reference image,

(2) for each element in each said segment reference image, computing a fidelity criterion,

(3) for each element in the total reference image represented in more than one segment reference image, finding which segment gives the best fidelity, and removing the element from the other segments.

17. The method according to any one of claims 1 to 15, the method further comprising the steps of:

(8) forming an energy potential image based on the fidelity indicators,

(9) for each segment, finding its border to neighbouring segments,

(10) for each border, optimizing its position iteratively to minimize the sum of energy along its trace, wherein the reference image elements inside each border represent the respective segments.

18. The method according to claim 17, wherein the energy potential function is also based on reference image intensity gradients.

19. The method according to any one of claim 17 to 18, further comprising:

(11 ) for each frame in the sequence, moving the frame back to reference position using its motion field or a reconstruction of its motion field computed by multiplying score values by loading vectors,

(12) reformatting the moved frames into row vectors,

(13) collecting the row vectors into an intensity matrix, -31-

(14) performing a Principal Component Analysis on the intensity matrix, resulting in row vectors called intensity loading vectors and column vectors called intensity score vectors, wherein said energy potential image is also dependent on the intensity loading vectors and intensity score vectors.

20. The method according to any one of claims 17 to 19, wherein said optimization of said border also includes optimizing spatial simplicity of the border

21. A method for segmenting an image sequence, said image sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of

(1 ) forming a reference image,

(2) estimating motion to one frame, resulting in, for each element in said reference image, an estimated position in said frame,

(3) forming a regressor matrix, said regressor matrix consisting of column vectors so that there is one column vector corresponding to each spatial dimension of the reference image, each of said columns containing for each element in said reference image one component of the spatial position of said element, and also one column vector containing ones

(3) forming a regressand matrix, said regressand matrix consisting of column vectors so that there is one column vector corresponding to each spatial dimension of the reference image, each column containing for each element one component of the estimated position in said frame,

(4) estimating a regression coefficient matrix so that said regressand matrix is approximated by the product of said regression coefficient matrix and said regressor matrix,

(5) computing for each element in said reference image a membership measurement based on the regression residual, said regression residual being computed as said regressand matrix minus said product of regression coefficient matrix and regressor matrix,

(6) forming a segment based on the membership measurement computed in step

(5), wherein the segment formed in step 6 represents the segmenting of said image sequence.

22. The method according to claim 21 , where said estimating in step (4) is performed using robust reweighted least squares regression.

23. The method according to any one of claim 21 or 22, wherein motion estimation is performed for a plurality of frames, and the regressand matrix contains one column vector for each spatial dimension of each resulting motion field.

24. The method according to any one of claim 21 or 22, wherein a motion estimation is performed for each frame in said sequence, a Principal Component Analysis being performed on the results from said motion estimation, the regressand matrix comprising one column vector for each loading vector.

25. The method according to any one of claims 21 to 24, wherein said regressand matrix also includes, for each frame, a column containing motion-compensated intensity residuals, said motion-compensated intensity residuals being formed by moving each frame back to reference position according to estimated motion and subtracting the reference image.

26. The method according to any one of claims 21 to 24, wherein a Principal Component Analysis is performed on motion-compensated intensity residuals, and said regressand matrix also includes the resulting loading vectors as columns.

27. The method according to any one of claims 21 to 26, wherein said regressor matrix also includes polynomials of said spatial positions as columns.

28. The method according to any one of claims 22 to 27, wherein spatially shifted versions of regressands are also included as columns in the regressor matrix.

29. A method for segmenting an image sequence, said image sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of:

(1 ) performing segmenting according to the method of any one of claims 21 to 28,

(2) repeating step (1 ), disregarded those parts of said reference image that have been found to be outside the segment, wherein the segments found in each repetition of step (1 ) together represent said segmenting.

30. The method according to any one of claims 21 to 28, wherein said regression is performed using a region-growing method.

31. A method for segmenting an image sequence, said image sequence consisting of frames, each frame consisting of samples of an input signal, the method comprising the steps of:

(1 ) performing segmenting according to the method of claim 30,

(2) repeating step (1 ), disregarding those parts of said reference image that have been found to be outside the segment, wherein the segments found in each repetition of step (1 ) together represent said segmenting.

32. The method according to any one of claims 21 to 31 , wherein uncertainities for said regressands are computed, and said uncertainties are exploited when estimating said regression coefficient matrix.

33. The method according to any one of claims 21 to 32, wherein uncertainities for said regressands are computed, and said uncertainties are exploited when computing said membership measurement

34. The method according to any one of claims 21 or 23 to 33, wherein step (4) and (5) comprise:

(4) performing a cluster analysis of the samples in the space of the regressand columns,

(5) computing a membership measurement based on said cluster analysis.

35. An apparatus for segmenting an image sequence, said sequence consisting of frames, each frame consisting of samples of an input signal, the apparatus comprising:

(1) means for forming a reference image, consisting of samples from a plurality of said frames,

(2) means for estimating motion from said reference image to each of said frames,

(3) means for reformatting the estimated motion into row vectors,

(4) means for collecting said row vectors into a motion matrix,

(5) means performing a Principal Component Analysis on the motion matrix, thereby obtaining a score matrix called consisting of a plurality of column vectors called score vectors and a loading matrix consisting of a plurality of row vectors called loading vectors, such that each score vector corresponds to one element for each frame, such that each element of each loading vector corresponds to one element of the reference image, such that one column of said score matrix and one loading vector together constitute a factor, and such that the number of factors is lower than or equal to the number of said frames,

(6) means reformatting each loading back into same format as used for the motion,

(7) means for performing segmenting based on the plurality of reformatted loadings.

36. The apparatus of claim 35, adapted to be used according to the method of any one of claim 2 to 34.