IMPROVEMENTS IN IMAGE PROCESSING
The present invention relates to image processing and, in particular, to improvements in techniques for "registering" or aligning as accurately as possible two or more images with each other.
The need to align two (or more) images arises in a variety of imaging fields. For example, in the field of medical image analysis it is often useful to align images of the same patient taken with different modalities (for example MRI, CT, PET) and/or images taken at different times to monitor the development of some condition or disease. A similar requirement occurs when an image of a patient is to be compared with a standardized image-like model known as an atlas. The need for image alignment also occurs in other fields. For example, it can be useful to align aerial images of scenes taken with different sensors or frequencies bands or under different imaging conditions (the sensor platform may have a different height and pose relative to the scene, and there may be cloud cover, snow cover or modifications to the scene). Image alignment may also be required in analysis of image motion: in the computation of the optic flow or the estimation of three-dimensional structure from motion. Also, in the case of multi-view image analysis such as stereo vision, the simultaneously-taken images from two or more cameras need to be aligned and combined to form an estimate of the three-dimensional locations of points in the scene.
In general, image alignment techniques have proceeded by taking the two image data sets, applying some form of transformation to one of them (such as a rigid transformation involving displacement and rotation, or a non-rigid transformation including, for example, warping of the structures), calculating the similarity between one of the data sets and the transformed version of the other data set, and adjusting the transformation, usually iteratively, to attempt to minimize the difference.
Figure 1 of the accompanying drawings illustrates the process. In step 10 the two image data sets I1 and I2 are acquired and in step 11 one of them is transformed using a candidate transformation. In step 12 the similarity between the transformed image and the other image is measured using a similarity measure such as cross-correlation or mutual information. In practice the similarity measure is not calculated directly from the intensities, but from the individual and joint probability distributions of the intensities, and the value of the similarity measure can be calculated from these
distributions (step 12a).
For example, the mutual information of the intensities of two images, /„ & /,,, can be defined in terms of the probability distributions of the intensities of the individual pixels as follows :-
where u is an intensity value in I
u , v is an intensity value in I
v, P
11 & P
v are the probabilities of those intensities occurring (i.e. the proportion of the pixels that have that intensity) and P
uv is the probability of two intensities occurring as a pair (i.e. at the same pixel position in both images). Other similarity measures such as normalised mutual information and correlation ratio can be similarly defined in terms of the probability distributions.
In step 13 it is checked whether the images are sufficiently aligned and, if not, the transformation is changed slightly in step 14 and steps 11 and 12 are repeated. This continues iteratively until the similarity measure has reached the required value, or, for example, has not changed significantly with the change in the transformation, at which point the process ends at step 15 with the value of the transformation which gives the best alignment having been calculated.
However, image alignment is a difficult task for several reasons. Firstly, the signal- to-noise ratio of many images is low; this is particularly true of medical images. Further, the two or more images to be aligned may have different spatial resolutions (particularly if they have been obtained by different imaging modalities), and the requirement for non-rigid transformations of one image to another make it difficult to find the correct alignment. For this reason there have been many different proposals for image alignment techniques involving different similarity criteria and different ways of coping with noise in the image.
For example, one way of allowing for noise in the image is to use a windowing technique in the calculation of the probability distribution mentioned above. Thus, whereas preparing an individual probability distribution of the intensities in an image simply involves preparing a histogram of those intensities (with each intensity value contributing only to one bin of the histogram) where a windowing technique is used, each intensity value contributes not only to its "primary" bin, but also partially to neighbouring bins. The amount of contribution to neighbouring bins depends on the
noise model chosen. For example, if a Gaussian noise model is chosen, each observed intensity value is regarded as forming the center of a Gaussian distribution, with the contribution to neighbouring histogram bins corresponding to the values of the Gaussian distribution. Different noise models can be chosen depending on the imaging modality. Thus the individual intensity values can be thought of as being "smeared-out" over several bins of the histogram. The same technique is used in the preparation of the joint probability distribution. This technique is known as Parzen windowing or kernel density estimation.
The choice of the optimal image alignment technique, though, in any given set of circumstances, for example the choice of the correct similarity measure and the correct handling of noise, remains a difficult technical problem.
It is important to note that the term image applies not only to values of intensities calculated in two spatial dimensions, but also to image sets which are in three dimensions (in which case the pixels are usually known as voxels) and also to two or three dimensions plus time, hi the present specification, though, the term "image" will be regarded as covering all such data sets, and the term "pixels" will be used to include "voxels".
A first aspect of the present invention relates to the use of a new image descriptor for image alignment. This image descriptor consists of one or more of the local energy, local phase and local orientation in the image. The local energy, local phase and local orientation are values which can be calculated from the intensity values in an image. These are values which describe the structure in the image. One definition of energy, phase and orientation is known as the monogenic signal, as discussed, for example, in "Low- Level Image Processing with the Structure Multivector", M. Felsberg, PhD thesis, Christian- Albrechts-Universitat Kiel, 2002 http://www.isy.liu.se/~mfe/Diss.ps.gz.
Ln accordance with the present invention, rather than conducting the alignment analysis on intensity values directly, instead the analysis is conducted on an image descriptor comprising one or more of these values. Preferably the analysis is conducted at a plurality of different scales. Each of local energy, local phase and local orientation presuppose analysis of the image at a certain scale. Images may be aligned more quickly by starting the analysis at a relatively large scale (this may correspond, for example, to aligning the overall outline of two image structures),
followed by analysis at successively smaller scales. It should be noted, as mentioned above, that the scales may be purely spatial or temporal.
A second aspect of the invention relates to an adaptation of the way in which the probability distributions of the image descriptor are calculated in order better to account for the noise in the image. Ln particular, a windowing function is used when calculating the probability distributions and the size and shape of the window are varied in accordance with the local structure in the image. In particular, the size and shape of the window can be varied in accordance with the local energy as calculated at each position in the image. Preferably the greater the local energy the smaller the windoλv. This reduces the susceptibility of the joint probability distribution estimate to the effects of noise. Furthermore, because local energy is concentrated around feature points in the image, the method becomes feature driven (ie. features in the image are used preferentially in the alignment process as the contribution to the histograms in these areas is less "smeared out").
Preferably the aspects are combined together so that the image descriptor consisting of one or more of the local energy, local phase and local orientation is used as the image descriptor in the calculation of the probability distributions using an energy- dependent window as mentioned above .
The invention will be further described by way of example with reference to the accompanying drawings in which: -
Figure 1 is a flow diagram illustrating the general technique of image alignment used in the prior art;
Figure 2 is a flow diagram illustrating the general image alignment method in accordance with an embodiment of the present invention;
Figure 3 illustrates schematically the calculation of the image descriptor in accordance with one embodiment of the present invention;
Figure 4 illustrates schematically the calculation of the probability distribution of image descriptors in accordance with an embodiment of the present invention;
Figures 5 A to I are flow diagrams and pseudo-code schematically illustrating
an example of the present invention; and
Figures 6 (a) to (d) illustrate an example of image alignment using an embodiment of the present invention.
Referring first to Figure 2 the general processing flow of image alignment using an embodiment of the present invention will be described. Firstly, in step 20, the two sets of image data to be aligned or "registered" are received. As discussed above, these may be two-dimensional images or three-dimensional images, and may be supplemented by time. In step 21 one or more scales are chosen for the image analysis. Preferably a multi-scale approach is used, with the analysis first being conducted at a relatively large scale, and then successively smaller scales (corresponding to finer resolutions). In the example which is illustrated in Figure 6, four different scales were chosen. The scales are chosen based on the size of the structures in the image and the accuracy of the registration required.
In step 22 the monogenic signal is calculated for each image point at the chosen scale. The monogenic signal consists of the local energy, the local phase and the local orientation. Methods of calculating the monogenic signal are known, and any of the available techniques may be used with the present invention, for example steerable quadrature filters such as described in "The Design and Use of Steerable Filters" by Freeman and Adelson (PAMI 13(9), Sept 1991, S91-906) or by the techniques described in "Low- Level Image Processing with the Structure Multivector", M. Felsberg, PhD thesis, Christian- Albrechts-Universitat Kiel, 2002 http://www.isy.liu.se/~mfe/Diss.ps.gz. This is discussed in more detail below.
hi step 23 the candidate transformation is applied to one of the data sets and in step 24 the probability distributions needed for calculation of the similarity measure in step 25 are calculated. As will be explained below, in accordance with an aspect of the present invention, the probability distributions are calculated taking into account the noise in the image in an adaptive way in accordance with the local structure in the image. Further, they are calculated on the components of the monogenic signal - the local energy, local phase and local orientation, not on the intensities as in the prior art. Step 26 and 27 correspond to the normal iterative processing to optimize the transformation and in step 28 it is checked whether all scales have been completed. If not then steps 22 to 26 are repeated for each of the chosen scale until they are all completed, at which point the process ends at step 30 with best estimate
for the transformation havin ■tog been calculated.
The local energy, local phase and local orientation, are indicative of the local properties of a region within an image. For example, they describe such features as step changes, ridges, sudden spikes and so on. Image features can be classified according to their local symmetry. Just as any real signal can be locally decomposed into a symmetric part and anti-symmetric part (corresponding, for example, to the sine and cosine Fourier components), the same is true of image features. For example, an edge between a dark area and a light area is locally anti-symmetric while a peak in intensity is locally symmetric. The ratio of the amplitude of the symmetric and anti-symmetric components in any given feature amounts to a simple summary of the local shape of the intensity profile of the feature. The combined amplitude gives information about the amount of local signal activity, and reaches a local maximum near the center of the feature. These two quantities are what is measured by the phase and energy in the monogenic signal respectively. If considering, therefore, the variation of intensity across an image feature such as an edge from white to black, this feature is locally anti-symmetric and so could be described as having mainly a sine component and thus a phase close to say, zero. On the other hand, a feature which is symmetric (and thus over the same distance goes from white to black and white again) would have predominantly a cosine component and thus a phase of nearer ninety degrees. The phase and energy can be obtained from an image data set by convolving it with a pair of quadrature filters, but with a two (or more) dimensional image there is the problem that the image features are different in different directions across the image. Thus the monogenic signal also includes the feature orientation which can intuitively be understood as the direction associated with a local linear structure (such as the direction of an edge). The three quantities of local phase, local energy and local orientation can be obtained by using three filters which are convolved with the image, using the known techniques described in the references above.
Figure 3 illustrates this schematically in which two odd filters, one for the vertical direction in the image and one for the horizontal direction, and one isotropic even filter are used and their outputs are combined to calculate the three components of the monogenic signal: the local energy As, the local phase φ and the local orientation θ.
Thus as a result of convolving each of the image data sets to be aligned with the three
filters, the intensity values are converted into values at each image point (pixel) for the components of the monogenic signal - the local energy, the local phase and the local orientation.
The next step involves calculating the individual probability distributions in the two data sets, and also the joint probability distribution corresponding to steps 23 and 24 of Figure 2.
As explained above a conventional way of estimating a probability distribution taking account of noise is to use a windowing function, such as a Parzen window in the construction of the histogram or joint histogram. In this approach each value being added to the histogram contributes not to a single bin, but contributes over several bins in a distribution which is based on the noise model for the data set. Thus considering a signal / corrupted by additive white Gaussian noise, the distribution would be: -
i -c-hΫ Equation 1
where σ is the standard deviation of the noise.
Thus this equation tells you how much of the vote of observed signal h goes into each bin / of the histogram. Conventionally this equation is used to calculate the contribution at each image point in the same way. However, with the present invention, this windowing function is adjusted at each point in accordance with the local image structure, hi this embodiment it is adjusted by the local energy Ak at that point so that, for example, taking the phase component of the monogenic signal, the Parzen window is defined as:-
Thus if the observed phase is φk, the histogram bin for phase value φ receives the value calculated by equation 2 where A1 is the local energy (also calculated from the monogenic signal), and σis the standard deviation of the noise model, which is estimated for each image data set, for example by modelling the imaging process or
by calculating the variance of the intensity in the image. So the amount of "smearing" of the observed values represented by the windowing function varies from point to point in the image with the local energy. In essence, σ, which governs the size and shape of the window, is modified by the local energy, which is an aspect of image structure.
The equivalent window expressions for the local orientation and local energy respectively in each of the two data sets are:-
where A
0 is the odd component of the energy, calculated as the square root of the sum of the two odd parts of the monogenic signal at each scale.
Figure 4 illustrates schematically how the probability distributions for the components of the monogenic signal are calculated. A linear noise model 40 (for example the Gaussian noise model discussed above or any other noise model suitable for the image data set) is convolved with the same filters 41 , 42 and 43 to generate the monogenic signal and is then utilized in the estimation at 44, 45 and 46 of the probability distribution functions 44, 45 and 46 for the local energy, local orientation and local phase.
These three PDFs together form the PDF of the new variable which will be used in place of intensity in the calculation of the similarity between the image data sets.
These probability distributions can then be used to calculate the similarity between the two images, using any desired similarity measure as illustrated in step 25 of Fi *g&u*-re 2.
Figure 6 illustrates an example of two test images to which the present invention was applied. Figure 6a and Figure 6b illustrate two significantly different images of the same scene. Figure 6c illustrates the most likely horizontal alignment of the two images as calculated by the use of traditional mutual information of the intensities in the two image data sets. It can be seen that there is a peak at zero, implying that the
images should be aligned at this point. Figure 6d illustrates the correct alignment point as calculated using an embodiment of the present invention. In particular the method of Figure 2 was run at four different scales and assuming a simple white, additive Gaussian noise model. It can be seen that again there is a peak at zero, illustrating that the correct alignment was calculated, but that this peak is much clearer than the peak in Figure 4c.
A simple example of a complete PDF estimation and registration procedure in accordance with an embodiment of the invention will now be described with reference to Figures 5A to 51. Figure 5 A illustrates the overall process. First, the images are received in step 50 and approximately aligned (e.g. by rigid registration algorithm as known in the art (not illustrated in Figure 5A)). One image is designated the source image and the other the target image. The source image will be deformed until it is aligned with the target image. Local energy, local phase and local orientation are then calculated at step 51 (in process A which will be described in more detail below) for each image, at a particular scale, which may be chosen by the designer to suit the problem at hand.
Next, in step 52 the initial joint Probability Distribution (PDF) of the descriptors must be estimated, for example using process B as shown by the pseudocode in Figure 5F. Then in step 53 the incremental improvement to the transformation is calculated, for example by process C shown in Figures 5 G to 51 and the new transformation is applied to the source image. This process is continued iteratively until the improvement DF in alignment caused by the change to the transformation is less than a threshold lim as tested at step 55, finishing at step 56 with the best alignment achieved.
In process A local energy, local phase and local orientation are calculated from the response of a set of three filters at each scale, as shown in figures 5C, 5D and 5E. The choice of the filter family is application specific; in this example a family of non-blurring filters are used. The overall flow is shown in Figure 5B. For each of N scales, where N is five in this example, a set of three filters is constructed in step Al. First the rotationally symmetric filter is defined. As shown in Figure 5 C, in this example, the rotationally symmetric filter is a weighted sum of two other rotationally symmetric filters, A and B, defined by A =r λ5+0-5*scale , B=r2+0-5*sca!e, where r is the distance from the filter centre, calculated in step r=sqrt(x2+y2), wherein the function sqrtQ performs a square root operation . The weight applied to each filters is defined
as one over the summation over all x and y of the filter value, as shown in the steps A=A/sum(A) and B=B/sum(B), where the function sumQ performs the summation of x and y. Next, each image is convolved with the rotationally symmetric filter formed by subtracting B from A, as in the steps Bandpass _1 scaιe=convolve(Image_l ,A-B) and Bandpass _2scaie=convolve(Image_2, A-B). This is repeated for each of the N= 5 scales, which results in a set of band-pass filtered images.
Next, as shown in figure 5D, the odd filter responses are calculated from the band¬ pass filtered images. This is achieved by applying the anti-symmetric monogenic filters Hl and H2 described in "Low- Level Image Processing with the Structure Multivector", M. Felsberg, PhD thesis, Christian- Albrechts-Universitat Kiel, 2002 http://www.isv.liu.se/~mfe/Diss.ps.gz. These are constructed in the steps Hl=x/(2π( x2+y2)2/2) and H2=y/(2π(x2+y2)3/2).
Next for each of the N=5 scales, each of the bandpass filters is convolved with each of Hi and H2 to form the odd filtered images, as in step
Odd_l _xscaιe=convolve(Bandpass_l Scaie,Hl) and the three subsequent steps. Finally, the local energy and local phase may be calculated, as in figure 5E.
If desired, local orientation could also be calculated at this point, according to the method described in "Low- Level Image Processing with the Structure Multivector", M. Felsberg, PbD thesis, Christian- Albrechts-Universitat Kiel, 2002 . For each of the N=5 scales, the odd and even components are calculated. In the present example, the even component ,at each scale and for each image, is simply the corresponding band- pass filtered image, as in step Even_l - Banpass_lscale and the subsequent step. The odd component is calculated, at each scale and for each image, by taking the square root of the sum of the two corresponding odd filtered images squared, as in step Odd_l = (Odd _1 _x2 scaie^ Odd _1 _y2 scale)1 /2 and the subsequent step. The local phase at each scale and for each image is then calculated according to the step φ_lscaιe=arctan(Even_l/Odd_l) and the subsequent step, where the arctanQ function calculates the arctangent. The local energy is calculated by taking the square root of the sum of the odd and even components squared, as in the step A_lScaie=(Even_l2+Odd_l2)1/2 and the subsequent step. This completes the estimation of local phase and local energy in every point in each image, at each scale.
The next step in the registration procedure is to estimate the joint PDF, as shown in figure 5F. In this simple example, the descriptor will consist of only the local phase.
The first step in estimating the joint PDF is to define an image noise model; typically this might be Additive White Gaussian distributed Noise (AWGN). This only needs to be defined once, at the beginning of the process, and may be obtained, for example, from a prior knowledge of the noise characteristics of the imaging apparatus. Once the noise model is defined, the PDF is initialised, hi the present example, the PDF is represented by a two dimensional histogram of ten thousand bins (one hundred bins square), initialised to zero. PDF initialisation is performed in the steps Binjio = 100 and Joint _histscaιe = zeros([Bin_no,Bin_no]), where the function zeros (u,v) creates a matrix of size u by v, populated with zeros.
The next step is construct the PDF for the observed phase values, defined by Equation 2, using the observed value of the local energy as A
k and the standard deviation of the AWGN model as σ. This is done once for each of the five scales, within the loop For scale = 1:5. To construct the PDF at the present scale, each pixel is visited in turn, within the loop For x=l:x_size, y=l:ysize , where x_size anάy_size are the sizes of the images in the x and y directions respectively. For each pixel, the PDF indices Pl and P2, corresponding to the observed phases at the present pixel in each image, and their associated local energies Al and A2 are retrieved, as in step Pl — φ_l scαie(x,y) and the three subsequent steps. These values are then used to increment each bin of the PDF in turn, in the loop For I=l:Bin_no, j=l:Bin_no. Within this loop, each bin of the PDF is incremented with a number proportional to the joint probability of the true phase lying within that bin, according to the noise model in equation 2. This achieved by the line Joint _hist
scαie(i,j) += exp(-(A!
2(Pl- i) +A2
~(P2-j) )/(2*σ) ), where i andy are the PDF indices and σis the standard deviation of the image noise. Once each bin of the PDF has been incremented once for each pixel pair, the PDF is normalised so that it sums to one, as in the step Joint _hist
scale
The final step is to calculate the first and second marginal distributions by summing the PDF along for all values of first and second phase respectively, which is accomplished by the final two lines in the outer loop of figure 5F.
From the PDF it is possible to refme the registration using a similarity measure, such as mutual information, and one of the many prior art methods for minimising it. This example employs a simplified variant of the method described in Information Theoretic Similarity Measures in Non-rigid Registration, William R. Cruni, Derek L. G. Hill David J. Hawkes, in Informatio Processing and Medical Imaging, 2003 pp 378-387. This algorithm is illustrated in figure 5g. Each scale is treated separately.
The first step for each scale is to look up the joint and marginal probabilities of the observed phases at each pixel to create three probability maps. This is performed by the code in figure 5h. For each pixel, the two observed phases are translated into PDF indices Pl and P2 by the step Pl = l+ronnd(198*φ_lscaιe(x,y)/π) and the subsequent step. In these two steps, the function roundQ rounds its argument to the nearest integer value and the factors 198 and pi are chosen to guarantee that the resulting index value lies somewhere in the interval 1 to Binjio. These indices are then used to look up the joint and marginal probabilities at the present point, as in the last three steps in the loop of figure 5H.
When this procedure has been done for every pixel, at the present scale, increments to the x and y components of the registration Flow_x_Increment and Flow_y_Increment are calculated according to the method in Information Theoretic Similarity Measures in Non-rigid Registration, William R. Cram, Derek L. G. Hill, David J. Hawkes, in Informatio Processing and Medical Imaging, 2003 pp 378-387. This is performed by the code in figure 5i. Once this whole procedure has been completed for every pixel at every scale, the resulting registration update must be regularised. This could be achieved by any of the extensive set of methods known in the literature. The optimal choice will depend on the application; convolution with a Gaussian window function is an example of a simple and effective regularisation method, which works well in the present implementation. The incremental improvement is then added to the previous estimate of the registration, and the source image is deformed accordingly, using one of the prior art interpolation schemes. The widely used bilinear interpolation scheme has been found to work well in the present implementation.
One iteration of registration algorithm, as shown in Figure 5 A, is now complete. The mean distance moved by the incremental improvement is measured and this it is above a certain (user defined) limit, the whole process is repeated, starting with the re-computation of local energies and phases. When an iteration occurs in which the incremental change is smaller than the user defined limit (step 55), the procedure terminates and alignment is achieved (step 56).