HK1118411B - Method for sub-pixel value interpolation - Google Patents
Method for sub-pixel value interpolation Download PDFInfo
- Publication number
- HK1118411B HK1118411B HK08112388.7A HK08112388A HK1118411B HK 1118411 B HK1118411 B HK 1118411B HK 08112388 A HK08112388 A HK 08112388A HK 1118411 B HK1118411 B HK 1118411B
- Authority
- HK
- Hong Kong
- Prior art keywords
- sub
- pixel
- coordinates
- pixels
- values
- Prior art date
Links
Description
The application is a divisional application, the national application number of the parent application is 02815085.6, the application date of the subject application is 2002.9.11 days, and the national application number of the parent application is: PCT/FI02/0079, entitled "method for sub-pixel value interpolation" of its parent application.
Technical Field
The present invention relates to a method for sub-pixel value interpolation during encoding and decoding of data. And more particularly, but not exclusively, to encoding and decoding of digital video signals.
Background
Like ordinary motion pictures recorded on film, digital video sequences comprise a sequence of still images, the vision of which is produced by displaying the images one after the other at a relatively fast frame rate, typically 15 to 30 frames/second. Due to the rather fast frame rate, the images in successive frames tend to be rather similar and therefore contain a considerable amount of redundant information. For example, a typical scene may include certain static components, such as a background and certain moving areas, which may take many different forms, such as a newscaster's face, a moving vehicle, etc. In addition, the camera that records the scene may itself be moving, in which case all image components have the same type of motion. In most cases this means that the total variation between one video frame and a subsequent video frame is rather small. Of course, this depends on the nature of the movement. For example, the faster the movement, the greater the change from one frame to the next. Similarly, if a scene contains several moving components, the change from one frame to the next will be greater than the change in a scene where only one component is moving.
It will be appreciated that each frame of an unprocessed, i.e. uncompressed, digital video sequence comprises a very large amount of image information. Each frame of an uncompressed digital video sequence is formed from an array of image pixels. For example, in a common digital video format known as the Quarter Common Intermediate Format (QCIF), a frame comprises an array of 176 × 144 pixels, where each frame has 25344 pixels. Each pixel is in turn represented by a determined number of bits, which carries information about the brightness and/or color content of the image area corresponding to the pixel. Typically, the luminance and color content of an image is represented using the so-called YUV color mode. The luminance, i.e. the Y-component, represents the light intensity (brightness) of the image, while the color content of the image is represented by two chrominance components, denoted U and V.
The color mode of a luminance/chrominance display based on the information content of the image provides certain advantages compared to the color mode of a display based on colors comprising primary colors, i.e. red, green, blue, RGB. The human visual system is more sensitive to light intensity variations than color variations; the YUV color mode takes advantage of the characteristics by using a lower spatial definition for the chrominance components (U, V) than for the luminance component (Y). In this way, the amount of information required to encode color information in an image can be encoded with an acceptable reduction in image quality.
The lower spatial resolution of the chrominance components is typically achieved by sub-sampling. Typically, a block of 16 x 16 image pixels is represented by a block of 16 x 16 pixels comprising luminance information and corresponding chrominance components, each of which is represented by a block of 8 x 8 pixels representing an area of the image equal to the 16 x 16 pixels of the luminance component. The chrominance components are spatially sub-sampled in the x and y directions by a factor of 2. The resulting combination of one 16 x 16 pixel luminance block and two 8 x 8 pixel color chrominance blocks is commonly referred to as a YUV macroblock, or macroblock for short.
The AQCIF picture comprises 11 × 9 macroblocks. If the luminance data block and the chrominance data block are represented in 8-bit sharpness (i.e., in numbers in the range of 0 to 255), the total number of bits required for each macro data block is (16 × 16 × 8) +2 × (8 × 8) ═ 3072 bits. The number of bits required to represent one video frame in the QCIF format is 99 × 3072 ═ 304, 128 bits.
This means that the amount of data required to transmit/record/display one video sequence of QCIF format at a rate of 30 frames/second will be greater than 9Mbps (megabits/second). This is a very high data rate and is not practical for video recording, transmission and display applications, since it would require a large amount of storage capacity, transmission channel capacity and hardware performance.
If the video data is transmitted in real time via a fixed subscriber line connection network, such as ISDN (integrated services digital network) or typically PSTN (public switched telephone network), the available data transmission bandwidth is typically about 64 kbit/s. In mobile videotelephony, where the transmission is at least partly over a radio communication link, the available frequency band can be as low as 20 kbit/s. This means that representing video data with a significantly reduced amount of information must be achieved in order to enable the transmission of digital video sequences over low bandwidth communication networks. For this reason, video compression techniques have been developed that reduce the amount of information transmitted while maintaining an acceptable image quality.
Video compression methods are based on reducing redundancy and perceptually irrelevant parts of the video sequence. Redundancy in video sequences can be classified as spatial, temporal and spectral redundancy. Spatial redundancy is used to describe the correlation between adjacent pixels in a frame. Temporal redundancy represents the fact that an object appearing in one frame of a sequence may appear in a subsequent frame, while spectral redundancy represents the correlation between different color components of the same image.
Compression that is sufficiently efficient is often not achieved by simply reducing the various forms of redundancy in a given image sequence. Therefore, the most popular video encoders also degrade the portions of the video sequence that are of least subjective importance. In addition, the redundancy of the compressed video bit stream is reduced by itself using high efficiency low loss coding. Typically, this is achieved using a technique known as Variable Length Coding (VLC).
Modern video compression standards, such as ITU-T recommendations H.261, H.263(+), H.26L, and the motion Picture experts group recommendation MPEG-4, utilize motion compensated temporal prediction. This is a form of temporal redundancy reduction in which the content of some (and often many) frames in an image sequence is predicted from other frames in the sequence by tracking the motion of objects or regions of an image between frames.
Compressed pictures that do not exploit temporal redundancy reduction are generally referred to as intra-coded, i.e., I-frames, while temporally predicted pictures are referred to as inter-coded, i.e., P-frames. In the inter case the predicted (motion compensated) image is rarely accurate enough to represent the image content with sufficient quality, so a spatially compressed Prediction Error (PE) frame is also associated with each inter frame. Many video compression schemes may also utilize bi-directionally predicted frames, commonly referred to as B-pictures or B-frames. B-pictures are inserted between pairs of reference pictures, so-called 'scout' pictures (i.e. or P-frames), and predicted from one or both of the scout pictures. B-pictures are not used as positioning pictures themselves, i.e. other frames are not predicted from them, so they can be discarded from the video sequence without causing a degradation of the future picture quality.
Different types of frames that occur in a typical compressed video sequence are shown in fig. 3. As can be seen from this figure, the sequence starts with an intra frame, i.e. I-frame 30. Arrow 33 of fig. 3 represents the 'forward' prediction process that forms the P-frame (numeral 34). The bi-directional prediction process to form the B frame (36) is represented by arrows 31a and 31B, respectively.
Fig. 1 and 2 show schematic diagrams of an example video coding system using motion compensated prediction. Fig. 1 shows an encoder 10 employing motion compensation, while fig. 2 shows a corresponding decoder 20. The encoder 10 shown in fig. 1 comprises a motion field estimation block 11, a motion field coding block 12, a motion compensated prediction block 13, a prediction error coding block 14, a prediction error decoding block 15, a multiplexer block 16, a frame memory 17 and an adder 19. The decoder 20 comprises a motion compensated prediction block 21, a prediction error decoding block 22, a demultiplexer block 23 and a frame memory 24.
The operating principle of video coding using motion compensation is to reduce a prediction error frame EnThe amount of information in (x, y) that is the current frame I being encodedn(x, y) and predicted frame PnThe difference between (x, y). The prediction error frame is therefore:
En(x,y)=In(x,y)-Pn(x,y). (1)
using a reference frame Rn(x, y) pixel values to create a predicted frame Pn(x, y) the reference frame is typically one of the frames sent by the pre-coding encoding, e.g. immediately preceding the current frame and available from the frame memory 17 of the encoder 10The frame of (2). More specifically, by a reference frame R substantially corresponding to the pixels in the current framen(x, y) finding so-called 'predicted pixels' to form the predicted frame Pn(x, y). Motion information describing a relationship (e.g., relative position, rotation, scaling, etc.) between a pixel in the current frame and a corresponding predicted pixel in the reference frame is obtained, and the predicted frame is constructed by moving the predicted pixel according to the motion information. In this way, the pixel values in the reference frame are used to construct the predicted frame as an approximate representation of the current frame. Thus, the above mentioned prediction error frame represents the difference between the approximate representation of the current frame provided by the predicted frame and the current frame itself. The basic advantages provided by video encoders using motion compensated prediction derive from the fact that a rather compact description of the current frame can be represented by the motion information required to form the current frame together with the associated prediction error information in the prediction error frame.
However, since the number of pixels in one frame is large, it is not efficient to transmit the individual motion information for each pixel to the decoder in general. In contrast, in most video coding schemes, the current frame is divided into larger image segments SkAnd the motion information relating to these image segments is sent to the decoder. For example, motion information is typically provided for each macroblock of a frame and the same motion information is used for all pixels in the macroblock. In some video coding standards, such as h.26l, a macroblock can be divided into smaller blocks, each of which has its own motion information.
Motion information typically employs motion vectors [ Δ x (x, y), Δ y (x, y)]In the form of (1). The pairs of numbers Δ x (x, y) and Δ y (x, y) are represented in the current frame InOne pixel of the position (x, y) in (x, y) with respect to the reference frame RnHorizontal and vertical displacement of one pixel in (x, y). Motion vectors [ Δ x (x, y), Δ y (x, y) ] are calculated by the motion field estimation unit 11]And the current frame [ Δ x (-), Δ y (-)]Referred to as a motion vector field.
Typically, the position of a macroblock in the current image frame is specified by its coordinates (x, y) in the upper left corner. Thus, in a video coding scheme in which motion information is associated with each macroblock of a frame, each motion vector describes the current frame InThe pixel in the upper left corner of a macroblock in (x, y) is relative to the reference frame RnThe nature of the predicted pixel in (x, y) corresponds to the horizontal and vertical displacements Δ x (x, y) and Δ y (x, y) of the upper left corner of the data block (as shown in fig. 4 b).
Motion estimation is a computationally intensive task. Given a reference frame Rn(x, y) and for example a rectangular macroblock comprising nxn pixels in the current frame (as shown in fig. 4 a), the goal of the motion estimation is to find an nxn pixel block in the reference frame that matches the characteristics of the macroblock in the current image based on some criteria. For example, this criterion may be the Sum of Absolute Differences (SAD) between the pixels of the macroblock in the current frame and the blocks of pixels in the reference frame to which it is compared. This process is commonly referred to as 'data block matching'. It should be noted that the geometry of the data block to be matched and the geometry of the data block in the reference frame do not have to be the same in general, since real world objects may experience scale changes, as well as rotation and warping. However, in the current international video coding standard, only translational motion mode (see description below) is used, and therefore it is sufficient to fix the geometry of the rectangle.
Ideally, to achieve the best chance of finding a match, the entire reference frame should be searched. However, this is impractical as it would cause too high a computational burden for the video encoder. Instead, the search area is limited to an area [ -p, p ] surrounding the original position of the macroblock in the current frame, as shown in fig. 4 c.
In order to reduce the amount of motion information to be transmitted from the encoder 10 to the decoder 20, the motion vector field is encoded in the motion field encoding block 12 of the encoder 10, the motion vector field being represented by a motion pattern. In this process the motion vectors of the image segments are re-represented using some predetermined function, in other words the motion vector field is represented in a pattern. Almost all currently used motion vector field models are additional motion patterns according to the following general formula:
wherein the coefficient aiAnd biReferred to as motion coefficients. The motion coefficients are sent to the decoder 20 (information data stream 2 in fig. 1 and 2). Function fiAnd giReferred to as motion field basis functions and known to both the encoder and decoder. An approximate motion vector field can be constructed using the coefficients and the basis functionSince the basis functions are known to both the encoder 10 and the decoder 20 (i.e. stored in both), only the necessary motion coefficients are sent to the encoder, thus reducing the amount of information needed to represent the motion information of the frame.
The simplest motion simulation is a translational motion mode, which requires only two coefficients to describe the motion vector for each field. The value of the motion vector is given by:
Δx(x,y)=a0
Δy(x,y)=b0 (4)
this mode is widely used in various international standards (ISO MPEG-1, MPEG-2, MPEG-4, ITU-T recommendation H.261 and H.263) to describe motion of 16 x 16 and 8 x 8 pixel data blocks. Systems using translational motion patterns typically perform motion estimation at full pixel resolution or some integer part of full pixel resolution, e.g. at half or quarter of pixel resolution.
Constructing a predicted frame P in a motion compensated prediction block 13 in an encoder 10n(x, y), and is given by:
in the prediction error coding block 14, the prediction error frame E is usually encoded byn(x, y) represents a finite series (transform) of some 2-dimensional function to compress the prediction error frame. For example, a 2-dimensional Discrete Cosine Transform (DCT) can be used. The transform coefficients are quantized and entropy (e.g. huffman) coded before being sent to the decoder (information data stream 1 in fig. 1 and 2). This operation typically results in the prediction error frame E due to quantization error introductionnSome degradation (loss of information) in (x, y). To compensate for this degradation, the encoder 10 further includes a prediction error decoding block 15 in which the transform coefficients are used to construct a decoded prediction error frameThe locally decoded prediction error frame is added to the prediction frame P in an adder 19n(x, y), and the resulting decoded current frameIs stored in the frame memory 17 for further use as the next reference frame Rn+1(x,y)。
The information data stream 2 carrying information about the motion vectors is combined with information about the prediction error in a multiplexer 16 and an information data stream 3, which usually contains at least these two types of information, is fed to a decoder 20.
The operation of a corresponding video decoder 20 will now be described.
The frame memory 24 of the decoder 20 stores a previously reconstructed reference frame Rn(x, y). Using the received motion coefficient information and the previously reconstructed reference frame RnThe pixel values of (x, y) form a predicted frame P according to equation 5 in a motion compensated prediction block 21 of the decoder 20n(x, y). Using the prediction error frame E in the prediction error decoded data block 22n(x, y) transmitted transform coefficients to form a decoded prediction error frame. By subsequently adding the predicted frame Pn(x, y) and decoded prediction error frameReconstructing the decoded current frameThe pixel of (a):
the decoded current frame may be stored in the frame memory 24 as the next reference frame Rn+1(x,y)。
In the description of encoding and decoding of motion compensated digital video provided above, the current frame is describedRelative to a reference frame RnMotion vector of (x, y) motion [ Δ x (x, y), Δ y (x, y)]Can point to any one of the pixels in the reference frame. This means that motion between frames of a digital video sequence can only be represented in one definition, which is determined by the image pixels in the frame (so-called full pixel definition). However, the actual motion has random accuracy and therefore the above system can only provide an approximate pattern of motion between successive frames of a digital video sequence. In general, motion patterns between video frames with full pixel definition are not accurate enough to achieve efficient minimization of Prediction Error (PE) information associated with each macroblock/frame. Therefore, to achieve a more accurate model of motion and to help reduce the amount of PE information that must be sent from the encoder to the decoder, many video coding standards, such as h.263(+) (++) and h.26l, cause the motion vectors to point to the image pixels 'in the middle'. In other words, the motion vector may have a 'sub-pixel' sharpness. Allowing motion vectors to have sub-pixel sharpness increases the complexity of the encoding and decoding operations that must be performed, so that it is still beneficial to limit the degree of spatial sharpness that a motion vector can have. Thus, video coding standards, such as those mentioned above, typically only allow motion vectors to have full, half, or quarter pixel sharpness.
As shown in fig. 5, motion estimation using sub-pixel resolution is typically performed as a two-stage process for a video coding scheme that results in motion vectors having full, or half-pixel resolution. In a first step, a motion vector with full pixel resolution is determined using any suitable motion estimation scheme, such as the block matching process described above. The resulting motion vectors with full pixel resolution are shown in fig. 5.
In a second stage, the motion vectors determined in the first stage are refined to obtain the desired half-pixel sharpness. In the example shown in fig. 5, the refinement is achieved by forming eight new search data blocks of 16X 16 pixels, the upper left corner of each data block in fig. 5 being marked with an X. These positions are denoted as [ Δ x + m/2, Δ y + n/2], where m and n can take the values-1, 0 and +1, but cannot be zero at the same time. Since only the pixel values of the original image are known, the values (e.g., luminance and/or chrominance values) of the sub-pixels residing at half-pixel locations must be estimated for each of the eight new search data blocks using some form of interpolation scheme.
Since the value of the sub-pixel has been interpolated in half-pixel sharpness, each of the eight search data blocks is compared to the macroblock for which the motion vector is sought. As the block matching process is performed to determine motion vectors with full pixel sharpness, the macroblock is compared to each of the eight search blocks according to some criteria, such as a SAD. As a result of the comparison, a minimum SAD value will typically be obtained. Depending on the nature of the motion in the video sequence, this minimum may correspond to a location specified by the original motion vector (with full pixel definition), or it may correspond to a location with half pixel definition. Thus, it is possible to determine whether a motion vector will point to a full pixel or sub-pixel location and whether the sub-pixel definition is correct in order to determine the correct sub-pixel definition motion vector. It should also be understood that the scheme just described can be extended to other sub-pixel resolutions (e.g., quarter-pixel resolutions) in a completely similar manner.
In practice, the estimation of the sub-pixel values in the reference frame is performed by interpolating the values of the sub-pixels from the surrounding pixel values. In general, the interpolation of a sub-pixel value F (x, y) placed at a non-integer position (x, y) — (n + Δ x, m + Δ y)) can be formulated as a two-dimensional operation, mathematically represented as:
where f (k, 1) is a filter coefficient, and n and m are values obtained by taking integers for x and y, respectively. Typically, the filter coefficients depend on x and y values, and the interpolation filter is typically a so-called 'separable filter', where the sub-pixel values F (x, y) can be calculated as follows:
in which a motion vector is calculated. Once the corresponding motion coefficients are sent to the decoder, it is a clear matter to interpolate the required sub-pixels using interpolation methods as used in the encoder. In this way, a frame following a reference frame in the frame memory 24 can be reconstructed from the reference frame and the motion vector.
The simplest way to apply sub-pixel value interpolation in a video encoder is to interpolate each sub-pixel value every time it is needed. However, this is not an efficient solution in a video encoder, since it is likely that the same sub-pixel value will need to be several times, and therefore the calculation of the interpolation of the same sub-pixel value will be performed multiple times. This would result in an unnecessary increase in computational complexity/burden in the encoder.
One option to limit the complexity of the encoder is to pre-calculate all sub-pixel values and store them in a memory associated with the encoder. This solution is referred to hereinafter as 'prior-hand' interpolation. Although limiting complexity, prior interpolation has the disadvantage of increasing memory usage due to large edges. For example, if the motion vector accuracy is a quarter-pixel both horizontally and vertically, storing a pre-computed sub-pixel value for a full image will result in a memory usage that is 16 times the memory usage required for the original non-interpolated image. Furthermore, it includes the calculation of some sub-pixels that may not actually be needed in the motion vector calculation in the encoder. Prior interpolation is especially inefficient in video decoders, since most of the pre-computed sub-pixel values the decoder will never need. Therefore, it is beneficial not to use pre-calculations in the decoder.
So-called 'on-demand' interpolation can be used to reduce memory requirements in the encoder. For example, if the desired pixel precision is quarter-pixel definition, only half the unit definition sub-pixels are previously interpolated for the entire frame and stored in the memory. When needed, only the quarter-pixel definition sub-pixel values are calculated as the motion estimation/compensation process progresses. In this case, the memory usage is only 4 times that required for the original non-interpolated image.
It should be noted that when using prior interpolation, the interpolation process constitutes only a small part of the overall encoder computational complexity/burden, since each pixel is interpolated only once. Therefore, the complexity of the interpolation process itself in the encoder is not a critical issue when using prior sub-pixel value interpolation. On the other hand, on-demand interpolation presents a much higher computational burden for the encoder, since the sub-pixels may be interpolated many times. The complexity of the interpolation process, which can be considered to be a function of the number of computational operations or the cycle of operations that must be performed to interpolate the sub-pixel values, becomes an important consideration.
In this decoder, the same sub-pixel value is used a few times at most, and some sub-pixel values are not needed at all. Therefore, it is beneficial in the decoder to not use prior interpolation at all, i.e. not calculate any sub-pixel values in advance.
Two interpolation schemes have been developed as part of the ongoing work of the international telecommunication union communication standardization organization 16, research group, Video Coding Experts Group (VCEG) problems 6 and 15. These solutions are proposed to be incorporated into the ITU-T recommendation h.26l and have been implemented in a Test Model (TML) for the purposes of evaluation and further development. The test model corresponding to issue 15 is referred to as test model 5(TML5), and the test model resulting from issue 6 is referred to as test model 6(TML 6). The interpolation schemes proposed in TML5 and TML6 will both be described.
The entire description of the sub-pixel value interpolation scheme used in test model TML5, which will be made with reference to fig. 10a, defines a representation for describing the pixels and the sub-pixel locations dedicated to TML 5. Another representation defined in fig. 11a will be used to discuss the sub-pixel value interpolation scheme used in TML 6. A still further representation shown in fig. 12a will be used later in the description in connection with the interpolation of sub-pixel values according to the invention. It should be appreciated that the three different representational force diagrams used in this specification aid in understanding each interpolation and help distinguish the differences between them. However, in all three figures, the letter a is used to represent the original image pixel (full pixel definition). More specifically, the letter A representsThe position of a pixel in image data of a frame of a video sequence, the pixel value of pixel A being received from a video source as current frame In(x, y), or as a reference frame R reconstructed and stored in a frame memory 17, 24 of the encoder 10 or the decoder 20nOne of (x, y). All other letters represent sub-pixel positions, the sub-pixel values located at the sub-pixel positions obtained by interpolation.
Certain other terms will be used throughout this description in a consistent manner to identify specific pixel and sub-pixel locations. These terms are as follows:
the term 'unit horizontal position' is used to describe the position of any sub-pixel that is comprised in a column of the original image data. The sub-pixels c and e in fig. 10a and 11a and the sub-pixels b and e in fig. 12a have a unit horizontal position.
The term 'unit vertical position' is used to describe the position of any sub-pixel that is comprised in a row of the original image data. The sub-pixels b and d in fig. 10a and 11a, and the sub-pixels b and d in fig. 12a have a unit vertical position.
By definition, pixel a has a unit horizontal and a unit vertical position.
The term 'horizontal position' is used to describe the position of any sub-pixel that is formed in a column that exists at half pixel definition. Sub-pixels b, c and e shown in fig. 10a and 11a belong to this class, as well as sub-pixels b, c and f in fig. 12 a. In a similar manner, the term 'semi-vertical position' is used to describe the position of any sub-pixel that is made up in a row that exists at half pixel definition, such as sub-pixels c and d in fig. 10a and 11a and sub-pixels b, c and g in fig. 12 a.
Also, the term 'quarter-horizontal position' refers to any one of the sub-pixels constituting a column existing in quarter-pixel definition, for example, the sub-pixels d and e in fig. 10a, the sub-pixels d and g in fig. 11a, and the sub-pixels d, g, and h in fig. 12 a. Similarly, the term 'quarter vertical position' refers to the sub-pixels that make up a row that exists at quarter pixel definition. The sub-pixels e and f shown in fig. 10a belong to this class, as well as the sub-pixels e, f and g in fig. 11a, and the sub-pixels e, f and h in fig. 12 a.
The definition of each of the above terms is shown by the 'envelope' drawn on the corresponding figure.
It should further be noted that it is often convenient to represent a particular pixel on a two-dimensional basis. In this case, a suitable two-dimensional reference can be obtained by examining the intersection of the envelopes of fig. 10a, 11a and 12 a. Applying this principle, for example, pixel d in FIG. 10a has a horizontal and a semi-vertical position, while sub-pixel e has a unit horizontal and a quarter vertical position. Further, for ease of reference, the sub-pixels at the half unit horizontal and unit vertical positions, the unit horizontal and half unit vertical positions, and the half unit horizontal and half unit vertical positions will be referred to as 1/2 definition sub-pixels. A sub-pixel at any quarter-unit horizontal and/or quarter-unit vertical position will be referred to as an 1/4 sharpness sub-pixel.
It should also be noted that in the description of the two test models and the detailed description of the invention, it is assumed that the pixel has a minimum value of 0 and a maximum value of 2n-1, where n is the number of bits reserved for one pixel value. The number of bits is typically 8. After a sub-pixel has been interpolated, if the value of the interpolated sub-pixel exceeds 2nA value of-1, then limit it to [0, 2]n-1]A value lower than the allowable minimum value becomes the minimum value (0) and a value larger than the maximum value becomes the maximum value (2)n-1). This operation is called clipping.
The sub-pixel value interpolation scheme according to TML5 will now be described in detail with reference to fig. 10a, 10b and 10 c.
1. The sub-pixel values for the half unit horizontal and unit vertical positions, i.e., the 1/2 sharpness sub-pixel b in fig. 10a, are calculated by using a 6-tap filter. As shown in fig. 10b, according to the formula b ═ (a)1-5A2+20A3+20A4-5A5+A6+16)/32, based on 6 pixels (A) in a row1To A6) A value for the sharpness sub-pixel b is interpolated 1/2 symmetrically about b at the unit horizontal position and the unit vertical position. The partition is operated/represented in truncation. The result is limited to the range [0, 2]n-1]In (1).
2. The value of the 1/2 sharpness sub-pixel, labeled c, is calculated using the same 6-tap filter and the six closest pixels or sub-pixels in the vertical direction (a or b) as used in step 1. Referring now to fig. 10c, according to formula c ═ a1-5A2+20A3+20A4-5A5+A6+16)/32, the filter is based on 6 pixels (A) in one column1To A6) One value of 1/2 sharpness sub-pixel c at unit horizontal and semi-vertical positions is interpolated symmetrically about c at unit horizontal and unit vertical positions. Similarly, according to c ═ b1-5b2+20b3+20b4-5b5+b6+16)/32 calculates a value for the 1/2 sharpness subpixel c in the horizontal and vertical positions.
The segmentation is also operated/represented in truncation. The calculated value for the c sub-pixel is further limited to the range [0, 2]n-1]In (1).
At this point in the interpolation process all 1/2 sharpness sub-pixel values have been calculated and the process will proceed with the calculation of 1/4 sharpness sub-pixel values.
3. The value of the 1/4 sharpness sub-pixel labeled d is calculated using linear interpolation and the value of the closest pixel and/or the 1/2 sharpness sub-pixel in the horizontal direction. More specifically, the value of the 1/4 sharpness sub-pixel d at the quarter horizontal and unit vertical positions is calculated by taking the average of the immediate neighboring pixel at the unit horizontal and unit vertical positions (pixel a) and the immediate neighboring 1/2 sharpness sub-pixel at the half horizontal and unit vertical positions (sub-pixel b), i.e., according to d ═ a + b)/2. By taking the position in a unit horizontal and semi-vertical position respectively and inAverage of the immediately adjacent 1/2 sharpness sub-pixels c in the semi-horizontal and semi-vertical positions, i.e. according to d ═ c1+c2) And/2, calculate the value of 1/4 sharpness subpixel d in the quarter-horizontal and half-vertical positions. The segmentation is also operated/indicated with truncation.
4. The value of the 1/4 sharpness sub-pixel labeled e is calculated using linear interpolation and the value of the closest pixel and/or the 1/2 sharpness sub-pixel in the vertical direction. Specifically, the value of 1/4 sharpness sub-pixel e at the unit horizontal and quarter vertical positions is calculated by taking the average of the immediately adjacent pixel (pixel a) at the unit horizontal and unit vertical positions and the immediately adjacent pixel (sub-pixel c) at the unit horizontal and half vertical positions, i.e., according to e ═ a + c)/2. The value of the 1/4 sharpness sub-pixel e3 at the horizontal and quarter vertical positions is calculated by taking the average of the immediate pixel at the horizontal and unit vertical positions (sub-pixel b) and the immediate pixel at the horizontal and half vertical positions (sub-pixel c), i.e., according to e ═ b + c)/2. Also, by taking the average value of the immediately adjacent pixels at the quarter-horizontal and unit-vertical positions and the corresponding sub-pixels (sub-pixels D) at the quarter-horizontal and half-vertical positions, i.e., according to e ═ D (D)1+d2) And/2, calculate the 1/4 sharpness sub-pixel e values at quarter horizontal and quarter vertical positions. Again with a truncation operation/representation split.
5. By taking the average of the 4 closest pixel values at the unit horizontal and vertical positions, i.e. according to f ═ a1+A2+A3+A4+2)/4 interpolates 1/4 the value of the sharpness sub-pixel f, where pixel A1、A2、A3And A4Is the four closest original pixels.
A disadvantage of TML5 is that the decoder is computationally complex. This is because the interpolation of the 1/4 definition sub-pixel values in the scheme used by the TML5 is a matter of relying on the interpolation of the 1/2 definition sub-pixel values. This means that in order to interpolate the value of the 1/4 sharpness sub-pixel, the value of the 1/2 sharpness sub-pixel that determines the 1/4 sharpness sub-pixel must first be calculated. Moreover, since certain values of the 1/4 definition sub-pixels depend on interpolated values obtained for other 1/4 definition sub-pixels, truncation of the 1/4 definition sub-pixel values has a deleterious effect on certain of the 1/4 definition sub-pixel values. Specifically, if 1/4 sharpness sub-pixel values are calculated from values that have not been truncated and limited, then the 1/4 sharpness sub-pixel values will be less accurate than they have. Another disadvantage of ML5 is that it must store 1/2 values for the definition sub-pixels to interpolate the 1/4 definition sub-pixel values. Therefore, additional memory is required to store the final unneeded results.
A sub-pixel value interpolation scheme according to TML6, referred to herein as a direct interpolation scheme, will now be described. The operation of the encoder in interpolation according to TML6 is similar to that in TML5 interpolation described previously, except that maximum accuracy is maintained throughout. This is achieved by using intermediate values that are neither truncated nor limited. The interpolation according to TML6 performed in the encoder is described in steps below with respect to fig. 11a, 11b and 11 c.
1. The sub-pixel values for the half unit horizontal and unit vertical positions, i.e., the 1/2 sharpness sub-pixel b in fig. 11a, are obtained by first calculating an intermediate value using six tap filters. As shown in fig. 11b, according to the formula b ═ (a)1-5A2+20A3+20A4-5A5+A6) The filter is based on 6 pixels (A) in a row1To A6) The b value is calculated symmetrically about b at the unit horizontal position and the unit vertical position. The final value of b is then calculated as b ═ b +16)/32, and is limited to the range [0, 2n-1]Within. As before, the segmentation is operated/represented in truncation.
2. The value of the 1/2 sharpness sub-pixel, labeled c, is obtained by first calculating the intermediate value c. Referring to fig. 11c, (a) according to formula c1-5A2+20A3+20A4-5A5+A6) Based on 6 pixels (A) in a column at a unit horizontal position and a unit vertical position symmetrical with respect to c1To A6) The value of (A) is calculated in units1/2 middle value of sharpness sub-pixel c in horizontal and semi-vertical position. The final value of 1/2 sharpness sub-pixel c at the unit horizontal and semi-vertical position is then calculated from c ═ c + 16)/32. Similarly, according to c ═ b1-5b2+20b3+20b4-5b5+b6) An intermediate value for 1/2 sharpness sub-pixel c in the horizontal and vertical positions is calculated. The final value for this 1/2 sharpness sub-pixel is then calculated from (c + 512)/1024. Again with truncation operation/segmentation in the representation and further limiting the calculated value for 1/2 sharpness sub-pixel c to the range 0, 2n-1]Within.
3. The value of the 1/4 sharpness sub-pixel for label d is calculated as follows. From the values of the pixels immediately adjacent at the unit horizontal and unit vertical positions (pixel a) and the intermediate value b calculated in step (1) for the immediately adjacent 1/2 sharpness sub-pixel at the unit horizontal and unit vertical position (1/2 sharpness sub-pixel b), the value of the 1/4 sharpness sub-pixel d at the quarter horizontal and unit vertical position is calculated according to d ═ 32A + b + 32)/64. According to d ═ 32c1+c2+1024)/2048, interpolating the value of the 1/4 sharpness sub-pixel d at the quarter horizontal and half vertical position using the intermediate value c calculated for the immediate 1/2 sharpness sub-pixel c at the unit horizontal and half vertical position and the half horizontal and half vertical position, respectively. Again with truncation operation/representation segmentation, and the finally obtained 1/4 sharpness sub-pixel value d is limited to the range 0, 2n-1]Within.
4. The value of the 1/4 sharpness sub-pixel for marker e is calculated as follows. From the values of the immediate neighboring pixels at the unit horizontal and unit vertical positions (pixel a), and the value c calculated in step (2) for the immediate neighboring 1/2 sharpness sub-pixel at the unit horizontal and half vertical positions, the value of 1/4 sharpness sub-pixel e at the unit horizontal and quarter vertical position is calculated according to e ═ 32A + c + 32)/64. Calculating a median value b calculated from the immediately adjacent 1/2 sharpness sub-pixels in half horizontal and unit vertical positions in step (1) and a median value c calculated from the immediately adjacent 1/2 sharpness sub-pixels in half horizontal and unit vertical positions in step (2) according to e ═ 32b + c +1024)/2048The value of 1/4 sharpness subpixel e at the half horizontal and quarter vertical positions. Again with truncation operation/representation segmentation, and the finally obtained 1/4 sharpness sub-pixel value e is limited to the range 0, 2n-1]Within.
5. According to g ═ 1024A +32b +32c1+c2+2048)/4096, the value of the 1/4 resolution sub-pixel labeled g is calculated using the median values of the nearest original pixel a and the three nearest neighbor 1/2 resolution sub-pixels. As done before, with truncation operation/representation segmentation, and the value finally obtained for the 1/4 sharpness sub-pixel g is limited to the range 0, 2n-1]Within.
6. By taking the average of the 4 closest pixel values at the unit horizontal and vertical positions, i.e. according to f ═ a1+A2+A3+A4+2)/4, interpolating 1/4 the value of the sharpness sub-pixel f, where pixel A is1、A2、A3And A4Is the four closest original pixels.
In the decoder, the sub-pixel values can be obtained by applying the 6-tap filter directly in the horizontal and vertical directions. Referring to fig. 11a, in the case of 1/4 sub-pixel resolution, the filter coefficients for the pixels and sub-pixels at a unit vertical position are, [0, 0, 64, 0, 0, 0] for a set of six pixels a, [1, -5, 52, 20, -5, 1] for a set of six sub-pixels d, [2, -10, 40, 40, -10, 2] for a set of six sub-pixels b, and [1, -5, 20, 52, -5, 1] for a set of six sub-pixels d. These filter coefficients are used for the respective pixel groups or sub-pixels in the same row as the sub-pixel value being interpolated.
After applying the filter in the horizontal and vertical directions, the interpolated value c is normalized according to c ═ c +2048)/4096, and the limit value is limited to the range [0, 2n-1]Within. When a motion vector points to an integer pixel position in the horizontal or vertical direction, a number of zero coefficients are used. In the practical implementation of TML6, the software optimized for different sub-pixels uses different branches, so thatWhere no multiplication by a zero coefficient occurs.
It should be noted that the 1/4 definition sub-pixel values in TML6 were obtained directly using the intermediate values mentioned above rather than from truncations and limits for 1/2 definition sub-pixels. Thus, in obtaining 1/4 definition sub-pixel values, there is no need to calculate a final value for any 1/2 definition sub-pixel. In particular, there is no need to perform truncation and limiting operations associated with the calculation of the final value for the 1/2 sharpness sub-pixel. Nor does it need to store the final value for the 1/2 definition sub-pixel for use in the calculation of the 1/4 definition sub-pixel value. Therefore, the computational complexity of TML6 is lower than TML5 because fewer truncate and limit operations are required. However, a disadvantage of TML6 is that high precision algorithms are required in both the encoder and decoder. High-precision interpolation in ASICs requires more silicon area and requires more computation in some CPUs. Moreover, direct interpolation implemented in the on-demand fashion specified in TML6 has a high memory requirement. This is an important factor especially in embedded devices.
In view of the foregoing discussion, it should be appreciated that, due to the different requirements of sub-pixel interpolation involved in video encoders and decoders, a significant problem exists in developing a method of sub-pixel value interpolation that can provide satisfactory performance in both the encoder and decoder. Moreover, none of the above-mentioned current test models (TML5, TML6) provides a solution that can be optimally applied in both encoder and decoder.
Summary of The Invention
According to a first aspect of the present invention there is provided a method of interpolation in video coding wherein an image comprises pixels arranged in rows and columns and represented by pixels having a specified dynamic range, the pixels in the rows at a unit horizontal position and the pixels in the columns at a unit vertical position being interpolated to produce values for sub-pixels at fractional horizontal and vertical positions in accordance with 1/2xDefinition ofWherein x is a positive integer having a maximum value N, the method comprising the steps of:
a) when needed for use at 1/2N-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1When the value of the sub-pixel at the unit vertical position is used, the value is interpolated directly using the weighted sum of the pixels at the unit horizontal and unit vertical positions;
b) when needed for use at 1/2N-1Unit level sum 1/2N-1The value of the sub-pixel per vertical position is used directly for the pixel at 1/2N-1First weighted sum of sub-pixel values for unit horizontal and unit vertical positions and 1/2 for sub-pixel values at unit horizontal and unit vertical positionsN-1Interpolating such value by one of the second weighted sums of sub-pixel values for a unit vertical position, the first and second weighted sums of sub-pixel values being calculated according to step (a); and
c) when needed for use at 1/2NUnit level sum 1/2NThe value of one sub-pixel at the unit vertical position is obtained by 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating a value of the second sub-pixel or pixel at the unit vertical position by a weighted average of the values of the variables m, N, p and q taken to integer values in the range 1 to N such that the first and second sub-pixels or pixels are relative to the value at 1/2NUnit level sum 1/2NThe sub-pixels in the vertical position are located diagonally.
Preferably, first and second weights are used in the weighted average involving step (c), the magnitude of the correlation of the weights being inversely proportional to the magnitude of the correlation of the first and second sub-pixels or pixel pairs at 1/2NUnit level sum 1/2NProximity of vertically positioned sub-pixels (straight diagonal).
In the first and second sub-pixels or pixels opposite (at 1/2)NUnit level sum 1/2NCase where vertically positioned sub-pixels are symmetrically placed (equidistant)The first and second weights may have equal values.
When required at 1/2N-1Unit level sum 1/2NThe sub-pixel for the position 1/2 in step b) can be used when a vertical position is assumedN-1A first weighted sum of values of sub-pixels at unit horizontal and unit vertical positions.
When required at 1/2NUnit level sum 1/2N-1One sub-pixel at a unit vertical position for being at a unit level and 1/2 in step b) may be usedN-1A second weighted sum of the values of the sub-pixels for the unit vertical position.
In one embodiment, when 1/2 is neededNUnit horizontal and unit vertical position and 1/2NLevel sum 1/2N-1For vertically positioned sub-pixel values, interpolating values by taking the average of a first pixel or sub-pixel at a vertical position and unit horizontal position corresponding to the sub-pixel being calculated and a second pixel or sub-pixel at a vertical position and 1/2 corresponding to the sub-pixel being calculatedN-1Unit horizontal position.
When required at unit level and 1/2NUnit vertical position and 1/2N-1Unit level sum 1/2NWhen the value of the sub-pixel at the unit vertical position is obtained, the desired value can be interpolated by taking the average of the value of a first pixel or sub-pixel at the horizontal position and the unit vertical position corresponding to the sub-pixel being calculated and the value of a second pixel or sub-pixel at the horizontal position and 1/2 corresponding to the sub-pixel being calculatedN-1Unit vertical position.
By taking the value of a pixel at the unit horizontal and unit vertical positions and 1/2N-1Unit level sum 1/2N-1Interpolation of the average value of the values of the sub-pixels per vertical position1/2NUnit level sum 1/2NThe value of the sub-pixel per vertical position.
Can be achieved by obtaining a point 1/2N-1The sum of the values of one sub-pixel at the unit horizontal and unit vertical positions is at the unit horizontal sum 1/2N-1Interpolating 1/2 by averaging the values of the sub-pixels at the unit vertical positionNUnit level sum 1/2NThe value of the sub-pixel per vertical position.
May be interpolated 1/2 by taking the average of the first pair of pixel valuesNUnit level sum 1/2NHalf the value of the sub-pixel per vertical position and interpolated 1/2 by taking the average of the values of the second pair of sub-pixelsNUnit level sum 1/2NThe other half of the unit vertical position, the first pair of pixel values being at 1/2N-1The value of one sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1The value of one sub-pixel at a unit vertical position, and the second pair of sub-pixel values are the values of the pixels at unit horizontal and unit vertical positions and 1/2N-1Unit level sum 1/2N-1The value of one sub-pixel per vertical position.
By taking the average of the first pair of pixel values at 1/2NUnit level sum 1/2NInterpolation of the values of the sub-pixels per vertical position is applied to one such sub-pixel, and the average of the values of the second pair of sub-pixels is taken at 1/2NUnit level sum 1/2NInterpolating the value of a sub-pixel per vertical position for a pixel adjacent to such sub-pixel, the first pair of sub-pixels being at 1/2N-1One sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1A sub-pixel at a unit vertical position, and the second pair of sub-pixels is at a pixel at a unit horizontal and unit vertical position and at 1/2N-1Unit level sum 1/2N-1One sub-pixel per vertical position.
At 1/2NUnit level sum 1/2NUnit ofThe vertically positioned sub-pixels may be alternately interpolated in one horizontal direction.
When required at 1/2NUnit level sum 1/2NWhen values of some sub-pixels of a unit vertical position are obtained, the values can be interpolated by taking the average of a plurality of nearest neighboring pixels.
At least one of steps a) and b) of interpolating sub-pixel values directly using the weighted sums may comprise a calculation for an intermediate value of sub-pixel values having a dynamic range greater than the specified dynamic range.
For having 1/2N-1The intermediate value of one sub-pixel of sub-pixel definition may be used to have 1/2NCalculation of a sub-pixel value for sub-pixel definition.
According to a second aspect of the present invention there is provided a method of interpolation in video coding, wherein an image comprises pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the image being interpolated to produce values for sub-pixels at fractional horizontal and vertical positions, the method comprising the steps of:
a) when values of sub-pixels at half unit horizontal and unit vertical positions, and unit horizontal and half unit vertical positions are required, directly interpolating such values using weighted sums of pixels at the unit horizontal and unit vertical positions;
b) when sub-pixels at half unit horizontal and half unit vertical positions are required, interpolating such values directly using a weighted sum of values for sub-pixels at half unit horizontal and unit vertical positions calculated according to step (a); and
c) when the values of the sub-pixels at the quarter-unit horizontal and quarter-unit vertical positions are required, such values are interpolated by taking the average of at least one pair of the values of a first pair of a sub-pixel at the half-unit horizontal and unit vertical positions and a sub-pixel at the unit horizontal and half-unit vertical positions and a second pair of a pixel at the unit horizontal and unit vertical positions and a sub-pixel at the half-unit horizontal and half-unit vertical positions.
According to a third aspect of the present invention there is provided a method of interpolation in video coding, wherein an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the image being interpolated to produce values for sub-pixels at fractional horizontal and vertical positions in accordance with 1/2xDefining the fractional horizontal and vertical positions, where x is a positive integer having a maximum value N, the method comprising the steps of:
a) when required 1/2N-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1When the value of the sub-pixel at the unit vertical position is used, the value is directly weighted and interpolated by using the pixels at the unit horizontal position and the unit vertical position;
b) when a value for a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is required, such a value is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
The sub-pixel used in the first weighted sum may be at 1/2N-1The sub-pixels at unit horizontal and unit vertical positions, and the first weighted sum may be used for interpolation 1/2N-1Unit level sum 1/2NThe value of one sub-pixel per vertical position.
The sub-pixels used in the second weighted sum may be at unit level and 1/2N-1The sub-pixel at the unit vertical position and the second weighted sum may be used for interpolation 1/2NUnit level sum 1/2N-1The value of one sub-pixel per vertical position.
When required at 1/2NUnit level sum 1/2NWhen the value of the sub-pixel is expressed in units of vertical position, the value can be interpolated by taking the average of at least one pair of the value of a first pair of pixels and the value of a second pair of pixels, the first pair of pixels being at 1/2N-1One sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1A sub-pixel at a unit vertical position, and the second pair of sub-pixels is a pixel at a unit horizontal and unit vertical position and at 1/2N-1Unit level sum 1/2N-1One sub-pixel per vertical position.
In the above solution, N may be equal to an integer selected from a list comprising values 2, 3 and 4.
A sub-pixel at a quarter-unit horizontal position is to be understood as a sub-pixel having one pixel at its left most adjacent unit horizontal position and one sub-pixel at its right most adjacent half-unit horizontal position, and a sub-pixel having one sub-pixel at its left most adjacent half-unit horizontal position and one pixel at its right most adjacent unit horizontal position. Accordingly, the sub-pixel at the quarter-unit vertical position will be understood as a sub-pixel having one pixel at its upper most adjacent unit vertical position and one sub-pixel at its lower most adjacent half-unit vertical position, and a sub-pixel having one sub-pixel at its upper most adjacent half-unit vertical position and one pixel at its lower most adjacent unit vertical position.
The term "dynamic range" refers to the range of values that the subpixel values and the weighted sums can take.
Preferably, changing the dynamic range, whether expanding or reducing the dynamic range, means changing the number of bits used to represent the dynamic range.
In one embodiment of the invention, the method is used for an image subdivided into image data blocks. Each block of image data includes four corners, each corner defined by a pixel at a unit horizontal and unit vertical position. The method is applied to each block of image data when a block of data is available for sub-pixel value interpolation. In addition, once all image data blocks of an image become available for sub-pixel value interpolation, sub-pixel value interpolation according to the method of the present invention is performed.
The method is preferably used in video coding. The method is preferably used in video decoding.
In one embodiment of the invention, when used in coding, the method is performed as a prior interpolation, wherein the values of all sub-pixels at half unit positions and all sub-pixels at quarter unit positions are calculated and stored before the determination of one predicted frame in the motion prediction coding process is subsequently used. In further embodiments, the method is performed as a combination of prior and on-demand interpolation. In this case, a certain proportion or class of sub-pixel values is calculated and stored before being used in the determination of a predicted frame, and certain other sub-pixel values are calculated only when needed in the motion prediction encoding process.
When the method is used in decoding, only the sub-pixels are interpolated when they need to be represented by one motion vector.
According to a fourth aspect of the present invention there is provided a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating sub-pixel values at fractional horizontal and vertical positions in accordance with 1/2xDefinition of WaterThe sum and vertical positions, where x is a positive integer with a maximum value of N, the interpolator being used to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) directly using the values calculated according to step (a) for the value at 1/2N-1First weighted sum of sub-pixel values for unit horizontal and unit vertical positions and 1/2 for sub-pixel values at unit horizontal and unit vertical positionsN-1Interpolating 1/2 a value of the second weighted sum of sub-pixel values for a unit vertical positionN-1Unit level sum 1/2N-1The value of the sub-pixel for a unit vertical position; and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qA weighted average of the values of the second sub-pixel or pixel at the unit vertical position, interpolated at 1/2NUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are relative to the value at 1/2NUnit level sum 1/2NThe vertically positioned sub-pixels are located diagonally.
The video encoder may comprise an image encoder. Which may comprise a video decoder. Which may be a codec comprising a video encoder and a video decoder.
According to a fifth aspect of the present invention, there is provided a communication terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating an image having a dynamic range for the pixels in the columns, the dynamic range being determined by a dynamic range for the pixels in the horizontal positions and for the pixels in the vertical positions, the dynamic range being determined by the interpolation of the dynamic range for theValues of sub-pixels at fractional horizontal and vertical positions according to 1/2xWhere x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) directly using the values calculated according to step (a) for the value at 1/2N-1First weighted sum of sub-pixel values for unit horizontal and unit vertical positions and 1/2 for sub-pixel values at unit horizontal and unit vertical positionsN-1Interpolating 1/2 a value of the second weighted sum of sub-pixel values for a unit vertical positionN-1Unit level sum 1/2N-1The value of the sub-pixel for a unit vertical position; and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qA weighted average of the values of the second sub-pixel or pixel at the unit vertical position, interpolated at 1/2NUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are relative to the value at 1/2NUnit level sum 1/2NThe vertically positioned sub-pixels are located diagonally.
The communication terminal may include an image encoder. Which may comprise a video decoder. Preferably it comprises a video codec consisting of a video encoder and a video decoder.
The communication terminal preferably comprises a user interface, a processor and at least one of a transmitting part and a receiving part, and a video encoder according to at least one of the third and fourth aspects of the invention. The processor preferably controls the operation of the transmitting and/or receiving components and the video encoder.
According to a sixth aspect of the present invention there is provided a telecommunications system comprising a telecommunications terminal and a network, the telecommunications network and the telecommunications terminal being connected by a communication line over which coded video signals can be transmitted, the telecommunications terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the fractional horizontal and vertical positions being in accordance with 1/2xDefining, where x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) directly using the values calculated according to step (a) for the value at 1/2N-1First weighted sum of sub-pixel values for unit horizontal and unit vertical positions and 1/2 for sub-pixel values at unit horizontal and unit vertical positionsN-1Interpolating 1/2 a value of the second weighted sum of sub-pixel values for a unit vertical positionN-1Unit level sum 1/2N-1The value of the sub-pixel for a unit vertical position; and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qA weighted average of the values of the second sub-pixel or pixel at the unit vertical position, interpolated at 1/2NUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are relative to the value at 1/2NUnit waterPeace 1/2NThe vertically positioned sub-pixels are located diagonally.
The telecommunications system is preferably a mobile telecommunications system comprising mobile communication terminals and a wireless network, the connection between the mobile communication terminals and the wireless network being formed by a radio link. The network preferably enables the communication terminal to communicate with other communication terminals connected to the network via a communication line between the other communication terminal and the network.
According to a seventh aspect of the present invention there is provided a telecommunications system comprising a communications terminal and a network, the telecommunications network and the communications terminal being connected by a communications line over which coded video signals can be transmitted, the network comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the fractional horizontal and vertical positions being in accordance with 1/2xDefining, where x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2NThe value of the sub-pixel for a unit vertical position;
b) directly using the values calculated according to step (a) for the value at 1/2N-1First weighted sum of sub-pixel values for unit horizontal and unit vertical positions and 1/2 for sub-pixel values at unit horizontal and unit vertical positionsN-1Interpolating 1/2 a value of the second weighted sum of sub-pixel values for a unit vertical positionN-1Unit level sum 1/2N-1The value of the sub-pixel for a unit vertical position; and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nUnit is verticalThe value of the first sub-pixel or pixel of the location and is at 1/2N-pUnit level sum 1/2N-qA weighted average of the values of the second sub-pixel or pixel at the unit vertical position, interpolated at 1/2NUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are relative to the value at 1/2NUnit level sum 1/2NThe vertically positioned sub-pixels are located diagonally.
According to an eighth aspect of the present invention there is provided a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being arranged to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
The interposer may further be used at 1/2N-1The values of the sub-pixels at unit horizontal and unit vertical positions form the first weighted sum and the first weighted sum is used to interpolate 1/2N-1Unit level sum 1/2NThe value of one sub-pixel per vertical position.
The interpolator may be further used to use a signal at a unit level and 1/2N-1The value of the sub-pixel at the unit vertical position forms the second weighted sum and the interpolation at 1/2 is performed using the second weighted sumNUnit level sum 1/2N-1The value of one sub-pixel per vertical position.
The interpolator may be further used to interpolate 1/2 by taking an average of at least one pair of a first pair of values of a pixel and a second pair of values of a pixelNUnit level sum 1/2NThe first pair of sub-pixels is at 1/2, the value of the sub-pixel being in the unit vertical positionN-1One sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1One sub-pixel at a unit vertical position and the second pair of sub-pixels is one pixel at a unit horizontal and unit vertical position and 1/2N-1Unit level sum 1/2N-1One sub-pixel per vertical position.
According to a ninth aspect of the present invention there is provided a communication terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being arranged to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
According to a tenth aspect of the present invention there is provided a telecommunications system comprising a communications terminal and a network, the telecommunications network and the communications terminal being connected by a communications line over which coded video signals can be transmitted, the communications terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being arranged to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
According to an eleventh aspect of the present invention there is provided a telecommunications system comprising a communications terminal and a network, the telecommunications network and the communications terminal being connected by a communications line over which coded video signals can be transmitted, the network comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being arranged to:
a) at 1/2, interpolation is performed directly using weighted sums of pixels at unit horizontal and unit vertical positionsN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
The invention provides a method of interpolation in video coding in which an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows lying at unit horizontal positions and the pixels in the columns lying at unit vertical positions, is interpolated to produce values for sub-pixels at fractional horizontal and vertical positions corresponding to 1/2xDefining, wherein x is a positive integer having a maximum value of N, the method comprising the steps of:
a) when required 1/2N-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1When the values of the sub-pixels at the unit vertical position are obtained, the values are directly weighted and interpolated by using the pixels at the unit horizontal and unit vertical positions;
b) when required at 1/2N-1Unit level sum 1/2N-1The value of the sub-pixel at unit vertical position is used directly at 1/2N-1A first weighted sum of sub-pixel values at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1Interpolating the values by a selected one of a second weighted sum of sub-pixel values for a unit vertical position, the first and second weighted sums of sub-pixel valuesThe sum is calculated according to step (a); and
c) when required at 1/2NUnit level sum 1/2NThe value of one sub-pixel at the unit vertical position is obtained by 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating a value of the second sub-pixel or pixel at the unit vertical position by a weighted average of the values of the variables m, N, p and q taken to integer values in the range 1 to N such that the first and second sub-pixels or pixels are relative to the value at 1/2NUnit level sum 1/2NThe sub-pixels in the vertical position are located diagonally.
Preferably, wherein the first and second weights are used in a weighted average involving step (c), the relative inverse of the weights being proportional to the relative inverse of the first and second sub-pixel or pixel pair at 1/2NUnit level sum 1/2NThe (straight diagonal) proximity of the sub-pixels per vertical position.
Preferably wherein at the first and second sub-pixels or pixels relative to at 1/2NUnit level sum 1/2NIn case the vertically positioned sub-pixels are symmetrically placed (equidistant), the first and second weights have equal values.
Preferably, wherein when desired at 1/2N-1Unit level sum 1/2NUsing one sub-pixel per vertical position for being at 1/2 in step b)N-1A first weighted sum of values of the sub-pixels at unit horizontal and unit vertical positions.
Preferably, wherein when desired at 1/2NUnit level sum 1/2N-1Using a sub-pixel at a unit vertical position in step b) for being at a unit horizontal level 1/2N-1A second weighted sum of values of the sub-pixels for the unit vertical position.
Preferably, wherein 1/2 is required when requiredNUnit horizontal and unit vertical position and 1/2NHorizontal sum 1/2N-1For vertically positioned sub-pixel values, interpolating values by taking the average of a first pixel or sub-pixel at a vertical position and unit horizontal position corresponding to the sub-pixel being calculated and a second pixel or sub-pixel at a vertical position and 1/2 corresponding to the sub-pixel being calculatedN-1Unit horizontal position.
Preferably, wherein the unit level and 1/2 are used when requiredNUnit vertical position, and 1/2N-1Unit level sum 1/2NThe value of the sub-pixel at the unit vertical position is calculated by taking the value of the first pixel or sub-pixel at the horizontal position and unit vertical position corresponding to the sub-pixel being calculated and the horizontal position and 1/2 corresponding to the sub-pixel being calculatedN-1This value is interpolated from the average of the values of the second pixel or sub-pixel for the unit vertical position.
Preferably by taking the value of a pixel at a unit horizontal and unit vertical position and 1/2N-1Unit level sum 1/2N-1Interpolating 1/2 by averaging the values of the sub-pixels at the unit vertical positionNUnit level sum 1/2NThe value of the sub-pixel per vertical position.
Preferably, wherein the obtaining is at 1/2N-1The sum of the values of one sub-pixel at the unit horizontal and unit vertical positions is at the unit horizontal sum 1/2N-1Average value of sub-pixel value of unit vertical position for interpolation 1/2NUnit level sum 1/2NThe value of the sub-pixel per vertical position.
Preferably wherein 1/2 is interpolated by taking the average of the first pair of valuesNUnit level sum 1/2NHalf the value of the sub-pixel per vertical position and interpolated 1/2 by taking the average of the second pair of valuesNUnit level sum 1/2NThe other half of the unit vertical position, the first pair being at 1/2N-1Unit level of andthe value of one sub-pixel at a unit vertical position and 1/2 at a unit horizontal sumN-1The value of one sub-pixel at a unit vertical position and the second pair of sub-values being the values of the pixels at unit horizontal and unit vertical positions and 1/2N-1Unit level sum 1/2N-1The value of one sub-pixel per vertical position.
Preferably, wherein the average of the first pair of values is taken at 1/2NUnit level sum 1/2NThe value of the sub-pixel per vertical position is alternately interpolated for one such sub-pixel and averaged at 1/2 by taking the average of the second pair of valuesNUnit level sum 1/2NInterpolating the values of the sub-pixels per vertical position for pixels adjacent to such sub-pixels, the first pair of values being at 1/2N-1The value of one sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1The value of a sub-pixel at a unit vertical position and the second pair of sub-pixel values is the value of a pixel at unit horizontal and unit vertical positions and is at 1/2N-1Unit level sum 1/2N-1The value of one sub-pixel per vertical position.
Preferably, wherein at 1/2NUnit level sum 1/2NThe sub-pixels of a unit vertical position may be alternately interpolated in one horizontal direction.
Preferably, wherein when desired at 1/2NUnit level sum 1/2NWhen values of some sub-pixels in a unit vertical position are obtained, the values are interpolated by taking an average value of a plurality of nearest neighboring pixels.
Preferably, where N is equal to an integer selected from a list comprising values 2, 3 and 4.
Preferably, wherein at least one of step a) and step b) of interpolating the sub-pixel values directly using the weighted sum comprises a calculation of an intermediate value for sub-pixel values having a dynamic range greater than the specified dynamic range.
Preferably, there is 1/2 thereinN-1The intermediate value of one sub-pixel of sub-pixel definition is used to have 1/2NCalculation of a sub-pixel value for sub-pixel definition.
The present invention also provides a method of interpolation in video coding, wherein an image comprises pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the image being interpolated to produce values for sub-pixels at fractional horizontal and vertical positions, the method comprising the steps of:
a) when values for sub-pixels at half unit horizontal and unit vertical positions, and unit horizontal and half unit vertical positions are required, weighting and interpolating the values directly using pixels at the unit horizontal and unit vertical positions;
b) when sub-pixels at half unit horizontal and half unit vertical positions are required, interpolating the values directly using a weighted sum of the values for sub-pixels at half unit horizontal and unit vertical positions calculated according to step (a); and
c) when the values of the sub-pixels at the quarter-unit horizontal and quarter-unit vertical positions are required, the values are interpolated by taking the average of at least one pair of a first pair of sub-pixel values of a sub-pixel at the half-unit horizontal and unit vertical positions and a sub-pixel at the unit horizontal and half-unit vertical positions and a second pair of sub-pixel values of a pixel at the unit horizontal and unit vertical positions and a sub-pixel at the half-unit horizontal and half-unit vertical positions.
The invention also provides a method of interpolation in video coding, wherein an image comprises pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the imageIs interpolated to generate values for sub-pixels at fractional horizontal and vertical positions corresponding to 1/2xDefining, wherein x is a positive integer having a maximum value of N, the method comprising the steps of:
a) when required 1/2N-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1When the values of the sub-pixels at the unit vertical position are obtained, the values are directly weighted and interpolated by using the pixels at the unit horizontal and unit vertical positions;
b) when a value for a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is required, such a value is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
Preferably, wherein the sub-pixel used in the first weighted sum is at 1/2N-1Sub-pixels at unit horizontal and unit vertical positions, and the first weighted sum is used for interpolation 1/2N-1Unit level sum 1/2NThe value of one sub-pixel per vertical position.
Preferably, wherein the sub-pixel used in the second weighted sum of picks is at the unit level sum 1/2N-1The sub-pixel of the unit vertical position and the second weighted sum is used for interpolation 1/2NUnit level sum 1/2N-1The value of one sub-pixel per vertical position.
Preferably, wherein when desired at 1/2NUnit level sum 1/2NInterpolating the value of the sub-pixel in the unit vertical position by taking the average of at least one of a first pair of values and a second pair of values, the first pair of values being at 1/2N-1The value of one sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1One of unit vertical positionThe value of a sub-pixel, and the second pair of sub-pixels is the value of a pixel at a unit horizontal and unit vertical position and at 1/2N-1Unit level sum 1/2N-1The value of one sub-pixel per vertical position.
The invention also provides a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows lying at unit horizontal positions and the pixels in the columns lying at unit vertical positions, the video encoder comprising an interpolator for generating sub-pixel values at fractional horizontal and vertical positions in accordance with 1/2xDefining the fractional horizontal and vertical positions, where x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) direct use is at 1/2N-1A first weighted sum of sub-pixel values at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1One choice of a second weighted sum of sub-pixel values per vertical position is interpolated 1/2N-1Unit level sum 1/2N-1A value of a sub-pixel per vertical position, the first and second weighted sums of sub-pixel values being calculated according to step (a); and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating at 1/2 a weighted average of the values of the second sub-pixel or pixel at the unit vertical positionNUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are opposite to the value at 1/2NUnit ofLevel sum 1/2NThe sub-pixels in the vertical position are located diagonally.
Preferably, a video encoder is included.
Preferably, a video decoder is included.
The present invention also provides a codec for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows lying at unit horizontal positions and the pixels in the columns lying at unit vertical positions, the codec comprising a video encoder and a video decoder, each of the video encoder and the video decoder comprising an interpolator for generating sub-pixel values at fractional horizontal and vertical positions according to 1/2xDefining the fractional horizontal and vertical positions, where x is a positive integer having a maximum value of N, the interpolator of the video encoder and the interpolator of the video decoder being used for:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) direct use is at 1/2N-1A first weighted sum of sub-pixel values at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1A selection of a second weighted sum of sub-pixel values per vertical position for interpolation 1/2N-1Unit level sum 1/2N-1A value of a sub-pixel per vertical position, the first and second weighted sums of sub-pixel values being calculated according to step (a); and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating at 1/2 a weighted average of the values of the second sub-pixel or pixel at the unit vertical positionNUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are opposite to the value at 1/2NUnit level sum 1/2NThe sub-pixels in the vertical position are located diagonally.
The invention also provides a communication terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows lying at unit horizontal positions and the pixels in the columns lying at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the fractional horizontal and vertical positions being in accordance with 1/2xWhere x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) direct use is at 1/2N-1A first weighted sum of sub-pixel values at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1A selection of a second weighted sum of sub-pixel values per vertical position for interpolation 1/2N-1Unit level sum 1/2N-1A value of a sub-pixel per vertical position, the first and second weighted sums of sub-pixel values being calculated according to step (a); and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating at 1/2 a weighted average of the values of the second sub-pixel or pixel at the unit vertical positionNUnit level sum 1/2NThe values of the variables m, n, p and q are taken in the range of one sub-pixel per vertical position1 to N, such that the first and second sub-pixels or pixels correspond to the integer value at 1/2NUnit level sum 1/2NThe sub-pixels in the vertical position are located diagonally.
Preferably, a video encoder is included.
Preferably, a video decoder is included.
The invention also provides a telecommunications system comprising a telecommunications terminal and a network, the telecommunications network and the telecommunications terminal being connected by a communications line over which coded video signals can be transmitted, the telecommunications terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the fractional horizontal and vertical positions being in accordance with 1/2xDefining, where x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) direct use is at 1/2N-1A first weighted sum of sub-pixel values at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1A selection of a second weighted sum of sub-pixel values per vertical position for interpolation 1/2N-1Unit level sum 1/2N-1A value of a sub-pixel per vertical position, the first and second weighted sums of sub-pixel values being calculated according to step (a); and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating at 1/2 a weighted average of the values of the second sub-pixel or pixel at the unit vertical positionNUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are opposite to the value at 1/2NUnit level sum 1/2NThe sub-pixels in the vertical position are located diagonally.
Preferably, the telecommunications system is a mobile telecommunications system comprising a mobile communications terminal and a wireless network, the connection between the mobile communications terminal and the wireless network being formed by a radio link.
Preferably, wherein the network enables the communication terminal to communicate with other communication terminals connected to the network via a communication link between the other communication terminal and the network.
The invention also provides a telecommunications system comprising a communications terminal and a network, the telecommunications network and the communications terminal being connected by a communications line over which encoded video signals can be transmitted, the network comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the fractional horizontal and vertical positions being in accordance with 1/2xDefining, where x is a positive integer having a maximum value of N, the interpolator being configured to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) direct use is at 1/2N-1Sub-pixel of unit horizontal and unit vertical positionA first weighted sum of values at a unit level sum 1/2N-1A selection of a second weighted sum of sub-pixel values per vertical position for interpolation 1/2N-1Unit level sum 1/2N-1A value of a sub-pixel per vertical position, the first and second weighted sums of sub-pixel values being calculated according to step (a); and
c) by taking the position at 1/2N-mUnit level sum 1/2N-nThe value of the first sub-pixel or pixel per vertical position and at 1/2N-pUnit level sum 1/2N-qInterpolating at 1/2 a weighted average of the values of the second sub-pixel or pixel at the unit vertical positionNUnit level sum 1/2NThe values of the variables m, N, p and q take integer values in the range 1 to N for one sub-pixel per vertical position, such that the first and second sub-pixels or pixels are opposite to the value at 1/2NUnit level sum 1/2NThe sub-pixels in the vertical position are located diagonally.
The invention also provides a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows lying at unit horizontal positions and the pixels in the columns lying at unit vertical positions, the encoder comprising an interpolator for generating values of sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being arranged to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
Preferably, wherein the interposer may be further used at 1/2N-1The values of the sub-pixels at unit horizontal and unit vertical positions form the first weighted sum and the first weighted sum is used to interpolate 1/2N-1Unit level sum 1/2NThe value of one sub-pixel per vertical position.
Preferably, wherein the interposer may be further used at 1/2N-1The values of the sub-pixels at unit horizontal and unit vertical positions form the second weighted sum and the second weighted sum is used to interpolate 1/2NUnit level sum 1/2N-1The value of one sub-pixel per vertical position.
Preferably, wherein the interpolator is further adapted to interpolate 1/2 by taking an average of at least one pair of a first pair of values of the pixel and a second pair of values of the pixelNUnit level sum 1/2NThe first pair of sub-pixels is at 1/2, the value of the sub-pixel being in the unit vertical positionN-1One sub-pixel at unit horizontal and unit vertical positions and 1/2 at unit horizontal and unit vertical positionsN-1One sub-pixel at a unit vertical position and the second pair of sub-pixels is one pixel at a unit horizontal and unit vertical position and 1/2N-1Unit level sum 1/2N-1One sub-pixel per vertical position.
Preferably, a video encoder is included.
Preferably, a video decoder is included.
The invention also provides a codec for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows lying at unit horizontal positions and the pixels in the columns lying at unit vertical positions, the codec comprising a video encoder and a video decoder, each of the video encoder and the video decoder comprising an interpolator for generating sub-pixel values at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator of the video encoder and the interpolator of the video decoder being arranged to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
The invention also provides a communication terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by values having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being arranged to:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
Preferably, a video encoder is included.
Preferably, a video decoder is included.
The invention also provides a telecommunications system comprising a telecommunications terminal and a network, the telecommunications network and the telecommunications terminal being connected by a communications line over which encoded video signals can be transmitted, the telecommunications terminal comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being for:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
Preferably, the telecommunications system is a mobile telecommunications system comprising a mobile communications terminal and a wireless network, the connection between the mobile communications terminal and the wireless network being formed by a radio link.
Preferably, wherein the network enables the communication terminal to communicate with other communication terminals connected to the network via a communication link between the other communication terminal and the network.
The invention also provides a telecommunications system comprising a communications terminal and a network, the telecommunications network and the communications terminal being connected by a communications line over which encoded video signals can be transmitted, the network comprising a video encoder for encoding an image comprising pixels arranged in rows and columns and represented by having a specified dynamic range, the pixels in the rows being at unit horizontal positions and the pixels in the columns being at unit vertical positions, the video encoder comprising an interpolator for generating values for sub-pixels at fractional horizontal and vertical positions, the sharpness of the sub-pixels being determined by a positive integer N, the interpolator being for:
a) at 1/2 using weighted sum interpolation of pixels at unit horizontal and unit vertical positions directlyN-1Unit horizontal and unit vertical position, and unit horizontal sum 1/2N-1The value of the sub-pixel for a unit vertical position;
b) the value of a sub-pixel at a sub-pixel horizontal and sub-pixel vertical position is interpolated directly using a selection of a first weighted sum for the value of the sub-pixel at a vertical position corresponding to the sub-pixel being calculated and a second weighted sum for the value of the sub-pixel at a horizontal position corresponding to the sub-pixel being calculated.
Description of the drawings
Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings, in which:
fig. 1 shows a video encoder according to the prior art;
fig. 2 shows a video decoder according to the prior art;
FIG. 3 illustrates the types of frames used in video encoding;
FIGS. 4a, 4b, and 4c illustrate steps in performing data block matching;
FIG. 5 illustrates a process for sub-pixel sharpness motion estimation;
fig. 6 shows a terminal device comprising an encoding and decoding device that can implement the method of the invention;
FIG. 7 illustrates a video encoder consistent with one embodiment of the invention;
FIG. 8 shows a video decoder according to an embodiment of the present invention;
figure 9 shows a schematic diagram of a mobile telecommunications network according to one embodiment of the present invention;
FIG. 10a shows one representation depicting pixel and sub-pixel locations dedicated to TML 5;
FIG. 10b illustrates interpolation of a half-resolution sub-pixel;
FIG. 10c illustrates interpolation of a half-resolution sub-pixel;
FIG. 11a shows one representation depicting pixel and sub-pixel locations dedicated to TML 6;
FIG. 11b illustrates interpolation of a half-resolution sub-pixel;
FIG. 11c illustrates interpolation of a half-resolution sub-pixel;
FIG. 12a shows one representation depicting pixel and sub-pixel locations that is dedicated to the present invention;
FIG. 12b illustrates interpolation of a half-resolution sub-pixel in accordance with the present invention;
FIG. 12c illustrates interpolation of a half-resolution sub-pixel in accordance with the present invention;
FIG. 13 shows a possible option for diagonal interpolation of sub-pixels;
FIG. 14 shows half-sharpness sub-pixel values required to calculate another half-sharpness sub-pixel value;
FIG. 15a shows the half-definition sub-pixel values that must be calculated in order to interpolate the values of the quarter-definition sub-pixels in a block of image data using the interpolation of TML 5;
FIG. 15b shows the half-definition sub-pixel values that must be calculated in order to interpolate a quarter-definition sub-pixel value in a block of image data using interpolation in accordance with the present invention;
FIG. 16a shows the number of half-resolution sub-pixels that must be calculated in order to obtain the value of a quarter-resolution sub-pixel in a block of image data using interpolation of TML 5;
FIG. 16b shows the number of half-resolution sub-pixels that must be calculated in order to obtain the value of a quarter-resolution sub-pixel in a block of image data using interpolation in accordance with the present invention;
FIG. 17 shows one numbering scheme for each of the 15 sub-pixel locations;
FIG. 18 illustrates terms used to describe a pixel, a half definition sub-pixel, a quarter definition sub-pixel, and an eighth definition sub-pixel;
FIG. 19a shows diagonal directions used in interpolation of each eighth definition sub-pixel in one embodiment of the invention;
FIG. 19b shows diagonal directions used in interpolation of each eighth definition sub-pixel in another embodiment of the present invention; and
fig. 20 shows terms used to describe one-eighth resolution sub-pixels in an image.
Detailed Description
Fig. 1 to 5, 10a, 10b, 10c, 11a, 11b and 11c have been described previously.
Fig. 6 shows a terminal device comprising a video encoding and decoding apparatus that can be used to operate in accordance with the present invention. More precisely, the figure shows a multimedia terminal 60 implemented according to the ITU-T recommendation h.324. The terminal can be considered a multimedia transceiver device. It comprises units for capturing, encoding and multiplexing multimedia data streams for transmission over a communication network, and units for receiving, demultiplexing, decoding and displaying received multimedia information content. The ITU-T recommendation h.324 defines the overall operation of the terminal and references other recommendations that control the operation of its various components. Such multimedia terminals may be used in real-time applications such as video telephony for conversations or in non-real-time applications such as retrieving/data streaming video from multimedia content servers of the internet.
In the present description, it should be understood that the h.324 terminal shown in fig. 6 is only one of several alternative multimedia terminal embodiments suitable for application of the method of the present invention. It should also be noted that several existing alternatives relate to the location and implementation of the terminal device. As shown in fig. 6, the multimedia terminal may be located in a communication device connected to a fixed wired telephone network, such as an analog PSTN (public switched telephone network). In this case, the multimedia terminal is equipped with a modem 71 compliant with ITU-T recommendation V.8, V.34 and, optionally, V.8bis standards. Alternatively, the multimedia terminal may be connected to an external modem. The modem effects conversion of the multiplexed digital data and control signals generated by the multimedia terminal to analog form suitable for transmission over the PSTN. It also enables the multimedia terminal to receive data and control signals in analog form from the PSTN and convert them into a digital data stream that can be demultiplexed and processed in an appropriate manner by the terminal.
The h.324 multimedia terminal can also be implemented by means of a network capable of direct connection to a digital fixed line network, such as ISDN (integrated services digital network). In this case the modem 71 is replaced by an ISDN user network interface. In fig. 6, this ISDN user network interface is represented by optional block 72.
The h.324 multimedia terminal may also be suitable for use in mobile communication applications. If a wireless communication link is used, the modem 71 can be replaced by any suitable wireless interface, as represented by optional block 73 in FIG. 6. For example, an h.324/M multimedia terminal can comprise a radio transceiver enabling connection to a current second generation GSM mobile phone network, or the proposed third generation UMTS (universal mobile telephone system).
It should be noted that in multimedia terminals designed for bi-directional communication, i.e. for the transmission and reception of video data, both a video encoder and a video decoder implemented according to the present invention are advantageously provided. Such a pair of encoder and decoder is often implemented as a single combined functional component, called a 'codec'.
Since the video encoder according to the invention performs motion compensated video encoding for sub-pixel definition using a specific interpolation scheme and a specific combination of prior and on-demand sub-pixel value interpolation, it is generally desirable for the video decoder of a receiving terminal to be implemented in a manner compatible with the encoder forming the transmitting end of the compressed video data stream. The lack of such a guarantee of compatibility may have a detrimental effect on the quality of the motion compensation and the accuracy of the reconstructed image frame.
A typical h.324 multimedia terminal will now be described in further detail with reference to fig. 6.
The multimedia terminal 60 includes various components called 'terminal equipment'. Including video, audio and telematics devices, are designated by the reference numerals 61, 62 and 63, respectively, by class. The video device 61 may comprise, for example, a camera for capturing video images, a monitor for displaying the content of received video information and optionally a video processing device. The audio device 62 typically includes, for example, a microphone for capturing spoken information and a speaker for reproducing received audio content. The audio device may further comprise an additional audio processing unit. Telematics device 63 may include a data terminal, keyboard, electronic whiteboard, or still image transceiver, such as a facsimile device.
The video device 61 is coupled to a video signal codec 65. The video signal codec 65 comprises a video encoder and a corresponding video decoder each implemented according to the invention. An encoder and a decoder will be described below. The video signal codec 65 is used to encode the captured video data in a suitable form for further transmission over a communication link and to decode compressed video information received from the communication network. In the example shown in fig. 6, the video signal codec is implemented according to ITU-T recommendation h.263, with appropriate modifications to implement the sub-pixel value interpolation method according to the present invention in the encoder and decoder of the video signal codec.
Similarly, the audio equipment of the terminal is coupled to an audio codec, indicated by reference numeral 66 in fig. 6. Like in the case of video signal codecs, an audio codec comprises one encoder/decoder pair. Converts audio data captured by the audio equipment of the terminal into a form suitable for transmission over the communication link and converts encoded audio data received from the network back into a form suitable for reproduction on, for example, the terminal's speakers. The output of the audio codec is passed to a delay element 67. This will compensate for the delay introduced by the video encoding process and ensure that the audio and video information content is sound and picture synchronized.
The system control block 64 of the multimedia terminal controls the end-to-network signaling using an appropriate control protocol (signaling means 68) to establish a common mode of operation between the sending and receiving terminals. The signaling component 68 exchanges information about the encoding and decoding capabilities of the sending and receiving terminals and may be used to implement various encoding modes of the video encoder. The system control block 64 also controls the use of data encryption. Information on the encryption type used in data transmission is transferred from the encryption section 69 to a multiplexer/demultiplexer (MUX/DMUX device) 70.
During data transmission from the multimedia terminal, the MUX/DMUX device 70 combines the encoded and synchronized image and audio data streams with the data and possibly control data input from the telematic device 63 to form a single bit data stream. Information relating to the type of encryption of data (if any) to be applied to the bitstream provided by the encryption component 69 is used to select an encryption mode. Accordingly, when receiving a multiplexed and possibly encrypted multimedia bitstream, the MUX/DMUX device 70 is used to decrypt the bitstream, separate it into constituent multimedia components, and pass those components to the appropriate codec and/or terminal equipment for decoding and reproduction.
It should be noted that the functional elements of the multimedia terminal, the video encoder, the decoder and the video signal codec according to the invention can be implemented as software or dedicated hardware, or as a combination of software and hardware. The video encoding and decoding methods according to the invention are particularly suitable for implementation in the form of a computer program comprising machine-readable instructions for carrying out the functional steps of the invention. For example, an encoder and decoder according to the present invention may be implemented as software code stored on a storage medium and executed in a computer, such as a personal desktop computer, to provide video encoding and/or decoding functionality for the computer.
If the multimedia terminal 60 is a mobile terminal, i.e. if it is equipped with a radio transceiver 73, the skilled person will understand that it may also comprise additional units. Which in one embodiment comprises a user interface with a display and a keyboard, enabling the multimedia terminal 60 to be operated by a user, together with the necessary functional blocks including a central processing unit, e.g. a microprocessor, controlling the components for the different functions of the multimedia terminal, the random access memory RAM, the read only memory ROM and the digital camera. The operating instructions of the microprocessor, i.e., basic functional program codes corresponding to the multimedia terminal 60, are stored in the read only memory ROM and can be executed by the microprocessor as required, for example, under the control of a user. According to the program code, the microprocessor uses the radio transceiver 73 to form a connection with a mobile communication network, enabling the multimedia terminal 60 to transmit and receive information from the mobile communication network via a radio path.
A microprocessor monitors the status of the user interface and controls the digital camera. In response to a user command, the microprocessor instructs the camera to record the digital image into the RAM. Once an image is captured, or during the capture process, the microprocessor segments the image into image fields (e.g., macroblocks) and performs motion compensated encoding of the fields using the encoder to produce a sequence of compressed images as explained in the description above. The user can instruct the multimedia terminal 60 to display the captured image on a display or to transmit the compressed image sequence to another multimedia terminal, a videophone or other communication device connected to a fixed subscriber line connection network (PSTN) using the radio transceiver 73. In a preferred embodiment, the first field is encoded, i.e. the transmission of the image data is started, so that the receiving party can start a corresponding decoding process with minimal delay.
Fig. 9 is a schematic diagram of a mobile telecommunications network according to one embodiment of the present invention; the multimedia terminal MS communicates with the base station BTS using a radio link. The base station BTS is further connected via a so-called Abis interface to a base station controller BSC controlling and managing several base stations.
The entity formed by several base stations BTS (usually several tens of base stations) and a single base station controller BSC controlling the base stations is called a base station subsystem BSS. In particular, the base station controller BSC manages the radio communication channels and transfers. The base station controller BSC is also connected to the mobile services switching centre MSC via a so-called a-interface, coordinating the form of connections to and from the mobile stations. Further connections to the outside of the mobile communication network are realized through a mobile services switching centre MSC. Outside the mobile communication network, there may further reside other networks, such as the internet or the Public Switched Telephone Network (PSTN), which are connected to the mobile communication network through a gateway GTW. With such an external network or within the telecommunications network, where video decoding or encoding stations may be located, for example, computer PC. in one embodiment of the invention, the mobile telecommunications network includes a video server VSRVR that provides video data to an MS subscribing to such a service. The video data is compressed using the motion compensated video compression method described above. The video server may act as a gateway to an online video source or may include a collection of previously recorded image programs. Typical videophone applications may include, for example, two mobile stations or a mobile station MS and a videophone connected to the PSTN, a PC connected to the internet or an h.261 compliant terminal connected to the internet or the PSTN.
Fig. 7 shows a video encoder 700 according to an embodiment of the invention. Fig. 8 shows a video decoder 800 according to an embodiment of the invention.
The encoder 700 includes an input 701 for receiving a video signal from a camera or other video source (not shown). It further comprises a DCT transformer 705, a digitizer 706, an inverse digitizer 709, an inverse DCT transformer 710, synthesizers 712 and 716, a previous sub-pixel interpolation component 730, a frame memory 740, and an on-demand sub-pixel interpolation component 750 implemented in conjunction with the motion estimation component 760. The encoder further comprises a motion field encoding unit 770 and a motion compensated prediction unit 780. Switches 702 and 714 are cooperatively operated by controller manager 720 to switch between one intra mode of video encoding and an inter mode of video encoding. The encoder 700 also includes a multiplexing unit (MUX/DMUX)790 for forming a single bit stream from the various information types produced by the encoder 700 for further transmission to a remote receiving terminal or, for example, for storage on a mass storage medium such as a computer hard drive (not shown).
It should be noted that the presence and implementation of the prior sub-pixel interpolation component 730 and the on-demand sub-pixel value interpolation component 750 in this encoder structure depends on the manner in which the sub-pixel interpolation method according to the present invention is applied. In embodiments of the present invention, no prior sub-pixel value interpolation is performed, and encoder 700 does not include prior sub-pixel value interpolation. In other embodiments of the present invention, only prior sub-pixel interpolation is performed, and thus the encoder does not include the on-demand sub-pixel value interpolation component 750. In embodiments where both prior and on-demand sub-pixel value interpolation is performed, both block 730 and block 750 are present in encoder 700.
The operation of the encoder 700 according to the present invention will now be described in detail. In the description, it is assumed that each frame of an uncompressed video signal received from a video source at input 701 is received and processed preferentially in raster scan order on a macroblock-by-macroblock basis. It is further assumed that when a new video sequence starts to be encoded, the first frame in sequence is encoded in intra mode. The encoder is then programmed to encode each frame in an inter-frame format unless one of the following conditions is met: 1) determining that the current frame being encoded is so different from the reference frame used in its prediction that too much prediction error information is produced; 2) has exceeded a redefined intra frame repetition interval; or 3) receiving feedback from a receiving terminal indicating that it will be required to encode a frame in an intra-frame format.
The occurrence of condition 1) is detected by monitoring the output of the combiner 716. The combiner 716 forms the difference between the current macroblock being coded with the code generated in the motion compensated prediction component 780 and its prediction. If the measure of this difference (e.g., the sum of the absolute differences of the pixel values) exceeds a predetermined threshold, the synthesizer 716 informs the control manager 720 via control line 717 and the control manager 720 operates the switches 702 and 714 to switch the encoder 700 to intra-coding mode. Condition 2) is monitored for the occurrence by a timer or frame counter implemented in control manager 720 by the control manager 720 operating the switches 702 and 714 to switch the encoder to intra-coding mode if the timer expires or the frame counter reaches a predetermined number of frames. Condition 3) is initiated if the control manager 720 receives a feedback signal from, for example, a receiving terminal via control line 718 indicating that an intra-frame refresh is required by the receiving terminal. Such a condition may occur, for example, if a previously transmitted frame is severely degraded by interference during its transmission, indicating that it is not possible to decode at the receiver. In this case, the receiver will issue a request for the next frame to encode in intra-frame format, thus reinitializing the encoded sequence.
It will further be assumed that the encoder and decoder are implemented such that the determination of the motion vectors has a spatial resolution of up to a quarter-pixel resolution. As will be seen below, better levels of clarity are also possible.
The operation of the encoder 700 in intra-frame encoding will now be described. In intra mode, the control manager 720 operates the switch 702 to receive video input from the input line 719. A video signal input is received macroblock by macroblock from input 701 via input line 719 and each macroblock of original image pixels is converted to DCT coefficients by DCT transformer 705. The DCT coefficients are then passed to a digitizer 706 where they are quantized using a quantization parameter QP. Via control line 722, control manager 720 controls the selection of the quantization parameter QP. Each DCT-transformed and quantized macroblock of intra-coded image information 723 that makes up the frame is passed from digitizer 706 to MUX/DMUX 790. MUX/DMUX 790 combines the intra-coded picture information with possible control information (e.g., header data, quantization parameter information, error correction data, etc.) to form a single bitstream of coded picture information 725. As is well known to those skilled in the art, Variable Length Codes (VLC) are used to reduce redundancy of the compressed video bit stream.
A partially decoded image is formed in the encoder 700 by passing the data output by the digitizer 706 through an inverse digitizer 709 and applying an inverse DCT transform 710 to the inverse quantized data. The generated data is then input to the synthesizer 712. In intra mode, the setting of switch 714 causes the input to synthesizer 712 via switch 714 to be zeroed out. In this way, the operation performed by the synthesizer 712 is equivalent to transferring the decoded image data formed by the inverse digitizer 709, and the inverse DCT transform 710 is unchanged.
In an embodiment of the present invention, a prior sub-pixel value interpolation is performed, and the output from the combiner 712 is applied to a prior sub-pixel interpolation unit 730. The input to the previous sub-pixel value interpolation unit 730 takes the form of a decoded image unit. In the prior sub-pixel value interpolation block 730, each decoded macroblock is subjected to sub-pixel interpolation by calculating a predetermined subset of sub-pixel definition sub-pixel values according to the interpolation method of the present invention and storing in the frame memory 740 together with the decoded pixel values.
In embodiments where prior sub-pixel interpolation is not performed, no prior sub-pixel interpolation component is present in the encoder structure, and the output from combiner 712, including the decoded block of image data, is applied directly to frame memory 740.
As subsequent macroblocks of the current frame are received and undergo the decoding steps in the components 705, 706, 709, 710, 712 described earlier, a decoded version of the intra frame is established in the frame memory 740. When the last macroblock of the current frame has been intra-coded and subsequently decoded, the frame store 740 contains a complete decoded frame available for use as a prediction reference frame in encoding a subsequently received image frame in inter-frame format. In embodiments of the present invention, prior sub-pixel value interpolation is performed, and the reference frame held in frame store 740 is at least partially interpolated to sub-pixel sharpness.
The operation of the encoder 700 in the inter coding mode will now be described. In the inter-coding mode, the control manager 720 operates the switch 702 to receive its input from line 721, including the output of the synthesizer 716. The combiner 716 forms prediction error information representing the difference between the macroblock of the frame currently being coded and its prediction, which is generated in the motion compensated prediction component 780. The prediction error information is DCT transformed in component 705 and quantized in component 706 to form a macroblock of DCT transformed and quantized prediction error information. Each macroblock of DCT transformed and quantized prediction error information is passed from the digitizer 706 to the MUX/DMUX unit 790. MUX/DMUX unit 790 combines this prediction error information 723 with motion coefficients 724 (described below) and control information (e.g., header data, quantization parameter information, error correction data, etc.) to form a single bit stream of encoded image information 725.
The partially decoded prediction error information for each macroblock of the inter-frame encoding is then formed in the encoder 700 by passing the encoded prediction error information 723 output by the digitizer 706 through the inverse digitizer 709 and applying the inverse DCT transform in the component 710. The resulting locally decoded macroblock of prediction error information is then input to a combiner 712. In the inter mode, the switch 714 is set so that the synthesizer 712 also receives the motion-predicted macroblock for the current inter frame generated in the motion-compensated prediction unit 780. Combiner 712 combines the two pieces of information to produce a block of reconstructed image data for the current frame.
As described above, in considering an intra-coded frame, a previous sub-pixel value interpolation is performed in an embodiment of the present invention, and the output of the synthesizer 712 is applied to the previous sub-pixel interpolation unit 730. Accordingly, the input to the previous sub-pixel value interpolation unit 730 in the inter coding mode is also in the form of a decoded image data block. In the prior sub-pixel value interpolation block 730, each decoded macroblock is subjected to sub-pixel interpolation by calculating a predetermined subset of sub-pixel definition sub-pixel values according to the interpolation method of the present invention and storing in the frame memory 740 together with the decoded pixel values. In embodiments where prior sub-pixel interpolation is not performed, no prior sub-pixel interpolation component is present in the encoder structure, and the output from combiner 712, including the decoded block of image data, is applied directly to frame memory 740.
As subsequent macroblocks of the video signal are received from the video source and undergo the previously described encoding and decoding in elements 705, 706, 709, 710, 712, a decoded inter-frame pattern is established in frame store 740. When the last macroblock of the current frame has been intra-coded and subsequently decoded, the frame store 740 contains a complete decoded frame available for use as a prediction reference frame in encoding a subsequently received image frame in inter-frame format. In embodiments of the present invention, prior sub-pixel value interpolation is performed, and the reference frame held in frame store 740 is at least partially interpolated to sub-pixel sharpness.
The format of the prediction of one macroblock of the current frame will now be described.
Any frame encoded in inter-frame format requires a reference frame for motion compensated prediction. This means that when encoding a video sequence, the first frame encoded, whether it be the first frame in the video sequence or other frames, must be encoded in intra-frame format. This in turn means that when the video encoder 700 is being switched to inter-coding mode by the control manager 720, a complete reference frame formed by partially decoding a previously encoded frame is already available in the encoder's frame memory 740. Typically, the reference frame is formed by partially decoding a frame of an intra-coded or inter-coded frame.
The first step of prediction to form a macroblock of the current frame is performed by the motion estimation block 760. The motion estimation block 760 receives the current macroblock of the encoded frame via line 727 and performs a block matching operation to identify an area in the reference frame that substantially corresponds to the current macroblock. The manner in which the block matching process is performed for sub-pixel sharpness depends on the implementation of the encoder 700 and the degree to which prior sub-pixel interpolation is performed, in accordance with the present invention. However, the basic principles supporting this block matching process are similar in all cases. Specifically, the motion estimation data block 760 performs data block matching by calculating a difference value (e.g., a sum of absolute value differences) representing a difference in pixel values between the current frame macroblock under examination and the candidate best matching region of pixels/sub-pixels in the reference frame. A difference is generated for all possible offsets (e.g., x, y displacements of one-quarter or one-eighth sub-pixel precision) between the macroblock of the current frame and the candidate test region in the predetermined search area of the reference frame, and the motion estimation block 760 determines the smallest calculated difference. The offset between the macroblock in the current frame and the candidate test region of pixel values/sub-pixel values in the reference frame that yields the smallest difference value defines the motion vector for the macroblock of interest. In some embodiments of the invention, an initial estimate for the motion vector with unit pixel precision is first determined and then refined as described above to a better level of sub-pixel precision.
In an encoder embodiment where no prior sub-pixel value interpolation is performed, all sub-pixel values needed in the data block matching process are calculated with the on-demand sub-pixel value interpolation component 750. The motion estimation section 760 controls the on-demand sub-pixel value interpolation section 750 to calculate each sub-pixel value required in the data block matching process in an on-demand manner, as necessary. In this case, the motion estimation section 760 may be implemented to perform the data block matching as a one-step process in which one motion vector having a desired sub-pixel definition is directly found, or the motion estimation section 760 may be implemented to perform the data block matching as a two-step process. If a two-step process is employed, the first step may include a search, for example, for full or half pixel definition motion vectors, and a second step is performed to refine the motion vectors to the desired sub-pixel definition. Since block matching is a complete process in which blocks of n x m pixels in the current frame are compared one by one with blocks of n x m pixels or sub-pixels in the interpolated reference frame, it should be appreciated that one sub-pixel calculation by on-demand pixel interpolation component 750 in an on-demand manner may need to be calculated multiple times as successive difference values are determined. In a video encoder, this scheme is not the most efficient option in computational complexity/burden of computation.
In an encoder that uses only prior sub-pixel value interpolation, in embodiments of the encoder that use only prior sub-pixel value interpolation, data block matching may be performed as a one-step process, since all sub-pixel values of the reference frame needed to determine a motion vector were previously calculated with the desired sub-pixel sharpness at block 730 and stored in frame memory 740. They are therefore directly available for use in the data block matching process and can be retrieved from the frame memory 740 by the motion estimation component 760 as required. However, even in the case where all sub-pixel values are available from the frame memory 740, performing data block matching in a two-step process would still be a more computationally efficient solution, since it would require fewer differences to be computed. It should be appreciated that while full prior sub-pixel value interpolation reduces computational complexity in the encoder, it is not the most efficient scheme in terms of memory consumption.
In embodiments of the encoder where both prior and on-demand sub-pixel value interpolation are used, the motion estimation component is performed in a manner that it can recover sub-pixel values previously calculated in the prior sub-pixel value interpolation component 730 and stored in the frame memory 740, and further control the on-demand sub-pixel value interpolation component 750 to calculate any additional sub-pixel values that may be needed. The block matching process may be performed as a one-step or two-step process. If a two-step implementation is used, the previously calculated sub-pixel values retrieved from the frame memory 740 may be used in a first step of the process, and a second step may be implemented to use the sub-pixel values calculated by the on-demand sub-pixel value interpolation component 750. In this case, some of the sub-pixel values used in the second step of the data block matching process may need to be calculated multiple times as successive comparisons are made, but the number of such repeated calculations is significantly less than if the previous sub-pixel value calculation were not used. Also, memory consumption is reduced relative to embodiments that use only prior sub-pixel value interpolation.
Once the motion estimation block 760 has generated a motion vector for the macroblock of the current frame under examination, it outputs the motion vector to the motion field coding block 770. The motion field encoding unit 770 then approximates the motion vectors received from the motion estimation unit 760 using a motion pattern. The motion pattern generally includes a set of basis functions. More specifically, the motion field encoding unit 770 motion vector is represented as a set of coefficient values (referred to as motion coefficients) that, when multiplied by a basis function, form an approximation of the motion vector. Motion coefficients 724 are passed from motion encoding block 770 to motion compensated prediction block 780 the motion compensated prediction block 780 also receives the pixel/sub-pixel values of the best matching candidate test region of the reference frame identified by motion estimation block 760. In fig. 7, these values are shown as being communicated from the on-demand sub-pixel interpolation component 750 via line 729. In further embodiments of the present invention, the pixel values of interest are provided from the motion estimation component 760 itself.
Using the approximate representation of the motion vector generated by motion field encoding unit 770 and the pixel/sub-pixel values of the best matching candidate test region, motion compensated prediction unit 780 generates a macroblock of predicted pixel values. A macroblock of predicted pixel values represents a prediction of the pixel values of the current macroblock generated from the interpolated reference frame. The macroblock of predicted pixel values is passed to a combiner 716 where the values are subtracted from the new current frame to generate prediction error information 723 for the macroblock as described above.
The motion coefficients 724 formed by the motion field coding means are also passed to a MUX/DMUX unit 790 where the coefficients are combined with prediction error information 723 for the macroblock of interest and possibly control information from a control manager 720 to form an encoded video data stream 725 for transmission to a receiving terminal.
The operation of the video decoder 800 according to the present invention will now be described in detail. Referring to fig. 8, the decoder 800 includes a demultiplexer unit (MUX/DMUX)810 that receives and demultiplexes the encoded video data stream 725 from the encoder 700, an inverse digitizer 820, an inverse DCT transform 830, a motion compensated prediction component 840, a frame memory 850, a combiner 860, a control manager 870, an output 880, a previous sub-pixel value interpolation component 845, and an on-demand sub-pixel interpolation component 890 associated with the motion compensated prediction component 840. In practice, the control manager 870 of the decoder 800 and the control manager 720 of the encoder 700 may be the same processor. This may be the case if the encoder 700 and decoder 800 are part of the same video signal codec.
Fig. 8 illustrates one embodiment of using a combination of prior and on-demand sub-pixel value interpolation in the decoder. In other embodiments, only prior sub-pixel value interpolation is used, wherein the decoder 800 does not include the on-demand sub-pixel value interpolation component 890. In a preferred embodiment of the present invention, no prior sub-pixel value interpolation is used in the decoder, and therefore the prior sub-pixel value interpolation block 845 is omitted from the decoder structure. If both prior and on-demand sub-pixel value interpolation is performed, the decoder includes both components 845 and 890.
The control manager 870 controls the operation of the decoder 800 in response to an intra or inter decoding. An intra/inter trigger control signal that causes the decoder to switch between decoding modes is derived, for example, from picture type information provided in the header portion of each compressed video frame received from the encoder. Along with other video codec control signals demultiplexed from encoded video data stream 725 by MUX/DMUX unit 810, the intra/inter trigger control signals are passed to control manager 870 via control line 815.
When decoding an intra frame, encoded video data stream 725 is demultiplexed into intra-coded macroblocks and control information. For an intra-coded picture frame, the encoded video data stream 725 does not include motion vectors. The decoding process is performed macroblock by macroblock. When the encoded information 723 for one macroblock is extracted from the video data stream by the MUX/DMUX unit 810, it is passed to the inverse digitizer 820. Based on the control information provided in video data stream 725, the control manager controls inverting digitizer 820 to apply the appropriate inverse quantization level to the encoded information macroblock. The inverse quantized macroblock is then inverse transformed in inverse DCT transformer 830 to form a decoded image information block. The control manager 870 controls the synthesizer 860 to prevent any reference information from being used in the decoding of intra-coded macroblocks. The decoded block of image information data is passed to the video output 880 of the decoder.
In embodiments of the decoder that employ prior sub-pixel value interpolation, the data block of decoded image information (i.e., pixel values) generated as a result of the inverse quantization and inverse transform operations performed by components 820 and 830 is passed to prior sub-pixel value interpolation component 845. Where sub-pixel value interpolation is performed according to the method of the invention, the details of the decoder implementation will determine the extent of the prior sub-pixel value interpolation performed. In embodiments of the present invention, instead of performing on-demand sub-pixel value interpolation, the previous sub-pixel value interpolation component 845 interpolates all sub-pixel values. In an embodiment using a combination of prior and on-demand sub-pixel value interpolation, the prior sub-pixel value interpolation component 845 interpolates a subset of certain sub-pixel values. This may include, for example, all sub-pixels at half-pixel locations, or a combination of sub-pixels at half-pixel and one quarter-pixel locations. In any case, after the interpolation of the previous sub-pixel values, the interpolated sub-pixel values are stored in the frame memory 850 along with the originally decoded pixel values. When subsequent macroblocks are decoded, previously interpolated and stored, a decoded frame at least partially interpolated to sub-pixel sharpness is assembled line by line in the frame memory 850 and becomes available as a reference frame for motion compensated prediction.
In embodiments of the decoder that do not employ prior sub-pixel value interpolation, the data blocks of the decoded image information (i.e., pixel values) generated as a result of the inverse quantization and inverse transform operations performed by components 820 and 830 on the macroblock are passed directly to frame memory 850. When the subsequent macroblock is decoded and stored, a decoded frame having unit pixel definition is assembled line by line in the frame memory 850, and becomes available as a reference frame for prediction of motion compensation.
When decoding an inter-frame, the encoded video data stream 725 is demultiplexed into the encoded prediction error information, the associated motion coefficients 724 and control information for each macroblock of the frame. Also, the decoding process is performed macroblock by macroblock. When the encoded prediction error information 723 for one macroblock is extracted from the video data stream by the MUX/DMUX unit 810, it is passed to the inverse digitizer 820. Based on the control information received in video data stream 725, the control manager 870 controls the inverse digitizer 820 to add the appropriate inverse quantum level to the encoded prediction error information and the inverse quantized macroblock of the prediction error information is then inverse transformed in an inverse DCT converter, producing a decoded prediction error for the macroblock.
Motion coefficients 724 associated with the macroblock of interest are extracted from the video data stream 725 by the MUX/DMUX unit 810 and passed to a motion compensated prediction unit 840, which reconstructs a motion vector for the macroblock using the same motion mode used to encode the inter-coded macroblock in the encoder 700. The reconstructed motion vector approximates the motion vector originally determined by the motion estimation component 760 of the encoder. The decoder's motion compensated prediction component 840 uses the reconstructed motion vectors to identify the location of a block of pixel/sub-pixel values in a prediction reference frame stored in the frame memory 850. The reference frame may be, for example, a previously decoded intra frame or a previously decoded inter frame. In either case, the block of pixel/sub-pixel values indicated by the reconstructed motion vector represents a prediction of the macroblock of interest.
The reconstructed motion vector may point to any pixel or sub-pixel. If the motion vector indicates that the current macroblock prediction is formed from pixel values (i.e., values of pixels at unit pixel locations), then these pixels can simply be retrieved from the frame memory 850 as the pixel value of interest directly obtained during the decoding of each frame. If the motion vector indicates that the prediction value for the current macroblock is formed from sub-pixel values, then these values are either retrieved from the frame memory 850 or calculated in the on-demand sub-pixel interpolation component 890. Whether the sub-pixel values have to be calculated or can simply be retrieved from the frame memory depends on the degree of interpolation of the previous sub-pixel values used in the decoder.
In embodiments of this decoder that do not employ prior sub-pixel value interpolation, the required sub-pixel values are all calculated in the on-demand sub-pixel value interpolation component 890. On the other hand, in embodiments where the sub-pixel values are all previously interpolated, the motion compensated prediction unit 840 can retrieve the desired sub-pixel value directly from the frame memory 850. In an embodiment using a combination of prior and on-demand sub-pixel value interpolation, the operation to obtain the desired sub-pixel value depends on which sub-pixel values were previously interpolated. Choosing as an example an embodiment where all sub-pixel values are calculated with half-pixel positions first, it is clear that if the reconstructed motion vector for a macroblock points to a pixel at a unit pixel position or a sub-pixel at a half-pixel position, all pixel or sub-pixel values needed for the prediction of the form of the macroblock are present in the frame memory 850 and can be retrieved from it by the motion compensated prediction means 840. However, if the motion vector represents a sub-pixel at a quarter-pixel location, the sub-pixels needed to form the prediction for the macroblock are not present in the frame memory 850 and are therefore calculated in the on-demand sub-pixel value interpolation component 890. In this case, the on-demand sub-pixel value interpolation component 890 retrieves from the frame memory 850 any pixels or sub-pixels needed to perform the interpolation and employs the interpolation method described below. The sub-pixel values calculated in the on-demand sub-pixel value interpolation component 890 are passed to the motion compensated prediction component 840.
Once a prediction for a macroblock has been obtained, the prediction (i.e., a macroblock of predicted pixel values) is passed from the motion compensated prediction component 840 to the combiner 860 where it is combined with the prediction error information for decoding of the macroblock to form a reconstructed block of image data, which is in turn passed to the video output 880 of the decoder.
It should be appreciated that in actual implementations of the encoder 700 and decoder 800, the degree to which a frame is previously sub-pixel interpolated and the amount of on-demand sub-pixel value interpolation performed accordingly can be selected or instructed depending on the hardware implementation of the video encoder 700 or the environment in which it is to be used. For example, if the memory available to the video encoder is limited, or memory must be reserved for other functions, then it is appropriate to limit the amount of prior sub-pixel value interpolation performed. In other cases where the microprocessor performing the video encoding operation has limited processing power, for example where the number of operations that can be performed per second is relatively low, it may be more appropriate to limit the amount of on-demand sub-pixel value interpolation that is performed. For example, in a mobile communications environment, when video encoding and decoding functions are incorporated in a mobile telephone or similar radio terminal for connectivity to a mobile telephone network, both storage and processing capabilities may be limited. In this case, a combination of prior and on-demand sub-pixel value interpolation may be the best choice to achieve efficient implementation in the video encoder. In the video decoder 800, the use of the prior sub-pixel values is generally not optimal, as it typically results in the computation of many sub-pixel values that are not actually used in the decoding process. However, it should be understood that both the encoder and decoder can be implemented to use the same division between the prior and on-demand sub-pixel value interpolations, although different amounts of prior and on-demand interpolations can be used in the encoder and decoder to optimize the operation of each interpolation.
Although the foregoing description does not describe the construction of bi-directionally predicted frames (B-frames) in the encoder 700 and the decoder 800, it should be understood that this capability may be provided in embodiments of the present invention. The provision of such capability is a consideration within the ability of those skilled in the art.
The encoder 700 or decoder 800 according to the invention can be implemented using hardware or software, or using a suitable combination of hardware and software. The encoder or decoder implemented in software may be, for example, a stand-alone program or can be variously programmed to use a software standard component. In the above description and in the drawings the functional components are shown as separate units, but these components can be implemented e.g. in one software program unit.
The encoder 700 and the decoder 800 can be further combined to form a video signal codec having encoding and decoding functions. In addition to being implemented in a multimedia terminal, such a codec may also be implemented in a network. The codec according to the invention may be a computer program or a computer program element or may be implemented at least partly in hardware.
The sub-pixel interpolation method used in the encoder 700 and the decoder 800 will now be described in detail. First the general concept of the method is presented and then two preferred embodiments are presented. In the first preferred embodiment, sub-pixel value interpolation is performed for 1/4 pixel resolution, while in the second preferred embodiment the method is extended to 1/8 pixel resolution.
It should be noted that this interpolation must produce exactly the same values in the encoder and decoder, but its implementation will be optimized separately for the encoder and decoder. For example, in an encoder according to the first embodiment of the present invention in which sub-pixel value interpolation is performed for 1/4 pixel sharpness, it is most efficient to calculate 1/2 sharpness pixels first and 1/4 sharpness sub-pixel values on an as-needed basis only when needed in the motion estimation process. This will have the effect of limiting memory usage while the computational complexity/burden remains at an acceptable level. On the other hand, it would be beneficial not to pre-compute any sub-pixels in the decoder. Thus, it should be understood that the preferred embodiment of the decoder does not include the prior sub-pixel value interpolation component 845, and that all sub-pixel value interpolations are performed in the on-demand sub-pixel value interpolation component 890.
In the description of the interpolation method provided below, reference will be made to the pixel positions depicted in fig. 12 a. The pixel label a in the figure indicates the original pixel (i.e., the pixel at the unit horizontal and vertical positions). The other alphabetically labeled pixels represent the sub-pixels to be interpolated. The following description will take on the description conventions introduced previously relating to pixel and sub-pixel locations.
The steps required to interpolate all sub-pixel positions are described subsequently: the value of the 1/2 sharpness sub-pixel, labeled b, is obtained by first computing an intermediate value b using a K-th order filter according to:
wherein xiIs a vector of filter coefficients, AiIs a corresponding vector of original pixel values a at unit horizontal and unit vertical positions, K being an integer defining the order of the filter. Therefore, equation (9) can be re-expressed as:
b=x1A1+x2A2+x3A3+....+xK-1AK-1+xKAK (10)
depending on the embodiment, the filter coefficient xiThe value of (c) and the order of the filter K may vary. Likewise, differences may be used in the calculation of different sub-pixels in one embodimentCoefficient values. In other embodiments, the filter coefficients xiThe value of (d) and the order of the filter may depend on which 1/2 definition b sub-pixels are being interpolated. Pixel AiThe sharpness sub-pixel b is placed symmetrically with respect to 1/2 being interpolated and is the nearest neighbor of that sub-pixel. With 1/2 definition subpixel b in the horizontal and unit vertical positions, pixel AiHorizontally with respect to b (as shown in fig. 12 b). If 1/2 sharpness sub-pixel b at the unit horizontal and semi-vertical positions is being interpolated, pixel AiVertically with respect to b (as shown in fig. 12 c).
By dividing the intermediate value b by a constant ratio 1, truncating it to obtain an integer and clipping the result to lie in the range 0, 2n-1]And the final value of the sharpness sub-pixel b is calculated 1/2. In alternative embodiments of the present invention, rounding may be performed instead of truncation. The constant ratio is preferably chosen to be equal to the filter coefficient xiThe sum of (1).
The value of the 1/2 sharpness sub-pixel, labeled c, is obtained by first computing an intermediate value c using an mth order filter according to:
wherein y isiIs a vector of filter coefficients, biIs a median value biThe corresponding vector in the horizontal or vertical direction, i.e.:
c=y1b1+y2b2+y3b3+....+yM-1bM-1+yMbM (12)
with the implementation ofExample difference, filter coefficient yiCan vary as well as the order of the filter M. Likewise, different coefficient values may be used in the calculation of different sub-pixels in one embodiment. The b value is preferably the median of the 1/2 sharpness sub-pixel b, and the 1/2 sharpness sub-pixel b is symmetrically located with respect to the 1/2 sharpness sub-pixel c and is the nearest neighbor sub-pixel of the sub-pixel c. In one embodiment of the present invention, 1/2 sharpness sub-pixel b is positioned horizontally with respect to sub-pixel c, and in an alternative embodiment they are positioned vertically with respect to sub-pixel c.
An integer is obtained by dividing the intermediate value c by a constant ratio 2, truncating it and clipping the result to lie in the range 0, 2n-1]And the final value of the sharpness sub-pixel c is calculated 1/2. In alternative embodiments of the present invention, rounding may be performed instead of truncation. The constant ratio 2 is preferably equal to the ratio1Ratio of2。
It should be noted that the results obtained using the median value b in the horizontal direction are the same as those obtained using the median value b in the vertical direction.
There are two alternative interpolation values for the 1/4 sharpness sub-pixel labeled h. Both include linear interpolation linking the 1/2 sharpness sub-pixel adjacent to the 1/4 sharpness sub-pixel h being interpolated along a diagonal. In the first embodiment, the value of sub-pixel h is calculated by averaging the values of the two 1/2 sharpness sub-pixels b that are closest to sub-pixel h. In the second embodiment, the value of sub-pixel h is calculated by averaging the values of the closest pixel a and the closest 1/2 sharpness sub-pixel c. It will be appreciated that this will provide the possibility of using different combinations of diagonal interpolation to determine the value of sub-pixel h within the limits of different groups of 4 image pixels a. It will also be appreciated, however, that the same combination should be used in both the encoder and decoder in order to produce exactly the same interpolation result. Fig. 13 depicts 4 possible options for diagonal interpolation of sub-pixel h in a 4-pixel neighborhood within an image. Simulations in a TML environment have demonstrated that both embodiments produce similar compression efficiencies. The second embodiment has a higher complexity because the calculation of the sub-pixel c requires the calculation of several intermediate values. The first embodiment is therefore optimal.
The values of the 1/4 sharpness sub-pixels labeled d and g are calculated from their nearest horizontal neighbors using linear interpolation. In other words, the value of 1/4 sharpness sub-pixel d is obtained by averaging its nearest horizontal neighbors, i.e., original image pixel a and 1/2 sharpness sub-pixel b. Similarly, the value of 1/4 sharpness sub-pixel g is obtained by taking 1/2 the average of the two nearest horizontally adjacent sub-pixels, sharpness sub-pixels b and c.
The 1/4 sharpness sub-pixel values labeled e, f, and i are calculated from their nearest neighbors in the vertical direction using linear interpolation. More specifically, the value of 1/4 sharpness sub-pixel e is obtained by taking the average of its two nearest vertically adjacent pixels, original image pixel a and 1/2 sharpness sub-pixel b. Similarly, the value of 1/4 sharpness sub-pixel f is obtained by taking 1/2 the average of the two nearest vertically adjacent sub-pixels, sharpness sub-pixels b and c. In one embodiment of the invention, the value of the 1/4 definition subpixel i is obtained in the same manner as just described in connection with the 1/4 definition subpixel f. However, in an alternative embodiment of the present invention, in common with the previously described H.26 test models TML5 and TML6, is a test according to (A)1+A2+A3+A4+2)/4 the sharpness sub-pixel i is determined 1/4 using the values of the four closest original image pixels.
It should also be noted that in all cases where an average value is determined that includes pixel and/or sub-pixel values, the average value may be formed in any suitable manner. For example, the value of the 1/4 sharpness sub-pixel d can be defined as d ═ a + b)/2, or as d ═ a + b + 1)/2. The addition of 1 to the sum of the values of pixel a and 1/2 definition sub-pixel b has the effect that any rounding or rounding operation subsequently applied rounds or rounds the value of d to the next highest integer value. This is true for any integer value of the sum and can be used for any averaging operation performed according to the method of the invention in order to control the rounding or truncation effect.
It should be noted that sub-pixel value interpolation according to the present invention provides advantages over each of TML5 and TML 6.
In contrast to TML5, where the values of some 1/4 definition sub-pixels depend on previously interpolated values obtained for other 1/4 definition sub-pixels, in the method according to the present invention all 1/4 definition sub-pixels are calculated from the original image pixel or 1/2 definition sub-pixel positions using linear interpolation. Therefore, the problem of reduced accuracy in those 1/4 definition sub-pixel values calculated from other 1/4 definition sub-pixels in TML5 due to intermediate truncation and clipping of the other 1/4 definition sub-pixels will not arise in the method according to the invention. Specifically, referring to FIG. 12a, 1/4 sharpness sub-pixel h (and sub-pixel i in one embodiment of the invention) is interpolated diagonally to reduce the correlation for the other 1/4 pixels. Furthermore, in comparison to TML5, in the method according to the invention, the number of computations (and hence the number of processor cycles) required to obtain the values of those 1/4 definition sub-pixels in the decoder is reduced. In addition, the calculation of any 1/4 definition sub-pixel value requires a substantially similar number of calculations to that required for any other 1/4 definition sub-pixel value being determined. More specifically, in the case where the required 1/2 definition sub-pixel values are already available, such as in the case where they have been previously calculated, the number of calculations required to interpolate one 1/4 definition sub-pixel value from the pre-calculated 1/2 definition sub-pixel value is the same as the number of calculations required to calculate any other 1/4 definition sub-pixel value from the available 1/2 definition sub-pixel value.
In contrast to TML6, the method according to the invention does not require the use of high precision algorithms for the calculation of all sub-pixels. Specifically, since all of the 1/4 definition sub-pixel values are calculated from the original image pixels or 1/2 definition sub-pixel values using linear interpolation, a less accurate algorithm can be used for the interpolation. As a result, in a hardware implementation of the inventive method, for example in an ASIC (application specific integrated circuit), the number of elements (e.g. gates) that must be dedicated to the calculation of the 1/4 sharpness sub-pixel values is reduced with a low precision algorithm. This in turn reduces the total area of silicon that must be dedicated to this interpolation function. In fact, the advantages provided by the present invention are particularly significant in this regard, since the majority of the sub-pixels are 1/4 definition sub-pixels (12 of the 15 sub-pixels shown in fig. 12 a). In software implementations where sub-pixel interpolation is performed using a standard instruction set of a conventional-purpose CPU (central processing unit) or using a DSP (digital signal processor), a reduction in the precision of the required algorithm generally results in an increase in the speed of the calculations that can be performed. This is particularly beneficial in 'low cost' implementations where it is desirable to utilize a conventional purpose CPU rather than any form of ASIC.
The present method according to the present invention provides further advantages compared to TML 5. As stated previously, at any given time, the decoder only needs 1 of the 15 sub-pixel positions, i.e. the 1 sub-pixel position indicated by the received motion vector information. Therefore, it would be beneficial if the value of a sub-pixel in any sub-pixel position could be calculated with a minimum number of steps resulting in a correct interpolated value. The method according to the invention provides this capability. As set forth in the detailed description provided above, the sharpness sub-pixel c may be interpolated 1/2 by filtering in the vertical or horizontal direction to obtain the same value for c regardless of whether horizontal or vertical filtering is used. Thus, when the number of required operations is reduced in such a way to calculate 1/4 the values of the sharpness sub-pixels f and g, the decoder can take advantage of this property in order to obtain the required values. For example, if the decoder needs 1/4 values for the sharpness sub-pixel f, then the sharpness sub-pixel c is interpolated 1/2 in the vertical direction. If the value of 1/4 sharpness sub-pixel g is required, it is beneficial to interpolate a value of c in the horizontal direction. Overall, therefore, the method according to the invention provides flexibility in the way in which values for certain 1/4 sharpness sub-pixels are derived. Such flexibility is not provided in TML 5.
Two specific embodiments will now be described in detail. The first represents the preferred embodiment for calculating values with a resolution of 1/4 pixels, while in the second embodiment the method according to the invention is extended to be used for calculating values with sub-pixels with a resolution of 1/2 pixels. For both embodiments, a comparison between the computational complexity/burden resulting from the use of the method according to the invention and the computational complexity/burden resulting from the interpolation according to TML5 and TML6 in an equivalent environment is provided.
A preferred embodiment for interpolating sub-pixels at 1/4 pixel resolution will be described with respect to fig. 12a, 12b and 12 c. In the following description, it is assumed that all image pixels and the final interpolation values for the sub-pixels are represented in 8 bits.
Calculation of 1/2 sharpness sub-pixels at i) half unit horizontal and unit vertical positions and ii) unit horizontal and half unit vertical positions.
1. By using six pixels (A)1To A6) First, the intermediate value b ═ a is calculated1-5A2+20A3+20A4-5A5+A6) The values of the sub-pixels at the half unit horizontal and unit vertical positions, i.e., 1/2 sharpness sub-pixel b in fig. 12a, are obtained as shown in fig. 12b and 12c, which are six pixels (a)1To A6) Are pixels located at unit horizontal and unit vertical positions in a pixel row or column containing b, and are symmetrically placed with respect to b. 1/2 the final value of the sharpness sub-pixel b is calculated as (b +16)/32, with truncation operation/representation split. The result is limited to the range [0, 255 ]]In (1).
Calculation of 1/2 sharpness sub-pixels at half unit horizontal and half unit vertical positions.
2. The values of the sub-pixels at the half-unit horizontal and half-unit vertical positions, i.e. 1/2 resolution sub-pixel c in fig. 12a, were calculated as c ═ using the median b of the six closest 1/2 resolution sub-pixels (b ═ b1-5b2+20b3+20b4-5b5+b6+512)/1024, the six closest 1/2 sharpness subpixels are in a row or column of subpixels containing c and are symmetrically placed about c. Also, the segmentation is performed in a truncate operation/representation, and the result is limited to the range [0, 255 ]]In (1). As previously explained, using the intermediate value b of the 1/2 sharpness sub-pixel in this horizontal direction yields the same result as using the intermediate value b of the 1/2 sharpness sub-pixel b in the vertical direction. Thus, in an encoder according to the invention, the direction of the sharpness sub-pixel b can be selected 1/2 for interpolation according to the best mode of an implementation. At a decoder according to the present invention, the direction for the interpolated sub-pixel b is selected according to any 1/4 sharpness sub-pixel to be interpolated using the results obtained for 1/2 sharpness sub-pixel c.
In i) quarter unit horizontal and unit vertical positions; ii) quarter unit horizontal and half unit vertical positions; iii) unit horizontal and quarter unit vertical positions; and iv) calculation of 1/4 sharpness sub-pixels for half unit horizontal and quarter unit vertical positions.
3. Using the closest original image pixel a and the closest 1/2 sharpness sub-pixel b in the horizontal direction, the value of the 1/4 sharpness sub-pixel d at the quarter-unit horizontal and unit vertical positions is calculated from d ═ a + b)/2. Similarly, using the two closest 1/2 sharpness sub-pixels in the horizontal direction, the value of the 1/4 sharpness sub-pixel g at quarter-unit horizontal and half-unit vertical positions is calculated from g ═ b + c)/2. In a similar manner, using the closest original image pixel a and the closest 1/2 sharpness sub-pixel b in the vertical direction, the value of the 1/4 sharpness sub-pixel e at the unit horizontal and quarter unit vertical positions is calculated from e ═ a + b)/2. Using the two closest 1/2 sharpness sub-pixels in the vertical direction, the value of the 1/4 sharpness sub-pixel f at half unit horizontal and quarter unit vertical positions is determined from f ═ b + c)/2. In all cases, the partitioning is operated on/represented in truncation.
Calculation of 1/4 sharpness subpixels at 1/4 units horizontal and 1/4 units vertical position.
4. Using the two nearest 1/2 sharpness sub-pixels in the diagonal direction, according to h ═ b1+b2) The/2 calculates the value of the 1/4 sharpness subpixel h at a horizontal position of 1/4 units and a vertical position of 1/4 units. Also, the segmentation is operated/represented in truncation.
5. Using the four closest original pixels a from i ═ a1+A2+A3+A4+2)/4 calculates the value of the 1/4 sharpness sub-pixel labeled i. Again, the partition is operated/represented with truncation.
An analysis of the computational complexity of the first preferred embodiment of the present invention will now be provided.
In an encoder, the same sub-pixel value will likely be calculated multiple times. Thus, as explained previously, the complexity of the encoder can be reduced by pre-calculating all sub-pixel values and storing them in memory. However, this solution increases the memory usage to a large extent. In the preferred embodiment of the present invention, motion vector accuracy of 1/4 in both the horizontal and vertical orientations is 1/4 pixels sharpness, requiring 16 times more memory to store the pre-computed sub-pixel values for the entire image than is required to store the original non-interpolated image pixels. To reduce memory usage, all 1/2 definition sub-pixels can be interpolated in advance, and 1/4 definition sub-pixels can be calculated on demand, i.e., only when they are needed. In accordance with the method of the present invention, the on-demand interpolation of 1/4 definition sub-pixel values only requires linear interpolation from 1/2 definition sub-pixels. Since only 8 bits are needed to represent the pre-computed 1/2 definition sub-pixel, the memory required to store the pre-computed 1/2 definition sub-pixel is four times the original image memory.
However, if the same strategy of pre-computing all 1/2 sharpness sub-pixels for prior interpolation is used in conjunction with the direct interpolation scheme of TML6, the memory requirement is increased to 9 times the memory required to store the original non-interpolated image. This result comes from the fact that a larger number of bits is required to store the high precision intermediate values associated with each 1/2 definition subpixel in TML 6. Furthermore, the complexity of sub-pixel interpolation during motion estimation of TML6 is higher since scaling and limiting must be performed for each 1/2 and 1/4 sub-pixel position.
The complexity of the sub-pixel value interpolation method according to the present invention when applied in a video decoder is compared with the complexity of the interpolation schemes used in TML5 and TML 6. Throughout the subsequent analysis, it is assumed that the interpolation of any sub-pixel value is performed in each method using only the minimum number of steps required to obtain a correct interpolation value. It is further assumed that each method is implemented on a block basis, i.e. that the common intermediate value to be interpolated in a particular nxm block is only calculated once. An illustrative example is provided in fig. 14. Referring to fig. 14, to compute 1/2 a 4 x 4 block of sharpness sub-pixels c, a 9 x 4 block of sharpness sub-pixels b is first computed 1/2.
Compared to the sub-pixel value interpolation method used in TML5, the method according to the invention has a lower computational complexity for the following reasons:
1. unlike the sub-pixel value interpolation scheme used in TML5, according to the method of the present invention, a value of 1/2 definition sub-pixel c can be obtained by filtering in the vertical or horizontal direction. Thus, to reduce the number of operations, 1/2 sharpness sub-pixel c can be interpolated in the vertical direction if a value of 1/4 sharpness sub-pixel f is required, and 1/2 sharpness sub-pixel c can be interpolated in the horizontal direction if a value of 1/4 sharpness sub-pixel g is required. As an example, fig. 15 shows the total 1/2 sharpness sub-pixel values that have to be calculated in order to interpolate 1/4 the value of the sharpness sub-pixel g in one image data block defined by 4 × 4 original image pixels using the interpolation method of TML5 (fig. 15a) and using the method according to the invention (fig. 15 b). In this example, sub-pixel value interpolation according to TML5 requires a total of 88 1/2 definition sub-pixels to be interpolated, whereas the method according to the present invention requires the computation of 72 1/2 definition sub-pixels. As can be seen from fig. 15b, according to the invention the sharpness sub-pixel c is interpolated 1/2 in the horizontal direction in order to reduce the number of required calculations.
2. According to the method of the present invention, the 1/4 sharpness sub-pixel h is calculated by linear interpolation from its two nearest neighbor 1/2 sharpness sub-pixels in the diagonal direction. The number of different 1/2 definition sub-pixels that must be calculated in order to obtain the value of 1/4 definition sub-pixel h within a 4 x 4 block of original image pixels using sub-pixel value interpolation according to TML5 and the method according to the present invention is shown in fig. 16(a) and 16(b), respectively. Using the method according to TML5 necessitates interpolation of a total number of 56 1/2 definition sub-pixels, whereas using the method according to the invention necessitates interpolation of 40 1/2 definition sub-pixels.
Table 1 summarizes the decoder complexity considered herein in accordance with TML5, the direct interpolation used in TML6, and the three sub-pixel value interpolation in accordance with the method of the present invention. The complexity is measured in terms of a 6-tap filter and performing a linear interpolation operation. Let 1/4 the interpolation of the sharpness sub-pixel i be based on i ═ a1+A2+A3+A4+2)/4, which is a bilinear interpolation and actually includes two linear interpolation operations. The operations required to interpolate sub-pixel values using one 4 x 4 block of data of the original image pixels are listed for each of the 15 sub-pixel positions, which for ease of reference are numbered in the scheme according to figure 17. Referring to fig. 17, position 1 is the position of the original image pixel a, and positions 2 to 16 are sub-pixel positions. Position 16 is the position of 1/4 sharpness sub-pixel i. To calculate the average of the operations, it has been assumed that the probability of the motion vector for each sub-pixel position is the same. Thus, the average complexity is the average of the 15 sums computed for each sub-pixel position as well as a single full pixel position.
Table 1: complexity of the interpolation of the 1/4 resolution sub-pixels in TML5, TML6 and the method according to the invention.
As can be seen from table 1, the method according to the invention requires fewer 6-tap filter operations than the method according to TML6 and sub-pixel value interpolation, and only a few additional linear interpolation operations. Since the 6-tap filter operation is much more complex than linear interpolation, the operational complexity of the two methods is similar. The sub-pixel value interpolation method according to TML5 has a rather high complexity.
A preferred embodiment for interpolating a sub-pixel to 1/8 pixel resolution will now be described with reference to fig. 18, 19 and 20.
Fig. 18 shows names used to describe a pixel, an 1/2 definition sub-pixel, a 1/4 definition sub-pixel, and a 1/8 definition sub-pixel in an extended application of the method according to the present invention.
1. Using a sub-pixel b positioned to contain information about 1/2 sharpness2Symmetrically placed b1、b2And b3Of the unit horizontal and unit vertical positions of the eight closest image pixels (a)1To A8) By first calculating the intermediate value b1=(-3A1,+12A2-37A3+229A4+71A5-21A6+6A7-A8);b2=(-3A1+12A2-39A3+158A4+158A5-39A6+12A7-3A8) And b3=(-A1+6A2-21A3+71A4+229A5-37A6+13A7-3A8) Obtain the mark b in FIG. 181、b2And b31/2 sharpness and 1/4 sharpness sub-pixel values. Considering the pixel A1To A8Not with respect to 1/4 definition sub-pixel b1And b3The fact that they are symmetrically placed, the asymmetry in the filter coefficients is used to obtain the intermediate value b1And b3. According to bi=(b1+128)/256 calculation of the sub-pixel biWhere truncation operation/representation split is used. The result is limited to the range [0, 255 ]]In (1).
2. Using the pixel values for the eight closest sub-pixels (b) in the vertical direction1 jTo b8 j) Calculated median value b1、b2And b3According toAndand the calculation flag is cijAnd the values of the 1/2 definition and 1/4 definition sub-pixels, i, j being 1, 2, 3, sub-pixel bjIncluding the definition sub-pixel c being interpolated and referenced 1/22jSymmetrically placed 1/2 definition and 1/4 definition sub-pixels cijIn the column (c). Considering the pixel b1 jTo b8 jNot with respect to 1/4 definition sub-pixelsc1jAnd c3jThe fact that the filter coefficients are symmetrically placed, the asymmetry in the filter coefficients is used to obtain the sub-pixel c1jAnd c3j. Again with a truncation operation/representation split. At the sub-pixel cijBefore they are stored in the frame memory, they are limited to the range 0, 255]. In an alternative embodiment of the invention, to use the median value b in the horizontal direction1、b2And b31/2 definition and 1/4 definition sub-pixels c are calculated in a similar mannerij。
3. The value of the 1/8 sharpness sub-pixel labeled d is calculated from the nearest neighboring image pixel, 1/2 sharpness, or 1/4 sharpness sub-pixel in the horizontal or vertical direction using linear interpolation. For example, according to d ═ a + b1+1)/2 calculates the top left most 1/8 sharpness subpixel d. As before, the segmentation is done in a truncation operation/representation.
4. The values of the 1/8 definition sub-pixels labeled e and f are calculated from the image pixel in the diagonal direction, the 1/2 definition or the 1/4 definition sub-pixel using linear interpolation. For example, referring to fig. 18, according to e ═ b1+b1+1)/2 the top left most pixel 1/8 sharpness sub-pixel e is calculated. The diagonal direction to be used in the interpolation of each 1/8 sharpness sub-pixel in the first preferred embodiment of the present invention, hereinafter referred to as 'best method 1', is shown in fig. 19 (a). According to g ═ a +3c22+3)/4 calculates the value of the 1/8 sharpness sub-pixel labeled g. As before, the segmentation is done in a truncation operation/representation. In an alternative embodiment of the invention, hereinafter referred to as 'best method 2', according to the relation f ═ 3b2+b2+2)/4, resolution sub-pixel b from 1/2 by using linear interpolation2Interpolating 1/8 sharpness sub-pixel b2Further reducing computational complexity. Sub-pixel b closer to f2Is multiplied by 3. The diagonal interpolation scheme used in an alternative embodiment of the present invention is depicted in fig. 19 (b). In other alternative embodiments, different diagonal interpolation schemes can be considered.
It should be noted that in all cases where an average value comprising pixel and/or sub-pixel values is used in the determination of sub-pixel sharpness, the average value may be formed in any suitable manner. The use of an addition of 1 to the sum of values in the calculation of such an average has the effect that any subsequently applied operation of rounding or truncating the average rounds or truncates the average of interest to the next highest integer value. In another embodiment of the present invention, the plus 1 operation is not used.
When interpolating the previously described sub-pixel values of 1/4 pixel resolution, the memory requirements in the encoder can be reduced by pre-computing only a portion of the sub-pixel values to be interpolated. In the case of sub-pixel value interpolation to 1/8 pixel definition, it is beneficial to calculate all 1/2 definition and 1/4 definition sub-pixels in advance and only when needed to calculate the value of 1/8 definition sub-pixels in an on-demand manner. When this scheme is employed, 16 times of raw image memory is required to store the 1/2 sharpness and 1/4 sharpness sub-pixel values according to both TML5 and the two interpolations according to the present invention. However, if the direct interpolation method according to TML6 is used in the same manner, intermediate values of sub-pixels of 1/2 pixel definition and 1/4 pixel definition must be stored. These intermediate values are represented with 32-bit accuracy and result in a memory requirement of 64 times the original non-interpolated image.
The complexity of applying the sub-pixel value interpolation method according to the invention to the computation of up to 1/8 pixel resolution in a video decoder is compared below with the complexity of the interpolation schemes used in TML5 and TML 6. As in the equivalent analysis of the 1/4 pixel definition sub-pixel value interpolation described above, it is assumed that only a minimum number of required steps are performed in any sub-pixel value interpolation method in order to obtain a correct interpolation value. It is further assumed that each method is implemented on a block basis such that a common intermediate value is computed only once for a particular nxm block to be interpolated.
Table 2 summarizes the complexity of the three interpolations. The complexity in each method is measured in terms of an 8-tap filter and performing a linear interpolation operation. The table gives the number of operations required to interpolate each of the 63 1/8 sharpness sub-pixels in a 4 x 4 block of original image pixels, each sub-pixel position being identified by a corresponding number, as shown in fig. 20. In fig. 20, position 1 is the position of the original image pixel, and positions 2 to 64 are sub-pixel positions. When calculating the average of the operations, it has been assumed that the probability of the motion vector for each sub-pixel position is the same. Thus, the average complexity is the average of the 63 sums computed for each sub-pixel position and a single full pixel position.
Table 2: complexity of the interpolation of the 1/8 resolution sub-pixels in TML5, TML6 and the method according to the invention. (results are shown separately for best method 1 and best method 2).
As can be seen from table 2, the number of 8-tap filtering operations performed according to best method 1 and best method 2 is 26% and 34% lower than the number of 8-tap filtering operations performed in the sub-pixel value interpolation of TML5, respectively. The linear operation in best method 1 and best method 2 is 25% lower compared to TML5, but this is a less important improvement than the reduction in the 8-tap filtering operation. It can further be seen that the direct interpolation method used in TML6 has a similar complexity as when using best method 1 and best method 2 for the interpolation values of the 1/8 definition sub-pixels.
From the foregoing description, it will be apparent to those skilled in the art that various modifications may be made within the scope of the invention. Although several preferred embodiments of the present invention have been described in detail, it should be apparent that many modifications and variations are possible in addition, all of which fall within the true spirit of the invention.
Claims (51)
1. A method for sub-pixel value interpolation to determine sub-pixel values located within a rectangular boundary region defined by four corner pixels, with no intermediate pixels between the corner pixels, the pixels and sub-pixels being arranged in rows and columns, the pixel and sub-pixel locations being marked by using the co-ordinate K/2N,L/2NMathematically expressed in said rectangular bounding region, K and L are of 0 to 2NN is a positive integer greater than 1 and represents a particular degree of sub-pixel value interpolation, the method comprising:
-interpolating sub-pixel values having odd K and L in sub-pixel coordinates, even, including 0, in the 1/4 quadrant of the rectangular border area, according to a predetermined selection of a weighted average of the nearest neighbor pixel values and the sub-pixel values located at coordinates 1/2, 1/2 or a weighted average of a pair of diagonal sub-pixel values, the 1/4 quadrant being defined by the nearest neighbor pixels and the sub-pixels having coordinates 1/2, 1/2;
-interpolating sub-pixel values in sub-pixel coordinates where K is even and L is 0 and in sub-pixel coordinates where K is 0 and L is even using a weighted sum of pixel values in rows and columns, respectively, for use in an interpolation where K and L are both odd in sub-pixel coordinates; and
-interpolating sub-pixel values with K and L being even in sub-pixel coordinates for the interpolation of sub-pixel values with K and L being odd in sub-pixel coordinates using a weighted sum of sub-pixel values with K being even and L being 0 in coordinates and sub-pixel values of the respective coordinates in the immediately adjacent rectangular boundary region or a weighted sum of sub-pixel values with K being zero and L being even in coordinates and sub-pixel values of the respective coordinates in the immediately adjacent rectangular boundary region.
2. The method of claim 1, comprising using first and second weights in a weighted average selected by interpolating sub-pixel values in which K and L are odd in the sub-pixel coordinates, wherein when the selected weighted average relates to a nearest neighbor pixel value and a sub-pixel value at coordinates 1/2, 1/2, the relative magnitudes of the first and second weights are selected to be inversely proportional to the respective diagonal straight-line distances of the nearest neighbor pixel and the sub-pixel at coordinates 1/2, 1/2 to the sub-pixel in which K and L are odd in the coordinates, wherein the values of the sub-pixels in which K and L are odd in the coordinates are interpolated.
3. A method according to claim 1, comprising using first and second weights in a selected weighted average of values of K and L that are odd in interpolated sub-pixel coordinates, wherein when the selected weighted average relates to a pair of diagonal sub-pixel values in coordinates where K and L are even, the relative magnitudes of the first and second weights are selected to be inversely proportional to the respective rectilinear diagonal distances of the sub-pixels in coordinates where K and L are even to the sub-pixels in coordinates where K and L are odd, wherein sub-pixel values in coordinates are interpolated.
4. A method according to claim 2, comprising using first and second weights having the same value when the nearest neighbouring pixel and the sub-pixels located at coordinates 1/2, 1/2 are equidistant from sub-pixels in which K and L are odd in coordinates in which the sub-pixel values in which K and L are odd are interpolated.
5. A method according to claim 3, comprising using first and second weights having the same value when the sub-pixels in which K and L are even in coordinates are equidistant from the sub-pixels in which K and L are odd in coordinates, wherein the sub-pixel values in which K and L are odd in coordinates are interpolated.
6. A method according to claim 1, comprising interpolating sub-pixel values in sub-pixel coordinates where K and L are even, using a weighted sum of sub-pixel values in coordinates where K is even and L is 0 and sub-pixel values in corresponding coordinates in an immediately adjacent rectangular boundary region, for interpolation of sub-pixel values in sub-pixel coordinates where K and L are odd, when sub-pixel values in coordinates where K is even and L is odd are also required.
7. A method according to claim 1, comprising interpolating sub-pixel values in sub-pixel coordinates where K and L are even, using a weighted sum of sub-pixel values in coordinates where K is 0 and L is even and sub-pixel values in corresponding coordinates in an immediately adjacent rectangular boundary region, for interpolation of sub-pixel values in sub-pixel coordinates where K and L are odd, when sub-pixel values in coordinates where K is odd and L is even are also required.
8. The method of claim 1, comprising interpolating sub-pixel coordinates where K is even by averaging a first sub-pixel where K is even O-1 and L is E in coordinates and a second sub-pixel where K is even O +1 and L is E in coordinatesO and L are odd sub-pixel values of E, E does not include 0 or 2N。
9. A method according to claim 1 comprising interpolating sub-pixel values having sub-pixel coordinates K of 1 and L of 0 by averaging pixels having K and L of 0 in coordinates and sub-pixels having K of 2 and L of 0 in coordinates.
10. The method of claim 1, comprising determining by aligning K to 2 in coordinatesNAnd L is 0 and K is 2 in the coordinatesN-2 and L is 0, with K being 2 in the interpolated sub-pixel coordinatesN-1 and L is 0.
11. The method of claim 1, comprising determining by comparing coordinates where K is 0 and L is 2NHas a pixel and coordinates where K is 2 and L is 2NIs averaged, K is 1 and L is 2 in the interpolated sub-pixel coordinatesNThe sub-pixel value of (a).
12. The method of claim 1, comprising determining by aligning K to 2 in coordinatesNAnd L is 2NHas a pixel and coordinates in which K is 2N-2 and L is 2NIs averaged, K in the interpolated sub-pixel coordinates is 2N-1 and L is 2NThe sub-pixel value of (a).
13. The method of claim 1, comprising interpolating sub-pixel values in sub-pixel coordinates where K is even E and L is odd O by averaging a first sub-pixel in coordinates where K is E and L is even O "1 and a second sub-pixel in coordinates where K is E and L is even O +1, where E does not include 0 or 2N。
14. A method according to claim 1 comprising interpolating the sub-pixel values of K0 and L1 in the sub-pixel coordinates by averaging the pixels of K and L0 in the coordinates and the sub-pixels of K0 and L2 in the coordinates.
15. The method of claim 1, comprising determining by aligning K to 2 in coordinatesNAnd L is 0 and K is 2 in the coordinatesNAnd the sub-pixels with L of 2 are averaged, and K of 2 in the interpolated sub-pixel coordinatesNAnd L is the sub-pixel value of 1.
16. The method of claim 1, comprising determining by comparing coordinates where K is 0 and L is 2NIn pixels and coordinates where K is 0 and L is 2N-2 sub-pixel averaging, with K0 and L2 in interpolated sub-pixel coordinatesN-a sub-pixel value of 1.
17. The method of claim 1, comprising determining by aligning K to 2 in coordinatesNAnd L is 2NHas a pixel and coordinates in which K is 2NAnd L is 2N-2 sub-pixel averaging, with K of 2 in interpolated sub-pixel coordinatesNAnd L is 2N-a sub-pixel value of 1.
18. A method according to claim 1, comprising interpolating K and L in sub-pixel coordinates by taking a weighted average of the four corner pixel values defining the rectangular bounding area, both K and L being 2N-a sub-pixel value of 1.
19. A method according to claim 1, comprising selecting a value for N from a list consisting of values 2, 3 and 4.
20. A method for 1/4 resolution sub-pixel value interpolation to determine sub-pixel values located within a rectangular boundary region defined by four corner pixels, with no intermediate pixels between the corner pixels, the pixels and sub-pixels being arranged in rows and columns, the pixel and sub-pixel locations being mathematically represented within the rectangular boundary region by using the coordinate notation K/4, L/4, K and L being positive integers having respective values between 0 and 4, the method comprising:
-interpolating sub-pixel values having both K and L of 1 or 3 in sub-pixel coordinates according to a predetermined selection of a weighted average of the nearest neighbor pixel values and the sub-pixel values located at coordinates 2/4, 2/4 or a weighted average of a pair of diagonal sub-pixel values having both K and L of 0, 2 or 4 in coordinates located within the 1/4 quadrant of the rectangular boundary region, the 1/4 quadrant being defined by the sub-pixels having coordinates 2/4, 2/4 and the nearest neighbor pixel;
-interpolating sub-pixel values in sub-pixel coordinates where K is 2 and L is 0 and K is 0 and L is 2 in sub-pixel coordinates using a weighted sum of pixel values in rows and columns, respectively, for use in an interpolation where K and L are both 1 or 3 in sub-pixel coordinates; and
-interpolating a sub-pixel value of K and L both being 2 in the sub-pixel coordinates using a weighted sum of a sub-pixel value of K being 2 and L being 0 in the coordinates and a sub-pixel value of the corresponding coordinates in the immediately adjacent rectangular boundary region or a weighted sum of a sub-pixel value of K being zero and L being 2 in the coordinates and a sub-pixel value of the corresponding coordinates in the immediately adjacent rectangular boundary region, for interpolation of sub-pixel values of K and L both being 1 or 3 in the sub-pixel coordinates.
21. A method for 1/8 resolution sub-pixel value interpolation to determine sub-pixel values located within a rectangular boundary region defined by four corner pixels, with no intermediate pixels between the corner pixels, the pixels and sub-pixels being arranged in rows and columns, the pixel and sub-pixel locations being mathematically represented within the rectangular boundary region by using the coordinate notation K/8, L/8, K and L being positive integers having respective values between 0 and 8, the method comprising:
-interpolating sub-pixel values having K and L of 1, 3, 5 or 7 in sub-pixel coordinates according to a predetermined selection of a weighted average of the nearest neighbor pixel values and the sub-pixel values located at coordinates 4/8, 4/8 or a weighted average of a pair of diagonal sub-pixel values having both K and L of 0, 2, 4, 6 or 8 in coordinates located within the 1/4 quadrant of the rectangular boundary region, the 1/4 quadrant being defined by the sub-pixels having coordinates 4/8, 4/8 and the nearest neighbor pixel;
-interpolating sub-pixel values in sub-pixel coordinates where K is 2, 4 or 6 and L is 0 and in sub-pixel coordinates where K is 0 and L is 2, 4 or 6 using a weighted sum of pixel values in rows and columns, respectively, for use in the interpolation where K and L are both 1, 3, 5 or 7 in sub-pixel coordinates; and
-interpolating sub-pixel values of K and L of 2, 4 or 6 in sub-pixel coordinates using a weighted sum of sub-pixel values of K of 2, 4 or 6 and L of 0 in coordinates and sub-pixel values of the corresponding coordinates in the immediately adjacent rectangular boundary region or a weighted sum of sub-pixel values of K of zero and L of 2, 4 or 6 in coordinates and sub-pixel values of the corresponding coordinates in the immediately adjacent rectangular boundary region for interpolation of sub-pixel values of K and L of 1, 3, 5 or 7 in sub-pixel coordinates.
22. Apparatus for sub-pixel value interpolation, the apparatus being operable to determine sub-pixel values located within a rectangular boundary region defined by four corner pixels, with no intermediate pixels between the corner pixels, the pixels and sub-pixels being arranged in rows and columns, the pixel and sub-pixel locations being determined by use of a coordinate marker K/2N,L/2NMathematically expressed in said rectangular bounding region, K and L are of 0 to 2NN is a positive integer greater than 1 and represents a particular degree of sub-pixel value interpolation, the apparatus comprising:
-circuitry operable to interpolate sub-pixel values for which K and L are odd in sub-pixel coordinates according to a predetermined selection of a weighted average of the nearest neighbor pixel values and the sub-pixel values located at coordinates 1/2, 1/2 or a weighted average of a pair of diagonal sub-pixel values for which K and L are even, including 0, located within the 1/4 quadrant of the rectangular boundary region, the 1/4 quadrant being defined by the corner pixels and nearest neighbor pixels at coordinates 1/2, 1/2;
-circuitry operable to interpolate sub-pixel values for which K is even and L is 0 in sub-pixel coordinates and for which K is 0 and L is even in sub-pixel coordinates using weighted sums of pixel values located in rows and columns respectively, for use in interpolation for which K and L are both odd in sub-pixel coordinates; and
-circuitry operable to interpolate sub-pixel values with K and L being even in sub-pixel coordinates using a weighted sum of sub-pixel values with K being even and L being 0 in coordinates and sub-pixel values with corresponding coordinates in an immediately adjacent rectangular boundary region, or a weighted sum of sub-pixel values with K being zero and L being even in coordinates and sub-pixel values with corresponding coordinates in an immediately adjacent rectangular boundary region, for interpolation of sub-pixel values with K and L being odd in sub-pixel coordinates.
23. The apparatus of claim 22, wherein the apparatus is operable to use first and second weights in a weighted average selected by interpolating sub-pixel values in which K and L are odd in the sub-pixel coordinates, wherein when the selected weighted average relates to a nearest neighbor pixel value and a sub-pixel value at coordinates 1/2, 1/2, the apparatus is operable to select relative magnitudes of the first and second weights to be inversely proportional to respective diagonal straight-line distances of the nearest neighbor pixel and the sub-pixel at coordinates 1/2, 1/2 to sub-pixels in which K and L are odd in the coordinates, wherein values of the sub-pixels in which K and L are odd in the coordinates are interpolated.
24. Apparatus according to claim 22, wherein the apparatus is operable to use first and second weights in a selected weighted average of sub-pixel values in which K and L are odd in interpolated sub-pixel coordinates, wherein when the selected weighted average relates to a pair of diagonal sub-pixel values in which K and L are even in coordinates, the apparatus is operable to select the relative magnitudes of the first and second weights to be inversely proportional to the respective rectilinear diagonal distances of sub-pixels in which K and L are even in coordinates to sub-pixels in which K and L are odd in coordinates, wherein sub-pixel values in which K and L are odd in coordinates are interpolated.
25. The apparatus of claim 23, wherein the apparatus is operable to use the first and second weights having the same value when the nearest neighboring pixel and the sub-pixel located at coordinates 1/2, 1/2 are equidistant from sub-pixels where K and L are odd in coordinates, wherein sub-pixel values where K and L are odd in coordinates are interpolated.
26. The apparatus of claim 24, wherein the apparatus is operable to use the first and second weights having the same value when the sub-pixels in which K and L are even in coordinates are equidistant from the sub-pixels in which K and L are odd in coordinates, wherein the sub-pixel values in which K and L are odd in coordinates are interpolated.
27. Apparatus according to claim 22, wherein the apparatus is operable to interpolate sub-pixel values of even K and L in sub-pixel coordinates using a weighted sum of sub-pixel values of even K and 0 in coordinates and sub-pixel values of corresponding coordinates in an immediately adjacent rectangular bounding region, when sub-pixel values of even K and odd L in coordinates are also required, for interpolation of sub-pixel values of odd K and L in sub-pixel coordinates.
28. Apparatus according to claim 22, wherein the apparatus is operable to interpolate sub-pixel values of K and L being even in sub-pixel coordinates using a weighted sum of sub-pixel values of K being 0 and L being even in coordinates and sub-pixel values of corresponding coordinates in an immediately adjacent rectangular boundary region, when sub-pixel values of K being odd and L being even in coordinates are also required, for interpolation of sub-pixel values of K and L being odd in sub-pixel coordinates.
29. The apparatus of claim 22, wherein the apparatus is operable to interpolate subpixel values having K that is even O and L that is odd E in subpixel coordinates by averaging a first subpixel having K that is even O "1 and L that is E in coordinates and a second subpixel having K that is even O +1 and L that is E in coordinates, E not including 0 or 2N。
30. The apparatus of claim 22, wherein the apparatus is operable to interpolate a sub-pixel value having a sub-pixel coordinate K of 1 and L of 0 by averaging a pixel having K and L of 0 in coordinates and a sub-pixel having K of 2 and L of 0 in coordinates.
31. The apparatus of claim 22, wherein the apparatus is operable by aligning K to 2 in coordinatesNAnd L is 0K in plain and coordinate is 2N-2 and L is 0, with K being 2 in the interpolated sub-pixel coordinatesN-1 and L is 0.
32. The apparatus of claim 22, wherein the apparatus is operable by aligning coordinates in which K is 0 and L is 2NHas a pixel and coordinates where K is 2 and L is 2NIs averaged, K is 1 and L is 2 in the interpolated sub-pixel coordinatesNThe sub-pixel value of (a).
33. The apparatus of claim 22, wherein the apparatus is operable by aligning K to 2 in coordinatesNAnd L is 2NHas a pixel and coordinates in which K is 2N-2 and L is 2NIs averaged, K in the interpolated sub-pixel coordinates is 2N-1 and L is 2NThe sub-pixel value of (a).
34. The apparatus of claim 22, wherein the apparatus is operable to interpolate sub-pixel values where K is even E and L is odd O in sub-pixel coordinates by averaging a first sub-pixel where K is E and L is even O "1 in coordinates and a second sub-pixel where K is E and L is even O +1 in coordinates, where E does not include 0 or 2N。
35. The apparatus of claim 22, wherein the apparatus is operable to interpolate a sub-pixel value having K of 0 and L of 1 in sub-pixel coordinates by averaging pixels having K and L of 0 in coordinates and sub-pixels having K of 0 and L of 2 in coordinates.
36. The apparatus of claim 22, wherein the apparatus is operable by aligning K to 2 in coordinatesNAnd L is 0 and K is 2 in the coordinatesNAnd the sub-pixels with L of 2 are averaged, and K of 2 in the interpolated sub-pixel coordinatesNAnd L is the sub-pixel value of 1.
37. Apparatus according to claim 22Wherein the apparatus is operable by aligning coordinates in which K is 0 and L is 2NIn pixels and coordinates where K is 0 and L is 2N-2 sub-pixel averaging, with K0 and L2 in interpolated sub-pixel coordinatesN-a sub-pixel value of 1.
38. The apparatus of claim 22, wherein the apparatus is operable by aligning K to 2 in coordinatesNAnd L is 2NHas a pixel and coordinates in which K is 2NAnd L is 2N-2 sub-pixel averaging, with K of 2 in interpolated sub-pixel coordinatesNAnd L is 2N-a sub-pixel value of 1.
39. Apparatus according to claim 22, wherein the apparatus is operable to interpolate sub-pixel coordinates where K and L are 2 by taking a weighted average of the four corner pixels defining the rectangular bounding areaN-a sub-pixel value of 1.
40. The apparatus of claim 22, wherein N is set to 2.
41. The apparatus of claim 22, wherein N is set to 3.
42. A video encoder comprising an apparatus for sub-pixel value interpolation according to claim 22.
43. A still image encoder comprising the apparatus for sub-pixel value interpolation according to claim 22.
44. A video decoder comprising an apparatus for sub-pixel value interpolation according to claim 22.
45. A still image decoder comprising the apparatus for sub-pixel value interpolation according to claim 22.
46. A codec comprising a video encoder according to claim 42 and a video decoder according to claim 44.
47. A codec comprising a still image encoder according to claim 43 and a still image decoder according to claim 45.
48. A communication terminal comprising a video encoder according to claim 42.
49. A communication terminal comprising the still image encoder of claim 43.
50. A communication terminal comprising a video decoder according to claim 44.
51. A communication terminal comprising the still image decoder of claim 45.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US09/954,608 US6950469B2 (en) | 2001-09-17 | 2001-09-17 | Method for sub-pixel value interpolation |
| US09/954608 | 2001-09-17 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1118411A1 HK1118411A1 (en) | 2009-02-06 |
| HK1118411B true HK1118411B (en) | 2011-12-16 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101232622B (en) | Method for sub-pixel value interpolation | |
| CN100553321C (en) | Coded Dynamic Filter | |
| AU2002324085A1 (en) | Method for sub-pixel value interpolation | |
| GB2379820A (en) | Interpolating values for sub-pixels | |
| HK1118411B (en) | Method for sub-pixel value interpolation | |
| AU2007237319B2 (en) | Method for sub-pixel value interpolation | |
| BRPI0211263B1 (en) | VIDEO ENCODER INTERPOLATION METHOD, VIDEO ENCODER FOR ENCODING AN IMAGE AND COMMUNICATIONS TERMINAL |