[go: up one dir, main page]

HK1147373B - Coding and decoding for interlaced video - Google Patents

Coding and decoding for interlaced video Download PDF

Info

Publication number
HK1147373B
HK1147373B HK11101301.9A HK11101301A HK1147373B HK 1147373 B HK1147373 B HK 1147373B HK 11101301 A HK11101301 A HK 11101301A HK 1147373 B HK1147373 B HK 1147373B
Authority
HK
Hong Kong
Prior art keywords
motion vector
macroblock
field
interlaced
motion
Prior art date
Application number
HK11101301.9A
Other languages
Chinese (zh)
Other versions
HK1147373A1 (en
Inventor
T‧W‧赫尔科比
P‧苏
S‧斯里尼瓦杉
C‧-L‧林
Original Assignee
Microsoft Technology Licensing, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/857,473 external-priority patent/US7567617B2/en
Priority claimed from US10/933,958 external-priority patent/US7599438B2/en
Application filed by Microsoft Technology Licensing, Llc filed Critical Microsoft Technology Licensing, Llc
Publication of HK1147373A1 publication Critical patent/HK1147373A1/en
Publication of HK1147373B publication Critical patent/HK1147373B/en

Links

Description

Encoding and decoding of interlaced video
This application is a divisional application entitled "encoding and decoding of interlaced video" filed as 2004.09.03, application No. 200480025575.3 (international application No. PCT/US 2004/029034).
Cross-referencing
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent disclosure, as it appears in the patent and trademark office patent files or records, but otherwise reserves all copyright rights whatsoever.
Technical Field
Techniques and tools for interlaced video encoding and decoding are described.
Background
Digital video consumes a lot of storage and transmission capacity. Typical raw digital video sequences comprise 15 or 30 frames per second. Each frame may include thousands of pixels (also known as pels), where each pixel represents a tiny element of the image. In the original format, the computer typically represents a pixel as a set of three samples, for a total of 24 bits. For example, a pixel may comprise eight-bit luminance samples (also referred to as luma samples, as the terms "luminance" and "luma" are used interchangeably herein) that define the gray-scale component of the pixel, and two eight-bit chrominance samples (also referred to as chroma samples, as the terms "chrominance" and "chroma" are used interchangeably herein) that define the color component of the pixel. Thus, a typical original digital video sequence may have a number of bits per second, i.e., a bit rate, of 5 million bits per second or more.
Many computers and computer networks lack the resources to process raw digital video. Thus, engineers use compression (also known as encoding or decoding) to reduce the bit rate of digital video. Compression reduces the cost of storing and transmitting video by converting it to a lower bit rate form. Decompression (also known as decoding) reconstructs a version of the original video from the compressed format. A "codec" is an encoder/decoder system. Compression may be lossless, in which video quality is not compromised, but the reduction in bit rate is limited by the inherent amount of variability (sometimes referred to as entropy) of the video data. Alternatively, compression may be lossy, in which video quality suffers, but the achievable reduction in bit rate is quite significant. Lossy compression is often used in conjunction with lossless compression-lossy compression creates an approximation of the information, and lossless compression is applied to represent the approximation.
In general, video compression techniques include "intra-image" compression and "inter-image" compression, where an image is, for example, a progressive video frame, an interlaced video frame (having alternating lines of video fields), or an interlaced video field. For progressive frames, intra-picture compression techniques compress independent frames (generally referred to as I-frames or key-frames), while inter-picture compression techniques compress frames (generally referred to as predicted frames, P-frames, or B-frames) with reference to a leading and/or trailing frame (generally referred to as a reference or anchor frame) or frames (for B-frames).
Inter-image compression techniques often use motion estimation and motion compensation. For motion estimation, for example, the encoder divides the current prediction frame into 8x8 or 16x16 pixel units. For a unit of the current frame, a similar unit in the reference frame is found to be used as a prediction value. The motion vector indicates the position of the predictor in the reference frame. In other words, the motion vector for a unit of the current frame indicates the displacement between the spatial position of the unit in the current frame and the spatial position of the predictor in the reference frame. The encoder calculates a sample-by-sample difference between the current unit and the prediction value to determine a residual (also referred to as an error signal). If the current cell size is 16x16, the residual is divided into four 8x8 blocks. For each 8x8 residual, the encoder applies an invertible frequency transform operation that produces a set of frequency domain (i.e., spectral) coefficients. The discrete cosine transform [ "DCT" ] is one type of frequency transform. The resulting block of spectral coefficients is quantized and entropy encoded. If the predicted frame is used as a reference for subsequent motion compensation, the encoder reconstructs the predicted frame. When reconstructing the residual, the encoder reconstructs quantized transform coefficients (e.g., DCT coefficients) and performs an inverse frequency transform, such as an inverse DCT [ "IDCT" ]. The encoder performs motion compensation to calculate predictors and combines these predictors with the residual. In decoding, the decoder typically entropy encodes the information and performs analog operations to reconstruct the residual, performs motion compensation, and combines the prediction with the residual.
I. Interframe compression in Windows Media Video versions 8 and 9
Microsoft Windows Media Video version 8[ "WMV 8" ] includes a Video encoder and a Video decoder. The WMV8 encoder uses intra and inter compression, while the WMV8 decoder uses intra and inter decompression. Windows Media Video version 9[ "WMV 9" ] uses a similar architecture for many operations.
Inter-frame compression in WMV8 encoders uses block-based motion compensated predictive coding followed by transform coding of the residual error. Fig. 1 and 2 illustrate block-based inter-frame compression of a predicted frame for use in a WMV8 encoder. In particular, fig. 1 illustrates motion estimation for a predicted frame (110), and fig. 2 illustrates compression of prediction residuals for motion compensated blocks of the predicted frame.
For example, in fig. 1, the WMV8 encoder calculates motion vectors for macroblocks (115) in a predicted frame (110). To calculate the motion vector, the encoder searches in a search area (135) of the reference frame (130). In the search area (135), the encoder compares the macroblock (115) from the predicted frame (110) with various candidate macroblocks to find candidate macroblocks that are good matches. The encoder outputs information specifying the motion vector (entropy coded) of the matching macroblock.
Since motion vector values are often related to the values of the spatially surrounding motion vectors, compression of the data used to transmit the motion vector information may be accomplished by determining or selecting a motion vector predictor from among neighboring macroblocks and using the motion vector predictor to predict the motion vector of the current macroblock. The encoder may encode a difference between the motion vector and the motion vector predictor. For example, the encoder calculates the difference between the horizontal component of the motion vector and the horizontal component of the motion vector predictor, calculates the difference between the vertical component of the motion vector and the vertical component of the motion vector predictor, and encodes the differences.
After reconstructing the motion vector by adding the difference to the predictor, the decoder uses the motion vector to calculate a predicted macroblock for the macroblock (115) using information from a reference frame (130), the reference frame (130) being a previously reconstructed frame available at the encoder and decoder. The prediction is rarely perfect and therefore the encoder typically encodes blocks of pixel differences (also referred to as error or residual blocks) between the predicted macroblock and the macroblock (115) itself.
Fig. 2 shows an example of the computation and encoding of the error block (235) in the WMV8 encoder. The error block (235) is the difference between the predicted block (215) and the original current block (225). The encoder applies a discrete cosine transform [ "DCT" ] (240) to the error block (235), resulting in an 8 x 8 block of coefficients (245). The encoder then quantizes (250) the DCT coefficients, resulting in an 8 x 8 block of quantized DCT coefficients (255). The encoder scans (260) the 8 x 8 blocks 255 into a one-dimensional array (265) such that the coefficients are generally ordered from lowest frequency to highest frequency. The encoder entropy encodes the scanned coefficients using a change in run length encoding (270). The encoder selects an entropy code from one or more run/level/last tables (275) and outputs the entropy code.
Fig. 3 shows an example of a corresponding decoding process (300) for an inter-coded block. In the overview of fig. 3, the decoder decodes (310, 320) entropy encoded information representing the prediction residual using variable length decoding (310), using one or more "run/level/last" tables 315 and run length decoding (320). The decoder inverse scans (330) the one-dimensional array 325 storing entropy encoded information into two-dimensional blocks (335). The decoder dequantizes and inverse discrete cosine transforms the data (collectively at 340) resulting in a reconstructed error block (345). In the independent motion compensation path, the decoder computes a prediction block (365) using motion vector information (355) for the offset from the reference frame. The decoder combines (370) the prediction block (365) with the reconstructed error block (345) to form a reconstructed block (375).
Interlaced video and progressive video
A video frame contains lines of spatial information of the video signal. For progressive video, the lines contain samples that continue to raster through successive lines from one instant on to the bottom of the frame. A progressive I-frame is an intra-coded progressive video frame. The progressive P-frames are progressive video frames encoded using forward prediction, and the progressive B-frames are progressive video frames encoded using bi-directional prediction.
The main aspect of interlaced video is that raster scanning of the entire video frame is done in two passes, with each pass scanning alternating lines. For example, the first scan is made up of even lines of a frame, and the second scan is made up of odd lines of the scan. This results in each frame comprising two half-frames, which represent two different points in time. Fig. 4 shows an interlaced video frame (400) comprising a top field (410) and a bottom field (420). In frame (400), the even-numbered lines (top field) are scanned from one time instant (e.g., time instant t), while the odd-numbered lines (bottom field) are scanned from a different (usually later) time instant (e.g., time instant t + 1). This timing may create a jagged-like feature in the area of the interlaced video frame where motion is present when the two fields begin scanning at different times. To this end, the interlaced video frames may be rearranged according to a field structure such that the odd lines are grouped together into one field and the even lines are grouped together into another field. This arrangement is called field coding and is very useful in high motion pictures to reduce the artifacts of this jagged edge. On the other hand, in still regions, image details in interlaced video frames can be preserved more efficiently without such rearrangement. Therefore, frame coding is typically used in still or low motion interlaced video frames where the original interlaced field line arrangement is preserved.
A typical progressive video frame consists of one frame with non-interlaced line content. In contrast to interlaced video, progressive video does not divide a video frame into separate fields and scans the entire frame from left to right, top to bottom, starting at a single instant.
Previous encoding and decoding in WMV encoder and decoder
Previous software for WMV encoders and decoders released in executable form has used the encoding and decoding of progressive and interlaced P frames. Although encoders and decoders are effective for many different encoding/decoding scenarios and content types, there is room for improvement in several places.
A. Motion compensated reference picture
The encoder and decoder use motion compensation for progressive and interlaced forward predicted frames. For progressive P-frames, motion compensation is relative to a single reference frame, which is a previously reconstructed I-frame or P-frame, immediately following the current P-frame. Since the reference frame of the current P frame is known and only one reference frame is possible, no information is needed to select between multiple reference frames.
Macroblocks of interlaced P frames may be field coded or frame coded. In a field-coded macroblock, a maximum of two motion vectors are associated with the macroblock, one for the top field and one for the bottom field. In a frame-coded macroblock, at most one motion vector is associated with the macroblock. For frame-coded macroblocks in interlaced P-frames, motion compensation is relative to a single reference frame, which is a previously reconstructed I-frame or P-frame, immediately following the current P-frame. For field coded macroblocks in interlaced P frames, motion compensation is still relative to a single reference frame, but only the upper field lines of the reference frame are considered for the motion vectors of the upper field of the field coded macroblock, and only the lower field lines of the reference frame are considered for the motion vectors of the lower field of the field coded macroblock. Again, since the reference frames are known and only one reference frame is possible, no information is required for selecting between multiple reference frames.
In some encoding/decoding scenarios (e.g., high bit rate interlaced video with many motions), limiting motion compensation for forward prediction may compromise overall compression efficiency relative to a single reference.
B. Signaling macroblock information
The encoder and decoder use a signal representation of macroblock information for progressive or interlaced P frames.
1. Signaling macroblock information for progressive P-frames
The progressive P-frame may be a 1MV or a mixed MV frame. The 1MVP progressive frame includes 1MV macroblocks. A 1MV macroblock has one motion vector that indicates the prediction block displacement for all six blocks in the macroblock. The mixed MV progressive P-frame includes 1MV and/or 4MV macroblocks. A 4MV macroblock has from 0 to 4 motion vectors, where each motion vector is for one of up to four luma blocks of the macroblock. Macroblocks in progressive P frames can be one of three possible types: 1MV, 4MV and skip. Additionally, 1MV and 4MV macroblocks may be intra-coded. The macroblock type is indicated by a combination of picture and macroblock layer elements.
Thus, 1MV macroblocks can occur in 1MV and mixed MV progressive frames. A single motion vector data MVDATA element is associated with all blocks in a 1MV macroblock. MVDATA signals whether these blocks are coded as intra-coded or inter-coded type. MVDATA also indicates motion vector differences if they are coded as inter-coding.
If a progressive P frame is 1MV, all motion vectors in the frame are 1MV macroblocks, so it is not necessary to signal the macroblock type separately. If a progressive P frame is a mixed MV, the macroblocks in the frame can be 1MV or 4 MV. In this case, the macroblock type (1MV or 4MV) is signaled for each macroblock in the frame by the bit plane at the image layer in the bitstream. The decoded bit-planes represent the raster scan order of the macroblock from left to right above the 1MV/4MV state as a 1-bit value plane. A value of 0 indicates that the corresponding macroblock is coded in 1MV mode. A value of 1 indicates that the corresponding macroblock is coded in 4MV mode. In one coding mode, each macroblock at the macroblock layer of the bitstream signals 1MV/4MV status information (rather than representing a plane as a progressive P frame).
The 4MV macroblock occurs in the mixed MV progressive frame. Individual blocks within a 4MV macroblock may be coded as intra-coded blocks. For each of the four luma blocks of a 4MV macroblock, the intra/inter coding state is signaled by the block motion vector data BLKMVDATA element associated with that block. For a 4MV macroblock, the coded block pattern CBPCY element indicates which blocks have BLKMVDATA elements in the bitstream. The inter-coding/intra-coding state of the chroma block is derived from the luma inter-coding/intra-coding state. If two or more luminance blocks are encoded as intra-coded, chrominance blocks are also encoded as intra-coded.
In addition, the skip/non-skip status of each macroblock in a frame is also signaled by the bit-plane of the progressive frame. Skipped macroblocks can still have information about hybrid motion vector prediction.
CBCPY is a variable length code [ "VLC" ], which is decoded into a 6-bit field. CBPCY occurs at different locations in the bitstream of 1MV and 4MV macroblocks and has different semantics for 1MV and 4MV macroblocks.
If: (1) MVDATA indicates that the macroblock is inter coded, and (2) MVDATA indicates that at least one block of the 1MV macroblock contains coefficient information (indicated by the "last" value decoded from MVDATA), then CBPCY is present in the 1MV macroblock layer. If CBPCY is present, it is decoded into a 6-bit field that indicates which of the corresponding six blocks contain at least one non-zero coefficient.
CBCPY always exists in the 4MV macroblock layer. The CBPCY bit positions (bits 0-3) of the luma block have a slightly different meaning than the bit positions (bits 4 and 5) of the chroma blocks. For one bit position of a luminance block, 0 indicates that the corresponding block does not contain motion vector information or any non-zero coefficients. For such a block, BLKMVDATA does not exist, the predicted motion vector is used as a motion vector, and there is no residual data. If the motion vector predictor indicates that a hybrid motion vector prediction is used, there is a single bit indicating the candidate motion vector predictor to be used. A 1 in one bit position of a luminance block indicates that BLKMVDATA exists for the block. BLKMVDATA indicates whether the block is inter-coded or intra-coded, and if inter-coded, indicates a motion vector difference. BLKMVDATA also indicates whether coefficient data for the block (with the "last" value decoded from BLKMVDATA) is present. For one bit position of a chrominance block, a 0 or 1 indicates whether the corresponding block contains non-zero coefficient information.
The encoder and decoder use the code tables of the VLC tables for MVDATA, BLKMVDATA and CBCPY, respectively.
2. Signaling macroblock information for interlaced P-frames
Interlaced P-frames may have a mix of frame-coded and field-coded macroblocks. In a field coded macroblock, a maximum of two motion vectors are associated with the macroblock. In a frame-coded macroblock, at most one motion vector is associated with the macroblock. If the sequence-layer element INTERLACE is a 1, then the picture-layer element INTRLCF is present in the bitstream. INTRLCF is a 1-bit element that indicates the mode used to encode the macroblock in the frame. If INTRLCF is 0, then all macroblocks in the frame are coded in frame mode. If INTRLCF is 1, the macroblock can be coded in field or frame mode and the bit plane INTRLCMB present in the picture layer indicates the field/frame coding status of each macroblock in the interlaced P frame.
Macroblocks in interlaced P frames can be one of three possible types: frame coding, field coding and skipping. The macroblock type is indicated by the combination of picture and macroblock layer elements.
A single MVDATA is associated with all blocks in a frame-coded macroblock. MVDATA signals whether a block is coded as an intra-coded or an inter-coded type. MVDATA also indicates motion vector differences if they are coded as inter-coding.
In a field coded macroblock, the top field motion vector data TOPMVDATA element is associated with the top field block, and the bottom field motion vector data BOTMVDATA element is associated with the bottom field block. These elements are signaled at the first block of each field. More specifically, TOPMVDATA is signaled along with the top left field block, while BOTMVDATA is signaled along with the bottom left field block. The TOPMVDATA indicates whether the upper field block is intra-coded or inter-coded. The TOPMVDATA also indicates the motion vector difference of the upper field block if they are inter-coded. Also, BOTMVDATA signals the inter/intra coding state of the lower block, and possibly the motion vector difference information for the lower block. CBPCY indicates which fields have motion vector data elements present in the bitstream.
Skipped macroblocks are signaled by the SKIPMB bit-plane in the picture layer. CBPCY and motion vector data elements are used to specify whether a block has an AC coefficient. CBPCY is present for frame-coded macroblocks of an interlaced frame if the "last" value decoded from MVDATA indicates that there is data to decode after the motion vector. If CBPCY exists, it is decoded into one 6-bit field, one bit for each of the four Y blocks, one bit for two U blocks (top and bottom half frames), and one bit for two V blocks (top and bottom half frames).
CBPCY is always present for half-frame coded macroblocks. CBPCY and two field motion vector data elements are used to determine the presence of AC coefficients in a block of a macroblock. CBPCY has the same meaning as bits 1, 3, 4 and 5 of a frame-coded macroblock. That is, they indicate the presence and absence of AC coefficients in the upper right field Y block, the lower right field Y block, the up/down U block, and the up/down V block. The significance is slightly different for bit positions 0 and 2. A 0 in bit position 0 indicates that topomvdata is not present and the motion vector predictor is used as the motion vector for the upper frame block. It also indicates that the top left half frame block does not contain any non-zero coefficients. A 1 in bit position 0 indicates the presence of topomvdata. The TOPMVDATA indicates whether the upper field blocks are inter-or intra-coded and, if they are inter-coded, also indicates the motion vector difference. If the "last" value decoded from the TOPMVDATA is 1, there are no AC coefficients for the top left frame block, otherwise there are no non-zero AC coefficients for the top left frame block. Also, the above rule applies to bit position 2 and the lower left half block of BOTMVDATA.
The encoder and decoder use code table selection of VLC tables for MVDATA, TOPMVDATA, bottmvdata, and CBPCY, respectively.
3. Problems associated with previous macroblock information signal representation
In summary, various information of macroblocks of progressive and interlaced frames is signaled using independent codes (or combinations of codes) at the frame and macroblock layers. This independently signaled information includes the number of motion vectors, macroblock intra/inter coding status, CBPCY presence or absence (e.g., by the "last" value for 1MV and frame coded macroblocks), and motion vector data presence or absence (e.g., by CBPCY for 4MV and field coded macroblocks). Although this signaling provides good overall performance in many cases, it does not take full advantage of the statistical correlation between the various signaled information in each common case. Moreover, it does not allow and does not deal with various useful configurations, such as the presence/absence of CBPCY for 4MV macroblocks, or the presence/absence of motion vector data for 1MV macroblocks.
Moreover, it requires a chaotic redefinition of the regular role of the CBPCY element in terms of signaling the presence/absence of motion vector data (e.g. by CBPCY for 4MV and field coded macroblocks). This in turn requires that the conventional CBCPY information be signaled by different elements (e.g., BLKMVDATA, TOPMVDATA, bottmvdata) that are not conventionally used for this purpose. Also, the signal indicates that various useful configurations are not allowed and not processed, such as the presence of coefficient information when motion vector data is not present.
C. Motion vector prediction
For a motion vector of a macroblock (or a block, or a field of the macroblock, etc.) in an interlaced or progressive P frame, an encoder encodes the motion vector by calculating a motion vector predictor based on adjacent motion vectors, calculating a difference between the motion vector and the motion vector predictor, and encoding the difference. The decoder reconstructs the motion vector by calculating the motion vector predictor (again based on neighboring motion vectors), decoding the motion vector difference and adding the motion vector difference to the motion vector predictor.
Fig. 5A and 5B show the locations of macroblocks considered for candidate motion vector predictors for 1MV macroblocks in a 1MV progressive P frame. Candidate predictors are taken from the macroblocks at the left, top and top right corners, except in the case where the macroblock is the last macroblock in a row. In this case, predictor B is taken from the macroblock in the upper left corner, not the upper right corner. For the special case where the frame is one macroblock wide, the predictor is always predictor a (top predictor). When predictor a is out of band because the macroblock is in the top row, the predictor is predictor C. Various other rules address other special cases such as intra-coded prediction values.
Fig. 6A-10 show the positions of blocks or macroblocks considered for up to three candidate motion vectors for the motion vector of a 1MV or 4MV macroblock in a mixed MV progressive P frame. In the figure, the larger square is a macroblock boundary, and the smaller square is a block boundary. For the special case where the frame is one macroblock wide, the predictor is always predictor a (top predictor). Various other rules address other special cases such as top row blocks for top row 4MV macroblocks, top row 1MV macroblocks, and intra coded prediction values.
Specifically, fig. 6A and 6B show the block positions considered for the candidate motion vector predictor of the 1MV current macroblock in the mixed MV progressive P frame. The neighboring macroblocks may be 1MV or 4MV macroblocks. Fig. 6A and 6B show the location of the selected motion vector assuming the neighbor block is 4MV (i.e., predictor a is the motion vector for block 2 in the macroblock above the current macroblock and predictor C is the motion vector for block 1 in the macroblock immediately to the left of the current macroblock). If any of the neighboring blocks is a 1MV macroblock, the motion vector predictor shown in FIGS. 5A and 5B is brought to the motion vector predictor of the entire macroblock. As shown in fig. 6B, if the macroblock is the last macroblock in a row, predictor B is from block 3 of the top left macroblock, but not block 2 of the top right macroblock otherwise.
Fig. 7A-10 show the block locations considered for candidate motion vector predictors for each of the 4 luma blocks in a 4MV macroblock of a mixed MV progressive P frame. FIGS. 7A and 7B illustrate block positions considered for a candidate motion vector predictor for a block at position 0; FIGS. 8A and 8B show diagrams of block positions considered for a candidate motion vector predictor for a block at position 1; FIG. 9 shows the block positions considered for the candidate motion vector predictor of the block at position 2; fig. 10 shows the block positions considered for the candidate motion vector predictor of the block at position 3. Again, if the neighboring block is a 1MV macroblock, the motion vector predictor of that macroblock is used for each block of that macroblock.
For the case where the macroblock is the first macroblock in a row, predictor B of block 0 is treated differently than block 0 of the remaining macroblocks in the row (see fig. 7A and 7B). In this case, predictor B is taken from block 3 in the macroblock immediately above the current macroblock, rather than from block 3 in the macroblock above and to the left of the current macroblock as would otherwise be the case. Similarly, for the case where the macroblock is the last macroblock in a row, predictor B for block 1 is processed differently (fig. 8A and 8B). In this case, the predictor is taken from block 2 in the macroblock immediately above the current macroblock, rather than from block 2 in the macroblock to the top right of the current macroblock as would otherwise be the case. In general, if the macroblock is in the first macroblock column, predictor C for blocks 0 and 2 is set equal to 0.
If a macroblock of a progressive frame is coded as skipped, its motion vector predictor is used as the motion vector for that macroblock (or its block predictor for these blocks, etc.). There is still a single bit to indicate which predictor to use in hybrid motion vector prediction.
Fig. 11 and 12A-B show examples of candidate predictors for motion vector prediction for frame-coded macroblocks and field-coded macroblocks, respectively, in an interlaced P frame. Fig. 11 shows candidate predictors A, B and C for a current frame coded macroblock (not the first or last macroblock in a macroblock row, not in the top row) for an intra position in an interlaced P frame. Predictors may be obtained from different candidate directions other than those labeled A, B and C (e.g., in special cases such as where the current macroblock is the first or last macroblock in a row, or in the top row, since some predictors are not available for these cases). For a current frame encoded macroblock, candidate predictors are calculated differently according to whether neighboring macroblocks are field-encoded or frame-encoded. For adjacent frame-coded macroblocks, only their motion vectors are taken as candidate predictors. Candidate motion vectors are determined by averaging the top and bottom field motion vectors for macroblocks encoded by adjacent fields.
Fig. 12A-B show candidate predictors A, B and C for a current field in a field-coded macroblock for an intra position in the field. In fig. 12A, the current field is the bottom field, and the bottom field motion vectors in neighboring macroblocks are used as candidate predictors. In fig. 12B, the current field is the top field, and the top field motion vectors in neighboring macroblocks are used as candidate predictors. For each field in a macroblock encoded for the current field, the number of motion vector candidate predictors for each field is at most 3, where each candidate predictor is from the same field type as the current field (e.g., top field or bottom field). If the neighboring macroblocks are frame-coded, their motion vectors are used as the top field predictor and the bottom field predictor. Again, when the current macroblock is the first or last macroblock in a row, or in the top row, various special cases (not shown) apply since some predictors are not available for these cases. If the frame is one macroblock wide, the motion vector predictor is predictor A. If the neighboring macroblock is intra coded, its motion vector predictor is 0.
Fig. 13A and 13B illustrate pseudo code for calculating a motion vector predictor when a set of predictors A, B and C is given. To select a predictor from a set of candidate predictors, the encoder and decoder use a selection algorithm, such as the median-of-three algorithm shown in fig. 13C.
D. Hybrid motion vector prediction for progressive P-frames
Hybrid motion vector prediction is allowed for motion vectors for progressive P-frames. For motion vectors of a macroblock or block, whether the progressive P frame is a 1MV or a hybrid MV, the motion vector predictor calculated in the previous section is tested against the a and C predictors to determine whether the predictor selection is explicitly coded in the bitstream. If so, a bit is decoded that indicates whether predictor a or C is used as the motion vector predictor for the motion vector (rather than using the motion vector predictor calculated in section C above). Hybrid motion vector prediction is not used in motion vector prediction of interlaced P frames or any representation of interlaced video.
The pseudo code in fig. 14A and 14B shows a hybrid motion vector prediction of motion vectors for progressive P frames. In this pseudo code, the variables predictor _ pre _ x and predictor _ pre _ y are the horizontal and vertical motion vector predictors, respectively, as calculated in the previous section. The variables predictor _ post _ x and predictor _ post _ y are the horizontal and vertical motion vector predictors, respectively, after checking the hybrid motion vector predictor.
E. Decoding motion vector differences
For macroblocks or blocks of a progressive P frame, the MVDATA or BLKMVDATA element signals motion vector difference information. A 1MV macroblock has a single MVDATA. A 4MV macroblock has between zero and four BLKMVDATA elements (the presence of which is indicated by CBPCY).
MVDATA or BLKMVDATA jointly encode: (1) a horizontal motion vector difference component; (2) a vertical motion vector difference component; and (3) a binary "last" flag, which generally indicates whether transform coefficients are present. Whether a macroblock (or block, for 4MV) is intra-coded or inter-coded is signaled as one of the motion vector difference possibilities. The pseudo code in fig. 15A and 15B shows how motion vector difference information, inter/intra coding type, and last flag information are decoded for MVDATA or BLKMVDATA. In this pseudo-code, the variable last flag is a binary flag, the purpose of which is described in the section on signaling macroblock information. The variable infra _ flag is a binary flag that indicates whether a block or macroblock is intra-coded. The variables dmv _ x and dmv _ y are the differential horizontal and vertical motion vector components, respectively. The variables k _ x and k _ y are fixed-length for the extended range motion vector, and their values vary as shown in the table of fig. 15C. The variable halfpel _ flag is a binary value indicating whether a half pel or a quarter pel is used for the motion vector, and its value is set based on the picture layer syntax element. Finally, the tables size _ table and offset _ table are arrays defined as follows:
size _ table [6] - {0, 2, 3, 4, 5, 8}, and
offset_table[6]{0,1,3,7,15,31}.
MVDATA, topomvdata and BLTMVDATA are decoded in the same way for frame-coded or field-coded macroblocks of interlaced P-frames.
F. Reconstructing and deriving motion vectors
The luma motion vector is reconstructed from the encoded motion vector difference information and the motion vector predictor, and the chroma motion vector is derived from the reconstructed luma motion vector.
For 1MV and 4MV macroblocks of a progressive P-frame, the luma motion vector is reconstructed by adding the difference to the motion vector predictor as follows:
mv_x=(dmv_x+predictor_x)smod range_x,
mv_y=(dmv_y+predictor_y)smod range_y,
wherein smod is a signed modulo operation defined as:
A smod b=((A+b)%2b)-b,
it ensures that the reconstructed vector is valid.
In a 1MV macroblock, there is a single motion vector for the four blocks that make up the luminance component of the macroblock. If a macroblock is intra-coded, no motion vector is associated with the macroblock. If the macroblock is skipped, then dmv _ x is 0 and dmv _ y is 0, so mv _ x is predictor _ x and mv _ y is predictor _ y.
Each inter-coded luminance block in a 4MV macroblock has its own motion vector. Thus, in a 4MV macroblock, there will be 0 to 4 luma motion vectors. Non-coded blocks in a 4MV macroblock may appear if the 4MV macroblock is skipped or if the CBPCY of the 4MV macroblock indicates that the block is non-coded. If the block is not encoded, then dmv _ x is 0 and dmv _ y is 0, so mv _ x is predictor _ x and mv _ y is predictor _ y.
For progressive frames, the chrominance motion vectors are derived from the luminance motion vectors. Also, for a 4MV macroblock, a decision is made whether to encode a chroma block as inter or intra based on the state of the luma block. The chrominance vector is reconstructed in two steps.
In a first step, the nominal chrominance motion vector is obtained by appropriately combining and scaling the luminance motion vector. The scaling is performed in such a way that a half pixel shift is preferred over a quarter shift. Fig. 16A shows pseudo code for scaling when deriving chroma motion vectors from luma motion vectors of 1MV macroblocks. Fig. 16B shows a pseudo code for combining up to four luminance motion vectors and scaling when deriving the chrominance motion vectors for a 4MV macroblock. Fig. 13C shows pseudo code of a mean 3() function, and fig. 16C shows pseudo code for a mean 4() function.
In a second step, 1-bit elements of the sequence level are used to determine whether further rounding of the chrominance motion vectors is required. If necessary, the chrominance motion vector at the quarter-pixel offset is rounded to the nearest whole-pixel position.
For frame-coded and field-coded macroblocks of interlaced frames, the luma motion vectors are reconstructed as is done for progressive P-frames. In a frame-coded macroblock, there is a single motion vector for the four blocks that make up the luminance component of the macroblock. If a macroblock is intra-coded, no motion vector is associated with the macroblock. If the macroblock is skipped, then dmv _ x is 0 and dmv _ y is 0, so mv _ x is predictor _ x and mv _ y is predictor _ y. In a frame-coded macroblock, each field may have its own motion vector. Thus, there will be 0 to 2 luma motion vectors in a frame-coded macroblock. A non-coded field in a field coded macroblock may appear if the field coded macroblock is skipped or if the CBPCY of the field coded macroblock indicates that the field is non-coded. If the field is non-coded, then dmv _ x is 0 and dmv _ y is 0, so mv _ x is predictor _ x and mv _ y is predictor _ y.
For interlaced P frames, the chrominance motion vectors are derived from the luminance motion vectors. For a frame-coded macroblock, there is one chroma motion vector corresponding to a single luma motion vector. For field coded macroblocks, there are two chroma motion vectors. One for the top field and one for the bottom field, corresponding to the top and bottom field luma motion vectors, respectively. The rule for deriving chroma motion vectors is the same for field coded and frame coded macroblocks. They depend on the luminance motion vector, not the type of macroblock. Fig. 17 shows pseudo-code for deriving chroma motion vectors from luma motion vectors of frame-coded or field-coded macroblocks of interlaced P frames. Basically, the x-component of the chroma motion vector is enlarged by a factor of 4, while the y-component of the chroma motion vector remains unchanged (due to 4:1:1 macroblock chroma subsampling). The enlarged x-component of the chrominance motion vector is also rounded to the adjacent quarter-pixel position. If the cmv _ x or cmv _ y is outside the boundary, it is pulled back to the valid range.
G. Intensity compensation
For progressive P-frames, the picture layer contains syntax elements that control the motion compensation mode and intensity compensation for the frame. If intensity compensation is signaled, LUMSCALE and LUMSHIFT elements follow in the image layer. LUMSCALE and LUMSHIFT are 6-bit values that specify parameters used in the intensity compensation process.
When intensity compensation is used for progressive P-frames, the pixels in the reference frame are remapped before they are used for motion compensated prediction of the P-frame. The pseudo code in fig. 18 shows that the LUMSCALE and LUMSHIFT elements are used to construct a look-up table that is used to remap the reference frame pixels. The Y component of the reference frame is remapped using the LUTTY [ ] table, and the U and V components are remapped using the LUTOV [ ] table, as follows:
pY=LUTY[pY]and is and
pUV=LUTUV[pUV]
where pY is the original luminance pixel value in the reference frame, pYIs the remapped luminance pixel value in the reference frame, PUV is the original U or V pixel value in the reference frame, and pUVAre the remapped U or V pixel values in the reference frame.
For interlaced P frames, a 1-bit picture layer INTCOMP value signals whether or not intensity compensation is used for the frame. If intensity compensation is used, LUMSCALE and LUMSHIFT elements follow in the image layer, where LUMSCALE and LUMSHIFT are 6-bit values that specify the parameters used for the entire interlaced P frame during intensity compensation. The intensity compensation itself is the same as for progressive P-frames.
Standards for video compression and decompression
In addition to the earlier WMV encoders and decoders, several international standards are related to video compression and decompression. These standards include the moving Picture experts group [ "MPEG" ]1, 2 and 4 standards and the H.261, H.262 (another name for MPEG 2), H.263 and H.264 standards from the International Telecommunications Union [ "ITU" ]. Encoders and decoders that comply with one of these standards typically use motion estimation and compensation to reduce temporal redundancy between pictures.
A. Motion compensated reference picture
For several standards, motion compensation of a forward predicted frame is relative to a single reference frame, which is a previously reconstructed I or P frame, immediately following the current forward predicted frame. Since the reference frames of the current forward predicted frame are known and only one reference frame is possible, no information is needed to select between multiple reference frames. See, for example, the H.261 and MPEG 1 standards. In some encoding/decoding scenarios (e.g., high bit rate interlaced video with many motions), limiting the motion compensation for forward prediction to be relative to a single reference compromises the overall compression efficiency.
The h.262 standard allows for encoding an interlaced video frame as a single frame or two fields, where frame encoding or field encoding can be adaptively selected on a frame-by-frame basis. For field-based prediction of the current field, motion compensation uses the previously reconstructed top or bottom field. [ H.262 Standard, chapters 7.6.1 and 7.6.2.1. The h.262 standard describes selecting between two reference fields for motion compensation of the motion vectors of the current field. [ H.262 Standard, chapters 6.2.5.2, 6.3.17.2, and 7.6.4. For a given motion vector for a 16x16 macroblock (or the upper half 16x8 of the macroblock, or the lower half 16x8 of the macroblock), a single bit is signaled to indicate whether the motion vector is to be applied to the upper or lower reference field. See the h.262 standard for further details.
While such reference field selection provides some flexibility and prediction improvement in some cases, it has several bit rate related drawbacks. The reference field selection signal for the motion vector may consume many bits. For example, for a single 720x288 field with 810 macroblocks with 0, 1, or 2 motion vectors per macroblock, the reference field selection bits for motion vectors consume up to 1620 bits. No attempt is made to reduce the bit rate of the reference field selection information by predicting which reference fields will be selected for the corresponding motion vectors. The signaling of the reference field selection information is inefficient in terms of pure coding efficiency. Also, for some cases, the reference field selection information may consume so many bits despite the information being encoded, outweighing the benefit of prediction improvement from having multiple available references in motion compensation. No option is given to disable the reference field selection to handle such cases.
The h.262 standard also describes dual-prime prediction, which is a prediction mode in which two forward field-based predictions are averaged for a 16x16 block in an interlaced P picture. [ H.262 Standard, section 7.6.3.6. ]
The MPEG-4 standard allows macroblocks of interlaced video frames to be frame-coded or field-coded. The MPEG-4 standard, chapter 6.1.3.8. For field-based prediction of the top or bottom field of a field-coded macroblock, motion compensation uses the previously reconstructed top or bottom field. The MPEG-4 standard, sections 6.3.7.3 and 7.6.2. The MPEG-4 standard describes the selection between two reference fields for motion compensation. The MPEG-4 Standard, section 6.3.7.3. For a given motion vector of the upper or lower field of a macroblock, a single bit is signaled to indicate whether the motion vector is to be applied to the upper or lower reference field. See the MPEG-4 standard for further details. Such signaling of reference field selection information has similar problems as described above for h.262.
The h.263 standard describes motion compensation for progressive frames, including an optional reference picture selection mode. [ H.263 Standard, section 3.4.12, appendix N. Typically, the temporally closest previous scout image is used for motion compensation. However, when the reference picture selection mode is used, temporal prediction from pictures other than the most recent reference picture is allowed. This can improve the performance of real-time video communication over error-prone channels by allowing the encoder to optimize its video encoding for channel conditions (e.g., stop error propagation due to information loss required for reference in inter-frame encoding). When used, for a given block or slice within a picture, a 10-bit value indicates the reference for prediction of that block or slice. The reference picture selection mechanism described in h.263 is for progressive video and is suitable for handling error propagation problems in error-prone channels, essentially without improving compression efficiency.
In the draft JVT-D157 of the h.264 standard, the inter prediction process for motion compensated prediction of a block may include selecting a reference picture from a number of stored, previously decoded pictures. JVT-D157, section 0.4.3. At the picture level, one or more parameters specify the number of reference pictures used to decode a picture. JVT-D157, sections 7.3.2.2 and 7.4.2.2. At the slice level, the number of available reference pictures may change, and additional parameters may be received to reorder and manage which reference pictures are in the list. JVT-D157, sections 7.3.3 and 7.4.3. For a given motion vector (for a macroblock or sub-macroblock portion), the reference index (when present) indicates the reference picture to be used for prediction. [ JVT-D157, sections 7.3.5.1 and 7.4.5.1. The reference index indicates the first, second, third frame or half frame, etc. in the list. If there is only one active reference picture in the list, the reference index does not exist. If there are only two active reference pictures in the list, then a single coded bit is used to represent the reference index. See draft JVT-D157 of the H.264 standard for additional details.
The reference picture selection by the JVT-D157 provides flexibility and thus may improve motion compensated prediction. However, the process of managing and signaling reference picture lists is complex and in some cases consumes many bits that are not efficient.
B. Signalling macroblock modes
Various standards use different mechanisms to signal macroblock information. For example, in the h.261 standard, the macroblock header of a macroblock includes a macroblock type MTYPE element, which is signaled as a VLC. [ H.261 Standard, section 4.2.3. MTYPE element indicates the prediction mode (intra, inter + MC + loop filter), whether there is a quantizer mqant element for the macroblock, whether there is a motion vector data MVD element for the macroblock, whether there is a coded block mode CBP element for the macroblock, and whether there is a transform coefficient TCOEFF element for the block of the macroblock. As before, for each motion compensated macroblock, there is an MVD element. [ As before ]
In the MPEG-1 standard, a macroblock has a macroblock _ type element, which is signaled as VLC. The MPEG-1 standard, section 2.4.3.6, tables b.2a through b.2d, d.6.4.2. For a macroblock in a forward predicted image, the macroblock _ type element indicates whether a quantizer scale element is present for the macroblock, whether forward motion vector data is present for the macroblock, whether an encoded block mode element is present for the macroblock, and whether the macroblock is intra-coded. As before, if the macroblock uses forward motion compensation, forward motion vector data is always present. [ As before ]
In the h.262 standard, a macroblock has a macroblock _ type element, which is signaled as a VLC. [ H.261 standard, chapters 6.2.5.1, 6.3.17.1, and tables B.2 through B.B. For a macroblock in a forward predicted image, the macroblock _ type element indicates whether a quantizer _ scale _ code element is present for the macroblock, whether forward motion vector data is present for the macroblock, whether a coded block mode element is present for the macroblock, whether the macroblock is intra coded, and scalability options for the macroblock. As before, if the macroblock uses forward motion compensation, forward motion vector data is always present. The independent codes (frame _ motion _ type or field _ motion _ type) may further indicate macroblock prediction type, including the count of motion vectors and the motion vector format of the macroblock. [ As before ]
In the h.263 standard, a macroblock has a macroblock type and a coded block pattern for chroma MCBPC elements, which is signaled as VLC. [ H.263 standard, section 5.3.2, tables 8 and 9, and F.2. Macroblock type gives information about the macroblock (e.g., inter-coded, inter 4V, intra-coded). As before, for a coded macroblock in an inter-coded image, MCBPC for luminance and coded block pattern are always present, and the macroblock type indicates whether or not a quantizer information element is present for the macroblock. A forward motion compensated macroblock always has motion vector data of the macroblock (or inter 4V type block) present. The MPEG-4 standard similarly specifies MCBPC elements, which are signaled as VLC. The MPEG-4 standard, section 6.2.7, 6.3.7, 11.1.1. ]
In JVT-D157, an mb _ type (macroblock type) element is a part of a macroblock layer. JVT-D157, sections 7.3.5 and 7.4.5. Mb _ type indicates a macroblock type and various associated information. For example, for a P slice, the mb _ type element indicates the type of prediction (intra-coded or forward), various intra mode coding parameters when the macroblock is intra-coded, macroblock partitions (e.g., 16x16, 16x8, 8x16, or 8x8) and thus the number of motion vectors when the macroblock is forward predicted, and whether reference picture selection information is present (if the partition is 8x 8). Together, the type of [ earlier ] prediction and mb _ type also indicate whether or not the macroblock has an encoded block pattern element. As before, motion vector data is signaled for each 16x16, 16x8, or 8x16 partition in a forward motion compensated macroblock. As before, for a forward predicted macroblock with 8x8 partitions, one sub _ mb _ type element per 8x8 partition indicates its prediction type (intra coded or forward). [ ibid ] if the 8x8 partition is forward predicted, sub _ mb _ type indicates a sub-partition (e.g., 8x8, 8x4, 4x8, or 4x4), and thus indicates the number of motion vectors for the 8x8 partition. As before, the motion vector data is signaled for each sub-partition in the forward motion compensated 8x8 partition. [ As before ]
Various standards use a variety of signaling mechanisms for macroblock information. Whatever the advantages of these signal representation mechanisms, they also have the following disadvantages. First, they sometimes do not effectively signal the macroblock type, the presence/absence of coded block pattern information, and the presence/absence of motion vector difference information for motion compensated macroblocks. In practice, these standards generally do not signal the presence/absence of motion vector difference information for a motion compensated macroblock (or block or field thereof) at all, instead assuming that motion vector difference information is signaled if motion compensation is used. Finally, these standards are inflexible in that they decide which code tables to use for macroblock mode information.
C. Motion vector prediction
Each of H.261, H.262, H.263, MPEG-1, MPEG-4, and JVT-D157 specifies some form of motion vector prediction, although the details of motion vector prediction vary greatly between these standards. For example, motion vector prediction is simplest in the h.261 standard, in which a motion vector predictor of a motion vector of a current macroblock is a motion vector of a previously encoded/decoded macroblock. [ H.261 Standard, section 4.2.3.4. For each special case (e.g., the current macroblock is the first in a row), the motion vector predictor is 0. Motion vector prediction is similar to that in the MPEG-1 standard. The MPEG-1 standard, chapter 2.4.4.2 and D.6.2.3. ]
Other criteria, such as h.262, specify more complex motion vector predictions, but still generally determine motion vector predictors from a single neighboring frame. [ H.262 Standard, section 7.6.3. Determining motion vector predictors from a single neighboring frame is sufficient when the motion is consistent, but is not effective in many other cases.
Therefore, there are other standards (such as H.263, MPEG-4, JVT-D157) that determine motion vector predictors from a number of different neighboring frames having different candidate motion vector predictors. [ H.263 standard, section 6.1.1; MPEG-4 standard, chapters 7.5.5 and 7.6.2; and F.2; JVT-D157, section 8.4.1. These are effective for a variety of motions but are still insufficient to handle the case where there is a high difference between different candidate motion vector predictors, indicating a discontinuity in the motion pattern.
For further details, see the corresponding standards.
D. Decoding motion vector differences
Each of h.261, h.262, h.263, MPEG-1, MPEG-4, and JVT-D157 specifies some form of differential motion vector encoding and decoding, although the details of encoding and decoding vary greatly between these standards. For example, motion vector encoding and decoding is simplest in the h.261 standard, where one VLC represents a horizontal differential component and another VLC represents a vertical differential component. [ H.261 Standard, section 4.2.3.4. Other standards specify more complex encoding and decoding for motion vector difference information. For further details, see the corresponding standards.
E. Reconstructing and deriving motion vectors
In general, motion vectors in H.261, H.262, H.263, MPEG-1, MPEG-4, or JVT-D157 are reconstructed by combining motion vector predictors with motion vector differences. Again, the details of the reconstruction vary between standards.
The chrominance motion vectors (which are not signaled) are typically derived from the luminance motion vectors (which are signaled). For example, in the h.261 standard, the halved luminance motion vector is truncated towards zero to derive the chrominance motion vector. [ H.261 Standard, section 3.2.2. Similarly, in the MPEG-1 standard and JVT-D157, the luminance motion vector is halved to derive the chrominance motion vector. The MPEG-1 Standard, section 2.4.4.2; JVT-D157, section 8.4.1.4. ]
In the h.262 standard, the luma motion vector is scaled down to the chroma motion vector by a factor that depends on the chroma subsampling mode (e.g., 4:2:0, 4:2:2, or 4:4: 4). [ H.262 Standard, section 7.6.3.7. ]
In the h.263 standard, for a macroblock that uses a single luma motion vector for all four luma blocks, the chroma motion vector is derived by dividing the luma motion vector by two and rounding to one-half pixel position. [ H.263 standard, section 6.1.1. For a macroblock with four luma motion vectors (one per block), the chroma motion vectors are derived by accumulating the four luma motion vectors, dividing by eight, and rounding to one-half pixel position. [ H.263 standard, section F.2. Chrominance motion vectors are similarly derived in the MPEG-4 standard. MPEG-standard, chapters 7.5.5 and 7.6.2. ]
F. Weighted prediction
Draft JVT-D157 of the h.264 standard describes weighted prediction. A weighted prediction flag for a picture indicates whether a prediction slice in the picture uses weighted prediction. JVT-D157, sections 7.3.2.2 and 7.4.2.2. If a picture uses weighted prediction, then each predicted slice in the picture has a prediction weight table. JVT-D157, sections 7.3.3, 7.3.3.2, and 10.4.1. For this table, the denominator of the luminance weight parameter and the denominator of the chrominance weight parameter are signaled. Then, for each reference picture available for the slice, the luma weight flag indicates whether the luma parameters and the luma shift sub-parameters (which, when signaled, follow) are signaled for the picture, and the chroma weight flag indicates whether the chroma weights and the chroma shift sub-parameters (which, when signaled, follow) are signaled for the picture. The default values associated with the signaled numerator values are given to the un-signaled numerator weight parameters. While JVT-D157 provides some flexibility in signaling weighted prediction parameters, this signaling mechanism is ineffective in various situations.
Given the key importance of video compression and decompression of digital video, it is not surprising that video compression and decompression are a heavily developed field. However, whatever the benefits of previous video compression and decompression techniques, they do not have the advantages of the following techniques and tools.
Disclosure of Invention
In general, the detailed description is directed to various techniques and tools for encoding and decoding interlaced video. The various techniques and tools described may be used in combination or independently.
Some portions of the detailed description are directed to various techniques and tools for hybrid motion vector prediction for interlaced forward predicted fields. Such techniques and tools include, but are not limited to, the following:
a tool, such as a video encoder or decoder, checks a hybrid motion vector prediction condition based at least in part on a predictor polarity applicable to a motion vector predictor. For example, the predictor polarity signal is used to select a dominant polarity or a non-dominant polarity of the motion vector predictor. The tool then determines a motion vector predictor.
Alternatively, a tool such as a video encoder or decoder determines an initial, derived motion vector predictor for the motion vectors of interlaced forward predicted fields. The tool then checks for a change condition based at least in part on the initial, derived motion vector predictor and one or more neighboring motion vectors. If the change condition is satisfied, the tool uses one of the one or more neighboring motion vectors as a final motion vector predictor for the motion vector. Otherwise, the tool uses the initial, derived motion vector predictor as the final motion vector predictor.
Some portions of the detailed description are directed to various techniques and tools for using a motion vector block mode that signals the presence or absence of motion vector data for a macroblock having multiple motion vectors. Such techniques and tools include, but are not limited to, the following:
a first variable length code is processed by a tool, such as a video encoder or decoder, that represents first information for a macroblock having a plurality of luma motion vectors. The first information includes one motion vector data presence indicator per luminance motion vector of the macroblock. The tool also processes a second variable length code that represents second information for the macroblock. The second information includes a plurality of transform coefficient data presence indicators for a plurality of blocks of the macroblock.
Alternatively, a tool such as a video encoder or decoder processes a motion vector block pattern consisting of a second number of bits (where the second number is the first number) for macroblocks having a first number of luma motion vectors (where the first number is greater than 1). Each of these bits indicates whether a respective one of the luminance motion vectors has been associated with the signaled motion vector data in the bitstream. The tool also processes the associated motion vector data for each of the luma motion vectors for which the associated motion vector data is indicated to be signaled in the bitstream.
Some portions of the detailed description are directed to various techniques and tools for selecting between a dominant polarity and a non-dominant polarity of a motion vector predictor. Such techniques and tools include, but are not limited to, the following:
a tool such as a video encoder or decoder determines the dominant polarity of the motion vector predictor. The tool processes the motion vector predictor based at least in part on the dominant polarity and processes the motion vector based at least in part on the motion vector predictor. For example, the motion vector is of a current block or macroblock of an interlaced forward predicted field, and the primary polarity is based at least in part on a polarity of each of a plurality of previous motion vectors of neighboring blocks or macroblocks.
Alternatively, a tool, such as a video encoder or decoder, processes information indicating a selection between a dominant polarity and a non-dominant polarity of the motion vector predictor, and processes the motion vector based at least in part on the motion vector predictor. For example, the decoder determines a dominant polarity and a non-dominant polarity, and then determines a motion vector predictor based at least in part on the dominant polarity and the non-dominant polarity and information indicative of a selection therebetween.
Some portions of the detailed description are directed to various techniques and tools for jointly encoding and decoding reference field selection information and differential motion vector information. Such techniques and tools include, but are not limited to, the following:
A tool such as a video decoder decodes a variable length code that jointly represents differential motion vector information for a motion vector with a motion vector predictor selection. The decoder then reconstructs the motion vector based at least in part on the differential motion vector information and the motion vector predictor selection.
Alternatively, a tool such as a video encoder determines the dominant/non-dominant predictor selection for the motion vector. The encoder determines differential motion vector information for the motion vector and the dominant/non-dominant polarity selection is jointly encoded along with the differential motion vector information.
Some portions of the detailed description are directed to various techniques and tools for code table selection and joint encoding/decoding of macroblock mode information for macroblocks of interlaced forward predicted fields. Such techniques and tools include, but are not limited to, the following:
a tool such as a video encoder or decoder processes variable length codes that jointly signal macroblock mode information for a macroblock. The macroblocks are motion compensated and the jointly signaled macroblock mode information includes: (1) macroblock type, (2) presence or absence of coded block pattern, and (3) presence or absence of motion vector data for motion compensated macroblocks.
Alternatively, a tool such as a video encoder or decoder selects one code table from a plurality of available code tables of macroblock mode information for interlaced forward predicted fields. The tool uses the selected code table to process variable length codes that indicate macroblock mode information for the macroblock. The macroblock mode information includes (1) macroblock type, (2) presence or absence of coded block mode, and (3) presence or absence of motion vector data when applicable to macroblock type.
Some portions of the detailed description are directed to various techniques and tools for using signals for the number of reference fields that can be used for interlaced forward predicted fields. Such techniques and tools include, but are not limited to, the following:
a tool such as a video encoder or decoder processes a first signal indicating whether an interlaced forward predicted field has one reference field or two possible reference fields for motion compensation. If the first signal indicates that the interlaced forward predicted field has one reference field, the tool processes a second signal that identifies one reference field from two possible reference fields. On the other hand, if the first signal indicates that the interlaced forward predicted field has two possible reference fields, then for each of the plurality of motion vectors for the blocks and/or macroblocks of the interlaced forward predicted field, the tool may process a third signal for selecting between the two possible reference fields. The tool then performs motion compensation on the interlaced forward predicted field.
Alternatively, a tool such as a video encoder or decoder processes a signal indicating whether an interlaced forward predicted field has one reference field or two possible reference fields for motion compensation. The tool performs motion compensation of interlaced forward predicted fields. The tool also updates the reference half-frame buffer for subsequent motion compensation without processing additional signals for managing the reference half-frame buffer.
Some portions of the detailed description are directed to various techniques and tools for deriving chroma motion vectors for macroblocks of interlaced forward predicted fields. Such techniques and tools include, but are not limited to, the following:
a tool such as a video encoder or decoder derives a chroma motion vector for a macroblock having one or more luma motion vectors based at least in part on a polarity estimate of the one or more luma motion vectors. For example, each of the one or more luma motion vectors is odd or dipole, and the polarity estimation includes determining which polarity is most common among the one or more luma motion vectors.
Alternatively, a tool such as a video encoder or decoder determines the dominant polarity among the plurality of luma motion vectors of a macroblock. The tool then derives a chroma motion vector for the macroblock based at least in part on one or more of the plurality of luma motion vectors having the dominant polarity.
Other features and advantages will become apparent from the following detailed description of various embodiments, which proceeds with reference to the accompanying drawings.
Drawings
Fig. 1 is a diagram illustrating motion estimation in a video encoder according to the prior art.
Fig. 2 is a diagram illustrating block-based compression of an 8 × 8 prediction residual block in a video decoder according to the related art.
Fig. 3 is a diagram illustrating block-based decompression of an 8 × 8 prediction residual block in a video encoder according to the prior art.
Fig. 4 is a diagram illustrating an interlaced frame according to the prior art.
Fig. 5A and 5B are diagrams illustrating macroblock positions of candidate motion vector predictors for 1MV macroblocks in a progressive P frame, according to the related art.
Fig. 6A and 6B are diagrams illustrating block positions of candidate motion vector predictors for 1MV macroblocks in a mixed 1MV/4MV progressive P frame, according to the related art.
Fig. 7A, 7B, 8A, 8B, 9 and 10 are diagrams illustrating block positions of candidate motion number predictors for blocks at various positions in 4MV macroblocks in a mixed 1MV/4MV progressive P frame, in accordance with the prior art.
Fig. 11 is a diagram illustrating candidate motion vector predictors for a current frame coded macroblock in an interlaced P frame, in accordance with the prior art.
Fig. 12A-12B are diagrams illustrating candidate motion vector predictors for a current field coded macroblock in an interlaced P frame, in accordance with the prior art.
Fig. 13A-13C are pseudo codes for calculating a motion vector predictor according to the related art.
Fig. 14A and 14B are pseudo codes illustrating hybrid motion vector prediction for progressive P frames according to the related art.
Fig. 15A-15C are pseudo codes and tables showing decoding of motion vector difference information according to the related art.
Fig. 16A-16C and 13C are pseudo codes illustrating the derivation of chroma motion vectors for progressive P frames according to the prior art.
Fig. 17 is a pseudo code illustrating the derivation of chroma motion vectors for interlaced P frames according to the prior art.
Fig. 18 is a pseudo code illustrating intensity compensation for progressive P frames according to the prior art.
FIG. 19 is a block diagram of a suitable computing environment in connection with which several described embodiments may be implemented.
FIG. 20 is a block diagram of a generalized video encoder system in conjunction with which several described embodiments may be implemented.
FIG. 21 is a block diagram of a generalized video decoder system in conjunction with which several described embodiments may be implemented.
Fig. 22 is a diagram of a macroblock format used in several described embodiments.
Fig. 23A is a partial view of an interlaced video frame showing alternating lines of the top field and bottom field. Fig. 23B is a diagram of an interlaced video frame organized for encoding/decoding as a frame, and fig. 23C is a diagram of an interlaced video frame organized for encoding/decoding as a field.
Fig. 24A to 24F are diagrams showing examples of reference fields of interlaced P fields.
Fig. 25A and 25B are flowcharts showing techniques for encoding and decoding the reference field number and the selection information, respectively.
Fig. 26 and 27 are tables showing MBMODE values.
Fig. 28A and 28B are flowcharts respectively showing techniques of encoding and decoding macroblock mode information of macroblocks of an interlaced P field.
FIG. 29 is pseudo code for determining main and non-main reference fields.
FIG. 30 is pseudo code for signaling whether to use a dominant or non-dominant reference field for motion vectors.
Fig. 31A and 31B are flow diagrams illustrating techniques for determining the dominant and non-dominant polarity, respectively, of motion vector prediction for motion vectors of two reference field interlaced P fields in encoding and decoding.
Fig. 32 is a pseudo code of hybrid motion vector prediction during decoding.
Fig. 33A and 33B are flowcharts illustrating techniques of hybrid motion vector prediction during encoding and decoding, respectively.
Fig. 34 is a diagram showing an association between a luminance block and a 4MVBP element.
Fig. 35A and 35B are flowcharts respectively showing a technique of encoding and decoding using a motion vector block mode.
FIG. 36 is pseudo code for encoding motion vector difference information and prime/non-prime predictor selection for two reference fields of an interlaced P field.
FIGS. 37A and 37B are flow diagrams illustrating techniques for encoding and decoding motion vector difference information and dominant/non-dominant predictor selection, respectively, for two reference fields of an interlaced P field.
Fig. 38 is a diagram of chroma subsampling patterns for 4:2:0 macroblocks.
Fig. 39 is a diagram showing the relationship between the current and reference fields of the vertical motion vector component.
Fig. 40 is pseudo code for selecting a luma motion vector that contributes to chroma motion vectors of motion compensated macroblocks of an interlaced P field.
Fig. 41 is a flow diagram illustrating a technique for deriving chroma motion vectors from luma motion vectors of macroblocks of interlaced P fields.
Fig. 42 and 43 are diagrams of an encoder framework and a decoder framework, respectively, in which intensity compensation is performed for interlaced P fields.
Fig. 44 is a table showing syntax elements for signaling the intensity compensated reference field mode for interlaced P fields.
Fig. 45A and 45B are flowcharts respectively showing a technique of performing fading estimation in encoding and fading compensation in decoding for interlaced P fields.
Fig. 46A-46E are syntax diagrams of layers of a bitstream implemented in accordance with the first combination.
FIGS. 47A-47K are code tables in a first combined implementation.
Fig. 48 is a diagram showing the relationship between the current and reference fields of the vertical motion vector component in the first combined implementation.
FIGS. 49A and 49B are pseudo code and tables, respectively, for motion vector differential decoding of 1 reference field interlaced P fields in a first combined implementation.
FIG. 50 is pseudo code for decoding motion vector difference information and dominant/non-dominant predictor selection for 2 reference field interlaced P fields in a first combined implementation.
Fig. 51A and 51B are pseudo codes for motion vector prediction of 1 reference field interlaced P fields in the first combined implementation.
FIGS. 52A-52J are pseudo codes and tables for motion vector prediction for 2 reference field interlaced P fields in a first combined implementation. Fig. 52K to 52N are pseudo code and tables for scaling operations, which are alternative operations to the operations shown in fig. 52H to 52J.
FIG. 53 is pseudo code for hybrid motion vector prediction for interlaced P fields in a first combined implementation.
FIG. 54 is pseudo code for motion vector reconstruction for 2 reference field interlaced P fields in a first combined implementation.
Fig. 55A and 55B are pseudo codes for chroma motion vector derivation for interlaced P fields in a first combined implementation.
FIG. 56 is pseudo code for intensity compensation of interlaced P fields in a first combined implementation.
Fig. 57A-57C are syntax diagrams of layers for a bitstream implemented in a second combination.
FIGS. 58A and 58B are pseudo code and tables, respectively, for motion vector differential decoding of 1 reference field interlaced P fields in a second combined implementation.
FIG. 59 is pseudo code for decoding motion vector difference information and dominant/non-dominant predictor selection for 2 reference field interlaced P fields in a second combined implementation.
Fig. 60A and 60B are pseudo codes for motion vector prediction of interlaced P fields for the 1 reference field in the second combined implementation.
FIGS. 61A-61F are pseudo codes for motion vector prediction for 2 reference field interlaced P fields in a second combined implementation.
Detailed Description
The present application relates to techniques and tools for efficiently compressing and decompressing interlaced video. Compression and decompression of interlaced video content improves with various techniques and tools that are explicitly designed to handle specific attributes of interlaced video representations. In various described embodiments, video encoders and decoders incorporate techniques for encoding and decoding interlaced forward predicted fields, as well as corresponding signal representation techniques for bitstream formats or syntax including different layers or levels (e.g., sequence level, frame level, field level, slice level, macroblock level, and/or block level).
Interlaced video content is commonly used in digital video broadcast systems over cable, satellite or DSL. Efficient techniques and tools for compressing and decompressing interlaced video content are an important part of video codecs.
Various alternatives to the implementations described herein are possible. For example, the techniques described with reference to the flowcharts may be changed by changing the order of the steps shown in the flowcharts, by repeating or omitting certain steps, and the like. As another example, although some implementations are described with reference to a particular macroblock format, other formats may be used. Moreover, the techniques and tools described with reference to interlaced forward predicted fields are also applicable to other types of images.
In various embodiments, flags and/or signals are used in the encoder and decoder bitstreams. Although specific flags and signals are described, it should be understood that this manner of description encompasses different conventions for flags and signals (e.g., 0 instead of 1).
The various techniques and tools may be used in combination or independently. Various embodiments implement one or more of the described techniques and tools. Some of the techniques and tools described herein may be used in a video encoder or decoder, or in some other system that is not explicitly limited to video encoding or decoding.
I. Computing environment
FIG. 19 illustrates one generalized example of a suitable computing environment (1900) in which several described embodiments may be implemented. The computing environment (1900) is not intended to suggest any limitation as to scope of use or functionality, as the techniques and tools may be implemented in diverse general-purpose or special-purpose computing environments.
Referring to FIG. 19, the computing environment (1900) includes at least one processing unit (1910) and memory (1920). In fig. 19, this most basic configuration (1930) is included within the dashed line. The processing unit (1910) executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The memory (1920) may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory (1920) stores software (1980) that implements a video encoder or decoder.
The computing environment may have additional features. For example, computing environment (1900) includes storage (1940), one or more input devices (1950), one or more output devices (1960), and one or more communication connections (1970). An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment (1900). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment (1900) and coordinates activities of the components of the computing environment (1900).
Storage (1940) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (1900). The memory (1940) stores instructions for software (1980) to implement a video encoder or decoder.
The input device (1950) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that may provide input to the computing environment (1900). For audio or video encoding, the input device (1950) may be a sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital format, or a CD-ROM or CD-RW that reads audio or video samples into the computing environment (1900). The output device (1960) may be a display, printer, speaker, CD writer, or another device that provides output from the computing environment (1900).
The communication connection (1970) allows communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Various techniques and tools may be described in the general context of computer-readable media. Computer readable media can be any available media that can be accessed within a computing environment. By way of example, and not limitation, with computing environment (1900), computer-readable media include memory (1920), storage (1940), communication media, and combinations of any of the above.
The various techniques and tools may be described in the general context of computer-executable instructions, such as those included in program modules, being executed on a target real or virtual processor in a computing environment. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or separated between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed in local or distributed computing environments.
For purposes of illustration, the detailed description uses terms such as "estimate," "compensate," "predict," and "apply" to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on the implementation.
Generalized video encoder and decoder
Fig. 20 is a block diagram of a generalized video encoder system (2000) in conjunction with which the various embodiments described may be implemented. Fig. 21 is a block diagram of a generalized video decoder (2100) in conjunction with which various embodiments described may be implemented.
The relationship shown between the modules within the encoder (2000) and decoder (2100) indicates the main information flow in the encoder and decoder; other relationships are not shown for simplicity. In particular, fig. 20 and 21 generally do not show auxiliary information indicating encoder settings, modes, tables, etc. for video sequences, frames, macroblocks, blocks, etc. This side information is typically sent in the output bitstream after entropy coding of the side information. The format of the output bitstream may be Windows Media Video version 9 or other format.
The encoder (2000) and decoder (2100) process video images, which may be video frames, video fields, or a combination of frames and fields. The bitstream syntax and syntax at the picture and macroblock level may depend on whether a frame or a field is used. There may also be changes to the macroblock organization and overall timing. The encoder (2000) and decoder (2100) are block-based and use a 4:2:0 macroblock format for frames, where each macroblock includes four 8 x 8 luma blocks (sometimes treated as one 16 x 16 macroblock) and two 8 x 8 chroma blocks. For fields, the same or different macroblock organization and format may be used. The 8 x 8 block may also be subdivided at different levels, for example at the frequency transform and entropy coding levels. Exemplary video frame organization is described in more detail in the next section
Depending on the desired implementation and type of compression, modules of the encoder or decoder may be added, omitted, divided into multiple modules, combined with other modules, and/or replaced with similar modules. In alternative embodiments, encoders or decoders having different module and/or other module configurations perform one or more of the described techniques.
A. Video frame organization
In some implementations, the encoder (2000) and decoder (2100) process video frames organized as follows. A frame contains a line of spatial information of the video signal. For progressive video, the lines contain samples starting at a time and continuing through successive lines to the bottom of the frame. The progressive video frame is divided into macroblocks such as the macroblock (2200) shown in fig. 22. The macroblock (2200) comprises four 8 x 8 luminance blocks (Y1 to Y4) and two 8 x 8 chrominance blocks, co-located with the four luminance blocks, but half the horizontal and vertical resolution, following the conventional 4:2:0 macroblock format. The 8 x 8 block may also be subdivided at different levels, for example at the frequency transform level (e.g., 8 x 4, 4 x 8 or 4 x 4DCT) and the entropy coding level. A progressive I frame is an intra-coded progressive video frame. A progressive P frame is a progressive video frame encoded using forward prediction, and a progressive B frame is a progressive video frame encoded using bi-directional prediction. Progressive P-frames and B-frames may include intra-coded macroblocks as well as different types of predicted macroblocks.
An interlaced video frame consists of two scans of a frame-one comprising the even lines of the frame (the top field) and the other comprising the odd lines of the frame (the bottom field). The two fields may represent two different time periods, or they may be from the same time period. Fig. 23A shows a portion of an interlaced video frame (2300) including alternating lines of a top field and a bottom field located in the top left portion of the interlaced video frame (2300).
Fig. 23B shows the interlaced video frame (2300) of fig. 23A organized as a frame (2330) for encoding/decoding. Interlaced video frame (2300) is divided into macroblocks such as macroblocks (2331) and (2332) using a 4:2:0 format as shown in fig. 22. In the luma plane, each macroblock (2331, 2332) includes 8 lines from the top field alternating with 8 lines from the bottom field for a total of 16 lines, and each line is 16 pixels long. (the actual organization and arrangement of luma and chroma blocks within a macroblock (2331, 2332) is not shown and may in fact differ for different coding decisions.) within a given macroblock, the top field information and the bottom field information may be coded jointly or separately at any of various stages. An interlaced I-frame is two intra-coded fields of an interlaced video frame, where a macroblock includes information about the two fields. An interlaced P frame is two fields of an interlaced video frame encoded using forward prediction, and an interlaced B frame is two fields of an interlaced video frame encoded using bi-directional prediction, where a macroblock includes information about the two fields. Interlaced P and B frames may include intra-coded macroblocks as well as different types of predicted macroblocks.
Fig. 23C shows the interlaced video frame (2300) of fig. 23A organized into fields (2360) for encoding/decoding. Each of two fields of an interlaced video frame (2300) is divided into macroblocks. The top field is divided into macroblocks such as macroblock (2361) and the bottom field is divided into macroblocks such as macroblock (2362). (these macroblocks also use the 4:2:0 format as shown in fig. 22, and the organization and arrangement of luma and chroma blocks within the macroblocks are not shown). In the luma plane, the macroblock (2361) includes 16 lines from the top field and the macroblock (2362) includes 16 lines from the bottom field, and each line is 16 pixels long. An interlaced I field is a field of a single, separate representation of an interlaced video frame. Interlaced P fields are fields of a single individual representation of an interlaced video frame encoded using forward prediction and interlaced B fields are fields of a single individual representation of an interlaced video frame encoded using bi-directional prediction. Interlaced P and B fields may include intra-coded macroblocks as well as different types of predicted macroblocks.
The term image generally refers to source, encoded or reconstructed image data. For progressive video, the image is a progressive video frame. For interlaced video, an image may refer to an interlaced video frame, the top field of a frame, or the bottom field of a frame, depending on the context.
Alternatively, the encoder (2000) and decoder (2100) are object based, using different macroblock or block formats, or performing operations on sets of pixels of different sizes or configurations than 8 x 8 blocks and 16 x 16 macroblocks.
B. Video encoder
Fig. 20 is a block diagram of a generalized video encoder system (2000). The encoder system (2000) receives a sequence of video images including a current image (2005) (e.g., a progressive video frame, an interlaced video frame, or a field of an interlaced video frame) and generates compressed video information (2095) as an output. Particular embodiments of the video encoder typically use a variant or complementary version of the generalized encoder (2000).
An encoder system (2000) compresses the predictive image and the key image. For the sake of illustration, fig. 20 shows the path of the key picture through the encoder system (2000) and the path for the forward predicted picture. Many components of the encoder system (2000) are used to compress both key pictures and predicted pictures simultaneously. The exact operations performed by these components may vary depending on the type of information being compressed.
A predicted picture (also referred to as a P-picture or B-picture for bi-directional prediction, or an inter-coded picture) is represented as a prediction (or difference) from one or more other pictures. The prediction residual is the difference between the predicted and original pictures. In contrast, key pictures (also referred to as I-pictures or intra-coded pictures) are compressed without reference to other pictures.
If the current picture (2005) is a forward predicted picture, a motion estimator (2010) estimates motion of the macroblock or other set of pixels of the current picture (2005) relative to a reference picture, the reference picture being a reconstructed previous picture (2025) buffered in a picture memory (2020). In an alternative embodiment, the reference picture is a later picture or the current picture is bi-directionally predicted. The motion estimator (2010) may estimate on a pixel-by-pixel, 1/2 pixel, 1/4 pixel, or other increment basis, and may switch the accuracy of the motion estimation on an image-by-image basis or on other basis. The accuracy of the motion estimation may be the same or different in the horizontal and vertical directions. The motion estimator (2010) outputs motion information (2015), such as motion vectors, as side information. The motion compensator (2030) applies the motion information (2015) to the reconstructed previous image (2025) to form a motion compensated current image (2035). However, the prediction is rarely perfect and the difference between the motion compensated current image (2035) and the original current image (2005) is the prediction residual (2045). Alternatively, the motion estimator and motion compensator apply another type of motion estimation/compensation.
The frequency transformer (2060) converts the spatial domain video information into frequency domain (i.e., spectral) data. For block-based video images, a frequency transformer (2060) applies a DCT, a variant of a DCT, to a block of pixel data or prediction residual data, producing a DCT transform block. Alternatively, the frequency transformer (2060) applies another conventional frequency transform such as a fourier transform or uses wavelet or subband analysis. The frequency transformer (2060) applies 8 × 8, 8 × 4, 4 × 8, 4 × 4, or other size frequency transform (e.g., DCT) to the prediction residual of the predicted image.
A quantizer (2070) then quantizes the block of spectral data coefficients. The quantizer applies a uniform scalar quantization to the spectral data, the step size of which varies on an image-by-image basis or other basis. Alternatively, the quantizer applies another type of quantization to the spectral data coefficients, such as non-uniform, vector or non-adaptive quantization, or quantizes the spatial domain data directly in an encoder system that does not use a frequency transform. In addition to adaptive quantization, the encoder (2000) may use frame dropping, adaptive filtering, or other techniques for rate control.
If a given macroblock in a predicted image does not have certain types of information (e.g., no motion information for the macroblock and no residual information), the encoder (2000) may encode the macroblock as a skipped macroblock. If so, the encoder signals the skipped macroblock in the output bitstream of the compressed video information (2095).
When the reconstructed current image is required for subsequent motion estimation/compensation, an inverse quantizer (2076) performs inverse quantization on the quantized spectral data coefficients. The inverse frequency transformer (2066) then performs the inverse operation of the frequency transformer (2060), thereby generating reconstructed prediction residuals (for predicted images) or reconstructed samples (intra-frame coded images). If the encoded image (2005) is a predicted image, the reconstructed prediction residual is added to the motion compensated prediction (2035) to form a reconstructed current image. The picture store (2020) buffers the reconstructed current picture for use in predicting a next picture. In some embodiments, the encoder applies a blocking filter to the reconstructed frame to adaptively smooth discontinuities between blocks of the frame.
The entropy encoder (2080) compresses the output of the quantizer (2070) as well as some side information (e.g., motion information (2015), quantization step size). Typical entropy encoding techniques include arithmetic coding, differential coding, huffman coding, run-length coding, LZ coding, lexicographic coding, and combinations thereof. The entropy encoder (2080) typically uses different encoding techniques for different kinds of information (e.g., DC coefficients, AC coefficients, different kinds of side information) and may select from multiple code tables within a particular encoding technique.
The entropy encoder (2080) places the compressed video information (2095) in a buffer (2090). The buffer level indicator is fed back to the bit rate adaptation module. The compressed video information (2095) is consumed from the buffer (2090) at a constant or relatively constant bit rate and stored for subsequent streaming at that bit rate. Thus, the level of the buffer (2090) is primarily a function of the entropy of the filtered, quantized video information, which affects the efficiency of entropy coding. Alternatively, the encoder system (2000) transmits the compressed video information next to the compressed stream, and the level of the buffer (2090) also depends on which information is consumed from the buffer (2090) for transmission.
The compressed video information (2095) may be channel encoded for transmission over a network, either before or after the buffer (2090). Channel coding may apply error detection and correction data to the compressed video information (2095).
C. Video decoder
Fig. 21 is a block diagram of a generic video decoder system (2100). A decoder system (2100) receives information (2205) about a compressed sequence of video images and produces an output including a reconstructed image (2105) (e.g., a progressive video frame, an interlaced video frame, or a field of interlaced video frames). Particular embodiments of video decoders typically use a variant or complementary version of the generalized decoder (2100).
The decoder system (2100) decompresses the predicted images and the key images. For the sake of illustration, fig. 21 shows the path of the key picture through the decoder system (2100) and the path for the forward predicted picture. Many components of the decoder system (2100) are used to decompress key pictures and predicted pictures. The exact operations performed by these components may vary depending on the type of information being decompressed.
The buffer (2190) receives information (2195) about the compressed video sequence and makes the received information available to the entropy decoder (2180). The buffer (2190) typically receives this information at a relatively constant rate over time and includes a jitter buffer to smooth out short term variations in bandwidth or transmission. The buffer (2190) may also include a playback buffer. Alternatively, the buffer (2190) receives information at a varying rate. Before or after the buffer (2190), the compressed video information may be channel decoded and processed for error detection and correction.
The entropy decoder (2180) decodes entropy encoded quantized data and entropy encoded side information (e.g., motion information (2115), quantization step size), typically applying the inverse of the entropy encoding performed in the encoder. Entropy decoding techniques include arithmetic decoding, differential decoding, huffman decoding, run-length decoding, LZ decoding, lexicographic decoding, and combinations thereof. The entropy decoder (2180) typically uses different decoding techniques for different kinds of information (e.g., DC coefficients, AC coefficients, different kinds of side information) and may select from among multiple code tables in a particular decoding technique.
If the image to be reconstructed (2105) is a forward predicted image, the motion compensator (2130) applies the motion information (2115) to the reference image (2125) to form a prediction (2135) of the reconstructed image (2105). For example, the motion compensator (2130) uses the macroblock motion vectors to find the macroblock in the reference picture (2125). The picture buffer (2120) stores previously reconstructed pictures for use as reference pictures. The motion compensator (2130) may compensate for motion in pixels, 1/2 pixels, 1/4 pixels, or other increments, and may switch the accuracy of the motion compensation on an image-by-image basis or other basis. The accuracy of the motion compensation may be the same or different horizontally and vertically. Alternatively, the motion compensator applies another type of motion compensation. The prediction of the motion compensator is rarely perfect, so the decoder (2100) also reconstructs the prediction residual.
When the decoder requires a reconstructed picture for subsequent motion compensation, the picture store (2120) buffers the reconstructed picture for use in predicting the next picture. In some embodiments, the encoder applies a blocking filter to the reconstructed frame to adaptively smooth discontinuities between blocks of the frame.
An inverse quantizer (2170) inverse quantizes the entropy-decoded data. In general, the inverse quantizer applies a uniform scalar inverse quantization to the entropy decoded data, where the step size varies on a picture-by-picture basis or other basis. Alternatively, the inverse quantizer applies another type of inverse quantization to the data, such as non-uniform vector quantization or non-adaptive inverse quantization, or inverse quantization of the spatial domain data directly in a decoder system that does not use inverse frequency transform.
An inverse frequency transformer (2160) converts the quantized frequency domain data into spatial domain video information. For block-based video images, an inverse frequency transformer (2160) applies IDCT or a variant of IDCT to the DCT coefficient block, thereby generating pixel data or prediction residual data for the key or predicted image, respectively. Alternatively, the inverse frequency transformer (2160) applies another conventional inverse frequency transform, such as an inverse fourier transform or using wavelet or subband synthesis. The inverse frequency transformer (2160) applies an inverse frequency transform (e.g., IDCT) of 8 × 8, 8 × 4, 4 × 8, 4 × 4, or other size to the prediction residual of the predicted image.
Interlaced P frame
A typical interlaced video frame consists of two fields (e.g., a top field and a bottom field) that are scanned at different times. In general, it is more efficient to encode a still region of an interlaced video frame by encoding two fields together ("frame mode" encoding). On the other hand, it is generally more efficient to encode the motion regions of an interlaced video frame by encoding each field separately (the "field mode" encoding), since the two fields tend to have different motions. A forward predicted interlaced video frame can be encoded as two separate forward predicted fields-interlaced P fields. Encoding fields separately for a forward predicted interlaced video frame may be effective when there is high motion, for example, throughout the interlaced video frame, and thus there are more differences between fields.
Alternatively, forward predicted interlaced video frames may be encoded using a mix of field coding and frame coding as interlaced P frames. For a macroblock of an interlaced P frame, the macroblock includes rows of pixels of the top and bottom fields, and the rows may be coded together in a frame coding mode or separately in a field coding mode.
Interlaced P-fields refer to one or more previously decoded fields. For example, in some implementations, interlaced P fields reference one or two previously decoded fields, whereas interlaced B fields reference at most two previous and two future reference fields (i.e., a maximum of four reference fields in total). (techniques for encoding and decoding Interlaced P Fields are described in detail below.) alternatively, in accordance with some embodiments, particularly for more information about Interlaced P Fields and 2-reference Interlaced P Fields, see U.S. patent application No. 10/857,473, filed on 5.27.2004, entitled "predictive Motion Vectors for Fields of Forward-predicted Interlaced Video Frames".
Number of reference fields in interlaced P field
In some embodiments, two previously encoded/decoded fields may be used as reference fields when performing motion compensated prediction of a single current interlaced P field. In general, the ability to use two reference fields results in better compression efficiency than when motion compensated prediction is limited to one reference field. However, when two reference fields are available, the signaling overhead is high because additional information is sent to indicate which of the two fields provides a reference for each macroblock or block having a motion vector.
In some cases, the benefit of having more possible motion compensation predictors per motion vector (two reference fields compared to one reference field) does not outweigh the overhead required to signal the reference field selection. For example, when the best references are all from one of the two possible reference fields, it may be advantageous to choose to use a single reference field instead of two. This is usually due to a scene change that causes only one of the two reference fields to come from the same scene as the current field. Alternatively, only one reference field is available, such as at the beginning of the sequence. In these cases, it is more efficient to signal at the field level of the current P field that only one reference field is used and what this one reference field is, and to have this decision applied to the macroblocks and blocks within the current P field. The reference field selection information then no longer needs to be sent with each macroblock or block having a motion vector.
A. Number of reference fields in different schemes
One scheme allows two previously encoded/decoded fields to be used as reference fields for the current P field. The reference field used by the motion vector (for a macroblock or block) is signaled for the motion vector, as is other information for the motion vector. For example, for motion vectors, the signaled information indicates: (1) a reference field; and (2) a position in a reference field of a block or macroblock predictor of a current block or macroblock associated with the motion vector. Alternatively, the reference field information and motion vector information are signaled as described in one of the combined implementations of section XII.
In another scheme, only one previously encoded/decoded field is used as a reference field for the current P field. For a motion vector, no reference field indicating the reference of the motion vector is needed. For example, for a motion vector, the signaled information only indicates the position in the reference field of the predictor of the current block or macroblock associated with the motion vector. Alternatively, the motion vector information is signaled as described in one of the combined implementations of section XII. Motion vectors in one reference field scheme are typically encoded with fewer bits than the same motion vectors in two reference field schemes.
For either scheme, the updating of the buffer and image memory for subsequent motion compensated reference fields is simple. The one or more reference fields of the current P field are one or both of the most recent and second most recent I or P fields prior to the current P field. Since the location of the candidate reference field is known, the encoder and decoder can update the motion compensated reference picture buffer for the next P field automatically and without buffering a management signal.
Alternatively, the encoder and decoder use one or more other schemes for interlacing multiple reference fields of a P-field.
B. Signal representation example
The particular example of signaling described in this section and in the combined implementation of section XII signals how many reference fields are used for the current P field and which candidate reference field to use when one reference field is used. For example, a 1-bit field in the P-field header (called NUMREF) indicates whether the P-field uses one or two previous fields as references. If NUMEREF is 0, only one reference field is used. If NUMEREF is 1, then two reference fields are used. If NUMREF is 0, another 1-bit field (called REFFIELD) exists and indicates which of the two half-frames is used as a reference. If REFFIELD is 0, then the temporally closer field is used as the reference field. If REFFIELD is 1, the temporally farther of the two candidate reference fields is used as the reference field of the current P field. Alternatively, the encoder and decoder use other and/or additional signals for reference field selection.
C. Position of reference field
FIGS. 24A-24F illustrate the locations of reference fields that may be used in motion compensated prediction of interlaced P fields. P fields may use one or two previously encoded/decoded fields as references. Specifically, fig. 24A to 24F show examples of reference fields where NUMREF ═ 0 and NUMREF ═ 1.
FIGS. 24A and 24B show an example in which two reference fields are used for the current P field. (NUMREF 1.) in fig. 24A, the current field refers to the temporally previous upper and lower fields. The middle interlaced B field is not used as a reference field. In fig. 24B, the current field refers to the top and bottom fields of an interlaced video frame that immediately precedes the interlaced video frame containing the current field.
Fig. 24C and 24D show examples in which one reference field (NUMREF ═ 0) is used for the current P field, and this reference field is the temporally closest reference field (REFFIELD ═ 0). The polarity of the reference field is opposite to that of the current P field, which means that, for example, if the current P field is from an even row, the reference field is from an odd row. In fig. 24C, the current field references the lower field in the temporally previous interlaced video frame and does not reference the less recent upper field in the interlaced video frame. Again, the middle interlaced B field is not allowed to be a reference field. In fig. 2D, the current field references the lower field of the interlaced video frame, which is immediately before the interlaced video frame containing the current field, rather than the less recent upper field.
Fig. 24E and 24F show examples in which one reference field is used for the current P field (NUMREF ═ 0), and the one reference field is the temporally second closest reference field (reffeld ═ 1). The polarity of the reference field is the same as the polarity of the current field, meaning, for example, if the current field is from an even row, the reference field is also from an even row. In fig. 24E, the current field references the top field of a temporally previous interlaced video frame, but does not reference the more recent bottom field. Again, the middle interlaced B field is not allowed to be a reference field. In fig. 24F, the current field refers to the top field instead of the more recent bottom field.
Alternatively, the encoder and decoder use reference fields at other and/or additional locations or timings for motion compensated prediction of interlaced P fields. For example, a reference field within the same frame is allowed to be the current P-field. Alternatively, the upper or lower half of the frame may be encoded/decoded first.
D. Coding technique
An encoder, such as encoder (2000) of fig. 20, signals which of a plurality of reference field schemes is used to encode an interlaced P field. For example, the encoder performs the technique (2500) shown in fig. 25A.
For a given interlaced P field, the encoder signals (2510) the number of reference fields used in motion compensated prediction of the interlaced P field. For example, the encoder uses a single bit to indicate whether one or two reference fields are used. Alternatively, the encoder uses another signaling/coding scheme for the number of reference fields.
The encoder determines (2520) whether to use one or two reference fields. If one reference field is used, the encoder signals 2530 the reference field selection for the interlaced P field. For example, the encoder uses a single bit to indicate whether the temporally closest or temporally second closest reference field (previous I or P field) is used. Alternatively, the encoder uses another signaling/coding scheme for reference field selection for P fields.
If two reference fields are used, the encoder signals 2540 the reference fields of the motion vectors for blocks, macroblocks or other portions of the interlaced P field. For example, the encoder jointly encodes the reference field selection of motion vectors along with differential motion vector information. Alternatively, the encoder uses another signaling/coding scheme for reference field selection of motion vectors. The encoder repeats (2545, 2540) the signaling of the next motion vector until the P field has no more motion vectors to signal. (for simplicity, FIG. 25A does not show the levels of macroblock and block coding and corresponding signaling (2540), which may occur after or left and right of the signaling of the reference field selection (2540.) instead, FIG. 25A focuses on the repeated signaling of the reference field selection for multiple motion vectors in the P field.)
Alternatively, the encoder performs another technique to indicate which of a plurality of reference field schemes to use to encode an interlaced P field. For example, the encoder has more and/or different options for the number of reference fields.
For simplicity, fig. 25A does not show the various ways in which other aspects of the encoding and decoding of this technique (2500) may be integrated. Various combined implementations are described in detail in section XII.
E. Decoding technique
A decoder, such as decoder (2100) of fig. 21, receives and decodes a signal indicating which of a plurality of schemes is to be used to decode an interlaced P field. For example, the decoder performs the technique shown in fig. 25B (2550).
For a given interlaced P field, the decoder receives and decodes (2560) a signal regarding the number of reference fields used in motion compensated prediction of the interlaced P field. For example, the decoder receives and decodes a single bit to indicate whether one or two reference fields are used. Alternatively, the decoder uses another decoding mechanism for the number of reference fields.
The decoder determines (2570) whether to use one or two reference fields. If one reference field is used, the decoder receives and decodes (2580) a signal selected for the reference field of the interlaced P field. For example, the decoder receives and decodes a single bit to indicate whether the temporally closest or temporally second closest reference field (previous I or P field) is used. Alternatively, the decoder uses another decoding mechanism for reference field selection for P fields.
If two reference fields are used, the decoder receives and decodes (2590) a signal selected for a reference field of motion vectors for a block, macroblock, or other portion of an interlaced P field. For example, the decoder decodes the reference field selection that is jointly encoded with the differential motion vector information for the motion vector. Alternatively, the decoder uses another decoding mechanism for reference field selection of motion vectors. The decoder repeats (2595, 2590) the reception and decoding of the next motion vector until there are no more motion vectors signaled for the P field. (for simplicity, FIG. 25B does not show the stages of macroblock and block decoding that may occur after or left and right of the reception and decoding (2590) of the reference field selection
Alternatively, the decoder performs another technique to determine which of a plurality of reference field schemes to use to decode an interlaced P field. For example, the decoder has more and/or different options for the number of reference fields.
For simplicity, fig. 25B does not show various methods by which the techniques (2550) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
V. signaling macroblock mode information for interlaced P fields
In some embodiments, various macroblock mode information for macroblocks of interlaced P fields are jointly grouped for signaling. A macroblock of an interlaced P field can be coded with any of several different syntax elements, either present or absent, in many different modes. Specifically, the type of motion compensation (e.g., 1MV, 4MV, or intra coding), whether or not there is an encoded block pattern in the bitstream for the macroblock, and (for the 1MV case) whether or not there is motion vector data in the bitstream for the macroblock are jointly coded. Different code tables may be used for different cases of macroblock mode information, which results in a more efficient overall compression of the information.
The particular example of signaling described in this section and in the combined implementation of section XII signals macroblock mode information in terms of variable length coded MBMODE syntax elements. The table selection of MBMODE is signaled by fixed-length coded field-level elements MBMODE ab. Alternatively, the encoder and decoder use other and/or additional signals to signal the macroblock mode information.
A. Macroblock mode for interlaced P-fields of different types
In general, a macroblock mode indicates a macroblock type (1MV, 4MV or intra coding), the presence/absence of a coded block mode of a macroblock, and the presence/absence of motion vector data of the macroblock. The information indicated by the macroblock mode syntax element depends on whether the interlaced P field is coded as a 1MV field (with intra coding and/or 1MV macroblocks) or a mixed MV field (with intra coding, 1MV and/or 4MV macroblocks).
In a 1MV interlaced P field, the macroblock mode elements of a macroblock jointly represent the macroblock type (intra or 1MV), the presence/absence of coded block mode elements of the macroblock, and the presence/absence of motion vector data (when the macroblock type is 1MV, but not when it is intra coded). The table in fig. 26 shows the complete event space for macroblock information signaled by MBMODE in a 1MV interlaced P-field.
In a hybrid MV interlaced P field, the macroblock mode elements of a macroblock jointly represent the macroblock type (intra coded or 1MV or 4MV), the presence/absence of coded block mode of the macroblock, and the presence/absence of motion vector data (when the macroblock type is 1MV, but not when it is intra coded or 4 MV). The complete event space for macroblock information signaled by MBMODE for a hybrid MV interlaced P-field is shown in fig. 27.
If the macroblock mode indicates that motion vector data is present, the motion vector data is present in the macroblock layer and signals a motion vector differential, which is combined with the motion vector predictor to reconstruct the motion vector. If the macroblock mode element indicates that motion vector data is not present, the motion vector difference is assumed to be zero, and thus the motion vector is equal to the motion vector predictor. The macroblock mode element thus effectively signals when motion compensation with only one motion vector predictor (not modified by any motion vector difference) is to be used.
One of a plurality of different VLC tables is used to signal macroblock mode elements of an interlaced P field. For example, eight code tables for MBMODE for macroblocks of a hybrid MV interlaced P field are shown in fig. 47H, and eight different code tables for MBMODE for 1MV interlaced P fields are shown in fig. 47I. Table selection is indicated by mbmode ab signaled at the field layer. Alternatively, the encoder and decoder use other and/or additional codes to signal macroblock mode information and table selection.
B. Coding technique
An encoder, such as the encoder (2000) of fig. 20, encodes macroblock mode information for interlaced P fields. For example, the encoder performs the technique shown in fig. 28A (2800).
For a given interlaced P field, the encoder selects (2810) a code table for encoding macroblock mode information for macroblocks of the interlaced P field. For example, the encoder selects one of the VLC tables shown in fig. 47H or 47I. Alternatively, the encoder selects from other and/or additional tables.
The encoder signals (2820) the selected code table in the bitstream. For example, the encoder signals the FLC indicating the selected code table given the type of interlaced P field. Alternatively, the encoder uses a different signaling mechanism for code table selection, e.g., uses VLC for code table selection.
The encoder selects (2830) a macroblock mode for the macroblock from a plurality of available macroblock modes. For example, the encoder selects a macroblock mode that indicates the macroblock type, whether an encoded block mode exists, and (if applicable to the macroblock type) whether motion vector data exists. Various combinations of options for MBMODE are listed in fig. 26 and 27. Alternatively, the encoder selects from other and/or additional macroblock modes for other and/or additional macroblock combination options.
The encoder signals (2840) the selected macroblock mode using the selected code table. Generally, the encoder signals the macroblock mode as VLC using the selected VLC table. The encoder repeats (2845, 2830, 2840) the selection of macroblock modes and signaling until the P field has no more macroblock modes to signal. (for simplicity, FIG. 28A does not show the levels of macroblock and block coding and corresponding signaling that may occur after or to the left and right of the signaling of the selected macroblock mode (2840.) instead, FIG. 28A focuses on repeating the macroblock mode signaling the macroblock using the code table selected for the P field.)
Alternatively, the encoder performs another technique to encode macroblock mode information for macroblocks of interlaced P fields. For example, although fig. 28A shows code table selection prior to mode selection, in many common coding scenarios, the encoder first selects macroblock modes for macroblocks, then selects code tables that effectively signal those selected macroblock modes, and then signals code table selection and these modes. Also, while fig. 28A shows code table selection occurring per interlaced P-field, alternatively, the code table is selected on a more frequent, less frequent, or non-periodic basis, or the encoder skips code table selection altogether (always using the same code table). Alternatively, the encoder may select a code table from the context information (signaling that code table selection is not necessary).
For simplicity, fig. 28A does not show various methods by which the techniques (2800) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
C. Decoding technique
A decoder, such as decoder (2100) of fig. 21, receives and decodes macroblock mode information for macroblocks of an interlaced P field. For example, the decoder performs the technique shown in fig. 28B (2850).
For a given interlaced P field, the decoder receives and decodes (2860) a code table selection of a code table to be used for decoding macroblock mode information for macroblocks of the interlaced P field. For example, the decoder receives and decodes the FLC indicating the selected code table given the type of interlaced P field. Alternatively, the decoder works with a different signaling mechanism for code table selection, e.g., a signaling mechanism using VLC for code table selection.
The decoder selects (2870) a code table based on the decoded code table selection (and possibly other information). For example, the decoder selects one of the VLC tables of the MBMODE shown in fig. 47H or 47I. Alternatively, the decoder selects from other and or additional tables.
The decoder receives and decodes (2880) a macroblock mode selection for a macroblock. For example, the macroblock mode selection indicates the macroblock type, whether the coded block mode is present, and (if applicable to the macroblock type) whether motion vector data is present. Various combinations of these options for MBMODE are listed in fig. 26 and 27. Alternatively, the macroblock mode is one of the other and/or additional macroblock modes of the other and/or additional macroblock combination options. The decoder repeats (2885, 2880) the receiving and decoding of the macroblock mode for the next macroblock until the P field has no more macroblock modes to receive and decode. (for simplicity, FIG. 28B does not show the stages of macroblock-to-block decoding that may occur after or around the receipt and decoding of macroblock mode selection (2880.) conversely, FIG. 28B focuses on repeatedly receiving/decoding macroblock mode selections for macroblocks in a P field using a code table selected for the P field.)
Alternatively, the decoder performs another technique to decode macroblock mode information for macroblocks of interlaced P fields. For example, although fig. 28B shows that scanning P field code table selection occurs every other row, alternatively code tables are selected on a more frequent, less frequent or non-periodic basis, or the decoder skips code table selection altogether (always using the same code table). Alternatively, the decoder may select a code table from the context information (making reception and decoding of the code table selection unnecessary).
For simplicity, fig. 28B does not show the various methods by which the technique (2850) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
Reference field selection in two reference field interlaced P-fields
In some embodiments, when performing motion compensated prediction for a single current interlaced P field, two previously encoded/decoded fields are used as reference fields. (see, e.g., section IV.) the signaled information indicates which of the two fields provides a reference for each macroblock (or block) with a motion vector.
In this section, various techniques and tools are described for efficiently signaling which of a plurality of previously encoded/decoded reference fields are used to provide motion compensated prediction information when encoding or decoding a current macroblock or block. For example, the encoder and decoder implicitly derive the main and non-primary reference fields for the current macroblock or block based on previously encoded motion vectors in the interlaced P field. (or, correspondingly, the encoder and decoder derive the primary and non-primary motion vector predictor polarities.) the signaled information then indicates whether to use the primary or non-primary reference field for motion compensation of the current macroblock or block.
A. Primary and non-primary reference fields and predictors
Interlaced fields may be encoded without motion compensation (I-field), forward motion compensation (P-field), or both forward and backward motion compensation (B-field). An interlaced P field may refer to two reference fields, which are previously encoded/decoded I or P fields. FIGS. 24A and 24B illustrate an example in which two reference fields are used for the current P field. The two reference fields are of opposite polarity. One reference field represents the odd lines of a video frame and the other reference field represents the even lines of a video frame (not necessarily the same frame that includes the odd line reference field). The P field currently being encoded or decoded may use one or both of the two previously encoded/decoded fields as a reference in motion compensation. Thus, the motion vector data of a macroblock or block of the P field indicates in some way: (1) which field is to be used as a reference field in motion compensation; and (2) the displacement/position at which the reference field of sample values to be used in motion compensation is located.
Signaling the reference field selection information consumes an inefficient number of bits. However, for a given motion vector, the number of bits can be reduced by predicting which reference field will be used for the motion vector and then signaling whether the predicted reference field is actually used as the reference field for the motion vector.
For example, for each macroblock or block that uses motion compensation in an interlaced P-field, the encoder or decoder analyzes up to three previously encoded/decoded motion vectors from neighboring macroblocks or blocks. From them, the encoder or decoder derives the main and non-primary reference fields. In practice, the encoder or decoder determines which of the two possible reference fields is used by the majority of the motion vectors of neighboring macroblocks or blocks. The field that is referenced by more motion vectors in neighboring macroblocks or blocks is the primary reference field and the other reference field is the non-primary reference field. Likewise, the polarity of the primary reference field is the primary motion vector predictor polarity, and the polarity of the non-primary reference field is the non-primary motion vector predictor polarity.
The pseudo code in fig. 29 illustrates one technique for an encoder or decoder to determine the main and non-main reference fields. In this pseudo code, the terms "same field" and "opposite field" are relative to the current interlaced P field. For example, if the current P field is an even field, then the "same field" is an even line reference field and the "opposite field" is an odd line reference field. Fig. 5A through 10 illustrate the locations of neighboring macroblocks and blocks from which predictors A, B and C are taken. In the pseudo code of fig. 29, the main field is the field referenced by most of the candidate motion vector predictors. In the case of bisection, the motion vector derived from the opposite field is considered the dominant motion vector predictor. Intra-coded macroblocks are not considered in the calculation of the primary/non-primary predictor. If all candidate predictor macroblocks are intra-coded, then the main and non-main motion vector predictors are set to zero and the main predictor is taken from the opposite field.
Alternatively, the encoder and decoder analyze other and/or additional motion vectors from neighboring macroblocks or blocks and/or apply different decision logic to determine the primary and non-primary reference fields. Alternatively, the encoder and decoder use different mechanisms to predict which reference field will be selected for a given motion vector in an interlaced P field.
In some cases, 1-bit information indicating whether the main or non-main field is used is jointly encoded together with the differential motion vector information. Thus, the bits/symbols for this 1-bit information can more accurately match the true symbol entropy. For example, the dominant/non-dominant selector is signaled as part of the vertical component of the motion vector differential shown in the pseudo code of fig. 30. Where MVY is the vertical component of the motion vector and PMVY is the vertical component of the motion vector predictor. In effect, the vertical motion vector differential jointly encodes the reference field selector and the vertical offset differential as follows:
DMVY=(MVY-PMVY)*2+p,
where p is 0 if the main reference field is used and 1 if the non-main reference field is used. As a numerical example: assuming the current block is dipolar, the actual reference field of the motion vector is dipolar, and the primary predictor is the opposite field (in other words, the primary reference field is an odd-polarity reference field). It is also assumed that the vertical displacement of the motion vector is 7 units (MVY ═ 7) and the vertical component of the motion vector predictor is 4 units (PMVY ═ 4). Since the current reference field and the main predictor are of opposite polarity, DMVY is (7-4) × 2+1 is 7.
Alternatively, the dominant/non-dominant selector is jointly encoded in some other way along with the motion vector difference information. Alternatively, the master/non-master selector is signaled by another mechanism.
B. Coding technique
An encoder, such as the encoder (2000) of fig. 20, determines the main and non-primary reference fields of the candidate motion vector predictor during encoding of the motion vectors of the two reference fields interlaced P fields. For example, the encoder performs the technique shown in fig. 31A on the motion vector of the current macroblock or block (3100). Typically, the encoder performs some form of motion estimation in both reference fields to obtain motion vectors and reference fields. The motion vectors are then encoded according to the technique (3100).
The encoder determines (3110) a motion vector predictor for the same reference field polarity as the motion vector. For example, the encoder determines that the motion vector predictor described in section VII is used for the reference field associated with the motion vector. Alternatively, the encoder uses another mechanism to determine the motion vector predictor.
The encoder determines (3120) the primary and non-primary reference field polarities of the motion vectors. For example, the encoder follows the pseudo code shown in fig. 29. Alternatively, the encoder uses another technique to determine the dominant and non-dominant polarities.
The encoder signals (3125) a dominant/non-dominant polarity selector in the bitstream that indicates whether dominant or non-dominant polarity should be used for the motion vector predictor and reference field associated with the motion vector. For example, the encoder jointly encodes the master/non-master polarity selector with other information using joint VLC. Alternatively, the encoder uses another mechanism to signal the selector, e.g., arithmetic coding of one bit indicating the selector. Prediction of the reference field polarity of the motion vector predictor reduces the entropy of the selector information, which allows for more efficient encoding of the selector information.
The encoder calculates (3130) a motion vector difference from the motion vector predictor and the motion vector, and signals (3140) information of the motion vector difference information.
Alternatively, the encoder performs another mechanism to determine the dominant and non-dominant polarity of motion vector prediction during encoding of motion vectors for two reference fields of an interlaced P field. Also, while FIG. 31A shows separate signaling of the main and non-main selectors and the motion vector difference information, in various embodiments this precise information is signaled jointly. Various other reordering is possible, including determining the motion vectors after determining the dominant/non-dominant polarity (to take into account the overhead cost of the selector signal representation in the motion vector selection process).
For simplicity, fig. 31A does not show the various methods by which the technique (3100) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
C. Decoding technique
A decoder, such as the decoder (2100) of fig. 21, determines the main and non-primary reference field polarities of the motion vector predictor candidates during decoding of the motion vectors of the two reference field interlaced P fields. For example, the decoder performs the technique (3150) shown in fig. 3B.
The decoder determines (3160) the main and non-main reference field polarities of the motion vector of the current macroblock or block. For example, the decoder follows the pseudo code shown in fig. 29. Alternatively, the decoder uses another technique to determine the dominant and non-dominant polarities.
The decoder receives and decodes (3165) a dominant/non-dominant polarity selector in the bitstream, which indicates whether dominant or non-dominant polarity should be used for the motion vector predictor and reference field associated with the motion vector. For example, the decoder receives and decodes a master/non-master polarity selector that has been jointly encoded using joint VLC along with other information. Alternatively, the decoder receives and decodes the selector signaled using another mechanism, e.g., arithmetic decoding of one bit indicating the selector.
The decoder determines (3170) a motion vector predictor for the reference field to be used with the motion vector. For example, the decoder determines the motion vector predictor described in section VII for the signaled polarity. Alternatively, the decoder determines the motion vector predictor with another mechanism.
The decoder receives and decodes (3180) information of the motion vector difference and reconstructs (3190) the motion vector from the motion vector difference and the motion vector predictor.
Alternatively, the decoder performs another technique to determine the dominant and non-dominant polarity of the motion vector predictor during decoding of the motion vectors of the two reference fields interlaced P fields. For example, although fig. 31B shows separate signaling of the dominant/non-dominant selector and the motion vector differential information, alternatively, this information is signaled jointly. Various other reordering is possible.
For simplicity, fig. 31B does not illustrate various methods by which the techniques (3150) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
Hybrid motion vector prediction for interlaced P-fields
In some embodiments, motion vectors are signaled as a difference relative to motion vector predictors in order to reduce the bit rate associated with signaling motion vectors. The performance of the motion vector difference signal representation depends in part on the quality of the motion vector prediction, which is typically improved when considering multiple candidate motion vector predictors from a surrounding area of the current macroblock, block, etc. However, in some cases, the use of multiple candidate predictors compromises the quality of motion vector prediction. This occurs, for example, when a motion vector predictor is computed as the median of a diverse set of candidate predictors (e.g., with large variations between motion vector predictors).
Thus, in some embodiments, the encoder and decoder perform hybrid motion vector prediction on the motion vectors of interlaced P fields. The hybrid motion vector prediction mode is used when vectors of causal neighboring frames constituting a current macroblock or block are different according to some criteria. In this mode, rather than using the median of the set of candidate predictors as the motion vector predictor, a particular motion vector from the set (e.g., up predictor, left predictor) is signaled by a selector bit or codeword. This helps to improve the motion vector prediction at motion discontinuities in interlaced P fields. For 2 reference field interlaced P fields, dominant polarity is also taken into account when examining the hybrid motion vector prediction conditions.
A. Motion vector prediction for interlaced P-fields
Hybrid motion vector prediction is a special case of normal motion vector prediction for interlaced P fields. As explained earlier, the motion vector is reconstructed by adding the motion vector difference (which is signaled in the bitstream) to the motion vector predictor. The predictor is calculated from at most three neighboring motion vectors. Fig. 5A to 10 show the positions of neighboring macroblocks and blocks from which the motion vector predicted predictors A, B and C are taken. (these figures show macroblocks and blocks of the progressive P field, but also apply to macroblocks and blocks of the interlaced P field, as described in section VI.)
If an interlaced P field references only one previous field, a single motion vector predictor is calculated for each motion vector of the P field. For example, fig. 51A and 51B (or alternatively fig. 60A and 60B) illustrate how the motion vector predictor is calculated for the motion vector of the 1 reference field interlaced P field, as discussed in section XII.
If two reference fields are used for an interlaced P field, two motion vector predictors are possible for each motion vector of the P field. Two motion vector predictors may be calculated and then one selected, or only one motion vector predictor may be calculated by first determining the predictor selection. For example, one possible motion vector predictor is from the primary reference field and another possible motion vector predictor is from the non-primary reference field, where the terms primary and non-primary are as described in section VI. The main and non-main reference fields have opposite polarities, so one motion vector predictor is from a reference field of the same polarity as the current P field, and the other motion vector predictor is from a reference field of opposite polarity. For example, the pseudo code sum in FIGS. 52A through 52N illustrates the process of calculating motion vector predictors for the motion vectors of the P field of the 2 reference field, as discussed in detail in section XII. The variables samefieldpred _ x and samefieldpred _ y represent the horizontal and vertical components, respectively, of the motion vector predictor from the same field, and the variables opendefieldpred _ x and opendefieldpred _ y represent the horizontal and vertical components, respectively, of the motion vector predictor from the opposite field. The variable dominantpredictor indicates which field contains the main predictor. predictor _ flag indicates whether the dominant or non-dominant predictor for the motion vector is used. Alternatively, the pseudo code in fig. 61A to 61F is used.
B. Hybrid motion vector prediction for interlaced P fields
For hybrid motion vector prediction of motion vectors, the encoder and decoder check the hybrid motion vector prediction condition of the motion vectors. In general, the condition is related to the degree of change in the motion vector predictor. The estimated predictor may be a candidate motion vector predictor and/or a motion vector predictor calculated using normal motion vector prediction. If a condition is met (e.g., the degree of change is high), one of the original candidate motion vector predictors is typically used in place of the normal motion vector predictor. The encoder signals which hybrid motion vector predictor is to be used, and the decoder receives and decodes the signal. When the variation between predictors is small (which is the usual case), the hybrid motion vector predictor is not used.
The encoder and decoder check the hybrid motion vector condition for each motion vector of the interlaced P field, whether the motion vector is for a macroblock, block, etc. In other words, the encoder and decoder determine for each motion vector whether a condition is triggered and thus anticipates a predictor selection signal. Alternatively, the encoder and decoder check the hybrid motion vector condition for only some of the motion vectors of the interlaced P field.
An advantage of hybrid motion vector prediction for interlaced P fields is that it uses the calculated predictor and dominant polarity to select a good motion vector predictor. Numerous experimental results indicate that the hybrid motion vector provides significant compression/quality improvement as described below, which exceeds motion vector prediction without it, and also exceeds early implementations of hybrid motion vector prediction. Moreover, the additional computational cost of hybrid vector prediction inspection is not very large.
In some embodiments, the encoder or decoder tests the normal motion vector predictor (as determined by the techniques described in section vii.a) against the original set of candidate motion vector predictors. The normal motion vector predictor is the component-level median of predictors A, B and/or C, and the encoder and decoder test it against predictors a and C. The test checks whether the difference between the normal motion vector predictor and the candidate is high. If high, the true motion vector is likely to be closer to one of these candidate predictors (A, B or C) than to the predictor derived from the median operation. When candidate predictors are far apart, their component-level median does not provide a good prediction, and it is more efficient to send an additional signal indicating whether the true motion vector is closer to a or C. If predictor a is a closer predictor, it is used as the motion vector predictor of the current motion vector, and if predictor C is a closer predictor, it is used as the motion vector predictor of the current motion vector.
The pseudo code in fig. 32 shows such hybrid motion vector prediction during decoding. The variables predictor _ pre _ x and predictor _ pre _ y are horizontal and vertical motion vector predictors, respectively, as calculated using normal hybrid motion vector prediction. The variables predictor _ post _ x and predictor _ post _ y are the horizontal and vertical motion vector predictors, respectively, after the hybrid motion vector predictor. In pseudo code, the normal motion vector predictor is tested against predictors a and C to see if the motion vector predictor selection is explicitly encoded in the bitstream. If so, there is a single bit in the bitstream indicating whether predictor a or predictor C is used as the motion vector predictor. Otherwise, the normal motion vector predictor is used. Various other conditions may also be checked (e.g., if a or C is intra-coded, the magnitude of the normal motion vector is checked). When a or C is intra-coded, the motion corresponding to a or C, respectively, is considered to be zero.
For motion vectors of two reference fields, the P field, all predictors are of the same polarity. In some embodiments, the reference field polarity is determined by the dominant/non-dominant predictor polarity and the selector signal obtained during differential motion vector decoding. For example, if the opposite field predictor is used: predictor _ x, predictor _ y, predictor ra _ x, predictor ra _ y, predictor c _ x, and predictor c _ y are optional. If the same field polarity is used: predictor _ pred _ x, predictore _ x, predictra _ x, predictorra _ y, predictorC _ x, predictorC _ y, and predictorC _ y. For example, the values of opposifieldpred and samefieldpred are calculated as in the pseudo codes of fig. 52A to 52J or 61A to 61F. Fig. 53 illustrates an alternative pseudo code for hybrid motion vector prediction in a combined implementation (see section XII).
Alternatively, the encoder and decoder test different hybrid motion vector prediction conditions, e.g., conditions that take into account other and/or additional predictors, conditions that use different decision logic to detect motion discontinuities, and/or conditions that use different thresholds (other than 32) for changes.
A simple signal to choose between two candidate predictors (e.g., a and C) is a single bit per motion vector. Alternatively, the encoder and decoder use different signaling mechanisms, e.g., jointly signaling the selector bits along with other information such as motion vector data.
C. Coding technique
An encoder, such as the encoder (2000) of fig. 20, performs hybrid motion vector prediction during encoding of motion vectors for interlaced P fields. For example, the encoder performs the technique shown in fig. 33A on the motion vector of the current macroblock or block (3300).
The encoder determines (3310) a motion vector predictor for the motion vector. For example, the encoder determines the motion vector predictor using the techniques described in section vii.a. Alternatively, the encoder uses another technique to determine the motion vector predictor.
The encoder then checks (3320) the hybrid motion vector prediction condition of the motion vector predictor. For example, the encoder uses a technique that reflects the pseudo code on the decoder side shown in fig. 32. Alternatively, the encoder checks for different hybrid motion vector prediction conditions. (the corresponding decoder checks the hybrid motion vector prediction condition like the encoder, regardless of the condition, since the presence/absence of predictor signal information is implicitly derived by the encoder and the corresponding decoder.)
If the hybrid motion vector condition is not triggered ("no" path out of decision 3325), the encoder uses the initially determined motion vector predictor.
On the other hand, if the hybrid motion vector condition is triggered ("yes" path out of decision 3325), the encoder selects (3330) the hybrid motion vector predictor to use. For example, the encoder selects between the upper candidate predictor and the left candidate predictor, which are neighboring motion vectors. Alternatively, the encoder selects between other and/or additional predictors.
The encoder then signals (3340) the selected hybrid motion vector predictor. For example, the encoder transmits a single bit indicating whether the upper candidate predictor or the left candidate predictor is to be used as the motion vector predictor. Alternatively, the encoder uses another signal representation mechanism.
The encoder performs the technique on each motion vector of the interlaced P field, or only on certain motion vectors of the interlaced P field (e.g., depending on the macroblock type) (3300). For simplicity, fig. 33A does not show the various methods by which the techniques (3300) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
D. Decoding technique
A decoder, such as decoder (2100) of fig. 21, performs hybrid motion vector prediction during decoding of motion vectors for interlaced P fields. For example, the decoder performs the technique (3350) shown in fig. 33B on the motion vector of the current macroblock or block.
The decoder determines (3360) a motion vector predictor for the motion vector. For example, the decoder determines the motion vector predictor using the techniques described in section vii.a. Alternatively, the decoder determines the motion vector predictor using another technique.
The decoder then checks (3370) the hybrid motion vector prediction condition for the motion vector predictor. For example, the decoder follows the pseudo code shown in fig. 32. Alternatively, the decoder checks for different hybrid motion vector prediction conditions. (the decoder checks the same motion vector prediction conditions as the corresponding encoder, regardless of the conditions.)
If the hybrid motion vector condition is not triggered ("no" path out of decision 3375), the decoder uses the initially determined motion vector predictor.
On the other hand, if the hybrid motion vector condition is triggered ("yes" path out of decision 3375), the decoder receives and decodes (3380) a signal indicating the selected hybrid motion vector predictor. For example, the decoder takes a single bit indicating whether the upper candidate predictor or the left candidate predictor is to be used as the motion vector predictor. Alternatively, the decoder operates in conjunction with another signal representation mechanism.
The decoder then selects the hybrid motion vector predictor to use. For example, the decoder selects between the upper candidate predictor and the left candidate predictor, which are neighboring motion vectors. Alternatively, the decoder selects between other and/or additional predictors.
The decoder performs the technique (3350) on each motion vector of the interlaced P field, or on only some motion vectors of the interlaced P field (e.g., depending on the macroblock type). For simplicity, fig. 33B does not show the various ways in which the technique (3350) may be integrated with other aspects of the encoding and decoding. Various combined implementations are described in detail in section XII.
Motion vector block mode
In some embodiments, a macroblock may have multiple motion vectors. For example, a macroblock of a hybrid MV interlaced P field may have one motion vector, four motion vectors (one for each luminance block of the macroblock), or be intra-coded (no motion vectors). Similarly, a field-coded macroblock of an interlaced P field may have two motion vectors (one per field) or four motion vectors (two per field), and a frame-coded macroblock of an interlaced P field may have one motion vector or four motion vectors (one per luma block).
A 2MV or 4MV macroblock may be signaled as "skipped" if the macroblock has no associated motion vector data (e.g., difference) to signal. If so, the motion vector predictor is generally used as a motion vector for the macroblock. Alternatively, a macroblock may have non-zero motion vector data to signal for one motion vector but not for another motion vector (which has a (0, 0) motion vector difference). Signaling motion vector data consumes an inefficient number of bits for a 2MV or 4MV macroblock where at least one but not all motion vectors have a (0, 0) difference.
Thus, in some embodiments, the encoder and decoder use a signaling mechanism that effectively signals the presence or absence of motion vector data for a macroblock having multiple motion vectors. The motion vector coded block mode (or simply "motion vector block mode") of a macroblock indicates, on a motion vector by motion vector basis, which blocks, fields, half fields, etc. have motion vector data signaled in the bitstream, and which do not. Motion vector block mode jointly signals the mode of motion vector data for a macroblock, which allows the encoder and decoder to exploit the spatial correlation that exists between blocks. Furthermore, signaling the presence/absence of motion vector data with motion vector block mode provides a simple method of signaling this information in a manner separate from signaling the presence/absence of related transform coefficient data (such as by CBCPY elements).
A particular example of signaling described in this section and in the combined implementation of section XII signals the motion vector block pattern with variable length coded 2MVBP and 4MVBP syntax elements. Table selection of 2MVBP and 4MVBP are signaled by fixed length coded 2MVBPTAB and 4MVBPTAB, respectively. Alternatively, the encoder and decoder signal the motion vector block pattern using other and/or additional signals.
A. Motion vector block mode
The motion vector block mode indicates which motion vectors are "coded" and which are not "coded" for a macroblock having multiple motion vectors. A motion vector is encoded if its differential motion vector is non-zero (i.e., the motion vector to be signaled is different from its motion vector predictor). Otherwise, the motion vector is not encoded.
If the macroblock has four motion vectors, the motion vector block mode has 4 bits, one for each of the four motion vectors. The order of bits in the motion vector block mode follows the block order shown in fig. 34 for a 4MV macroblock of an interlaced P field or a 4MV frame coded macroblock of an interlaced frame. For a 4MV field coded macroblock of an interlaced frame, the bit order of the motion vector block pattern is the top left field motion vector, the top right field motion vector, the bottom left field motion vector, and the bottom right field motion vector.
If a macroblock has two motion vectors, the motion vector block mode has 2 bits, one for each of the two motion vectors. For 2MV field coded macroblocks of interlaced P frames, the bit order of the motion vector block mode is simply the top field motion vector and then the bottom field motion vector.
One of a plurality of different VLC tables may be used to signal a motion vector block mode element. For example, four different code tables for 4MVBP are shown in fig. 47J, and four different code tables for 2MVBP are shown in fig. 47K. The table selection is indicated by either 4MVBPTAB or 2MVBPTAB signaled at the image layer. Alternatively, the encoder and decoder use other and/or additional codes for signaling motion vector block mode information and table selection.
Additional rules are applied to determine which motion vectors to encode for the macroblocks of the interlaced P field of the 2 reference field. The "not coded" motion vector has a primary predictor as described in section VI. An "encoded" motion vector may have a zero value motion vector differential, but signal a non-primary predictor. Alternatively, the "encoded" motion vector may have a non-zero differential motion vector and signal a primary or non-primary predictor.
Alternatively, the encoder and decoder use the motion vector block mode for other and/or additional kinds of pictures, for other and/or additional kinds of macroblocks, for other and/or additional numbers of motion vectors, and/or with different bit positions.
B. Coding technique
An encoder, such as the encoder (2000) of fig. 20, encodes motion vector data of a macroblock using a motion vector block mode. For example, the encoder performs the technique (3500) shown in fig. 35A.
For a given macroblock having multiple motion vectors, the encoder determines (3510) a motion vector block mode for the macroblock. For example, the encoder determines four block modes for a 4MV macroblock in an interlaced P field or for a 4MV field-coded or frame-coded macroblock in an interlaced P frame. Alternatively, the encoder determines two motion vector block modes for 2MV field coded macroblocks in an interlaced P-frame. Alternatively, the encoder determines the motion vector block mode for other kinds of macroblocks and/or other numbers of motion vectors.
The encoder then signals (3520) the motion vector block mode. In general, the encoder signals the VLC of the motion vector block pattern using a code table such as shown in fig. 47J and 47K. Alternatively, the encoder uses another mechanism to signal the motion vector block mode.
If there is at least one motion vector for which motion vector data is to be signaled ("yes" path out of decision 3525), the encoder signals (3530) the motion vector data for the motion vector. For example, the encoder encodes the motion vector data as BLKMVDATA, TOPMVDATA, or bottmvdata elements using the techniques described in section IX. Alternatively, the encoder uses a different signal representation technique.
The encoder repeats (3525, 3530) the encoding of the motion vector data until there are no more motion vectors for which to signal the motion vector data (decision 3525 goes to the "no" path).
The encoder may select between multiple code tables to encode a motion vector block pattern (not shown in fig. 35A). For example, the encoder selects a code table for interlaced P fields or P frames, and then uses the table to encode the motion vector block mode for the macroblock in the picture. Alternatively, the encoder selects the code table on a more frequent, less frequent or aperiodic basis, or the encoder skips code table selection altogether (always using the same code table). Alternatively, the encoder may select the code table from the context information (so that signaling of code table selection is unnecessary). The code tables may be the tables shown in fig. 47J and 47K, other tables, and/or additional tables. For example, the encoder signals the selected code table in the bitstream with an FLC indicating the selected code table, with a VLC indicating the selected code table, or with a different signaling mechanism.
Alternatively, the encoder performs another technique to encode the motion vector data of the macroblock using motion vector block mode. For simplicity, fig. 35A does not show the various ways in which the techniques (3500) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
C. Decoding technique
A decoder, such as the decoder (2100) of fig. 21, receives and decodes motion vector data for an interlaced P field or interlaced P frame using a motion vector block mode. For example, the decoder performs the technique shown in fig. 3B (3550).
For a given macroblock having multiple motion vectors, the decoder receives and decodes (3560) the motion vector block pattern for the macroblock. For example, the decoder receives and decodes the 4 motion vector block mode, or other motion vector block mode described in the previous section. In general, a decoder receives a VLC of a motion vector block mode and decodes it using the code tables shown in fig. 47J and 47I. Alternatively, the decoder receives and decodes the motion vector block mode in conjunction with another signaling mechanism.
If there is at least one motion vector for which motion vector data is to be signaled ("yes" path out of decision 3565), the decoder receives and decodes (3570) the motion vector data for the motion vector. For example, the decoder receives and decodes motion vector data encoded as BLTMVDATA, TOPMVDATA, or bottmvdata elements using the techniques described in section IX. Alternatively, the decoder uses a different decoding technique.
The decoder repeats (3565, 3570) the receiving and decoding of the motion vector data until there are no more motion vectors for which to signal the motion vector data (decision 3565 taken as the "no" path).
The decoder may select between multiple code tables to decode the motion vector block mode (not shown in fig. 3513). For example, the table selection and table selection signal representation options reflect those techniques described for the encoder in the previous section.
Alternatively, the decoder performs another technique to decode the motion vector data of the macroblock using the motion vector block mode. For simplicity, fig. 35B does not illustrate various methods by which the techniques (3550) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
IX. motion vector differentiation in interlaced P-fields
In some embodiments, when performing motion compensated prediction for a single current interlaced P field, two previously encoded/decoded fields are used as reference fields. (see, e.g., sections IV, VI, and VII.) the information signaled for motion vectors in the P-field indicates: (1) which of the two fields provides a reference for the motion vector; and (2) motion vector values. The motion vector value is typically signaled as a difference with respect to the motion vector predictor. The choice between two possible reference fields can be signaled for the motion vector with a single additional bit, but in many cases this signaling is inefficient. In general, it is possible that two reference fields are not equal for a given motion vector, and the selection of a motion vector is not independent of the selection of other (e.g., neighboring) motion vectors. Thus, in practice, signaling the reference field selection with a single bit per selection is generally inefficient.
Thus, in some embodiments, the encoder jointly encodes the motion vector difference information and the reference field selection information. The decoder performs a corresponding decoding of the jointly encoded information.
A. Results of theory and experiment
For a 2 reference field interlaced P field, the two reference fields have the following spatial and temporal relationships with the P-field. The polarity of the temporally closest reference field is opposite to the polarity of the current P field. For example, if the current P field is an even field (consisting of even lines of an interlaced frame), the temporally closest reference field is an odd field, while the other reference field (temporally farther) is an even field.
The encoder and decoder use causal information to predict the reference field selection for the current motion vector. For example, reference field selection information from neighboring previously coded motion vectors is used to predict the reference field for the current motion vector. A binary value then indicates whether to use the predicted reference field. One value indicates that the actual reference field of the current motion vector is the predicted reference field and the other value indicates that the actual reference field of the current motion vector is the other reference field. In some implementations, the reference field prediction is expressed in terms of the polarity of the previously used reference field and the expected reference field for the current motion vector (e.g., as a dominant or non-dominant polarity, see section VI). In most cases, with such predictions, the probability distribution of the binary-valued reference field selector is uniform and skewed towards the predicted reference field. In the experiment, the predicted reference field is used for about 70% of the motion vectors, and about 30% of the motion vectors use the other reference field.
It is inefficient to transmit a single bit to signal the reference field selection information with such a probability distribution. A more efficient approach is to jointly encode the reference field selection information together with the differential motion vector information.
B. Examples of Signal representation mechanisms
Various examples of signaling mechanisms for joint encoding and decoding of motion vector difference information and reference field selection information are provided. Alternatively, the encoder and decoder incorporate another mechanism to jointly encode and decode the information.
The pseudo code in fig. 36 shows the joint encoding of motion vector difference information and reference field selection information according to a general signaling mechanism. In this pseudo code, the variables DMVX and DMVY are the horizontal and vertical differential motion vector components, respectively. The variables AX and AY are the absolute values of the differential components, while the variables SX and SY are the signs of the differential components. The horizontal motion vectors range from-RX to RX +1, while the vertical motion vectors range from-RY to RY + 1. RX and RY are powers of two, with the indices MX and MY, respectively. The variables ESCX and ESCY (which are powers of two with the exponents KX and KY, respectively) indicate a threshold beyond which an escape code is used. The variable R is a binary value selected with reference to the field.
When an escape condition is triggered (AX > ESCX or AY > ESCY), the encoder sends a VLC that jointly represents the escape mode signal and R. The encoder then sends DMVX and DMVY as fixed length codes of length MX +1 and MY +1, respectively. Thus, two elements in the VLC table are used to signal (1) the use of (MX + MY +2) bits together to encode DMVX and DMVY, and (2) the associated R value. In other words, two elements are escape codes corresponding to R ═ 0 and R ═ 1.
For other events, changes NX and NY indicate how many bits are to be used to signal different values of AX and AY, respectively. AX is in a time interval (2NX ≦ AX < 2NX +1), where NX ≦ 0, 1, 2,. KX-1, and AX ≦ 0 when NX ≦ 1. AXY in the time interval (2NY < AY < 2NY +1), where NY is 0, 1, 2,. KY-1 and AY is 0 when NY is-1.
The VLC table for the coding size information NX and NY and the field reference information R is a table having (KX +1) × (KY +1) × 2+1 elements, where each element is a (codeword, code size) pair. In the elements of the table, all but two are used to jointly signal the values of NX, NY, and R. These other two elements are escape codes.
For events signaled with NX and NY, the encoder sends a VLC to indicate the combination of NX, NY, and R values. The encoder then transmits AX as NX bits, SX as one bit, AY as NY bits, and SY as one bit. AX need not be transmitted if NX is 0 or-1, and so for NY and AY, since the value of AX or AY can be derived directly from NX or NY in these cases.
Events at AX-0, AY-0 and R-0 are signaled by another mechanism, such as a skip macroblock mechanism or motion vector block mode (see section VIII). The VLC table of the pseudo code in fig. 36 does not have a [0, 0, 0] element or is not processed in the pseudo code.
The corresponding decoder performs joint decoding, which reflects the encoding shown in fig. 36. For example, received bits are decoded instead of transmitted bits, variable length decoding is performed instead of variable length encoding, and so on.
The pseudo code in fig. 50 illustrates the decoding of motion vector difference information and reference field selection information that have been jointly encoded according to the signaling scheme in one combined implementation. The pseudo code in fig. 59 shows the decoding of motion vector difference information and reference field selection information that have been jointly encoded according to another combined signaling scheme. The pseudo code in fig. 50 and 59 is illustrated in detail in section XII. In particular, the pseudo code shows joint encoding and decoding of a prediction selector having a vertical differential value or having a magnitude of a vertical and horizontal differential value.
The corresponding encoder performs joint encoding, which reflects the decoding shown in fig. 50 or 59. For example, the encoder transmits bits instead of receiving bits, performs variable length encoding instead of variable length decoding, and so on.
C. Coding technique
An encoder, such as the encoder (2000) of fig. 20, jointly encodes the reference field prediction selector information and the differential motion vector information. For example, the encoder performs the technique (3700) shown in fig. 37A to jointly encode this information. Typically, the encoder performs some form of motion estimation in both reference fields to obtain motion vectors and reference fields. The motion vectors are then encoded according to a technique (3700) at which point one of the two possible reference fields is associated with the motion vectors by jointly encoding the selector information, for example together with the vertical motion vector difference.
The encoder determines (3710) a motion vector predictor for the motion vector. For example, the encoder determines the motion vector predictor as described in section VII. Alternatively, the encoder determines the motion vector predictor with another mechanism.
The encoder determines (3720) a motion vector difference for the motion vector with respect to the motion vector predictor. In general, the difference is the component level difference between the motion vector and the motion vector predictor.
The encoder also determines (3730) reference field prediction selector information. For example, the encoder determines the dominant and non-dominant polarity of the motion vector (and thus the dominant reference field, dominant polarity, etc. of the motion vector predictor, see section IV), in which case the selector indicates whether or not dominant polarity is used. Alternatively, the encoder uses a different technique to determine the reference field prediction selector information. For example, the encoder uses different types of reference field prediction.
The encoder then jointly encodes 3740 motion vector difference information and reference field prediction selector information along with the motion vectors. For example, the encoder encodes this information using the mechanism described in the previous section. Alternatively, the encoder uses another mechanism.
For simplicity, fig. 37A does not show various methods by which the technique (3700) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
D. Decoding technique
The jointly encoded reference field prediction selector information and differential motion vector information are decoded, such as the decoder (2100) of fig. 21. For example, the decoder performs the technique (3750) shown in fig. 37B to decode such jointly encoded information.
The decoder decodes (3760) the motion vector difference information and the reference field prediction selector information for jointly encoding the motion vectors. For example, the decoder decodes information signaled using one of the mechanisms described in section ix.b. Alternatively, the decoder decodes the information signaled using another mechanism.
The decoder then determines (3770) a motion vector predictor for the motion vector. For example, the decoder determines the dominant and non-dominant polarities of the motion vectors (see section VI), applies the selector information, and determines the motion vector predictor for the selected polarity as described in section VII. Alternatively, the decoder uses a different mechanism to determine the motion vector predictor. For example, the decoder uses different types of reference field prediction.
Finally, the decoder reconstructs (3750) a motion vector by combining the motion vector difference with the motion vector predictor.
For simplicity, fig. 37B does not illustrate various methods by which the techniques (3750) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
Deriving chroma motion vectors in interlaced P-fields
In some embodiments, the encoder and decoder derive chroma motion vectors from luma motion vectors signaled for macroblocks of interlaced P fields. The chrominance motion vectors are not explicitly signaled in the bitstream. Instead, they are determined from the luminance motion vectors of the macroblocks. The encoder and decoder may use chroma motion vector derivation for either progressive P-frames or interlaced P-frames, but this generally does not provide sufficient performance for interlaced P-fields. Thus, the encoder and decoder use chroma motion vector derivation that is appropriate for the reference field organization of interlaced P fields.
The chroma motion vector derivation has two stages: (1) selection, and (2) subsampling and chroma rounding. Of these stages, the selection stage is particularly suitable for chroma motion vector derivation in interlaced P fields. The output of the selection stage is the initial chrominance motion vector, which depends on the number (and possibly polarity) of luminance motion vectors of the macroblock. If no luminance motion is used for a macroblock (intra coded macroblock), no chrominance motion vector is derived. If a single luminance motion vector is used for a macroblock (1MV macroblock), a single luminance motion vector is selected for the second and third stages. If four luma motion vectors are used for a macroblock (4MV macroblock), the initial chroma motion vector is selected using logic that supports the most common polarity among the four luma motion vectors.
A. Chroma subsampling and motion vector representation
The chroma motion vector derivation for a macroblock of an interlaced P field depends on the chroma subsampling type for the macroblock and also on the motion vector representation.
Some common chroma subsampling formats are 4:2:0 and 4:1: 1. Fig. 38 shows a sampling grid of YUV 4:2:0 macroblocks, according to which chroma samples are sub-sampled in a regular 4:1 pattern with respect to luma samples. Fig. 38 shows the spatial relationship between luma and chroma samples for a 16x16 macroblock, where there are four 8x8 luma blocks, one 8x8 chroma "U" block, and one 8x8 chroma "V" block (such as shown in fig. 22). In general, the resolution of the chroma grid is half the resolution of the luma grid in both the x and y directions, which is the basis for down-sampling in chroma motion vector derivation. To scale the motion vector distance of the luminance grid to the corresponding distance of the chrominance grid, the motion vector value is divided by a factor of 2. The selection phase techniques described herein may be applied to YUV 4:2:0 macroblocks or to macroblocks having another chroma subsampling format.
The representation of the luminance and chrominance motion vectors for interlaced P fields depends in part on the motion vectors and the accuracy of the motion compensation. Typical motion vector accuracy is 1/2 pixels and 1/4 pixels, which work with 1/2 pixel and 1/4 pixel interpolation in motion compensation, respectively.
In some embodiments, the motion vectors for interlaced P fields may reference either the upper or lower reference fields, which are either the same or opposite polarity. The vertical displacement specified by the motion vector value depends on the polarity of the current P field and the reference field. The motion vector unit is generally expressed in units of field images. For example, if the vertical component of the motion vector is +6 (in 1/4 pixels), this typically indicates a vertical displacement of 1/2 field image lines (before a different polarity adjustment for the current P field or reference field, if necessary).
For various motion vector component values and field polarity combinations, FIG. 39 shows the corresponding spatial positions in the current and reference fields according to the first convention. Each combination of field polarities has a pair of columns, one (left column) for pixels of a row in the current field (numbered row N0, 1, 2, etc.), and the other (right column) for pixels of a reference row in the field (also numbered row N0, 1, 2, etc.). The circles represent samples at integer pixel positions and the xs represent interpolated samples at sub-pixel positions. Under this convention, a vertical motion vector component value of 0 refers to an integer pixel position (i.e., a sample on the actual line) in the reference field. If the current field and the reference field have the same polarity, then a vertical component value of 0 from line N of the current field references line N in the reference field, which is at the same actual offset of one frame. If the current field and the reference field have opposite polarities, then the vertical component value of 0 from line N in the current field still references line N in the reference frame, but the reference position is at the actual offset of 1/2 pixels of the frame due to the alternation of odd and even lines.
FIG. 48 illustrates the respective spatial positions of the current and reference fields according to the second convention. Under this convention, a vertical motion vector component value of 0 refers to samples at the same actual offset in an interlaced frame. The samples referenced are at integer pixel positions in the same polarity reference field or at 1/2 pixel positions in the opposite reference field.
Alternatively, the motion vectors of interlaced P fields use another representation and/or follow another convention for handling vertical displacement of polarity.
B. Selection phase example
In some embodiments, the selection stage of chroma motion vector derivation is adapted to the reference field mode used in motion vectors for interlaced P fields with one or two reference fields. For example, the result of the selection phase of a macroblock depends on the number and polarity of the luma motion vectors of the macroblock.
The simplest case is when the entire macroblock is intra coded. In this case, there is no chroma motion vector, and the second and third stages of chroma motion vector derivation are skipped. The chroma blocks of a macroblock are intra coded/decoded without motion compensation.
The next simplest case is when a macroblock has a single luma motion vector for all four luma blocks. Whether the current P field has one reference field or two reference fields, there is essentially no selection operation because a single luma motion vector is simply passed forward for rounding and subsampling.
The selection phase is more complex when the macroblock has up to four luma motion vectors. In general, the selection stage supports dominant polarity in the luma motion vectors of the macroblocks. If the P field has only one reference field, the polarity is the same as that of all luma motion vectors of the macroblock. However, if the P field has two reference fields, then different luma motion vectors for the macroblocks may point to different reference fields. For example, if the polarity of the current P field is odd, a macroblock may have two luma motion vectors of opposite polarity (refer to the dipole reference field) and two luma motion vectors of the same polarity (refer to the odd-polarity reference field). The encoder or decoder determines the dominant polarity of the luma motion vector for the macroblock and determines an initial chroma motion vector based on the luma motion vector of the dominant polarity.
In some embodiments, a 4MV macroblock has zero to four motion vectors. The luma blocks of such a 4MV macroblock are intra coded, either with associated luma motion vectors of the same polarity or with associated luma motion vectors of opposite polarity. In other implementations, a 4MV macroblock always has four luma motion vectors, even if some of them are not signaled (e.g., because they have a (0, 0) difference). The luminance blocks of such a 4MV macroblock have motion vectors of opposite polarity or motion vectors of the same polarity. The selection phase logic is slightly different for these different implementations.
1. 4MV macroblocks with 0-4 luma motion vectors
The pseudo code in fig. 40 shows one example of the selection phase logic, which applies to 4MV macroblocks with 0 and 4 luma motion vectors. In the luma motion vector, if the number of luma motion vectors referring to the same-polarity reference field is greater than the number of reference opposite-polarity reference fields, the encoder/decoder derives an initial chroma motion vector from the luma motion vector referring to the same-polarity reference field. Otherwise the encoder/decoder derives an initial chroma motion vector from the luma motion vector referencing the opposite polarity reference field.
If the four luma motion vectors have dominant polarity (e.g., all odd reference fields or all even reference fields), the encoder/decoder computes the median of the four luma motion vectors. If only three luma motion vectors have dominant polarity (e.g., because one luma block is intra-coded or has a motion vector of non-dominant polarity), the encoder/decoder calculates the median of the three luma motion vectors. If the two luminance motion vectors have dominant polarity, the encoder/decoder calculates the average of the two luminance motion vectors. (same polarity (as the current P field) is supported in case the same and opposite polarity counts are halved.) finally, if only one dominant polarity luma motion vector is present (e.g. because three luma blocks are intra coded), this luma motion vector is taken as the output of the selection phase. If all luminance blocks are intra-coded, then the macroblock is intra-coded and the pseudo-code of FIG. 40 is not applied.
2. 4MV macroblock with 4 luminance motion vectors
The pseudo code in fig. 55A and 55B shows another example of the selection stage logic, which applies to a 4MV macroblock that always has 4 luma motion vectors (e.g., because intra-coded luma blocks are not allowed.) fig. 55A handles chroma motion vector derivation for such 4MV macroblock in a 1 reference field interlaced P field, and fig. 55B handles chroma motion vector derivation for such 4MV macroblock in a 2 reference field interlaced P field.
Referring to fig. 55B, the encoder/decoder determines which polarity is dominant (e.g., odd or even) among the four luminance motion vectors of the 4MV macroblock. If all four luma motion vectors are from the same field (e.g., all odd or all even), then the median of the four luma motion vectors is determined. If three of the four are from the same field, the median of the three luminance motion vectors is determined. Finally, if there are two luma motion vectors for each polarity, two luma motion vectors with the same polarity as the current P field are supported and their average is determined. (if a 4MV macroblock always has four luma motion vectors, it is not possible that only one luma motion vector and no luma motion vector have dominant polarity.)
Alternatively, the encoder or decoder uses different selection logic in deriving the chroma motion vector from the plurality of luma motion vectors for macroblocks of an interlaced P field. Alternatively, the encoder or decoder considers luma motion vector polarity in the chroma motion vector derivation for another type of macroblock (e.g., macroblocks with a different number of luma motion vectors and/or in a different picture type than interlaced P fields).
C. Subsampling/rounding stage
For the second stage of chroma motion vector derivation, the encoder and decoder typically apply rounding logic to eliminate certain pixel positions from the initial chroma motion vector (e.g., to round to 3/4 pixel positions, so such chroma motion vectors do not indicate 1/4 pixel displacement after downsampling). The use of rounding can be adjusted to trade off between prediction quality and interpolation complexity. In the case of more aggressive rounding, for example, the encoder or decoder eliminates all 1/4 pixel chroma shifts in the resulting chroma motion vector, thus allowing only integer pixel and 1/2 pixel shifts, which simplifies interpolation in motion compensation of the chroma block.
In the second stage, the encoder and decoder also down-samples the initial chroma motion vector to obtain the chroma motion vector in a ratio suitable for the chroma resolution. For example, if the chrominance resolution is 1/2 of the luminance resolution in both the horizontal and vertical directions, the horizontal and vertical motion vector components are down-sampled by a factor of 2.
Alternatively, the encoder or decoder applies other and/or additional mechanisms for rounding, sub-sampling, pull-back (pullback), or other adjustment of the chroma motion vectors.
D. Export techniques
An encoder, such as the encoder (2000) of fig. 20, derives the chroma motion vectors for the macroblocks of the interlaced P field. Or a decoder such as the decoder (2100) of fig. 21 derives the chroma motion vectors for the macroblocks of the interlaced P field. For example, the encoder/decoder performs the technique (4100) shown in fig. 41 to derive the chroma motion vector.
The encoder/decoder determines (4110) whether the current macroblock is an intra-coded macroblock. If so, the encoder/decoder skips the motion vector derivation and instead uses motion compensation, intra coding/decoding, for the macroblock.
If the macroblock is not an intra macroblock, the encoder/decoder determines (4120) whether the macroblock is a 1MV macroblock. If so, the encoder/decoder uses the single luma motion vector of the macroblock as the initial chroma motion vector that is passed to an adjustment stage (4150) following the technique (4100).
If the macroblock is not a 1MV macroblock, the encoder/decoder determines (4130) the dominant polarity in the luma motion vector of the macroblock. For example, the encoder/decoder determines the dominant polarity in one or more luminance motion vectors of a macroblock, as described in fig. 40 or 55A and 55B. Alternatively, the encoder/decoder applies other and/or additional decision logic to determine the dominant polarity. If the P field containing the macroblock has only one reference field, the dominant polarity in the luma motion vector is simply the polarity of the reference field.
The encoder/decoder then determines (4140) an initial chroma motion vector from those luma motion vectors of macroblocks having a dominant polarity. For example, the encoder/decoder determines an initial chroma motion vector, as shown in fig. 40 or 55A and 55B. Alternatively, the encoder/decoder determines the initial chroma motion vector as a median, average, or other combination of the dominant polarity motion vectors using other and/or additional logic.
Finally, the encoder/decoder adjusts (4150) the initial chroma motion vector generated by one of the previous stages. For example, the encoder/decoder performs rounding and sub-sampling as described above. Alternatively, the encoder/decoder performs other and/or additional adjustments.
Alternatively, the encoder/decoder checks the various macroblock types and polarity conditions in a different order. Alternatively, the encoder/decoder derives chroma motion vectors for other and/or additional types of macroblocks of interlaced P fields or other types of pictures.
For simplicity, fig. 41 does not illustrate various ways in which the technique (4100) may be integrated with other aspects of the encoding and decoding. Various combined implementations are described in detail in section XII.
Intensity compensation of interlaced P fields
Fading, distortion, and color blending are widely used in the creation and editing of video content. These techniques smooth the visual evolution of the video as it passes through the content transitions. In addition, some video sequences include natural fading that changes due to lighting. For predicted images that are affected by fading, distortion, color mixing, etc., the global change in luminance reduces the effects of conventional motion estimation and compensation compared to the reference image. As a result, the motion compensated prediction is poor and the predicted image requires more bits to represent it. This problem is more complicated for interlaced P fields with one reference field or with multiple reference fields.
In some embodiments, the encoder and decoder perform fade compensation (also referred to as intensity compensation) on the reference fields of the interlaced P fields. The encoder performs a corresponding fade estimation. The fade estimation and compensation, and the signal representation mechanism for the fade compensation parameters, are adapted to the reference field organization of interlaced P fields. For example, for interlaced P fields with one reference field or two reference fields, a decision is made to perform fade compensation for each reference field separately. Each reference field using fade compensation may have its own fade compensation parameters. The signal representation mechanism and parameters used for fade compensation determination effectively represent this information. As a result, the quality of the interlaced video improves and/or the bit rate is reduced.
A. Fade estimation and compensation for reference fields
Fade compensation involves performing changes to one or more reference fields to compensate for fade, color mixing, distortion, and the like. Generally, fade compensation includes any compensation for fading (i.e., fading to or from black), color mixing, distortion, or other natural or synthetic lighting effects that affect the intensity of the pixel values. For example, a global brightness change may be represented as a change in brightness and/or contrast of a scene. Typically the changes are linear, but can also be defined as any smooth, non-linear mapping included within the same framework. The current P field is then predicted by motion estimation/compensation from the adjusted one or more reference fields.
For the reference field in the YUV color space, the adjustment occurs by adjusting the samples in the luminance and chrominance channels. The adjustment may include scaling and panning the luminance values and scaling and panning the chrominance values. Alternatively, the color spaces are different (e.g., YIQ or RGB), and/or the compensation uses other adjustment techniques.
The encoder/decoder performs fade estimation/compensation on a field-by-field basis. Alternatively, the encoder/decoder performs fade estimation/compensation on some other basis. Thus, the fade compensation adjustment affects a defined area, which may be a field or a portion of a field (e.g., a separate block or macroblock, or a group of macroblocks), and the fade compensation parameter is used for the defined area. Alternatively, the fade compensation parameters are applied for the entire field, but may alternatively be applied and applied when needed for the area within the field.
B. Reference field organization for interlaced P-fields
In some embodiments, interlaced P fields have one or two reference fields for motion compensation. (see, e.g., section IV.) fig. 24A-24F illustrate the locations of reference fields that may be used in motion compensated prediction of interlaced P fields. The encoder and decoder may use reference fields at other and/or additional locations or timings for motion compensated prediction of P fields. For example, reference fields within the same frame as the current P field are allowed. Alternatively, the upper or lower half of the frame may be encoded/decoded first.
For interlaced P fields with one or two reference fields for motion compensation, the P field has only one reference field. Alternatively, the P field may have two reference fields, and switch between the two reference fields for different motion vectors or on some other basis.
Alternatively, the P field has more reference fields and/or reference fields at different locations.
C. Encoder and decoder
FIG. 42 shows an exemplary encoder framework (4200) for performing intensity estimation and compensation for interlaced P fields with one or two reference fields. In this framework (4200), the encoder conditionally remaps the reference fields using parameters obtained by the fade estimation. When the encoder detects fading with a good degree of certainty and consistency across a field, the encoder performs remapping or fading compensation. Otherwise, fade compensation is the exact same operation (i.e., output-input).
Referring to fig. 42, the encoder compares the current P field (4210) with the first reference field (4220) using a fade detection module (4230) to determine whether a fade occurs between the fields (4220, 4210). The encoder independently compares the current P field (4210) to the second reference field (4225) using a fade detection module (4230) to determine whether a fade occurs between those fields (4225, 4210). The encoder generates one or more "fade on" or "fade off" signals based on the results of the fade detection (4240). The signal indicates whether fade compensation is to be used and, if so, whether fade compensation is to be used on the first, second or both of the reference fields (4220, 4225) only.
If the first reference field (4220) is to be fading compensated, a fading estimation module (4250) estimates a fading parameter (4260) of the first reference field (4220). (details of fade estimation are discussed below.) likewise, if the second reference field (4225) is to be fade compensated, the fade estimation module (4250) independently estimates the fade parameters for the second reference field.
The fade compensation module (4270, 4275) uses the fade parameters (4260) to remap one or both reference fields (4220). Although fig. 42 shows two fade compensation modules (4270, 4275) (one per reference field), alternatively, the encoder frame (4200) includes a single fade compensation module that operates on either reference field (4220, 4225).
Other encoder modules (4280) (e.g., motion estimation and compensation, frequency transformer and quantization modules) compress the current P field (4210). The encoder outputs motion vectors, residuals and other information (4290), which define the encoded P field frame (4210). In addition to motion estimation/compensation with shifted motion vectors, the framework (4200) can be applied in a wide variety of motion compensation based video codecs.
Fig. 43 shows an exemplary decoder framework (4300) that performs intensity compensation. The decoder generates decoded P fields (4310). To decode the encoded fade-compensated P field, the decoder performs fade compensation on one or two previously decoded reference fields (4320, 4325) using a fade compensation module (4370, 4375). Alternatively, the decoder framework (4300) includes a single fade compensation module that operates on any of the reference field frames (4320, 4325).
If the fade on/off signal (4340) indicates that fade compensation is used for the first reference field (4320) and the P field (4310), the decoder performs fade compensation on the first reference field (4320). Likewise, if the fade on/off signal (4340) indicates that fade compensation is to be used for the second reference field (4325) and the P field (4310), the decoder performs fade compensation on the second reference field (4325). The decoder performs fading compensation (as it is done in the encoder) using the respective fading parameter sets obtained during the fading estimation of the first and second reference fields (4320, 4325). If fade compensation is off, fade compensation is the exact same operation (i.e., output-input).
Other decoder modules (4360), such as motion compensation, inverse frequency transformer and inverse quantization modules, decompress the encoded P-field frame (4310) using the motion vectors, residuals and other information (4390) provided by the encoder.
D. Parameterization and compensation
Between the P-field and the first reference field and/or between the P-field and the second reference field, the parameter indicates fading, color mixing, distortion, or other changes. These parameters are then applied in the fade compensation.
In video editing, composite fades are sometimes achieved by applying simple pixel-level linear transforms to the luminance and chrominance channels. Also, cross-fading is sometimes implemented as a linear sum of two video sequences, where the components change over time. Thus, in some embodiments, fade or other intensity compensation adjustment is parameterized as a pixel-level linear transformation, while cross fade is parameterized as a linear sum.
Assume that I (n) is P field n and I (n-1) is a reference field. Where the motion is small, simple fading is simulated by a first order relationship in the following equation. The relationship in this equation is approximate due to possible motion in the video sequence.
I(n)≈C1I(n-1)+B1,
Wherein the fade parameters B1 and C1 correspond to the brightness and contrast changes, respectively, of the reference field. (parameters B2 and C2 correspond to brightness and contrast changes, respectively, for other reference fields.) when non-linear fading occurs, the first order component is generally the primary cause of most of the changes.
The cross fade from the image sequence U (n) to the image sequence V (n) can be modeled by the relationship in the following equation. Again, the relationship in this equation is approximate due to possible motion in the sequence.
Where n ≈ 0 indicates the start of cross-fade, and n ≈ 1/α indicates the end of cross-fade. For cross fade across several fields, α is small. At the beginning of the cross fade, the nth field is close to the attenuated (contrast < 1) version of the (n-1) th field. By the end, the nth field is an enlarged (contrast > 1) version of the (n-1) th field.
The encoder performs intensity compensation by remapping the reference fields. The encoder remaps the reference fields on a pixel-by-pixel basis or on some other basis. The original un-remapped reference fields are actually discarded (although in some implementations the un-remapped reference fields may still be used for motion compensation).
The following linear rule remaps the luminance values of the reference field R to the remapped reference field according to two parameters B1 and C1
The luminance values of the reference field are scaled (or "weighted") by the contrast value and shifted (i.e., by adding an offset) by the illumination luminance value. For chroma, the remapping follows the following rule:
where μ is the average of the chrominance values. In one embodiment, assume 128 is represented by an average of unsigned eight bits of chroma values. This rule for chroma remapping does not use the luma component. In some embodiments, the linear remapping of the two references is extended to higher order terms. For example, remapping the intensity values of R toThe quadratic equation of (a) is:
other embodiments use other remapping rules. In one category of such remapping rules, for non-linear fading, a non-linear mapping is used instead of a linear mapping.
The fade compensation may be applied to the reference field prior to motion compensation. Alternatively, it can be applied to the reference field when needed during motion compensation, e.g., only to those areas of the reference field that are actually referenced by the motion vectors.
E. Estimation of parameters
Estimation is the process of calculating the compensation parameters during encoding. An encoder, such as the encoder in the framework (4200) of fig. 42, calculates the brightness (B1, B2) and contrast (C1, C2) during encoding. Alternatively, such an encoder calculates other compensation parameters.
To speed up the estimation, the encoder considers and estimates the parameters for each residual error independently. Moreover, the encoder only analyzes the luminance channel. Alternatively, the encoder includes chroma in the analysis when more computational resources are available. For example, the encoder solves for C1 (or C2) in the luma and chroma remapping equations for the first reference field, not just luma, to make C1 (or C2) more robust.
Motion in the scene is ignored during the fade estimation. This is based on the following observations: (a) fading and cross-fading generally occur in static or low motion scenes, and (b) the effectiveness of intensity compensation in high motion scenes is very low. Alternatively, the encoder jointly solves for the fade compensation parameters and the motion information. The motion information is then used to further improve the accuracy of the fade compensation parameters at a later stage of the technique or at some other time. One approach to using motion information is to omit from the fade estimation calculation those portions of the reference frame where movement is detected.
Is used as a measure to determine the presence and parameters of discoloration. Alternatively, the encoder uses other or additional measures, such as mean square error or sum of mean square errors over the same error term, or encodesThe decoder uses different error terms.
The encoder may end the estimation when an exit condition, such as the condition described below, is satisfied. For another exit condition, the encoder checks whether the contrast parameter C1 (or C2) is near 1.0 at the beginning or at an intermediate stage of estimation (in one implementation,. 99 < C < 1.02), and if so, ends the technique.
The encoder starts the estimation by downsampling the current field and the selected reference field (first or second). In one implementation, the encoder downsamples by a factor of 4 in the horizontal and vertical directions. Alternatively, the encoder downsamples by another factor, or does not downsample at all.
The encoder then calculates ∑ abs (I)d(n)-Rd) The sum of absolute errors on the lower resolution versions id (n) and Rd of the current and reference fields. The absolute error sum measures the difference in value between the down-sampled current field and the down-sampled reference field. If the sum of absolute errors is less than a certain threshold (e.g., a predetermined difference measure), the encoder assumes that no fade has occurred and no fade compensation is used.
Otherwise, the encoder estimates the luma B1 (or B2) and contrast C1 (or C2) parameters. The first truncated estimates (cut estimates) are obtained by simulating id (n) with Rd of different parameter values. For example, the brightness and contrast parameters are obtained by linear regression over the whole down-sampled field. Alternatively, the encoder uses other forms of statistical analysis, such as total least squares, least squares median, etc., for more robust analysis. For example, the encoder minimizes the MSE or SSE of the error terms Id (n) -Rd. In some circumstances, MSE and SSE are not robust, so the encoder also tests the sum of absolute errors of the error terms. The encoder discards high error values at a particular point (this may be due to motion rather than fading).
The first truncation parameters are quantized and dequantized to ensure that they are within the allowable range and to test compliance. In some embodiments, these parameters are each quantized to 6 bits for typical eight bit depth imaging. B1 (or B2) takes integer values from-32 to 31 (represented as a signed six-bit integer). C1 (or C2) varied from 0.5 to 1.484375 in uniform steps of 0.015625(1/64), corresponding to 0 to 63 quantization values of C1 (or C2). Quantization is performed by rounding B1 (or B2) and C1 (or C2) to the most received valid dequantized value and choosing the appropriate binary index.
The encoder calculates the original bounded sum of absolute errors (SOrgBnd) and the remapped bounded sum of absolute errors (SRmpBnd). In some embodiments, the encoder uses a fitness analysis to calculate these sums. For a random or pseudo-random set of pixels at the original resolution, the encoder computes a remapped bounded sum of absolute errors ∑ babs (i (n) -C)fR-Bf) Where babs (x) min (abs (x), M) is used for some margin M, such as a multiple of the quantization parameter of the field being encoded. The limit M is higher when the quantization parameter is coarse and lower when the quantization parameter is fine. The encoder also accumulates the original, bounded sum of absolute errors ∑ babs (i (n) -R). If computational resources are available, the encoder can compute a bounded sum of errors over the entire field.
Based on the relative values of the original and remapped bounded sums of absolute errors, the encoder determines whether to use fade compensation. For example, in some embodiments, the encoder does not perform fade compensation unless the remapped bounded sum of absolute errors is less than or equal to some threshold percentage σ of the original bounded sum of absolute errors. In one implementation, σ ═ 95.
If fade compensation is used, the encoder recalculates the fade parameters, this time based on a linear regression between I (n) and R but at full resolution. To save computation time, the encoder may perform repeated linear regression over random or pseudo-random samples of the field. Again, the encoder may alternatively use other forms of statistical analysis (e.g., total least squares, least squares median, etc.) for more robust analysis.
In some implementations, the encoder allows a special case where the reconstructed value of C1 (or C2) is-1. This special case is signaled by the syntax element of C1 (or 2) equal to 0. In this "inversion" mode, the reference field is inverted first between translations by B1 (or B2), and the range of B1 (or B2) is 193 to 319 at uniform step 2. Alternatively, some or all of the fade compensation parameters may use another representation, or other and/or additional parameters.
F. Signal representation
At a higher layer, the signaled fade compensation information includes (1) compensation on/off information and (2) compensation parameters. The on/off information may further include: (a) whether fade compensation is allowed or disallowed globally (e.g., for the entire sequence); (b) if fade compensation is enabled, whether fade compensation is for a particular P field; and (c) which reference fields should be adjusted by fade compensation if fade compensation is used for a particular P field. When fade compensation is used for the reference field, fade compensation parameters to be applied are as follows.
1. Integrated on/off signal representation
At the sequence level, one bit indicates whether fade compensation is allowed for the sequence. If fade compensation is allowed, the latter elements indicate when and how it is performed. Alternatively, fade compensation is enabled/disabled at some other syntax level. Alternatively, fade compensation and skipping of the overall on/off signal representation is always allowed.
P field on/off signaling
If fade compensation is allowed, the one or more additional signals indicate when fade compensation is to be used. In a field of a typical interlaced video sequence, intensity compensation occurs very rarely. It is possible to signal the use of fade compensation for P fields by adding one bit per field (e.g. one bit signaled at field level). However, it is more economical to signal the use of fade compensation in conjunction with other information.
One option is to signal the use of fade compensation for the P fields jointly with the motion vector pattern (e.g., the number and configuration of motion vectors, sub-pixel interpolation scheme, etc.). For example, VLC jointly indicates activation of the least common motion vector pattern and fade compensation for P-fields. For additional details, see U.S. patent application publication No. 2003-0206593-A1 entitled "Fading Estimation/Compensation". Alternatively, the use/non-use of fade compensation for P fields is signaled along with motion vector mode information, as described in several combined implementations below. See section XII, MVMODE and MVMODE2 elements. Alternatively, another mechanism for signaling the P-field fade compensation on/off information is used.
3. Reference field on/off signal representation
If fade compensation is used for the P field, there are several options for the reference field to be used to undergo fade compensation. When a P field uses fade compensation and has two reference fields, there are three cases. Performing fade compensation for: (1) two reference fields; (2) only the first reference field (e.g., the temporally second most recent reference field); or (3) only the second reference field (e.g., the temporally closest reference field). The fade compensated reference field mode information may be signaled as an FLC or VLC for each P field. The table in fig. 44 shows a set of VLCs for the mode information of the element intcomp field, which is signaled in the P field header. Alternatively, the table in fig. 47G or other tables are used at the field level or another syntax level.
In some implementations, the fade compensated reference field pattern is signaled for all P fields. Alternatively, for the 1 reference field P field using fade compensation, the signal representation of the reference field pattern is skipped, since fade compensation is automatically applied to a single reference field.
4. Fade compensation parameter signal representation
If fade compensation is used for the reference field, the fade compensation parameters for the reference field are signaled. For example, a first set of fade compensation parameters is present in the header of the P field. If fade compensation is used for only one reference field, then the first set of parameters is used for that reference field. However, if fade compensation is used for both reference fields of the P field, then the first set of parameters is used for one reference field and the second set of fade compensation parameters is present in the fade compensated header for the other reference field.
For example, each set of fade compensation parameters includes a contrast parameter and a brightness parameter. In one combined implementation, the first set of parameters includes the LUMSCALE1 and LUMSHIFT1 elements that are present in the P-field header when signaling intensity compensation for the P-field. If INTCOMPFIELD indicates that two reference fields or only the second-most recent reference field uses fade compensation, LUMSCALE1 and LUMSHIFT1 are applied to the second-most recent reference field. Otherwise (intcompoeld indicates that only the most recent reference field uses fading compensation), LUMSCALE1 and LUMSHIFT1 are applied to the most recent reference field. When intensity compensation is signaled for the P-field and INTCOMPFIELD indicates that fading compensation is used for both reference fields, a second set of parameters, including LUMSCALE2 and LUMSHIFT2 elements, is present at the P-field header. LUMSCALE2 and LUMSHIFT2 are applied to the closer reference field.
LUMSHIFT1, LUMSCALE1, LUMSHIFT2 and LUMSCALE2 correspond to parameters B1, C1, B2 and C2. LUMSCALE1, LUMSCALE2, LUMSHIFT1, and LUMSHIFT2 were each signaled using a 6-bit FLC. Alternatively, these parameters are signaled using VLC. Fig. 56 shows pseudo code for performing fade compensation on the first reference field based on LUMSHIFT1 and LUMSCALE 1. A similar process is performed for fade compensation on the second reference field based on LUMSHIFT2 and LUMSCALE 2.
Alternatively, the fade compensation parameter has a different representation and/or is signaled by a different signaling mechanism.
G. Estimation and signal representation techniques
An encoder, such as the encoder (2000) of fig. 20 or the encoder in the framework (4200) of fig. 42, performs fade estimation and corresponding signal representation for interlaced P fields with two reference fields. For example, the encoder performs the technique shown in fig. 45A (4500).
The encoder performs fade detection (4510) on a first of two reference fields of the P field. If fade is detected ("yes" path out of decision 4512), the encoder performs fade estimation for the P field relative to the first reference field (4514), which produces fade compensation parameters for the first reference field. The encoder also performs fade detection (4520) on the second of the two reference fields of the P field. If fade is detected ("yes" path out of decision 4522), the encoder performs fade estimation for the P field relative to the second reference field (4524), which produces fade compensation parameters for the second reference field. For example, the encoder performs fade detection and estimation, as described in the section entitled "estimation of fade parameters". Alternatively, the encoder uses a different technique to detect fade and/or obtain fade compensation parameters. If the current P field has only one reference field, the operation of the second reference field can be skipped.
The encoder signals (4530) whether fade compensation is on or off for the P field. For example, the encoder jointly encodes the information together with the motion vector mode information of the P field. Alternatively, the encoder uses other and/or additional signals to indicate whether fade compensation is on or off for the P-fields. If fade compensation is not on for the current P field ("NO" path out of decision 4532), then the technique (4500) ends.
Otherwise ("yes" path out of decision 4532), the encoder signals (4540) the fade-compensated reference field pattern. For example, the encoder signals a VLC that indicates fade compensation for both reference fields, only the first reference field, or only the second reference field. Alternatively, the encoder uses another signaling mechanism (e.g., FLC) to indicate the reference field mode. In this path, the encoder also signals (4542) the first and/or second set of fade compensation parameters calculated in the fade estimation. For example, the encoder uses a signal representation as described in section xi.f. Alternatively, the encoder uses other signal representations.
Although the encoder also typically performs fading compensation, motion estimation, and motion compensation, fig. 45A does not show these operations for simplicity. Also, fade estimation may be performed prior to or concurrently with motion estimation. Fig. 45A does not illustrate various methods by which the techniques (4500) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
H. Decoding and compensation techniques
A decoder, such as the decoder (2100) of fig. 21 or the decoder in the framework (4300) of fig. 43, performs decoding and fade compensation for interlaced P fields with two reference fields. For example, the decoder performs the technique shown in fig. 45B (4550).
The decoder receives and decodes (4560) one or more signals indicating whether fade compensation is on or off for the P field. For example, information is jointly encoded together with motion vector mode information for P fields. Alternatively, the decoder receives and decodes other and/or additional signals indicating whether the fade compensation is on or off for the P-field. If fade compensation is not on for the P field frame ("NO" path out of decision 4562), then the technique (4550) ends.
Otherwise ("yes" path out of decision 4562), the decoder receives and decodes (4570) the fade-compensated reference field pattern. For example, the decoder receives and decodes VLCs indicating fade compensation for two reference fields, only the first reference field, or only the second reference field. Alternatively, the decoder operates in conjunction with another signal representation mechanism (e.g., FLC) to determine the reference field mode.
In this path, the decoder also receives and decodes (4572) a first set of fade compensation parameters. For example, the decoder works with the signal representation as described in section xi.f. Alternatively, the decoder works with other signal representations.
If fade compensation is performed for only one of the two reference fields (decision 4575 taken as the "no" path), then the first set of parameters is used for either the first or second reference field, as indicated by the reference field mode. The decoder performs fade compensation (4592) on the indicated reference field with the first set of fade compensation parameters, and the technique (4500) ends.
Otherwise, fade compensation is performed for both reference fields ("yes" path out of decision 4575), and the decoder receives and decodes (4580) a second set of fade compensation parameters. For example, the decoder works with the signal representation as described in section xi.f. Alternatively, the decoder works with other signal representations. In this case, a first set of parameters is used for one of the two reference fields and a second set of parameters is used for the other. The decoder performs fading compensation (4592) on one reference field with the first set of parameters and performs fading compensation (4582) on another reference field with the second set of parameters.
For simplicity, fig. 45B does not show the various methods by which the techniques (4550) may be integrated with other aspects of encoding and decoding. Various combined implementations are described in detail in section XII.
XII Combined realization
A detailed combinatorial implementation of the bitstream syntax will now be described, with emphasis on interlaced P fields. The following description includes a first combined implementation and an alternate second combined implementation. In addition, U.S. patent application serial No. 10/857,473, filed on july 27 of 2004, discloses aspects of a third combined implementation.
Although emphasis is placed on interlaced P fields, in various locations of this section, we look at the applicability of syntax elements, semantics, and decoding to other picture types (e.g., interlaced P and B frames, interlaced I, BI, PI, and B fields).
A. Sequence and semantics in a first combined implementation
In a first combined implementation, the compressed video sequence is composed of data structured into hierarchical layers: image layer, macroblock layer and block layer. The sequence layer precedes the sequence and the entry point layers may be interspersed in the sequence. Fig. 46A to 46E show bit stream elements constituting various layers.
1. Sequence level syntax and semantics
The sequence-level header contains sequence-level parameters for decoding the sequence of compressed images. In some profiles (profiles), sequence-related metadata is transmitted to the decoder through the transport layer or other means. However, for a profile with interlaced P fields (a high level profile), this header syntax is part of the video data bitstream.
Fig. 46A shows syntax elements constituting a sequence header of the high-level profile. The PROFILE (4601) and LEVEL (4602) elements specify the PROFILE for the encoding sequence and the encoding LEVEL in the PROFILE, respectively. The INTERLACE (4603) element of the interlaced P field of particular interest is a 1-bit syntax element that signals whether the source content is progressive (INTERLACE ═ 0) or interlaced (INTERLACE ═ 1). The independent frame may still be encoded using progressive or interlaced syntax at INTERLACE ═ 1.
2. Entry point layer syntax and semantics
The entry point header exists in the high level profile. The entry point serves two purposes. First, it is used to signal a random access point within a bitstream. A primary term for signaling a change in an encoding control parameter.
Fig. 46B illustrates syntax elements constituting an entry point layer. The reference frame distance FLAG reflast _ FLAG (4611) element of the interlaced P field of particular interest is a 1-bit syntax element. Reflast _ FLAG ═ 1 indicates that a reflast (4624) element is present in the I/I, I/P, P/I or P/P field picture header. Reflast _ FLAG ═ 0 indicates that the reflast (4624) element is not present in the I/I, I/P, P/I or P/P field picture header.
The EXTENDED motion vector flag EXTENDED _ MV (4612) element is a 1-bit element that indicates whether the EXTENDED motion vector capability is on (EXTENDED _ MV 1) or off (EXTENDED _ MV 0). The EXTENDED differential motion vector range flag EXTENDED _ DMV (4613) element is a 1-bit element, which exists if EXTENDED _ MV is 1. If EXTENDED _ DMV is 1, the motion vector difference in the EXTENDED differential motion vector range is signaled at the image level in the entry point segment. If EXTENDED _ DMV is 0, the motion vector difference in the EXTENDED differential motion vector range is not signaled. The extended differential motion vector range is an option for interlacing P and B pictures, including interlaced P fields and P frames and interlaced B fields and B frames.
3. Image level syntax and semantics
The data of a picture consists of a picture header followed by macroblock layer data. Fig. 46C shows bit stream elements constituting the frame header of an interlaced field image. In the following description, emphasis is placed on elements used with interlaced P fields, but the header shown in fig. 46C is applicable to various combinations of interlaced I, P, B and BI fields.
The frame coding mode FCM (4621) element exists only in the advanced profile and only when the sequence layer INTERLACE (4603) has a value of 1. The FCM (4621) indicates whether a picture is encoded as a progressive, interlaced field, or interlaced frame. The table in fig. 47A includes VLC for indicating the picture coding type and FCM.
A field picture type FPTYPE (4622) element is a 3-bit syntax element, which is present in a picture header of an interlaced field picture. FPTYPE is decoded according to the table in fig. 47B. As shown in the table, an interlaced frame may include two interlaced I fields, one interlaced I field and one interlaced P field, two interlaced P fields, two interlaced B fields, one interlaced B field and one interlaced BI field, or two interlaced BI fields.
The first TFF (4623) element in the top half frame is a 1-bit element, which is present in the advanced profile picture header if the sequence header element PULLDOWN is 1 and the sequence header element INTERLACE is 1. TFF 1 means that the upper field is the first decoded field. If TFF is 0, the lower field is the first decoded field.
The P-reference distance reflast (4624) element is a variable size syntax element, present in the interlaced field picture header if the entry level FLAG reflast _ FLAG is 1 and if the picture type is not BB, BBI, BI/B, BI/BI. If reflast _ FLAG is 0, reflast (4624) is set to default 0. The reflast (4624) indicates the number of frames between the current frame and the reference frame. The table in fig. 47C includes VLC for the reflast (4624) value. The last row in the table indicates the code words used to represent reference frame distances greater than 2. These codes are (binary) 11 followed by N-3 1's, where N is the reference frame distance. The last bit in the codeword is 0. The value of reflast (4624) is less than or equal to 16. For example:
N is 3, VLC codeword is 110, VLC size is 3,
n-4, VLC codeword 1110, VLC size 4, and
n is 5, VLC code word is 11110, and VLC size is 5.
The field picture layer FIELDPICLAYER (4625) element is data for one of the interlaced fields of an interlaced frame. If the interlaced frame is a P/P frame (FPTYPE 011), the bitstream includes two FIELDPICLAYER (4625) elements for two interlaced P fields. Fig. 46D shows bit stream elements constituting the field picture header of the interlaced P field picture.
The reference picture number NUMREF (4631) element is a 1-bit syntax element present in the interlaced P-half frame header. It indicates that the interlaced P field has 1(NUMREF ═ 0) or 2(NUMREF ═ 1) reference pictures. The reference field picture indicator REFFIELD (4632) is a 1-bit syntax element that exists in an interlaced P field if NUMREF is 0. Which indicates which of the two possible reference pictures is used by the interlaced P field.
The extended MV range flag MVRANGE (4633) is a variable size syntax element that generally indicates the extended range of motion vectors (i.e., the longer possible horizontal and/or vertical displacement of the motion vectors). The EXTENDED differential MV range flag DMVRANGE (4634) is a variable size syntax element, which exists if EXTENDED _ DMV is 1. The table in fig. 47D is for DMVRANGE (4634) element. Two MVRANGEs (4633) and DMVRANGE (4634) are used to decode the motion vector differential, and the extended differential motion vector range is an option for interlaced P fields, interlaced frames, interlaced B fields, and interlaced B frames.
The motion vector mode MVMODE (4635) element is a variable size syntax element that signals one of four motion vector coding modes or an intensity compensation mode. The motion vector coding mode comprises three "1 MV" modes with different sub-pixel interpolation rules for motion compensation. 1MV means that each macroblock in a picture has at least one motion vector. In "hybrid MV" mode, each macroblock in a picture may have one or four motion vectors, or be skipped. One of the tables shown in fig. 47E is used for the MVMODE (4635) element depending on the value of PQUANT (quantization coefficient of image).
The motion vector mode 2MVMODE2(4636) element is a variable size syntax element that is present in the interlaced P-half frame header if MVMODE (4635) signals intensity compensation. Depending on the value of PQUANT, any of the tables shown in fig. 47F is used for MVMODE (4635) elements.
The intensity compensated field intcomp field (4637) is a variable size syntax element stored in the header of an interlaced P-field picture. As shown in the table of fig. 47G, INTCOMPFIELD (4637) is used to indicate which reference field(s) are intensity compensated. INTCOMPFIELD (4637) is present even if NUMREF ═ 0. The field image luminance scale1 LUMSCALE1(4638), field image luminance shift 1LUMSHIFT1(4639), field image luminance scale2 LUMSCALE2(4640), and field image luminance shift2 LUMSHIFT2(4641) elements are each a 6-bit value in intensity compensation. If MVMODE (4635) signals intensity compensation, LUMSCALE1(4638) and LUMSHIFT1(4639) elements are present. If the INTCOMPFIELD (4637) element is "1" or "00", LUMSCALE1(4638) and LUMSHIFT1(4639) are applied to the top half-frame. Otherwise, LUMSCALE1(4638) and LUMSHIFT1(4639) are applied to the next field. If MVMODE (4635) signals intensity compensation and the INTCOMPFILED (4637) element is "1", LUMSCALE2(4640) and LUMSHIFT2(4641) elements are present. LUMSCALE2(4640) and LUMSHIFT2(4641) were applied to the next half frame.
The macroblock mode table mbmode ab (4642) element is a fixed length field with a 3-bit value for the interlaced P field header. MBMODE ab (4642) indicates which of the eight code tables (e.g., tables 0 through 7 as specified with a 3-bit value) is used to encode/decode macroblock mode MBMODE (4661) syntax elements in the macroblock layer. There are two sets of eight code tables and the set used depends on whether a 4MV macroblock is possible in the picture. Fig. 47H shows eight tables of MBMODE (4661) in interlaced P fields that can be used for the hybrid MV mode. Fig. 47I shows eight tables of MBMODE (4661) in interlaced P fields that can be used for 1MV mode.
The motion vector table MVTAB (4643) elements are fixed-length fields. For interlaced P fields where NUMREF ═ 0, MVTAB (4643) is a 2-bit syntax element that indicates which of the four code tables (e.g., tables 0 through 3 specified with a 2-bit value) is used to decode the motion vector data. For interlaced P fields where NUMREF ═ 1, MVTAB (4643) is a 3-bit syntax element that indicates which of the eight code tables (e.g., tables 0 through 7 specified with a three-bit value) is used to encode/decode motion vector data.
In the interlaced P half frame header, if MVMODE (4635) (or MVMODE2(4636), if MVMODE (4635) is set to intensity compensation) indicates that the image is of hybrid MV type, then the 4MV block mode table 4MVBPTAB (4644) element is a 2-bit value. The 4MVBPTAB (4644) syntax element signals which of the four tables (e.g., tables 0 through 3 specified with 2-bit values) is used for the 4MV block mode 4MVBP (4664) syntax element in the 4MV macroblock. FIG. 47J shows four tables that may be used for 4MVBP (4664).
The interlaced P frame header (not shown) has many of the same elements as the field encoded interlaced frame header shown in fig. 46C and the interlaced P field header shown in fig. 46D. These include FCM (4621), MVRANGE (4633), DMVRANGE (4634), mbmode ab (4642), and MVTAB (4643), although the exact syntax and semantics of interlaced P frames may differ from interlaced P fields. The interlaced frame header also includes different elements of the picture type, switching between 1MV and 4MV modes, and intensity compensation signal representation.
Since an interlaced P frame may include field coded macroblocks with two motion vectors per macroblock, the interlaced frame header includes a 2 motion vector block mode table 2MVBPTAB element. 2MVBPTAB is two 2-bit values present in an interlaced P-frame. This syntax element signals which of the four tables (tables 0 to 3 specified with two bit values) is used to decode the 2MV block mode (2MVBP) element in the 2MV field coded macroblock. Fig. 47K shows four tables that may be used for 2 MVBP.
Interlaced B fields and interlaced B frames have many of the same elements of interlaced P fields and interlaced frames. In particular, the interlaced B field may include a 4MVBPTAB (4644) syntax element. Interlaced B frames include two syntax elements, 2MVBPTAB and 4MVBPTAB (4644), although the semantics of these elements may be different.
4. Macroblock layer syntax and semantics
The data of a macroblock consists of a macroblock header followed by a block layer. Fig. 46E shows the macroblock layer structure of the interlaced P field.
The macroblock mode MBMODE (4661) element is a variable size element. It jointly indicates information such as the number of motion vectors for a macroblock (1MV, 4MV or intra coded), whether there are coded block mode CBPCY (4662) elements for the macroblock, and (in some cases) whether there is motion vector differential data for the macroblock. Fig. 47H and 47I show tables of MBMODE (4661) that can be used for interlaced P fields.
The motion vector data MVDATA (4663) element is a variable size element that encodes the motion vector information (e.g., horizontal and vertical differences) for the motion vector. For interlaced P fields with two reference fields, MVDATA (4663) also encodes information for selecting between multiple possible motion vector predictors for the motion vector.
The four motion vector block mode 4MVBP (4664) elements are variable size syntax elements that may be present in macroblocks of interlaced P fields, B fields, frames, and B frames. In a macroblock of interlaced P field, B field, and P frame, if MBMODE (4661) indicates that the macroblock has 4 motion vectors, then a 4MVBP (4664) element exists. In this case, 4MVBP (4664) indicates which of the 4 luma blocks contains a non-zero motion vector differential.
In a macroblock of an interlaced B frame, if MBMODE (4661) indicates that the macroblock contains 2 field motion vectors and if the macroblock is an interpolated macroblock, then 4MVBP (4664) exists. In this case, the 4MVBP (4664) indicates which of the four motion vectors (the upper and lower field forward motion vectors and the upper and lower field backward motion vectors) is present.
Two motion vector block pattern 2MVBP elements (not shown) are variable size syntax elements, which are present in interlaced P and B frames. In an interlaced P frame macroblock, 2MVBP exists if MBMODE (4661) indicates that the macroblock has 2 field motion vectors. In this case, 2MVBP indicates which of the 2 fields (up and down) contains a non-zero motion vector differential. In an interlaced B frame macroblock, a 2MVBP exists if the macroblock contains 1 motion vector and the macroblock is an interpolated macroblock. In this case, the 2MVBP indicates which of two motion vectors (forward and backward motion vectors) exists.
The block-level motion vector data BLKMVDATA (4665) element is a variable size element that may be present in some cases. It contains motion information for the blocks of the macroblock.
The hybrid motion vector prediction, HYBRIDPED (4666) element is a 1-bit per motion vector syntax element that may be present in a macroblock of an interlaced P field. When hybrid motion vector prediction is used, hybrid pred (4666) indicates which of the two motion vector predictors to use.
5. Block level syntax and semantics
The block layer of an interlaced image follows the syntax and semantics of the block layer of a progressive image. Typically, the information of the DC and AC coefficients of a block or sub-block is signaled at the block level.
B. Decoding in a first combined implementation
When the video sequence consists of interlaced video frames or includes a mix of interlaced and progressive frames, the FCM (4621) element indicates whether a given picture is encoded as a progressive frame, an interlaced field, or an interlaced frame. For frames encoded as interlaced fields, FPTYPE (4622) indicates that the frame includes two interlaced I fields, one interlaced I field and one interlaced P field, two interlaced P fields, two interlaced B fields, one interlaced B field and one interlaced BI field, or two interlaced BI fields. The decoding of interlaced fields is as follows. The following sections focus on the decoding process for interlaced P fields.
1. Reference for interlaced P-field decoding
Interlaced P fields may refer to one or two previously decoded fields in motion compensation. The NUMREF (4631) element indicates whether the current P field can reference one or two previous reference fields. If NUMEREF is 0, only one field can be referenced by the current P field. In this case, a reffeld (4632) element follows in the bitstream. REFFIELD (4632) indicates which decoded field is used as a reference. If REFFIELD is 0, the temporally nearest (in display order) I field or P-field is used as a reference. If REFFIELD is 1, the second temporally nearest I field or P field is used as a reference. If NUMREF is 1, the current P field uses the two temporally closest (in display order) I or P fields as references. The examples of reference field images of NUMREF 0 and NUMREF 1 shown in fig. 24A-24F as described above apply to the first combined implementation.
2. Image type
Interlaced P fields can be one of two types: 1MV or mixed MV. In a 1MVP field, each macroblock is a 1MV macroblock. In a hybrid MVP field, each macroblock can be coded as a 1MV or 4MV macroblock, as indicated by the MBMODE (4661) on each macroblock. The 1MV or hybrid MV mode is signaled for interlaced P fields by MVMODE (4635) or MVMODE2(4636) elements.
3. Macroblock mode
The macroblocks in an interlaced P field can be one of 3 possible types: 1MV, 4MV, and intra. The MBMODE (4661) element indicates the macroblock type (1MV, 4MV, or intra), and also indicates the presence of CBP and MV data. Depending on whether the MVMODE (4635)/MVMODE2(4636) syntax elements indicate whether the interlaced P field is a mixed MV or a full 1MV, the MVMODE (4661) signals the information as follows.
The table in fig. 26 shows how MBMODE (4661) signals information about macroblocks in all 1MVP fields. As shown in fig. 47I, one of 8 tables is used for MBMODE (4661) for encoding/decoding 1MVP fields. The table in fig. 27 shows that MBMODE (4661) signals information about the macroblocks in the hybrid MVP field. As shown in fig. 47H, one of 8 tables is used for MBMODE (4661) for encoding/decoding a hybrid MVP field.
Thus, 1MV macroblocks can appear in 1MV and mixed MV interlaced P fields. In a 1MV macroblock, a single motion vector represents the displacement between the current and reference pictures for all 6 blocks in the macroblock. For a 1MV macroblock, the MBMODE (4661) element indicates the following three items: (1) the macroblock type is 1 MV; (2) whether a CBPCY (4662) element exists for the macroblock; and (3) whether there is an MVDATA (4663) element for the macroblock.
If the MBMODE (4661) element indicates that the CBPCY (4662) element is present, the CBPCY (4662) element is present in the macroblock layer of the corresponding location. CBPCY (4662) indicates which of the 6 blocks is encoded in the block layer. If the MBMODE (4661) element indicates that CBPCY (4662) is not present, CBPCY (4662) assumes equal to 0 and there is no block data for any of the 6 blocks in the macroblock.
If the MBMODE (4661) element indicates that an MVDATA (4663) element is present, the MVDATA (4663) element is present in the macroblock layer in the corresponding location. The MVDATA (4663) element encodes the motion vector differential, which is combined with the motion vector predictor to reconstruct the motion vector. If the MBMODE (4661) element indicates that an MVDATA (4663) element is not present, the motion vector differential is assumed to be zero and thus the motion vector is equal to the motion vector predictor.
The 4MV macroblock appears in the hybrid MVP field. In a 4MV macroblock, each of the 4 luma blocks in the macroblock may have an associated motion vector that indicates the displacement between the current and reference pictures of the block. The displacement of the chrominance block is derived from the 4 luminance motion vectors. The difference between the current and reference blocks is encoded in the block layer. For a 4MV macroblock, the MBMODE (4661) element indicates the following two terms: (1) the macroblock type is 4 MV; and (2) whether the CBPCY (4662) element is present.
Intra-coded macroblocks may appear in 1MV or hybrid MVP fields. In an intra-coded macroblock, all six blocks are coded without reference to any previous image data. For intra-coded macroblocks, the MBMODE (4661) element indicates the following two items: (1) the macroblock type is intra-coded; and (2) whether the CBPCY (4662) element is present. For intra coded macroblocks, when a CBPCY (4662) element is present, it indicates which of the 6 blocks has AC coefficient data encoded in the block layer. In all cases, the DC coefficient still exists for each block.
4. Motion vector block mode
The 4MVBP (4664) element indicates which of the 4 luma blocks contains a non-zero motion vector differential. The 4MVBP (4664) is decoded to a value between 0 and 15, which when represented as a binary value, represents a 1-bit syntax element that indicates whether a motion vector for the corresponding luma block is present. The table in fig. 34 shows the association of luminance blocks with bits in 4MVBP (4664). As shown in fig. 47J, one of the 4 tables is used for encoding/decoding 4MVBP (4664).
For each of the 4 bit positions in 4MVBP (4664), a value of 0 indicates that there is no motion vector differential (in BLKMVDATA) for the block in the corresponding position, and the motion vector differential is assumed to be 0. A value of 1 indicates that there is a motion vector differential (in BLKMVDATA) for the block in the corresponding position. For example, if 4MVBP (4664) decodes to binary value 1100, the bitstream contains BLKMVDATA (4665) for blocks 0 and 1, and blocks 2 and 3 do not have BLKMVDATA (4665). The 4MVBP (4664) is similarly used to indicate the presence/absence of motion vector difference information of 4MV macroblocks in interlaced B fields and interlaced P frames.
A field coded macroblock in an interlaced P frame or interlaced B frame may include 2 motion vectors. In the case of 2 field MV macroblocks, the 2MVBP element indicates which of the two fields has a non-zero differential motion vector. As shown in fig. 47K, one of 4 tables is used for encoding/decoding 2 MVBP.
5. Half-frame image coordinate system
In the following sections, the motion vector unit is expressed in a field image unit. For example, if the motion vector indicates a vertical component of displacement of +6 (in quarter pixels), this indicates a displacement of 1/2 field image lines.
Fig. 48 shows the relationship between the vertical component and the spatial position of two combined motion vectors of current and reference field polarity (opposite and same). FIG. 48 shows a vertical column of pixels in the current and reference fields. Circles represent integer pixel positions and x represents a quarter pixel position. A value of 0 indicates that there is no vertical displacement between the current and reference field positions. If the current and reference fields are of opposite polarity, the 0 vertical vector points to a position midway between the field lines in the reference field (1/2 pixel shift). If the current and reference fields are of the same polarity, the 0 vertical vector points to the corresponding field line in the reference field.
6. Decoding motion vector differences
The MVDATA (4663) and BLKMVDATA (4665) elements encode motion information for a macroblock or block in a macroblock. A 1MV macroblock has a single MVDATA (4663) element, while a 4MV macroblock may have zero to four BLMMVDATA (4665). The process of calculating the motion vector difference from MVDATA (4663) or BLKMVDATA (4665) is different for the case of one reference (NUMREF ═ 0) and the case of two references (NUMREF ═ 1).
In a field picture with one reference field, each MVDATA (4663) or BLKMVDATA (4665) syntax element jointly encodes the following two items: (1) a horizontal motion vector difference component; and (2) a vertical motion vector differential component. The MVDATA (4663) or BLKMVDATA (4665) element is a VLC followed by an FLC. The value of VLC determines the size of the FLC. The MVTAB (4643) syntax element specifies the table used to decode VLC.
Fig. 49A shows a pseudo code showing differential decoding of motion vectors of blocks or macroblocks in a field image having one reference field. In this pseudo code, values dmv _ x and dmv _ y are calculated, where dmv _ x is the differential horizontal motion vector component and dmv _ y is the differential vertical motion vector component. The variables k _ x and k _ y are fixed length values that depend on the motion vector range as defined by MVRANGE (4633) in accordance with the table shown in fig. 49B.
The variable extended _ x is used for the extended range horizontal motion vector difference and the variable extended _ y is used for the extended range vertical motion vector difference. The variables extend _ x and extend _ y are derived from DMVRANGE (4634) syntax elements. If DMVRANGE (4634) indicates an extended range using the horizontal component, then extended _ x is 1. Otherwise, extended _ x is 0. Likewise, if DMVRANGE (4634) indicates an extended range using a vertical component, then extended _ y is 1. Otherwise, extended _ y is 0. The offset table is an array defined as:
offset _ table1[9] ═ {0, 1, 2, 4, 8, 16, 32, 64, 128}, and
offset_table2[9]={0,1,3,7,15,31,63,127,255},
wherein offset _ table [ ] is used for the horizontal or vertical component when the differential range is extended for the horizontal or vertical component. Although fig. 49A and 49B illustrate extended differential motion vector decoding of interlaced P fields, extended differential motion vector decoding is also used for interlaced B fields, interlaced P frames, and interlaced B frames in the first combined implementation.
In a field picture with two reference fields, each MVDATA (4663) or BLKMVDATA (4665) syntax element jointly encodes the following three items: (1) a horizontal motion vector difference component; (2) a vertical motion vector difference component; and (3) whether the dominant or non-dominant predictor is used, i.e., which of the two fields is referenced by a motion vector. As in the case of a reference field, the MVDATA (4663) or BLKMVDATA (4665) element is a VLC followed by an FLC, and the MVTAB (4643) syntax element specifies the table used to decode the VLC.
Fig. 50 illustrates pseudo code showing motion vector differential and main/non-main predictor decoding for motion vectors of blocks or macroblocks in field pictures having two reference fields. In this pseudo code, the value predictor flag is a binary flag that indicates whether a dominant or non-dominant motion vector predictor is used. If predictor _ flag is 0, the main predictor is used, and if predictor _ flag is 1, the non-main predictor is used. Various other variables (including, dmv _ x, dmv _ y, k _ x, k _ y, extended _ x, extended _ y, offset _ table [ ] and offset _ table2[ ]) are described for one reference field case. The table size _ table is an array defined as:
size-table[16]={0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7}.
7. motion vector predictor
The motion vector is calculated by adding the motion vector difference calculated in the previous section to the motion vector predictor. The prediction value is calculated from at most three neighboring motion vectors. The motion vector predictor calculation is done in 1/4 pixel units even if the motion vector mode is half pixel.
In a 1MV interlaced P field, up to three neighboring motion vectors are used to calculate the predictor of the current macroblock. The positions of the neighboring macroblocks with the considered motion vectors are shown in fig. 5A and 5B and are described for 1MV progressive P-frames.
In a hybrid motion vector interlaced P field, up to three neighboring motion vectors are used to calculate a predictor for a current block or macroblock. The locations of neighboring blocks and/or macroblocks with considered motion vectors are shown in fig. 6A-10 and are described for mixed MV progressive P-frames.
If the NUMREF (4631) syntax element in the picture header is 0, then the current interlaced P field may reference only one previously encoded field. If NUMEREF is 1, the current interlaced P field can reference the two nearest reference field images. In the former case, a single predictor is calculated for each motion vector. In the latter case, two motion vector predictors are calculated. The pseudo code in FIGS. 51A and 51B describes how to calculate the motion vector predictor for one reference field case. The variables fieldpred _ x and fieldpred _ y in the pseudo code represent the horizontal and vertical components of the motion vector predictor.
In two reference field interlaced P fields (NUMREF 1), the current field may reference the two nearest reference fields. In this case, two motion vector predictors are calculated for each inter-coded macroblock. One predictor is from a reference field of the same polarity and the other from a reference field of opposite polarity. Of the same polarity field and the opposite polarity field, one is a dominant field and the other is a non-dominant field. The main field is the field containing most of the candidate motion vector predictors. In the case of bisection, the motion vector derived from the opposite field is considered the primary predictor. Intra-coded macroblocks are not considered in the calculation of the primary/non-primary predictor. If all candidate predictor macroblocks are intra-coded, the main and non-main motion vector predictors are set to zero and the main predictor is taken from the opposite field.
The pseudo code in fig. 52A-52F describes how to calculate motion vector predictors for two reference field cases, given 3 candidate motion vector predictors. The variables samefieldpred _ x and samefieldpred _ y represent the horizontal and vertical components of the motion vector predictor from the same field, and the variables oppotieldpred _ x and oppotieldpred _ y represent the horizontal and vertical components of the motion vector predictor from the opposite field. The variables samecount and oppositectount are initialized to 0. The variable dominatpredictor indicates which field contains the main predictor. The value predictor flag (differentially decoded from the motion vector) indicates whether the dominant or non-dominant predictor is used.
The pseudo code in fig. 52G and 52H illustrates the scaling operations referenced by the pseudo code in fig. 52A-52F for deriving the predicted value of one field from the predicted value of another field. The values of SCALEOPP, scalemame 1, SCALESAME2, scaleozone 1_ X, SCALEZONE1_ Y, ZONE1OFFSET _ X and ZONE1OFFSET _ Y for the case where the current field is the first field are shown in the table of fig. 52I, and the value for the case where the current field is the second field is shown in the table of fig. 52J. The reference frame distance is encoded in the reflast (4624) at the picture header. The reference frame distance is reflast + 1.
Fig. 52K to 52N are pseudo codes and tables for replacing the scaling operations shown in fig. 52H to 52J. Instead of scaling the pseudo-code and tables in fig. 52H to 52J (but still using the pseudo-code in fig. 52A to 52G), the scaling pseudo-code and tables in fig. 52K to 52N are used. The reference frame distance is obtained from the elements of the field layer header. The value of N depends on the motion vector range, as shown in the table of fig. 52N.
8. Hybrid motion vector prediction
The motion predictor calculated in the previous section is tested against the a (top) and C (left) predictors to determine whether the reference picture is explicitly coded in the bitstream. If so, there is one bit indicating whether predictor A or C is used as the motion vector predictor. The pseudo code in fig. 53 shows hybrid motion vector prediction decoding. In this pseudo code, the variables predictor _ pre _ x and predictor _ pre _ y are the horizontal and vertical motion vector predictors, respectively, as calculated in the previous section. The variables predictor _ post _ x and predictor _ post _ y are the horizontal and vertical motion vector predictors, respectively, after checking the hybrid motion vector predictor. The variables predictor _ pre, predictor _ post, predictorra, predictorB, and predictorC all represent the fields of polarity indicated by the value of the predictor flag. For example, if predictor _ flag indicates that the opposite field predictor is used, then:
predictor_pre_x=oppositefieldpred_x
predictor_pre_x=oppositefieldpred_y
predictorA_x=oppositefieldpredA_x
predictorA_y=oppositefieldpredA_y
predictorB_x=oppositefieldpredB_x
predictorB_y=oppositefieldpredB_y
predictorC_x=oppositefieldpredC_x
predictorC_y=oppositefieldpredC_y
Likewise, if predictor _ flag indicates that the same field predictor is used, then:
predictor_pre_x=samefieldpred_x
predictor_pre_x=samefieldpred_y
predictorA_x=samefieldpredA_x
predictorA_y=samefieldpredA_y
predictorB_x=samefieldpredB_x
predictorB_y=sainefieldpredB_y
predictorC_x=samefieldpredC_x
predictorC_y=samefieldpredC_y
wherein the values of opposifieldpred and samefieldpred are calculated as described in the previous section.
9. Reconstructing luminance motion vectors
For both 1MV and 4MV macroblocks, the luma motion vector is reconstructed by adding the difference to the prediction values as follows, where the variables range _ x and range _ y depend on MVRANGE (4633) and are specified in the table shown in fig. 49B. For NUMREF ═ 0(1 reference field interlaced P field):
my _ x ═ (dmv _ x + predictor _ x) smod range _ x, and
my_y=(dmv_y+predictor_y)smod(range_y).
for NUMREF ═ 1(2 reference field interlaced P field):
my _ x ═ (dmv _ x + predictor _ x) smod range _ x, and
mv_y=(dmv_y+predictor_y)smod(range_y/2).
if an interlaced P-field uses two reference pictures (NUMREF ═ 1), then the values of predictor flag (derived in decoding motion vector differentials) and dominantpredictor (derived in motion vector prediction) are combined to determine which fields to use as references, as shown in fig. 54.
In a 1MV macroblock, there is a single motion vector for 4 blocks that are used to constitute the luminance component of the macroblock. If the MBMODE (4661) syntax element indicates that MV data is not present in the macroblock layer, then dmv _ x is 0 and dmv _ y is 0(MV _ x is predictor _ x and MV _ y is predictor _ y).
In a 4MV macroblock, each of the inter-coded luminance blocks in the macroblock has its own motion vector. Thus, there are 4 luma motion vectors in each 4MV macroblock. If the 4MVBP (4664) syntax element indicates that a block has no motion vector information, then the dmv _ x of the block is 0 and dmv _ y (mv _ x is predictor _ x and mv _ y is predictor _ y).
10. Deriving chroma motion vectors
The chrominance motion vectors are derived from the luminance motion vectors. The chrominance motion vectors are reconstructed in two steps. As a first step, a nominal chrominance motion vector is obtained by appropriately combining and scaling the luminance motion vectors. The scaling is done in such a way that a half shift is preferred over a quarter pixel shift. In a second step, a 1-bit fastumvc syntax element is used to determine whether further rounding of the chroma motion vectors is necessary. If fastvmc is 0, no rounding is performed in the second step. If FASTUVMC is 1, the chroma motion vector at quarter-pixel offset will be rounded to the nearest half and full-pixel position. Only bilinear filtering is used for the full chroma interpolation. Variables cmv _ x and cmv _ y represent chrominance motion vector components, respectively, and 1mv _ x and 1mv _ y represent luminance motion vector components, respectively.
In a 1MV macroblock, the chroma motion vector is derived from the luma motion vector as follows:
cmv _ x ═ 1 (1mv _ x + round [1mv _ x & 3]) > 1, and
cmv_y=(1mv_y+round[1mv_y & 31)>>1,
where round [0] ═ 0, round [1] ═ 0, round [2] ═ 0, and round [3] ═ 1.
The pseudo-code in fig. 55A and 55B shows the first stage of how the chroma motion vectors are derived from the motion information in the four luma blocks of a 4MV macroblock. In this pseudo code, ix and iy are temporary variables. Fig. 55A is pseudo-code for chroma motion vector derivation for 1 reference field interlaced P fields, while fig. 55B is pseudo-code for chroma motion vector derivation for 2 reference field interlaced P fields.
11. Intensity compensation
If the MVMODE (4635) indicates that intensity compensation is to be used for the interlaced P fields, the pixels in one or both of the reference fields are remapped before using them as predictors for the current P field. When intensity compensation is used, the LUMSCALE1(4638) and LUMSCALE 1(4639) syntax elements are present in the bitstream of the first reference field, while the LUMSCALE2(4640) and LUMSCALE 2(4641) elements may also be present in the bitstream of the second reference field. The pseudo code in FIG. 56 shows how LUMSCALE1(4698) and LUMSHIFT1(4639) values are used to build a lookup table for remapping reference field pixels of the first reference field. (this pseudo-code may be similarly applied to LUMSCALE2(4640) and LUMSHIFT2(4641) for the second reference field.)
The Y component of the reference field is remapped using the LUTTY [ ] table, and the Cb/Cr component is remapped using the LUTUV [ ] table, as follows:
pY=LUTY[pY]and an
pUV=LUTUV[pUV],
Wherein p isYIs the original luminance pixel value, p, in the reference fieldYIs the remapped luminance pixel value, p, in the reference fieldUVIs the original Cb or Cr pixel value in the reference field, and pUVIs the remapped Cb or Cr pixel value in the reference field.
12. Residual decoding
The decoder decodes the element of the macroblock when a CBPCY (4662) element is present, wherein the CBPCY (4662) element indicates the presence/absence of coefficient data. On the block level, the decoder decodes coefficient data for inter-coded blocks and intra-coded blocks (except for 4MV macroblocks). To reconstruct the inter-coded block, the decoder: (1) selecting one transform type (8x8, 4x4, 4x8, or 4x4), (2) decoding subblock modes, (3) decoding coefficients, (4) performing an inverse transform, (5) performing inverse quantization, (6) obtaining a prediction for a block, and (7) adding the prediction and error blocks.
C. Sequence and semantics in a second combined implementation
In a second combined implementation, the compressed video sequence is composed of data in layers structured into levels. The top to bottom layers are: image layer, macroblock layer and block layer. The sequence layer precedes the sequence. Fig. 57A to 57C show bit stream elements constituting various layers.
1. Sequence level syntax and semantics
The sequence-level header contains sequence-level parameters for decoding the sequence of compressed images. This header is made available to the decoder, either as externally transmitted decoder configuration information or as part of the video data bitstream. Fig. 57A is a syntax diagram of a sequence layer bitstream, which shows elements constituting a sequence layer. The clip PROFILE (5701) element specifies the encoding PROFILE used to generate the clip. If PROFILE is a "high LEVEL" PROFILE, the clip LEVEL LEVEL (5702) element specifies the encoding LEVEL of the clip. Alternatively (e.g., for other profiles), the clipping level is sent to the decoder by external means.
The INTERLACE (5703) element is a 1-bit field that exists if PROFILE is an advanced PROFILE. Interlace (5703) specifies whether video is encoded in progressive or interlaced mode. If INTERLACE is 0, the video frame is encoded in progressive scan mode. If INTERLACE is 1, the video frame is encoded in interlaced mode. If PROFILE (5701) is not an advanced PROFILE, then the video is encoded in progressive mode.
The EXTENDED motion vector EXTENDED _ MV (5704) element is a 1-bit field that indicates whether the EXTENDED motion vector capability is on or off. If EXTENDED _ MV is 1, the motion vector has an EXTENDED range. If EXTENDED _ MV is 0, the motion vector has no EXTENDED range.
2. Image layer syntax and semantics
The data of an image is composed of an image header and a macroblock layer data following it. Fig. 57B is a syntax diagram of a picture layer bitstream, which shows elements of picture layers constituting an interlaced P field.
The picture type PTYPE (5722) element is a 1-bit field, or a variable-size field. If there are no B pictures, there are only I and P pictures in the sequence, and the PTYPE is encoded with a single bit. If PTYPE is 0, the picture type is I. If PTYPE is 1, the picture type is P. If the number of B pictures is greater than 0, then PTYPE (5722) is a variable size field indicating the picture type of the frame. If PTYPE is 1, the picture type is P. If PTYPE is 01 (binary), the image type is I. Also, if PTYPE is 00 (binary), the image type is B.
The number of reference pictures NUMREF (5731) element is a 1-bit syntax element present in the interlaced P-half frame header. It indicates whether the interlaced P field has 1 (NUMREF ═ 0) or 2 (NUMREF ═ 1) reference pictures. The reference field picture indicator REFFIELD (5732) is a 1-bit syntax element that exists in the interlaced P-field header if NUMREF is 0. Which indicates which of the two possible reference pictures is used for the interlaced P field. The external MV range flag MVRANGE (5733) is a variable size syntax element that is present in P-pictures of sequences encoded using a particular profile (the "master" profile), and whose BROADCAST element is set to 1. In general, MVRANGE (5733) indicates an extended range of motion vectors (i.e., longer possible horizontal and/or vertical displacements of the motion vectors). MVRANGE (5733) is used to decode motion vector differences.
The motion vector mode MVMODE (5735) element is a variable size syntax element that signals one of four motion vector coding modes or one intensity compensation mode. The motion vector coding mode comprises three "1 MV" modes with different sub-pixel interpolation rules for motion compensation. 1MV means that each macroblock in a picture has at least one motion vector. In "hybrid MV" mode, each macroblock in a picture may have one or four motion vectors, or may be skipped. Any one of tables shown in fig. 47E is used for the MVMODE (5735) element depending on the value of PQUANT (quantization coefficient of image).
The motion vector mode 2MVMODE2(5736) element is a variable size syntax element that is present in the interlaced P half frame header if MVMODE (5735) signals intensity compensation. The previous table (code minus intensity compensation) may be used for MVMODE2 (5736).
The luminance scale LUMSCALE (5738) and luminance shift LUMSHIFT (5739) elements are each 6-bit values used in intensity compensation. If MVMODE (5735) signals intensity compensation, LUMSCALE (5738) and LUMSHIFT (5739) are present in the interlaced P header.
The macroblock mode table mbmode ab (5742) element is a 2-bit field of the interlaced P field header. MBMODE ab (5742) indicates which of the four code tables (tables 0 to 3 specified with a 2-bit value) is used to encode/decode macroblock mode MBMODE (5761) syntax elements in the macroblock layer.
The motion vector table MVTAB (5743) element is a 2-bit field of an interlaced P field. MVTAB (5743) indicates which of the four code tables (tables 0 to 3 specified with two bit values) is used to encode/decode motion vector data.
4MV Block mode Table 4MVBPTAB (5744) element is a 2-bit value, which is present in an interlaced P field if MVMODE (5735) (or MVMODE2(5736), if MVMODE (5735) is set to intensity compensation) indicates that the image is of mixed MV type. The 4MVBPTAB (5744) signals which of the four code tables (tables 0 to 3 specified with two bit values) is used to encode/decode the 4MV block mode 4MVBP (5764) field in a 4MV macroblock.
The interlaced frame header (not shown) has many of the same elements as the interlaced P field header shown in fig. 57B. These include PTYPE (5722), mbmode ab (5742), MVTAB (5743), and 4MVBPTAB (5744), although the exact syntax and semantics of interlaced frames may differ from interlaced P-fields. For example, 4MVBPTAB is also a 2-bit field that indicates which of the four code tables (tables 0 to 3 specified by the two bit value) is used to encode/decode the 4MV block mode 4MVBP element in the 4MV macroblock. The interlaced frame header also includes different elements for switching between 1MV and 4MV modes and for intensity compensated signal representation.
Since an interlaced frame may include field coded macroblocks with two motion vectors per macroblock, the interlaced frame header includes two motion vector block mode table 2MVBPTAB elements. 2MVBPTAB is a 2-bit field present in an interlaced P frame. The syntax element signals one of four tables (tables 0 to 3 specified with two bit values) for the coded/decoded 2MV block mode (2MVBP) element in the 2MV field coded macroblock. Fig. 47K shows four tables that may be used for 2 MVBP.
Interlaced B fields and interlaced B frames have many of the same elements as interlaced P fields and interlaced P frames. Specifically, interlaced B frames include 2MVBPTAB and 4MVBPTAB (5721) syntax elements, although the semantics of the elements may differ from interlaced P fields and frames.
3. Macroblock layer syntax and semantics
The data of a macroblock consists of a macroblock header followed by a block layer. Fig. 57C is a syntax diagram of a macroblock layer bitstream, which shows elements of macroblock layers of macroblocks constituting an interlaced P field.
Macroblock mode MBMODE (5761) elements are variable size elements. It jointly indicates information such as the number of motion vectors for a macroblock (1MV, 4MV or intra coded), whether or not coded block pattern CBPCY (5762) exists for the macroblock, and (in some cases) whether or not motion vector differential data exists for the macroblock.
The motion vector data MVDATA (5763) element is a variable size element that encodes motion vector information (e.g., horizontal and vertical differences) of a motion vector of a macroblock. For interlaced P fields with two reference fields, MVDATA (5763) also encodes information for selecting between the primary and non-dominant motion vector predictors of the motion vectors.
If MBMODE (5761) indicates that the macroblock has four motion vectors, then four motion vector block mode 4MVBP (5764) elements are present. The 4MVBP (5764) element indicates which of the four luminance blocks contain non-zero motion vector differences. The 4MVBP (5764) element is decoded to a value between 0 and 14 using a code table. When represented as a binary value, this decoded value represents a 1-bit field indicating whether a motion vector of the corresponding luminance block exists, as shown in fig. 34.
Two motion vector block mode 2MVBP elements (not shown) are variable size syntax elements present in the macroblocks of an interlaced P frame. In an interlaced frame macroblock, 2MVBP exists if MBMODE (5761) indicates that the macroblock has 2 field motion vectors. In this case, 2MVBP indicates which of the 2 fields (up and down) contains a non-zero motion vector differential.
The block-level motion vector data BLKMVDATA (5765) element is a variable size element that exists in some cases. It contains motion information for the blocks of the macroblock.
The hybrid motion vector prediction HYBRIDPED (5766) element is a 1-bit per motion vector syntax element that may be present in a macroblock of an interlaced P field. When using hybrid motion vector prediction, hybrid pred (5766) indicates which of the two motion vector predictors to use.
4. Block level syntax and semantics
The block layer of an interlaced image follows the syntax and semantics of the block layer of a progressive image. Typically, the information of the DC and AC coefficients of the block and sub-blocks is signaled at the block level.
D. Decoding in a second combined implementation
The following sections focus on the decoding process for interlaced P fields.
1. Reference for interlaced P-field decoding
Interlaced P fields may refer to one or two previously decoded fields in motion compensation. A NUMREF (5731) field in the image layer indicates whether the current field can reference one or two previous reference field images. If NUMEREF is 0, then the current interlaced P field may only reference one field. In this case, the REFFIELD (5732) element follows in the picture layer bitstream and indicates which field is used as a reference. If REFFIELD is 0, the temporally nearest (in display order) I or P field is used as a reference. If REFFIELD is 1, the temporally second nearest I or P field picture is used as a reference. If NUMREF is 1, the current interlaced P field picture uses the two temporally closest (in display order) I or P field pictures as references. As described above, the examples of the reference field images of NUMREF ═ 0 and NUMREF ═ 1 shown in fig. 24A to 24F are applied to the second combined implementation.
2. Image type and image layer table selection
Interlaced P fields can be one of two types: 1MV or mixed MV. In a 1MVP field, for a 1MV macroblock, a single motion vector is used to indicate the displacement of a predicted block of all 6 blocks in the macroblock. In a hybrid MVP field, macroblocks can be coded as 1MV or 4MV macroblocks. For a 4MV macroblock, each of the four luma blocks may have a motion vector associated with it. The 1MV mode or the hybrid MV mode is signaled by MVMODE (5735) and MVMODE2(5736) picture layer blocks.
For interlaced P fields, the picture layer contains syntax elements that control the motion compensation mode and intensity compensation of the field. The MVMODE (5735) signals either: 1) one of the four motion vector modes of a field, or 2) intensity compensation is used in a field. If intensity compensation is signaled, then MVMODE2(5736), LUMSCALE (5738), and LUMSHIFT (5739) fields follow in the picture layer. One of the two tables in fig. 47E is used to decode the MVMODE (5735) and MVMODE2(5736) fields, depending on whether PQUANT is greater than 12.
If the motion vector mode is a hybrid MV mode, MBMODETAB (5742) signals which of the four hybrid MV MBMODE tables is used to signal the mode for each macroblock in a field. If the motion vector mode is not a hybrid MV (in this case 1 motion vector is used for all inter-coded macroblocks), then MBMODE ab (5742) signals which of the four 1MV MBMODE tables is used to signal the mode for each macroblock in the field.
MVTAB (5742) indicates a code table for decoding motion vector differences of macroblocks in an interlaced P field. 4MVBPTAB (5744) indicates a code table for decoding 4MVBP (5764) for 4MV macroblocks in an interlaced P field.
3. Macroblock mode and motion vector block mode
Macroblocks in interlaced P-fields can be one of 3 possible types: 1MV, 4MV and intra coding. The macroblock type is signaled by MBMODE (5761) in the macroblock layer.
1MV macroblocks may occur in 1MV and hybrid MVP fields. In a 1MV macroblock, a single motion vector represents the displacement between the current and reference pictures for all 6 blocks of the macroblock. The difference between the current and reference blocks is coded in the block layer. For 1MV macroblocks, MBMODE (5761) indicates the following three terms: (1) the macroblock type is 1 MV; (2) whether CBPCY (5762) is present; and (3) whether MVDATA (5763) is present.
If MBMODE (5761) indicates that CBPCY (5762) is present, CBPCY (5762) is present in the macroblock layer and indicates which of the 6 blocks are encoded in the block layer. If MBMODE (5761) indicates that CBPCY (5762) does not exist, CBPCY (5762) is assumed to be equal to 0 and no block data exists for any of the 6 blocks in the macroblock.
If MBMODE (5761) indicates that MVDATA (5763) is present, MVDATA (5763) is present in the macroblock layer and encodes a motion vector difference, which is combined with the motion vector predictor to reconstruct the motion vector. If MBMODE (5761) indicates that MVDATA (5763) is not present, the motion vector differential is assumed to be zero, and thus the motion vector is equal to the motion vector predictor.
The 4MV macroblock only appears in the mixed MVP field. In a 4MV macroblock, each of the four luma blocks in the macroblock may have an associated motion vector that indicates the displacement between the current and reference pictures of the block. The displacement of the chrominance block is derived from the four luminance motion vectors. The difference between the current and reference blocks is coded in the block layer. For a 4MV macroblock, MBMODE (5761) indicates the following three items: (1) the macroblock type is 4 MV; (2) whether CBPCY (5762) is present; and (3) whether 4MVBP (5764) is present.
If MBMODE (5761) indicates that 4MVBP (5764) is present, then 4MVBP (5764) is present in the macroblock layer and indicates which of the four luma blocks contain non-zero motion vector differences. The 4MVBP (5764) is decoded to a value between 0 and 14, which when represented as a binary value, represents a 1-bit field, which indicates whether motion vector data for the corresponding luminance block is present, as shown in fig. 27. For each of the 4-bit positions in the 4MVBP (5764), a value of 0 indicates that there is no motion vector difference (BLKMVDATA (5765)) for the block, and the motion vector difference is assumed to be 0. A value of 1 indicates that there is a motion vector difference for the block (BLKMVDATA (5765)). If MBMODE (5761) indicates that 4MVBP (5764) is not present, motion vector difference data (BLKMVDATA (5765)) is assumed to be present for all four luminance blocks.
A field coded macroblock in an interlaced frame may include 2 motion vectors. In the case of 2 field MV macroblocks, the 2MVBP element indicates which of the two fields have non-zero differential motion vectors.
Intra-coded macroblocks can occur in 1MV or hybrid MVP fields. In an intra macroblock, all six blocks are encoded without reference to any previous image data. The difference between the current block pixels and the constant value 128 is encoded in the block layer. For intra-coded macroblocks, MBMODE (5761) indicates two items: (1) the macroblock type is intra-coded; and (2) whether CBPCY (5762) is present. For intra-coded macroblocks, CBPCY (5762), when present, indicates which of the six blocks has AC coefficient data encoded in the block layer.
4. Decoding motion vector differences
The MVDATA (5763) and BLKDATA (5765) fields encode motion information for a macroblock or a block within a macroblock. A 1MV macroblock has a single MVDATA (5763) field, while a 4MV macroblock may have zero to four BLKMVDATA (5765). The calculation of the motion vector difference is performed differently for the case of one reference (NUMREF ═ 0) and the case of two references (NUMREF ═ 1).
In a field picture with only one reference field, each MVDATA (5763) or BLKMVDATA (5765) field in the macroblock layer jointly encodes the following two: (1) a horizontal motion vector difference component; and (2) a vertical motion vector differential component. The MVDATA (5763) or BLKMVDATA (5765) field is a Huffman VLC followed by a FLC. The value of VLC determines the size of the FLC. The MVTAB (5743) field in the image layer specifies the table used to decode the VLC.
Fig. 58A shows a pseudo code showing motion vector differential decoding of motion vectors of blocks or macroblocks in a field image having one reference field. In this pseudo code, values dmv _ x and dmv _ y are calculated. The value dmv _ x is a differential horizontal motion vector component and the value dmv _ y is a differential vertical motion vector component. The variables k _ x and k _ y are fixed length values for long motion vectors and depend on the motion vector range as defined by MVRANGE (5733), as shown in the table of fig. 58B. The value halfpel _ flag is a binary value indicating whether half-pel or quarter-pel precision is used for motion compensation of the picture. The value of halfpel _ flag is determined by the motion vector mode. If the mode is 1MV or hybrid MV, halfpel _ flag is 0 and quarter-pixel precision is used for motion compensation. If the mode is 1MV pel or 1MV half pel bilinear, then halfpel _ flag is 1 and half pel precision is used. The offset _ table is an array defined as follows:
offset_table[9]={0,1,2,4,8,16,32,64,128}.
in a field picture with two reference fields, each MVDATA (5763) or BLKMVDATA (5765) field in the macroblock layer jointly encodes the following three items: (1) a horizontal motion vector difference component; (2) a vertical motion vector difference component; and (3) whether the dominant or non-dominant motion vector predictor is used. The MVDATA (5763) or BLKMVDATA (5765) field is a huffman VLC followed by an FLC, and the value of the VLC determines the size of the FLC. The MVTAB (5743) field specifies a table for decoding VLC.
Fig. 59 shows pseudo code showing motion vector differential and main/non-main predictor decoding of motion vectors of blocks or macroblocks in field pictures having two reference fields. In this pseudo code, the value predictor _ flag is a binary flag that indicates whether the dominant or non-dominant motion vector predictor is used (0 ═ dominant predictor is used, 1 ═ non-dominant predictor is used). Various other variables (including, dmv _ x, dmv _ y, k _ x, k _ y, halfpel _ flag and offset _ table [ ] are described as for a reference field.
size-table[14]={10,0,1,1,2,2,3,3,4,4,5,5,6,6}.
5. Motion vector predictor
The motion vector is calculated by adding the motion vector difference calculated in the previous section to the motion vector predictor. The prediction value is calculated from at most three neighboring motion vectors.
In a 1MV interlaced P field, up to three motion vectors are used to calculate the predictor of the current macroblock. The locations of the adjacent predictors A, B and C are shown in FIGS. 5A and 5B. As described for progressive P frames, the neighboring predictors are taken from the left, top, and top-right macroblocks, except in the case where the current macroblock is the last macroblock in the row. In this case, predictor B is taken from the top left (instead of top right) macroblock. For the special case where the frame is one macroblock wide, the predictor is always predictor a (the up predictor).
In a hybrid MV interlaced P-field, up to three motion vectors are used to calculate the prediction value of the current block or macroblock. Fig. 6A-10 show three candidate motion vectors for 1MV and 4MV macroblocks in a hybrid MVP field, as described for a progressive P frame. For the special case where the frame is one macroblock wide, the predictor is always predictor a (the up predictor).
If the NUMREF (5731) field in the picture header is 0, then the current interlaced P field can reference only one previously encoded picture. If NUMREF is 1, the current interlaced P field can reference the two nearest reference field images. In the former case, a single predictor is calculated for each motion vector. In the latter case, two motion vector predictors are calculated. The pseudo code in fig. 60A and 60B shows how the motion vector predictor is calculated for one reference field case. The variables fieldpred _ x and fieldpred _ y represent the horizontal and vertical components of the motion vector predictor.
In a 2 reference field interlaced P-field (NUMREF ═ 1), the current field can reference the two nearest reference fields. In this case, two motion vector predictors are calculated for each inter-coded macroblock. One predictor is from a reference field of the same polarity and the other from a reference field of opposite polarity.
The pseudo code in fig. 61A-61F describes how to compute motion vector predictors for two reference field cases given 3 candidate motion vector predictors. The variables samefieldpred _ x and samefieldpred _ y represent the horizontal and vertical components of the motion vector predictor from the same field, while the variables openfieldpred _ x and openfieldpred _ y represent the horizontal and vertical components of the motion vector predictor from the opposite field. The variable dominantpredictor indicates which field contains the main predictor. The value predictor flag (differentially decoded from the motion vector) indicates whether the dominant or non-dominant predictor is used.
6. Hybrid motion vector prediction
If the interlaced P field is a 1MV or a hybrid MV, the motion vector predictor calculated in the previous section is tested against the A (top) and C (left) predictors to determine if the predictor was explicitly encoded in the bitstream. If so, there is one bit indicating whether predictor A or C is used as the motion vector predictor. The pseudo code in fig. 14A and 14B shows hybrid motion vector prediction decoding, using the following variables: the variables predictor _ pre _ x and predictor _ pre _ y and the candidate predictors A, B and C are as calculated in the previous section (i.e. they are opposite field predictors or they are the same field predictor, as indicated by predictor _ flag). The variables predictor _ post _ x and predictor _ post _ y are the horizontal and vertical motion vector predictors, respectively, after checking the hybrid motion vector predictor.
7. Reconstructing motion vectors
For both 1MV and 4MV macroblocks, the luma motion vector is reconstructed by adding the difference to the prediction values as follows:
my _ x ═ (dmv _ x + predictor _ x) -smod range _ x, and
mv_y=(dmv_y+predictor_y)smod range_y,
where the variables range _ x and range _ y depend on MVRANGE (5733) and are specified in the table shown in fig. 58B, and where the operation "smod" is signed modulo defined as follows.
A smod b=((A+b)%2b)-b,
Where the reconstructed vector is guaranteed to be valid. (A smod b) is located within b and b-1.
In a 1MV macroblock, there will be a single motion vector for the four blocks used to make up the luminance component of the macroblock. If the dmv _ x indicates that the macroblock is intra-coded, no motion vector is associated with the macroblock. If the macroblock is skipped, then dmv _ x is 0 and dmv _ y is 0, so mv _ x is predictorx and mv _ y is predictor _ y.
In a 4MV macroblock, each of the inter-coded luminance blocks in the macroblock has its own motion vector. Thus, there will be 0 to 4 luma motion vectors for each 4MV macroblock. Uncoded blocks in a 4MV macroblock can occur in one of two ways: (1) if the macroblock is skipped and the macroblock is 4MV (in which case all blocks in the macroblock are skipped); or (2) if the CBPCY of the macroblock (5762) indicates that the block is not coded. If the block is not encoded, then dmv _ x equals 0 and dmv _ y equals 0, so mv _ x equals predictor _ x and. mv _ y ═ predicotr _ y
8. Deriving chroma motion vectors
The chrominance motion vectors are derived from the luminance motion vectors. Also, for a 4MV macroblock, the decision whether to encode a chroma block as inter or intra is made based on the state of the luma block. The chrominance motion vectors are reconstructed in two steps. As a first step, a nominal chrominance motion vector is obtained by appropriately combining and scaling the luminance motion vectors. The scaling is done in such a way that half a pixel is preferred over a quarter pixel offset. In the second stage, the sequence-level 1-bit field fastumvc field is used to determine whether further rounding of the chroma motion vectors is necessary. If fastvmc is 0, no rounding is done in the second stage. If fastvmc is 1, the chroma motion vector at quarter-pixel offset will be rounded to the nearest full-pixel position. In addition, when fastvmc is 1, only bilinear filtering will be used for all chroma interpolation.
In a 1MV macroblock, the chrominance motion vector is derived from the luminance motion vector as follows:
//s_RndTbl[0]=0,s_RndTbl[1]=0,s_RndTbl[2]=0,s_RndTbl[3]=1
cmv_x=(lmv_x+s RndTbl[lmv_x&3])>>1
cmv_y=(lmv_y+s_RndTbl[lmv_y&3])>>1
the pseudo-code in fig. 16B shows the first stage, i.e. how the chroma motion vectors are derived from the motion information of the four luma blocks in a 4MV macroblock, using the following variables. Dominant polarity among up to four luminance motion vectors of a 4MV macroblock is determined, and the chrominance motion vectors are determined from luminance motion vectors having dominant polarity (but not from luminance motion vectors of other polarities).
9. Intensity compensation
If intensity compensation is used for the reference fields, the pixels in the reference frame are remapped before using them as predictors. LUMSCALE (5738) and LUMSHIFT (5739) are present in the image bitstream when intensity compensation is used. The pseudo code in fig. 18 or 56 shows LUMSCALE (5738) and LUMSHIFT (5739) being used to remap the reference field pixels. The referenced Y component is remapped using the LUTTY [ ] table, and the U and V components are remapped using the LUTOV [ ] table, as follows:
pY=LUTY[pY]and an
pUV=LUTUV[pUV],
Wherein p isYIs the original luminance in the reference fieldDegree pixel value, pYIs the remapped luminance pixel value, p, in the reference fieldUVIs the original U or V pixel value in the reference field, and pUVAre the remapped U or V pixel values in the reference field.
10. Residual decoding
The decoder decodes CBPCY (5762) of the macroblock when CBPCY (5762) elements are present, wherein the CBPCY (5762) elements indicate the presence/absence of coefficient data. At the block level, the decoder decodes coefficient data for inter-coded blocks and intra-coded blocks. To reconstruct the inter-coded block, the decoder: (1) selecting a transform type (8x8, 8x4, 4x8, or 4x4), (2) decoding a subblock mode, (3) decoding coefficients, (4) performing an inverse transform, (5) performing inverse quantization, (6) obtaining a prediction for a block, and (7) adding the prediction and an error block.
Having described and illustrated the principles of the invention with reference to various embodiments, it will be recognized that the various embodiments can be modified in arrangement and detail without departing from such principles. It should be understood that the programs, processes, or methods described herein are related to or limited by any particular type of computing environment unless otherwise indicated. Operations may be used in various types of general purpose or special purpose computing environments or performed in accordance with the teachings described herein. Elements of embodiments shown in software may be implemented in hardware and vice versa.
In view of the many possible embodiments to which the principles of our invention may be applied, it is intended that all such embodiments be claimed as the invention within the scope and spirit of the appended claims and their equivalents.

Claims (34)

1. A computer-implemented method for encoding video information, the method comprising:
determining a first variable length code representing first information for a macroblock having a plurality of luma motion vectors, wherein the first information includes a motion vector data present indicator for each luma motion vector of the macroblock, wherein each motion vector data present indicator is represented by a corresponding bit in a bit field, wherein a bit value of 0 indicates that no motion vector data is signaled for a corresponding luma motion vector of the plurality of luma motion vectors and a bit value of 1 indicates that motion vector data is signaled for a corresponding luma motion vector of the plurality of luma motion vectors;
signaling the first variable length code in a bitstream;
determining a second variable length code representing second information for the macroblock, wherein the second information comprises a plurality of transform coefficient data presence indicators for a plurality of blocks of the macroblock; and
the second variable length code is signaled in a bitstream.
2. The method of claim 1, further comprising:
for each of a plurality of luminance motion vectors whose motion vector data is indicated by the first information to be present, its motion vector data is signaled in the bitstream.
3. The method of claim 2, wherein the motion vector data comprises motion vector difference information and/or predictor polarity selection.
4. The method of claim 1, wherein the macroblock has four luma motion vectors corresponding to four luma blocks, and wherein the first information consists of four motion vector data presence indicators.
5. The method of claim 1, wherein the macroblock has two luma motion vectors, and wherein the first information consists of two motion vector data presence indicators.
6. The method of claim 1, further comprising signaling a table selection code in the bitstream that indicates which of a plurality of variable length code tables is to be used to determine the first variable length code.
7. The method of claim 6, wherein the table selection code is signaled at a picture level or a slice level.
8. The method of claim 6, wherein the table selection code is a fixed-length code.
9. A computer-implemented method for encoding video information, the method comprising:
For a macroblock having a first number of luma motion vectors, wherein the first number is greater than one, determining a motion vector block pattern, the motion vector block pattern consisting of a second number of bits, wherein the second number is equal to the first number, and wherein a bit in each of the motion vector block patterns indicates whether a respective one of the luma motion vectors has associated motion vector data signaled in the bitstream, wherein a bit value of 0 indicates that no motion vector data is signaled for a corresponding luma motion vector in the luma motion vectors, and a bit value of 1 indicates that motion vector data is signaled for a corresponding luma motion vector in the luma motion vectors;
signaling the motion vector block pattern in the bitstream, wherein the motion vector block pattern is signaled in a variable length code signal; and
for each of the luma motion vectors indicating that its associated motion vector data is to be signaled in the bitstream, the associated motion vector data is signaled in the bitstream.
10. The method of claim 9, further comprising:
A coded block pattern is determined that indicates which of the plurality of blocks of the macroblock have associated transform coefficient data signaled in a bitstream.
11. The method of claim 9, wherein the associated motion vector data comprises motion vector difference information.
12. The method of claim 11, wherein the associated motion vector data further comprises predictor polarity selection.
13. The method of claim 9, wherein the macroblock has four luma motion vectors for four luma blocks of the macroblock.
14. The method of claim 9, wherein the macroblock has two luminance motion vectors for an upper half frame and a lower half frame of the macroblock, respectively.
15. The method of claim 9, wherein the macroblock has four luma motion vectors for a left half and a right half of a top half frame and a bottom half frame of the macroblock, respectively.
16. The method of claim 9, further comprising signaling a table selection code in the bitstream, the table selection code indicating which of a plurality of variable length code tables is to be used for determining the motion vector block mode.
17. A video decoder, comprising:
means for decoding a plurality of variable length codes representing a plurality of motion vector block modes, wherein each of the plurality of motion vector block modes has one bit for each respective luma motion vector of a macroblock having a plurality of luma motion vectors, the one bit indicating whether motion vector data for the respective luma motion vector is signaled, wherein a bit value of 0 indicates that motion vector data is not signaled for the corresponding luma motion vector and a bit value of 1 indicates that motion vector data is signaled for the corresponding luma motion vector; and
means for decoding the motion vector data.
18. The video decoder of claim 17, further comprising means for selecting a variable length code table from a plurality of available variable length code tables for decoding a plurality of variable length codes representing the plurality of motion vector block modes.
19. A computer-implemented method for decoding video information, the method comprising:
receiving a first variable length code in a bitstream;
decoding the first variable length code, the first variable length code representing first information for a macroblock having a plurality of luma motion vectors, wherein the first information includes a motion vector data presence indicator for each luma motion vector of the macroblock, wherein each motion vector data presence indicator is represented by a corresponding bit in a bit field, wherein a bit value of 0 indicates that no motion vector data is signaled for a corresponding luma motion vector of the plurality of luma motion vectors and a bit value of 1 indicates that motion vector data is signaled for a corresponding luma motion vector of the plurality of luma motion vectors;
Receiving a second variable length code in the bitstream; and
decoding the second variable length code, the second variable length code representing second information for the macroblock, wherein the second information includes a plurality of transform coefficient data presence indicators for a plurality of blocks of the macroblock.
20. The method of claim 19, further comprising:
for each of a plurality of luminance motion vectors whose motion vector data is indicated by the first information to be present, receiving its motion vector data in the bitstream.
21. The method according to claim 20, wherein the motion vector data comprises motion vector difference information and/or predictor polarity selection.
22. The method of claim 19, wherein the macroblock has four luma motion vectors corresponding to four luma blocks, and wherein the first information consists of four motion vector data presence indicators.
23. The method of claim 19, wherein the macroblock has two luma motion vectors, and wherein the first information consists of two motion vector data presence indicators.
24. The method of claim 19, further comprising: a table selection code is received in the bitstream that indicates which of a plurality of variable length code tables is to be used for decoding the first variable length code.
25. The method of claim 24, wherein the table selection code is signaled at a picture level or a slice level.
26. The method of claim 24, wherein the table selection code is a fixed-length code.
27. A computer-implemented method for decoding video information, the method comprising:
for a macroblock having a first number of luma motion vectors, wherein the first number is greater than one, receiving a motion vector block pattern in the bitstream, the motion vector block pattern consisting of a second number of bits, wherein the second number is equal to the first number, and wherein a bit in each of the motion vector block patterns indicates whether a respective one of the luma motion vectors has associated motion vector data signaled in the bitstream, wherein a bit value of 0 indicates that no motion vector data is signaled for a corresponding luma motion vector in the luma motion vectors, and a bit value of 1 indicates that motion vector data is signaled for a corresponding luma motion vector in the luma motion vectors, and wherein the motion vector block pattern is received as a variable length code; and
For each of the luma motion vectors indicating that its associated motion vector data is to be signaled in a bitstream, its associated motion vector data is received in the bitstream.
28. The method of claim 27, further comprising:
an encoded block mode is received in the bitstream, the mode indicating which of a plurality of blocks of a macroblock have associated transform coefficient data signaled in the bitstream.
29. The method of claim 27, wherein the associated motion vector data comprises motion vector difference information.
30. The method of claim 29, wherein the associated motion vector data further comprises predictor polarity selection.
31. The method of claim 27, wherein the macroblock has four luma motion vectors for four luma blocks of the macroblock.
32. The method of claim 27, wherein the macroblock has two luma motion vectors for an upper half frame and a lower half frame of the macroblock, respectively.
33. The method of claim 27 wherein the macroblock has four luma motion vectors for the left and right halves of the top and bottom half frames of the macroblock, respectively.
34. The method of claim 27, further comprising signaling a table selection code in the bitstream, the table selection code indicating which of a plurality of variable length code tables to use for decoding the motion vector block mode.
HK11101301.9A 2003-09-07 2011-02-10 Coding and decoding for interlaced video HK1147373B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US50108103P 2003-09-07 2003-09-07
US60/501,081 2003-09-07
US10/857,473 2004-05-27
US10/857,473 US7567617B2 (en) 2003-09-07 2004-05-27 Predicting motion vectors for fields of forward-predicted interlaced video frames
US10/933,958 US7599438B2 (en) 2003-09-07 2004-09-02 Motion vector block pattern coding and decoding
US10/933,958 2004-09-02

Publications (2)

Publication Number Publication Date
HK1147373A1 HK1147373A1 (en) 2011-08-05
HK1147373B true HK1147373B (en) 2013-03-22

Family

ID=

Similar Documents

Publication Publication Date Title
CN101411195B (en) Encoding and decoding of interlaced video
US20050053295A1 (en) Chroma motion vector derivation for interlaced forward-predicted fields
US20050053144A1 (en) Selecting between dominant and non-dominant motion vector predictor polarities
US20050053134A1 (en) Number of reference fields for an interlaced forward-predicted field
KR101038794B1 (en) Coding and Decoding Interlaced Video
HK1147373B (en) Coding and decoding for interlaced video
HK1144989B (en) Coding and decoding for interlaced video
HK1149405B (en) Coding and decoding for interlaced video
HK1149657B (en) Coding and decoding for interlaced video
HK1150484B (en) Coding and decoding for interlaced video
HK1149658B (en) Coding and decoding for interlaced video