US20130235928A1 - Advanced coding techniques - Google Patents
Advanced coding techniques Download PDFInfo
- Publication number
- US20130235928A1 US20130235928A1 US13/652,311 US201213652311A US2013235928A1 US 20130235928 A1 US20130235928 A1 US 20130235928A1 US 201213652311 A US201213652311 A US 201213652311A US 2013235928 A1 US2013235928 A1 US 2013235928A1
- Authority
- US
- United States
- Prior art keywords
- coding
- frames
- video sequence
- frame
- current frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000003111 delayed effect Effects 0.000 claims abstract description 27
- 230000000007 visual effect Effects 0.000 claims description 25
- 239000011159 matrix material Substances 0.000 claims description 19
- 230000000873 masking effect Effects 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 10
- 238000013139 quantization Methods 0.000 claims description 9
- 230000002123 temporal effect Effects 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 238000007906 compression Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 239000013598 vector Substances 0.000 description 4
- 206010034962 Photopsia Diseases 0.000 description 3
- 238000003491 array Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000013144 data compression Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 235000019580 granularity Nutrition 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000007493 shaping process Methods 0.000 description 2
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/577—Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/107—Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/149—Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/15—Data rate or code amount at the encoder output by monitoring actual compressed data size at the memory before deciding storage at the transmission buffer
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/154—Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/156—Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/192—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/58—Motion compensation with long-term prediction, i.e. the reference frame for a current frame not being the temporally closest one
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/46—Embedding additional information in the video signal during the compression process
Definitions
- a video coder may code a source video sequence into a coded representation that has a smaller bit rate than does the source video and, thereby may achieve data compression.
- the video coder may code processed video data according to any of a variety of different coding techniques to achieve compression.
- One common technique for data compression uses predictive coding techniques (e.g., temporal/motion predictive coding). For example, some frames in a video stream may be coded independently (I-frames) and some other frames (e.g., P-frames or B-frames) may be coded using other frames as reference frames.
- P-frames may be coded with reference to a single previously coded frame (called, a “reference frame”) and B-frames may be coded with reference to a pair of previously-coded reference frames, typically a reference frame that occurs prior to the B-frame in display order and another reference frame that occurs subsequently to the B-frame in display order.
- the resulting compressed sequence (bit stream) may be transmitted to a decoder via a channel.
- the bit stream may be decompressed at the decoder by inverting the coding processes performed by the coder, yielding a recovered video sequence.
- a video coder may need to achieve a particular target compression ratio based on factors such as network bandwidth.
- certain frames of a video sequence may be coded with a higher compression than other frames in the video sequence.
- the higher the compression the lower the resulting image quality. Consequently, the frames with relatively high compression may have a lower visual quality than adjacent frames, leading to sudden changes in visual quality in the video sequence. Therefore, designers of video coding systems endeavor to provide coding systems that maintain smooth transitions in the visual quality of video.
- FIG. 1 is a simplified block diagram of a video coding system according to an embodiment of the present invention.
- FIG. 2 is a functional block diagram of a video coding system according to an embodiment of the present invention.
- FIG. 3 is a simplified block diagram of a video coding system of another embodiment of the present invention.
- FIG. 4 illustrates a method to iteratively code a video sequence to achieve a target bitrate according to an embodiment of the present invention.
- FIG. 5 illustrates a method to code a video sequence based on information generated on a previous coding pass according to an embodiment.
- FIG. 6 illustrates a method to estimate a file size of a video sequence according to an embodiment.
- FIG. 7 illustrates a video sequence including a selected random access pictures (RAP) frame, according to an embodiment.
- RAP random access pictures
- FIG. 8 illustrates a method to select a RAP frame according to an embodiment.
- Embodiments of the present invention provide techniques for efficiently coding/decoding video data during circumstances when constraints are imposed on the video data.
- coding parameters of a video sequence may be selected based on a target bit rate.
- the video sequence may be predictively coded based on the parameters. If the target bit rate is not achieved, regions of the video sequence with high bit rates may be identified, a filtering strength applied to the identified regions may be increased, and the video sequence may be predictively coded with the increased filtering strength.
- a video sequence on a first coding pass, may be coded based on a first set of coding parameters. Values of a characteristic of frames from the video sequence may be stored during the first coding pass. Frames which violate a constraint imposed on the characteristic based on the stored values may be identified. Target characteristic values for the frames from the video sequence may be determined. The target characteristic values may be lower than the constraint. A second set of coding parameters to achieve the target characteristic values may be computed. On a second pass, the video sequence may be coded based on the second set of coding parameters.
- perceptual model values may be determined from a video sequence.
- An index into a matrix may be computed based on the perceptual model values.
- the matrix may store associations between parameter range(s) and file sizes.
- a file size may be retrieved from the matrix corresponding to the computed index.
- the video sequence may be predictively coded with parameters associated with the file size.
- a motion compensated error energy of a current frame of a video sequence may be computed.
- a weighted average motion compensated error energy of frames successive to the current frame in coding order may be computed. If the motion compensated error energy of the current frame is higher than the weighted average motion compensated error energy of the successive frames by a first factor, a weighted average motion compensated error energy of the current frame and frames adjacent to the current frame may be computed. If the weighted average motion compensated error energy of the current frame and the adjacent frames is higher than the weighted average motion compensated error energy of the successive frames by a second factor, the current frame may be marked as a random access pictures frame. The current frame may be predictively coded.
- a frame from a video sequence may be marked as a delayed decoder refresh frame.
- Frames successive to the delayed decoder refresh frame in coding order may predictively coded without reference to frames preceding the delayed decoder refresh frame in coding order.
- the distance between the delayed decoder refresh frame and the successive frames may exceed a distance threshold.
- frames successive to a current frame in decoding order may be decoded without reference to frames preceding the current frame in decoding order.
- the distance between the current frame and the successive frames may exceed a distance threshold.
- FIG. 1 is a simplified block diagram of a video coding system 100 according to an embodiment of the present invention.
- the system 100 may include at least two terminals 110 - 120 interconnected via a network 150 .
- a first terminal 110 may code video data at a local location for transmission to the other terminal 120 via the network 150 .
- the second terminal 120 may receive the coded video data of the other terminal from the network 150 , decode the coded data and display the recovered video data.
- Unidirectional data transmission is common in media serving applications and the like.
- FIG. 1 illustrates a second pair of terminals 130 , 140 provided to support bidirectional transmission of coded video that may occur, for example, during videoconferencing.
- each terminal 130 , 140 may code video data captured at a local location for transmission to the other terminal via the network 150 .
- Each terminal 130 , 140 also may receive the coded video data transmitted by the other terminal, may decode the coded data and may display the recovered video data at a local display device.
- the terminals 110 - 140 are illustrated as servers, personal computers and smart phones but the principles of the present invention are not so limited. Embodiments of the present invention find application with laptop computers, tablet computers, media players and/or dedicated video conferencing equipment.
- the network 150 represents any number of networks that convey coded video data among the terminals 110 - 140 , including, for example, wireline and/or wireless communication networks.
- the communication network 150 may exchange data in circuit-switched and/or packet-switched channels.
- Representative networks include telecommunications networks, local area networks, wide area networks and/or the Internet. For the purposes of the present discussion, the architecture and topology of the network 150 are immaterial to the operation of the present invention unless explained hereinbelow.
- FIG. 2 is a functional block diagram of a video coding system 200 according to an embodiment of the present invention.
- the system 200 may include a video source 210 that provides video data to be coded by the system 200 , a pre-processor 220 , a video coder 230 , a transmitter 240 and a controller 250 to manage operation of the system 200 .
- the video source 210 may provide video to be coded by the rest of the system 200 .
- the video source 210 may be a storage device storing previously prepared video.
- the video source 210 may be a camera that captures local image information as a video sequence.
- Video data typically is provided as a plurality of individual frames that impart motion when viewed in sequence. The frames themselves typically are organized as a spatial array of pixels.
- the pre-processor 220 may perform various analytical and signal conditioning operations on video data.
- the pre-processor 220 may parse input frames into color components (for example, luminance and chrominance components) and also may parse the frames into pixel blocks, spatial arrays of pixel data, which may form the basis of further coding.
- the pre-processor 220 also may apply various filtering operations to the frame data to improve efficiency of coding operations applied by a video coder 230 .
- the video coder 230 may perform coding operations on the video sequence to reduce the video sequence's bit rate.
- the video coder 230 may include a coding engine 232 , a local decoder 233 , a reference picture cache 234 , a predictor 235 and a controller 236 .
- the coding engine 232 may code the input video data by exploiting temporal and spatial redundancies in the video data and may generate a datastream of coded video data, which typically has a reduced bit rate as compared to the datastream of source video data.
- the video coder 230 may perform motion compensated predictive coding, which codes an input frame predictively with reference to one or more previously-coded frames from the video sequence that were designated as “reference frames.” In this manner, the coding engine 232 codes differences between pixel blocks of an input frame and pixel blocks of reference frame(s) that are selected as prediction reference(s) to the input frame.
- a video coder 230 may be provided as a multi-pass coder in which portions of the video sequence may be coded iteratively in a trial-and-error manner using different coding parameters to achieve a target bit rate.
- the target bit rate represents a number of bits per unit time. Results of one coding pass may be exchanged with the pre-processor 220 to improve results of a subsequent coding pass.
- the local decoder 233 may decode coded video data of frames that are designated as reference frames. Operations of the coding engine 232 typically are lossy processes. When the coded video data is decoded at a video decoder (not shown in FIG. 2 ), the recovered video sequence typically is a replica of the source video sequence with some errors.
- the local decoder 233 replicates decoding processes that will be performed by the video decoder on reference frames and may cause reconstructed reference frames to be stored in the reference picture cache 234 . In this manner, the system 200 may store copies of reconstructed reference frames locally that have common content as the reconstructed reference frames that will be obtained by a far-end video decoder (absent transmission errors).
- the predictor 235 may perform prediction searches for the coding engine 232 . That is, for a new frame to be coded, the predictor 235 may search the reference picture cache 234 for image data that may serve as an appropriate prediction reference for the new frames. The predictor 235 may operate on a pixel block-by-pixel block basis to find appropriate prediction references. In some cases, as determined by search results obtained by the predictor 235 , an input frame may have prediction references drawn from multiple frames stored in the reference picture cache 234 .
- the controller 236 may manage coding operations of the video coder 230 , including, for example, selection of coding parameters to meet a target bit rate of coded video.
- video coders operate according to constraints imposed by bit rate requirements, quality requirements and/or error resiliency policies; the controller 236 may select coding parameters for frames of the video sequence in order to meet these constraints.
- the controller 236 may assign coding modes and/or quantization parameters to frames and/or pixel blocks within frames.
- the transmitter 240 may buffer coded video data to prepare it for transmission to the far-end terminal (not shown).
- the transmitter 240 may merge coded video data from the video coder 230 with other data to be transmitted to the terminal, for example, coded audio data and/or ancillary data streams (sources not shown).
- the controller 250 may manage operation of the system 200 .
- the controller 250 may assign to each frame a certain frame type (either of its own accord or in cooperation with the controller 236 ), which can affect the coding techniques that are applied to the respective frame. For example, frames often are assigned as one of the following frame types:
- Frames commonly are parsed spatially into a plurality of pixel blocks (for example, blocks of 4 ⁇ 4, 8 ⁇ 8 or 16 ⁇ 16 pixels each) and coded on a pixel block-by-pixel block basis.
- Pixel blocks may be coded predictively with reference to other coded pixel blocks as determined by the coding assignment applied to the pixel blocks' respective frames.
- pixel blocks of I frames can be coded non-predictively or they may be coded predictively with reference to pixel blocks of the same frame (spatial prediction).
- Pixel blocks of P frames may be coded non-predictively, via spatial prediction or via temporal prediction with reference to one previously coded reference frame.
- Pixel blocks of B frames may be coded non-predictively, via spatial prediction or via temporal prediction with reference to one or two previously coded reference frames.
- FIG. 3 is a simplified block diagram of a video coding system 300 of another embodiment of the present invention, illustrating the operation of pixel-block coding operations.
- the system 300 may include a pre-processor 310 , a block-based coder 320 , a reference frame decoder 330 , a reference picture cache 340 , a predictor 350 , a transmit buffer 360 and a controller 370 .
- the block-based coder 320 may include a subtractor 321 , a transform unit 322 , a quantizer 323 and an entropy coder 324 .
- the subtractor 321 may generate data representing a difference between the source pixel block and a reference pixel block developed for prediction.
- the subtractor 321 may operate on a pixel-by-pixel basis, developing residuals at each pixel position over the pixel block.
- Non-predictively coded blocks may be coded without comparison to reference pixel blocks, in which case the pixel residuals are the same as the source pixel data.
- the coder 320 may be provided as a multi-pass coder in which portions of the video sequence may be coded iteratively in a trial-and-error manner using different coding parameters to achieve a target bit rate. Results of one coding pass may be exchanged with the pre-processor 310 to improve results of a subsequent coding pass.
- the transform unit 322 may convert the source pixel block data to an array of transform coefficients, such as by a discrete cosine transform (DCT) process or a wavelet transform.
- the quantizer unit 323 may quantize (divide) the transform coefficients obtained from the transform unit 322 by a quantization parameter QP.
- the entropy coder 324 may code quantized coefficient data by run-value coding, run-length coding or the like. Data from the entropy coder may be output to the channel as coded video data of the pixel block.
- the reference frame decoder 330 may decode pixel blocks of reference frames and assemble decoded data for such reference frames. Decoded reference frames may be stored in the reference picture cache 340 .
- the predictor 350 may generate and output prediction blocks to the subtractor 321 .
- the predictor 350 also may output metadata identifying type(s) of predictions performed.
- the predictor 350 may search among the reference picture cache for pixel block data of previously coded and decoded frames that exhibits strong correlation with the source pixel block.
- the predictor 350 finds an appropriate prediction reference for the source pixel block, it may generate motion vector data that is output to the decoder as part of the coded video data stream.
- the predictor 350 may retrieve a reference pixel block from the reference cache that corresponds to the motion vector and may output it to the subtractor 321 .
- the predictor 350 may search among the previously coded and decoded pixel blocks of the same frame being coded for pixel block data that exhibits strong correlation with the source pixel block. Operation of the predictor 350 may be constrained by a mode selection provided by the controller 370 . For example, if a controller selects an inter-coding mode for application to a frame, the predictor 350 will be constrained to use inter-coding techniques. If the controller selects an inter-prediction mode for the frame, the predictor may select among inter-coding modes and intra-coding modes depending upon results of its searches.
- a transmit buffer 360 may accumulate metadata representing pixel block coding order, coded pixel block data and metadata representing coding parameters applied to the coded pixel blocks.
- the metadata can include prediction modes, motion vectors and quantization parameters applied during coding. Accumulated data may be formatted and transmitted to the channel.
- a controller 370 may manage coding of the source video, including selection of a coding mode for use by the predictor 350 and selection of quantization parameters to be applied to pixel blocks.
- FIG. 4 illustrates a method 400 to iteratively code a video sequence to achieve a target bitrate according to an embodiment of the present invention.
- a pre-processor may filter a portion of a source video sequence according to an initial set of coding parameters (box 410 ).
- a video coder may select coding parameters for the sequence portion based on an estimate of the target bit rate (box 420 ).
- the video coder may code the sequence portion according to the coding parameters (box 430 ).
- the video coder may then determine whether the coded video data obtained from coding satisfies the target bit rate (box 440 ).
- the results of the coding operation may be terminated for the current portion of the sequence (box 490 ), and the coding operation may advance to another portion of the sequence, if available. Otherwise, the video coder may identify regions of the coded sequence that generate high bit rates (box 450 ). In response to the identified regions, the pre-processor may increase filtering strengths as applied to the regions in the source data (box 460 ) and operation may return to box 420 for another coding pass.
- the initial set of coding parameters utilized by the pre-processor may be predetermined, may be a default set of parameters, or they may be derived from an analysis of the source video performed by the pre-processor and related controls.
- the coding parameters selected by the video coder may involve selections of prediction mode for frames within the sequence portion and quantization parameters applied to pixel blocks within the frames.
- Filtering and coding operations performed by method 400 may vary among different regions of the video sequence. Typically, due to correlation among frames, the identified regions will persist across a common spatial region of multiple frames in a portion of the video sequence being coded. Accordingly, recursive operations of the video coder and pre-processor may be performed on a single frame of the video or they may be performed on a set of several frames (say, 10 frames) of the video sequence.
- the video coder may adjust target bit rates of frames with the regions at the expense of frames that do not have such regions (box 470 ).
- the video coder may select coding parameters corresponding to each frame's target bit rate. For example, the video coder may select relatively higher quantization parameters for frames that have high bit rate regions in prior passes, which tends to reduce bit rates of such frames as the expensive of lower coding quality.
- the sequence may be governed by a coding policy that imposes constraints on certain characteristics of the video sequence. For example, a constraint may limit a bit rate over the video sequence. Another constraint may limit the bit rate over a fixed window of frames. In another example, a constraint may define a minimum threshold on the visual quality of a window of frames.
- the video coder may adjust coding parameters such as a quantization parameter (QP) and mode selection. Specifically, the video coder may react to constraint breakages as they occur and manage the parameter through models which map QPs to characteristic levels.
- QP quantization parameter
- a QP-to-bits mapping model may be utilized and for constraints on visual quality, a QP-to-peak signal-to-noise ratio (QP-to-PSNR) mapping model may be utilized.
- QP-to-PSNR QP-to-peak signal-to-noise ratio
- FIG. 5 illustrates a method 500 to code a video sequence based on information generated on a previous coding pass according to an embodiment.
- a video coder may code, on a first coding pass, a video sequence based on an initial set of coding parameters (box 510 ).
- a controller controlling the operations of the video coder may compute revised parameters for the video coder based on the first coding pass (box 520 ).
- the video coder may then re-code the video sequence again utilizing the revised parameters (box 530 ).
- attributes of frames affecting the coding parameters may be stored. Based on the stored attributes, frames which violate constraints may be identified. Then, the coding parameters for a subsequent coding iteration may be adjusted so that parameter changes from one frame to another are gradual, while simultaneously ensuring that constraints are not violated for any of the frames in the video sequence. For example, when a portion of the video sequence that violates one or more constraints is identified, a window of support may be developed to the beginning of a scene in which the constraint is violated when shaping a characteristic curve of the frames in that window. Thus, coding parameters may be adjusted smoothly to avoid sudden changes in visual quality within a scene.
- the characteristic values resulting from the first coding may be stored and transformed to form a list (called, for example, “targetCharacteristicCurveArray”) of desired characteristic values that satisfy the constraint.
- a second coding of the sequence of frames may generate (within a tolerance) the desired characteristic values as stored in targetCharacteristicCurveArray.
- the values in targetCharacteristicCurveArray for coded frames may be updated with the actual characteristic values, and values in targetCharacteristicCurveArray for future frames may be adjusted to reflect the actual values that are generated.
- a data rate constraint may be expressed as a pair of values: maximum data rate and fixed length window of presentation times over which to compute the data rate.
- a data rate constraint may be inferred from a model such as the Hypothetical Reference Decoder.
- a visual quality constraint may impose a minimum visual quality (given some metric, such as PSNR) over specified set of frames.
- a constraint may be the decoder complexity required to decode a video sequence. Another constraint may be the amount of heat dissipated by a decoder while decoding a video sequence (decoder thermal generation). A constraint may be the amount of energy utilized by a decoder to decode a video sequence (decoder power usage or battery drainage). Another constraint may be the visual quality of a video sequence in dark scenes. Still another constraint may be the quality degradation through visual masking.
- the set of frames over which a constraint is imposed need not be successive frames and can be imposed over all frames with a perceptual model score in a particular range, all frames with average luma in some range, etc.
- the perceptual model score may be based on the visual quality and/or video complexity of a video sequence.
- the perceptual model score may be computed from spatial, temporal, and spatiotemporal visual masking values derived by a pre-processor.
- the transformation applied to characteristic values resulting from the first coding pass may include scaling the values by a fixed constant. For example, for a data rate constraint, if a set of frames total A bits and is greater than a specified max of B bits, individual frame sizes can be scaled by B/A. Further, the scaling factor applied to each frame may be modulated by something simple as the relative size of the frame; it may also be modulated by the perceptual significance of the frame.
- the transformation may be computed jointly or sequentially. For example, a set of frames may violate both data rate and visual quality constraints.
- a sequential transformation may update the targetCharacteristicCurveArray based on the data rate constraint and then on the visual rate constraint.
- a joint transformation may update the targetCharacteristicCurveArray on both constraints simultaneously, for example, by targeting a weighted average of both measures.
- the method 500 may be performed on portions of a video sequence rather than the entirety of a video sequence and may operate at various granularities such as a sequence of frames belonging to a common scene, a group of dependent frames, short clips, single frames, slices, and coding units (pixel blocks).
- a number of coding strategies can be employed to minimize the number of encodes for any frame, including the use of an adaptive QP-to-characteristic model.
- QP-to-characteristic models may include QP-to-bits and QP-to-PSNR as explained above.
- analytical operations performed for curve shaping may be performed by a pre-processor.
- Management of the targetCharacteristicCurveArray and selection of coding parameters may be performed by controller(s) within the system.
- a video coder may receive a data rate value or a quality level as a control input along with a video sequence to determine the size of the output coded bitstream.
- the data rate may be specified without regard to the content of the video; likewise, often, the quality level may be specified without regard to the resulting data rate.
- FIG. 6 illustrates a method 600 to estimate a file size of a video sequence according to an embodiment.
- the method 600 may scan the video sequence and compute perceptual model values therefrom (box 610 ), based on, for example, spatial, temporal, and spatiotemporal visual masking values derived by a pre-processor from content of the video sequence. Based on the computed values, the method 600 may develop an index into a Data Rate/Quality matrix, taking into account the resolution and the frame rate of the video (box 620 ). The method 600 may then retrieve a file size estimate based on the information in the matrix (box 630 ).
- Perceptual model values may be developed in a variety of ways. For example, a single number may be distilled from a number of values. In an embodiment, a single weighted value may be computed from, for example, spatial, temporal, and spatiotemporal visual masking values derived by a pre-processor from content of the video sequence.
- the method 600 may be performed by a coder controller, which also may store the Data Rate/Quality matrix.
- the Data Rate/Quality matrix may store values representing optimum file sizes for video along a multi-dimensional parameter range.
- An example of a multi-dimensional parameter range may be a data rate range and a quality range, where the Data Rate/Quality matrix is dependent on the resolution, duration and frame rate of the video.
- the Data Rate/Quality matrix may also store file size values based on thermal output range, power utilization range, and decoder complexity range.
- file size values stored in the matrix may be derived from operation of similar coders on other training sequences having their own perceptual model values, resolution, duration and frame rate and the file sizes generated by those coders.
- the optimum file size may minimize the weighted sum of a quality degradation value and the resulting bit rate.
- the quality degradation value could include metrics for visual quality through a perceptual model over a noiseless channel and for a number of noisy channels.
- the optimum file size may minimize the weighted sum of the resulting decoder complexity and the resulting decoder power/thermal output.
- the coder may search for the optimum coding by sampling the specified Data Rate/Quality space.
- the method 600 may be performed on portions of a video sequence rather than the entirety of a video sequence and may operate at various granularities such as short clips, single frames, slices, and coding units (pixel blocks). For example, in the case of video capture, buffered frames can be processed to calculate the optimum file size.
- a quality level that is predefined or calculated based on the history can be used to avoid spending excessive bits in frames that already reach acceptable quality.
- the bits saved can be used in later frames that require more bits to reach the quality level, or not used at all such that the bit rate of the entire stream can be reduced.
- Random access pictures represent another coding constraint of a system.
- RAPs facilitate random access among a bitstream.
- RAPs are placed where no natural scene changes exist, incurring bit rate spikes and often causing sudden changes in the visual quality in a scene (visual flashes). Inserting RAPs in the middle of scenes may be inefficient because RAPs are often coded as Instantaneous Decoder Refresh (IDR) frames.
- IDR Instantaneous Decoder Refresh
- An IDR is a type of I-frame which forces the decoder to refresh its state immediately, guaranteeing that no state prior or subsequent (in decode order) to the IDR is necessary for decoding the IDR. This break in coding dependency causes the aforementioned bit rate spike in coding and visual flashes when the coded video is decoded and displayed.
- a frame may be identified as a RAP frame based on relative motion masking.
- FIG. 7 illustrates a video sequence 700 including a selected RAP frame 710 , according to an embodiment.
- Frame 710 from the video sequence 700 may be selected as a RAP frame if the frames 720 before frame 710 in coding order have a relatively high motion masking and the frames 730 after frame 710 in coding order have a relatively low motion masking.
- Motion masking may be computed as a weighted average over the video segment of motion compensated error energy (MCEE).
- the MCEE represents the amount of pixel changes between frames, along motion trajectories (motion vectors).
- the MCEE may be computed, for example, as a sum of absolute differences (SAD), sum of squared differences (SSD), etc., between successive frames in that video segment.
- SAD sum of absolute differences
- SSD sum of squared differences
- the motion masking levels may be determined by a pre-processor and/or a controller.
- FIG. 8 illustrates a method 800 to select a RAP frame according to an embodiment.
- the method 800 may compute the MCEE of the current frame and determine whether it exceeds a threshold (box 810 ). If it does, the method 800 may compute a weighted average of motion compensated error energy (WMCEE) of the frames successive to the current frame (box 820 ). If the MCEE of the current frame exceeds the WMCEE of the successive frames by a first factor ( 830 ), the method 800 may compute a WMCEE of the current frame and frames adjacent to the current frame ( 840 ). If the WMCEE of the current frame and adjacent frames exceed the WMCEE of the successive frames (box 850 ), the current frame may be selected as a RAP frame. Otherwise, the method 800 may determine whether the next frame qualifies as a RAP frame.
- WMCEE motion compensated error energy
- potential RAP frames if multiple frames within a video sequence are identified as potential RAP frames by method 800 and the potential RAP frames are within a proximity threshold to each other, not all potential RAP frames need to be selected as RAP frames.
- the last potential RAP frame (in decoding order) can be selected as a RAP frame.
- a subsampling of potential RAP frames may be selected as RAP frames.
- the highest motion masking video segment within a video sequence may be identified and a selected RAP frame may be inserted into that video segment.
- RAPs within a scene may be re-used to reduce the bit rate overhead.
- a re-usable RAP frame may be defined as a “Delayed Decoder Refresh” (DDR) frame.
- DDR Delayed Decoder Refresh
- a DDR frame may not force an immediate state refresh at a decoder but guarantees that state information from frames prior to the DDR frame (in decode order) is not necessary to decode the DDR frame itself or to decode frames subsequent to the DDR frame (in decode order) that are more than a specified number N delay of frames from the DDR frame.
- the DDR frame may be used as a reference frame for the frames immediately after it (in decode order) and as a RAP for the frames N delay +1 frames after it (in decode order).
- multiple delays may be associated with a single a DDR frame to indicate that the DDR frame may be used as a RAP for X+1 video segments.
- the frame at the beginning (in decode order) of the each such segment may include the appropriate delay value, N delay .
- setting N delay to 0 may indicate that the DDR is to be inserted immediately as an IDR frame in the associated video segment.
- information pertaining to the DDR frame such as the number N delay may be signaled in the channel data 260 ( FIG. 2 ) as part of syntax defining a DDR frame.
- a frame may specify the DDR frame which it needs as a reference frame. This may be done by giving each DDR frame an identifier which may be signaled in the bitstream using a specified number of bits.
- video coders are provided as electronic devices. They can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on personal computers, notebook computers or computer servers.
- decoders can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors, or they can be embodied in computer programs that execute on personal computers, notebook computers or computer servers. Decoders commonly are packaged in consumer electronics devices, such as gaming systems, DVD players, portable media players and the like and they also can be packaged in consumer software applications such as video games, browser-based media players and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Embodiments of the present invention provide techniques for efficiently coding/decoding video data during circumstances when constraints are imposed on the video data. A frame from a video sequence may be marked as a delayed decoder refresh frame. Frames successive to the delayed decoder refresh frame in coding order may predictively coded without reference to frames preceding the delayed decoder refresh frame in coding order. The distance between the delayed decoder refresh frame and the successive frames may exceed a distance threshold. Frames successive to a current frame in decoding order may be decoded without reference to frames preceding the current frame in decoding order. The distance between the current frame and the successive frames may exceed a distance threshold.
Description
- This application claims the benefit of priority afforded by provisional application Ser. No. 61/607,484, filed Mar. 6, 2012, entitled “Improvements in Video Preprocessors and Video Coders.”
- In video coder/decoder systems, a video coder may code a source video sequence into a coded representation that has a smaller bit rate than does the source video and, thereby may achieve data compression. The video coder may code processed video data according to any of a variety of different coding techniques to achieve compression. One common technique for data compression uses predictive coding techniques (e.g., temporal/motion predictive coding). For example, some frames in a video stream may be coded independently (I-frames) and some other frames (e.g., P-frames or B-frames) may be coded using other frames as reference frames. P-frames may be coded with reference to a single previously coded frame (called, a “reference frame”) and B-frames may be coded with reference to a pair of previously-coded reference frames, typically a reference frame that occurs prior to the B-frame in display order and another reference frame that occurs subsequently to the B-frame in display order. The resulting compressed sequence (bit stream) may be transmitted to a decoder via a channel. To recover the video data, the bit stream may be decompressed at the decoder by inverting the coding processes performed by the coder, yielding a recovered video sequence.
- A video coder may need to achieve a particular target compression ratio based on factors such as network bandwidth. Thus, certain frames of a video sequence may be coded with a higher compression than other frames in the video sequence. Typically, the higher the compression, the lower the resulting image quality. Consequently, the frames with relatively high compression may have a lower visual quality than adjacent frames, leading to sudden changes in visual quality in the video sequence. Therefore, designers of video coding systems endeavor to provide coding systems that maintain smooth transitions in the visual quality of video.
-
FIG. 1 is a simplified block diagram of a video coding system according to an embodiment of the present invention. -
FIG. 2 is a functional block diagram of a video coding system according to an embodiment of the present invention. -
FIG. 3 is a simplified block diagram of a video coding system of another embodiment of the present invention. -
FIG. 4 illustrates a method to iteratively code a video sequence to achieve a target bitrate according to an embodiment of the present invention. -
FIG. 5 illustrates a method to code a video sequence based on information generated on a previous coding pass according to an embodiment. -
FIG. 6 illustrates a method to estimate a file size of a video sequence according to an embodiment. -
FIG. 7 illustrates a video sequence including a selected random access pictures (RAP) frame, according to an embodiment. -
FIG. 8 illustrates a method to select a RAP frame according to an embodiment. - Embodiments of the present invention provide techniques for efficiently coding/decoding video data during circumstances when constraints are imposed on the video data. According to the embodiments, coding parameters of a video sequence may be selected based on a target bit rate. The video sequence may be predictively coded based on the parameters. If the target bit rate is not achieved, regions of the video sequence with high bit rates may be identified, a filtering strength applied to the identified regions may be increased, and the video sequence may be predictively coded with the increased filtering strength.
- In an embodiment, on a first coding pass, a video sequence may be coded based on a first set of coding parameters. Values of a characteristic of frames from the video sequence may be stored during the first coding pass. Frames which violate a constraint imposed on the characteristic based on the stored values may be identified. Target characteristic values for the frames from the video sequence may be determined. The target characteristic values may be lower than the constraint. A second set of coding parameters to achieve the target characteristic values may be computed. On a second pass, the video sequence may be coded based on the second set of coding parameters.
- In an embodiment, perceptual model values may be determined from a video sequence. An index into a matrix may be computed based on the perceptual model values. The matrix may store associations between parameter range(s) and file sizes. A file size may be retrieved from the matrix corresponding to the computed index. The video sequence may be predictively coded with parameters associated with the file size.
- In an embodiment, a motion compensated error energy of a current frame of a video sequence may be computed. A weighted average motion compensated error energy of frames successive to the current frame in coding order may be computed. If the motion compensated error energy of the current frame is higher than the weighted average motion compensated error energy of the successive frames by a first factor, a weighted average motion compensated error energy of the current frame and frames adjacent to the current frame may be computed. If the weighted average motion compensated error energy of the current frame and the adjacent frames is higher than the weighted average motion compensated error energy of the successive frames by a second factor, the current frame may be marked as a random access pictures frame. The current frame may be predictively coded.
- In an embodiment, a frame from a video sequence may be marked as a delayed decoder refresh frame. Frames successive to the delayed decoder refresh frame in coding order may predictively coded without reference to frames preceding the delayed decoder refresh frame in coding order. The distance between the delayed decoder refresh frame and the successive frames may exceed a distance threshold.
- In an embodiment, frames successive to a current frame in decoding order may be decoded without reference to frames preceding the current frame in decoding order. The distance between the current frame and the successive frames may exceed a distance threshold.
-
FIG. 1 is a simplified block diagram of avideo coding system 100 according to an embodiment of the present invention. Thesystem 100 may include at least two terminals 110-120 interconnected via anetwork 150. For unidirectional transmission of data, afirst terminal 110 may code video data at a local location for transmission to theother terminal 120 via thenetwork 150. Thesecond terminal 120 may receive the coded video data of the other terminal from thenetwork 150, decode the coded data and display the recovered video data. Unidirectional data transmission is common in media serving applications and the like. -
FIG. 1 illustrates a second pair of 130, 140 provided to support bidirectional transmission of coded video that may occur, for example, during videoconferencing. For bidirectional transmission of data, eachterminals 130, 140 may code video data captured at a local location for transmission to the other terminal via theterminal network 150. Each 130, 140 also may receive the coded video data transmitted by the other terminal, may decode the coded data and may display the recovered video data at a local display device.terminal - In
FIG. 1 , the terminals 110-140 are illustrated as servers, personal computers and smart phones but the principles of the present invention are not so limited. Embodiments of the present invention find application with laptop computers, tablet computers, media players and/or dedicated video conferencing equipment. Thenetwork 150 represents any number of networks that convey coded video data among the terminals 110-140, including, for example, wireline and/or wireless communication networks. Thecommunication network 150 may exchange data in circuit-switched and/or packet-switched channels. Representative networks include telecommunications networks, local area networks, wide area networks and/or the Internet. For the purposes of the present discussion, the architecture and topology of thenetwork 150 are immaterial to the operation of the present invention unless explained hereinbelow. -
FIG. 2 is a functional block diagram of avideo coding system 200 according to an embodiment of the present invention. Thesystem 200 may include avideo source 210 that provides video data to be coded by thesystem 200, apre-processor 220, avideo coder 230, atransmitter 240 and acontroller 250 to manage operation of thesystem 200. - The
video source 210 may provide video to be coded by the rest of thesystem 200. In a media serving system, thevideo source 210 may be a storage device storing previously prepared video. In a videoconferencing system, thevideo source 210 may be a camera that captures local image information as a video sequence. Video data typically is provided as a plurality of individual frames that impart motion when viewed in sequence. The frames themselves typically are organized as a spatial array of pixels. - The pre-processor 220 may perform various analytical and signal conditioning operations on video data. The pre-processor 220 may parse input frames into color components (for example, luminance and chrominance components) and also may parse the frames into pixel blocks, spatial arrays of pixel data, which may form the basis of further coding. The pre-processor 220 also may apply various filtering operations to the frame data to improve efficiency of coding operations applied by a
video coder 230. - The
video coder 230 may perform coding operations on the video sequence to reduce the video sequence's bit rate. Thevideo coder 230 may include acoding engine 232, alocal decoder 233, areference picture cache 234, apredictor 235 and acontroller 236. Thecoding engine 232 may code the input video data by exploiting temporal and spatial redundancies in the video data and may generate a datastream of coded video data, which typically has a reduced bit rate as compared to the datastream of source video data. As part of its operation, thevideo coder 230 may perform motion compensated predictive coding, which codes an input frame predictively with reference to one or more previously-coded frames from the video sequence that were designated as “reference frames.” In this manner, thecoding engine 232 codes differences between pixel blocks of an input frame and pixel blocks of reference frame(s) that are selected as prediction reference(s) to the input frame. - In an embodiment, a
video coder 230 may be provided as a multi-pass coder in which portions of the video sequence may be coded iteratively in a trial-and-error manner using different coding parameters to achieve a target bit rate. Typically, the target bit rate represents a number of bits per unit time. Results of one coding pass may be exchanged with the pre-processor 220 to improve results of a subsequent coding pass. - The
local decoder 233 may decode coded video data of frames that are designated as reference frames. Operations of thecoding engine 232 typically are lossy processes. When the coded video data is decoded at a video decoder (not shown inFIG. 2 ), the recovered video sequence typically is a replica of the source video sequence with some errors. Thelocal decoder 233 replicates decoding processes that will be performed by the video decoder on reference frames and may cause reconstructed reference frames to be stored in thereference picture cache 234. In this manner, thesystem 200 may store copies of reconstructed reference frames locally that have common content as the reconstructed reference frames that will be obtained by a far-end video decoder (absent transmission errors). - The
predictor 235 may perform prediction searches for thecoding engine 232. That is, for a new frame to be coded, thepredictor 235 may search thereference picture cache 234 for image data that may serve as an appropriate prediction reference for the new frames. Thepredictor 235 may operate on a pixel block-by-pixel block basis to find appropriate prediction references. In some cases, as determined by search results obtained by thepredictor 235, an input frame may have prediction references drawn from multiple frames stored in thereference picture cache 234. - The
controller 236 may manage coding operations of thevideo coder 230, including, for example, selection of coding parameters to meet a target bit rate of coded video. Typically, video coders operate according to constraints imposed by bit rate requirements, quality requirements and/or error resiliency policies; thecontroller 236 may select coding parameters for frames of the video sequence in order to meet these constraints. For example, thecontroller 236 may assign coding modes and/or quantization parameters to frames and/or pixel blocks within frames. - The
transmitter 240 may buffer coded video data to prepare it for transmission to the far-end terminal (not shown). Thetransmitter 240 may merge coded video data from thevideo coder 230 with other data to be transmitted to the terminal, for example, coded audio data and/or ancillary data streams (sources not shown). - The
controller 250 may manage operation of thesystem 200. During coding, thecontroller 250 may assign to each frame a certain frame type (either of its own accord or in cooperation with the controller 236), which can affect the coding techniques that are applied to the respective frame. For example, frames often are assigned as one of the following frame types: -
- An Intra Frame (I frame) is one that is coded and decoded without using any other frame in the sequence as a source of prediction,
- A Predictive Frame (P frame) is one that is coded and decoded using earlier frames in the sequence as a source of prediction.
- A Bidirectionally Predictive Frame (B frame) is one that is coded and decoded using both earlier and future frames in the sequence as sources of prediction.
- Frames commonly are parsed spatially into a plurality of pixel blocks (for example, blocks of 4×4, 8×8 or 16×16 pixels each) and coded on a pixel block-by-pixel block basis. Pixel blocks may be coded predictively with reference to other coded pixel blocks as determined by the coding assignment applied to the pixel blocks' respective frames. For example, pixel blocks of I frames can be coded non-predictively or they may be coded predictively with reference to pixel blocks of the same frame (spatial prediction). Pixel blocks of P frames may be coded non-predictively, via spatial prediction or via temporal prediction with reference to one previously coded reference frame. Pixel blocks of B frames may be coded non-predictively, via spatial prediction or via temporal prediction with reference to one or two previously coded reference frames.
-
FIG. 3 is a simplified block diagram of avideo coding system 300 of another embodiment of the present invention, illustrating the operation of pixel-block coding operations. Thesystem 300 may include apre-processor 310, a block-basedcoder 320, areference frame decoder 330, areference picture cache 340, apredictor 350, a transmitbuffer 360 and acontroller 370. - The block-based
coder 320 may include asubtractor 321, atransform unit 322, aquantizer 323 and anentropy coder 324. Thesubtractor 321 may generate data representing a difference between the source pixel block and a reference pixel block developed for prediction. Thesubtractor 321 may operate on a pixel-by-pixel basis, developing residuals at each pixel position over the pixel block. Non-predictively coded blocks may be coded without comparison to reference pixel blocks, in which case the pixel residuals are the same as the source pixel data. - The
coder 320 may be provided as a multi-pass coder in which portions of the video sequence may be coded iteratively in a trial-and-error manner using different coding parameters to achieve a target bit rate. Results of one coding pass may be exchanged with the pre-processor 310 to improve results of a subsequent coding pass. - The
transform unit 322 may convert the source pixel block data to an array of transform coefficients, such as by a discrete cosine transform (DCT) process or a wavelet transform. Thequantizer unit 323 may quantize (divide) the transform coefficients obtained from thetransform unit 322 by a quantization parameter QP. Theentropy coder 324 may code quantized coefficient data by run-value coding, run-length coding or the like. Data from the entropy coder may be output to the channel as coded video data of the pixel block. Thereference frame decoder 330 may decode pixel blocks of reference frames and assemble decoded data for such reference frames. Decoded reference frames may be stored in thereference picture cache 340. - The
predictor 350 may generate and output prediction blocks to thesubtractor 321. Thepredictor 350 also may output metadata identifying type(s) of predictions performed. For inter-prediction coding, thepredictor 350 may search among the reference picture cache for pixel block data of previously coded and decoded frames that exhibits strong correlation with the source pixel block. When thepredictor 350 finds an appropriate prediction reference for the source pixel block, it may generate motion vector data that is output to the decoder as part of the coded video data stream. Thepredictor 350 may retrieve a reference pixel block from the reference cache that corresponds to the motion vector and may output it to thesubtractor 321. For intra-prediction coding, thepredictor 350 may search among the previously coded and decoded pixel blocks of the same frame being coded for pixel block data that exhibits strong correlation with the source pixel block. Operation of thepredictor 350 may be constrained by a mode selection provided by thecontroller 370. For example, if a controller selects an inter-coding mode for application to a frame, thepredictor 350 will be constrained to use inter-coding techniques. If the controller selects an inter-prediction mode for the frame, the predictor may select among inter-coding modes and intra-coding modes depending upon results of its searches. - A transmit
buffer 360 may accumulate metadata representing pixel block coding order, coded pixel block data and metadata representing coding parameters applied to the coded pixel blocks. The metadata can include prediction modes, motion vectors and quantization parameters applied during coding. Accumulated data may be formatted and transmitted to the channel. - A
controller 370 may manage coding of the source video, including selection of a coding mode for use by thepredictor 350 and selection of quantization parameters to be applied to pixel blocks. -
FIG. 4 illustrates amethod 400 to iteratively code a video sequence to achieve a target bitrate according to an embodiment of the present invention. A pre-processor may filter a portion of a source video sequence according to an initial set of coding parameters (box 410). A video coder may select coding parameters for the sequence portion based on an estimate of the target bit rate (box 420). The video coder may code the sequence portion according to the coding parameters (box 430). The video coder may then determine whether the coded video data obtained from coding satisfies the target bit rate (box 440). If so, the results of the coding operation may be terminated for the current portion of the sequence (box 490), and the coding operation may advance to another portion of the sequence, if available. Otherwise, the video coder may identify regions of the coded sequence that generate high bit rates (box 450). In response to the identified regions, the pre-processor may increase filtering strengths as applied to the regions in the source data (box 460) and operation may return tobox 420 for another coding pass. - The initial set of coding parameters utilized by the pre-processor (box 410) may be predetermined, may be a default set of parameters, or they may be derived from an analysis of the source video performed by the pre-processor and related controls. The coding parameters selected by the video coder may involve selections of prediction mode for frames within the sequence portion and quantization parameters applied to pixel blocks within the frames.
- Filtering and coding operations performed by
method 400 may vary among different regions of the video sequence. Typically, due to correlation among frames, the identified regions will persist across a common spatial region of multiple frames in a portion of the video sequence being coded. Accordingly, recursive operations of the video coder and pre-processor may be performed on a single frame of the video or they may be performed on a set of several frames (say, 10 frames) of the video sequence. - In an embodiment, when a video coder identifies a region with high bit rates (box 450), the video coder may adjust target bit rates of frames with the regions at the expense of frames that do not have such regions (box 470). When operation returns to box 420 for another coding pass, the video coder may select coding parameters corresponding to each frame's target bit rate. For example, the video coder may select relatively higher quantization parameters for frames that have high bit rate regions in prior passes, which tends to reduce bit rates of such frames as the expensive of lower coding quality.
- When a video sequence is coded, the sequence may be governed by a coding policy that imposes constraints on certain characteristics of the video sequence. For example, a constraint may limit a bit rate over the video sequence. Another constraint may limit the bit rate over a fixed window of frames. In another example, a constraint may define a minimum threshold on the visual quality of a window of frames. To adhere to the constraints, the video coder may adjust coding parameters such as a quantization parameter (QP) and mode selection. Specifically, the video coder may react to constraint breakages as they occur and manage the parameter through models which map QPs to characteristic levels. For example, for constraints on data rate, a QP-to-bits mapping model may be utilized and for constraints on visual quality, a QP-to-peak signal-to-noise ratio (QP-to-PSNR) mapping model may be utilized. However, this may lead to isolated frames which have noticeably worse characteristics than neighboring frames. In the case of constraints on data rate, one frame can be coded much smaller and have much poorer visual quality than surrounding frames, and in the case of constraints on visual quality, one frame can have much higher visual quality and be coded much bigger than surrounding frames, resulting in a poor viewing experience for the end-viewer.
-
FIG. 5 illustrates amethod 500 to code a video sequence based on information generated on a previous coding pass according to an embodiment. A video coder may code, on a first coding pass, a video sequence based on an initial set of coding parameters (box 510). A controller controlling the operations of the video coder may compute revised parameters for the video coder based on the first coding pass (box 520). The video coder may then re-code the video sequence again utilizing the revised parameters (box 530). - In an embodiment, to compute the revised parameters, during the first coding pass, attributes of frames affecting the coding parameters may be stored. Based on the stored attributes, frames which violate constraints may be identified. Then, the coding parameters for a subsequent coding iteration may be adjusted so that parameter changes from one frame to another are gradual, while simultaneously ensuring that constraints are not violated for any of the frames in the video sequence. For example, when a portion of the video sequence that violates one or more constraints is identified, a window of support may be developed to the beginning of a scene in which the constraint is violated when shaping a characteristic curve of the frames in that window. Thus, coding parameters may be adjusted smoothly to avoid sudden changes in visual quality within a scene.
- Given a first coding pass of a sequence of frames and a constraint, the characteristic values resulting from the first coding may be stored and transformed to form a list (called, for example, “targetCharacteristicCurveArray”) of desired characteristic values that satisfy the constraint. A second coding of the sequence of frames may generate (within a tolerance) the desired characteristic values as stored in targetCharacteristicCurveArray.
- In an embodiment, as the second coding progresses, the values in targetCharacteristicCurveArray for coded frames may be updated with the actual characteristic values, and values in targetCharacteristicCurveArray for future frames may be adjusted to reflect the actual values that are generated.
- A data rate constraint may be expressed as a pair of values: maximum data rate and fixed length window of presentation times over which to compute the data rate. In another example, a data rate constraint may be inferred from a model such as the Hypothetical Reference Decoder. A visual quality constraint may impose a minimum visual quality (given some metric, such as PSNR) over specified set of frames.
- A constraint may be the decoder complexity required to decode a video sequence. Another constraint may be the amount of heat dissipated by a decoder while decoding a video sequence (decoder thermal generation). A constraint may be the amount of energy utilized by a decoder to decode a video sequence (decoder power usage or battery drainage). Another constraint may be the visual quality of a video sequence in dark scenes. Still another constraint may be the quality degradation through visual masking.
- The set of frames over which a constraint is imposed need not be successive frames and can be imposed over all frames with a perceptual model score in a particular range, all frames with average luma in some range, etc. The perceptual model score may be based on the visual quality and/or video complexity of a video sequence. The perceptual model score may be computed from spatial, temporal, and spatiotemporal visual masking values derived by a pre-processor.
- The transformation applied to characteristic values resulting from the first coding pass may include scaling the values by a fixed constant. For example, for a data rate constraint, if a set of frames total A bits and is greater than a specified max of B bits, individual frame sizes can be scaled by B/A. Further, the scaling factor applied to each frame may be modulated by something simple as the relative size of the frame; it may also be modulated by the perceptual significance of the frame.
- In the case of multiple constraints, the transformation may be computed jointly or sequentially. For example, a set of frames may violate both data rate and visual quality constraints. A sequential transformation may update the targetCharacteristicCurveArray based on the data rate constraint and then on the visual rate constraint. A joint transformation may update the targetCharacteristicCurveArray on both constraints simultaneously, for example, by targeting a weighted average of both measures.
- In an embodiment, the
method 500 may be performed on portions of a video sequence rather than the entirety of a video sequence and may operate at various granularities such as a sequence of frames belonging to a common scene, a group of dependent frames, short clips, single frames, slices, and coding units (pixel blocks). - In an embodiment, a number of coding strategies can be employed to minimize the number of encodes for any frame, including the use of an adaptive QP-to-characteristic model. QP-to-characteristic models may include QP-to-bits and QP-to-PSNR as explained above.
- In an embodiment, analytical operations performed for curve shaping may be performed by a pre-processor. Management of the targetCharacteristicCurveArray and selection of coding parameters may be performed by controller(s) within the system.
- A video coder may receive a data rate value or a quality level as a control input along with a video sequence to determine the size of the output coded bitstream. Often, the data rate may be specified without regard to the content of the video; likewise, often, the quality level may be specified without regard to the resulting data rate.
-
FIG. 6 illustrates amethod 600 to estimate a file size of a video sequence according to an embodiment. Themethod 600 may scan the video sequence and compute perceptual model values therefrom (box 610), based on, for example, spatial, temporal, and spatiotemporal visual masking values derived by a pre-processor from content of the video sequence. Based on the computed values, themethod 600 may develop an index into a Data Rate/Quality matrix, taking into account the resolution and the frame rate of the video (box 620). Themethod 600 may then retrieve a file size estimate based on the information in the matrix (box 630). - Perceptual model values may be developed in a variety of ways. For example, a single number may be distilled from a number of values. In an embodiment, a single weighted value may be computed from, for example, spatial, temporal, and spatiotemporal visual masking values derived by a pre-processor from content of the video sequence. The
method 600 may be performed by a coder controller, which also may store the Data Rate/Quality matrix. - In an embodiment, the Data Rate/Quality matrix may store values representing optimum file sizes for video along a multi-dimensional parameter range. An example of a multi-dimensional parameter range may be a data rate range and a quality range, where the Data Rate/Quality matrix is dependent on the resolution, duration and frame rate of the video. The Data Rate/Quality matrix may also store file size values based on thermal output range, power utilization range, and decoder complexity range. In an embodiment, file size values stored in the matrix may be derived from operation of similar coders on other training sequences having their own perceptual model values, resolution, duration and frame rate and the file sizes generated by those coders.
- In an embodiment, the optimum file size may minimize the weighted sum of a quality degradation value and the resulting bit rate. The quality degradation value could include metrics for visual quality through a perceptual model over a noiseless channel and for a number of noisy channels. In other embodiments, the optimum file size may minimize the weighted sum of the resulting decoder complexity and the resulting decoder power/thermal output. In an embodiment, the coder may search for the optimum coding by sampling the specified Data Rate/Quality space.
- In an embodiment, the
method 600 may be performed on portions of a video sequence rather than the entirety of a video sequence and may operate at various granularities such as short clips, single frames, slices, and coding units (pixel blocks). For example, in the case of video capture, buffered frames can be processed to calculate the optimum file size. - When frames cannot be buffered (e.g. in a real-time application scenario) or only a small number of frames can be buffered, an accurate global optimum file size may be difficult to calculate. Therefore, a quality level that is predefined or calculated based on the history can be used to avoid spending excessive bits in frames that already reach acceptable quality. The bits saved can be used in later frames that require more bits to reach the quality level, or not used at all such that the bit rate of the entire stream can be reduced.
- Random access pictures (RAPs) represent another coding constraint of a system. RAPs facilitate random access among a bitstream. RAPs are placed where no natural scene changes exist, incurring bit rate spikes and often causing sudden changes in the visual quality in a scene (visual flashes). Inserting RAPs in the middle of scenes may be inefficient because RAPs are often coded as Instantaneous Decoder Refresh (IDR) frames. An IDR is a type of I-frame which forces the decoder to refresh its state immediately, guaranteeing that no state prior or subsequent (in decode order) to the IDR is necessary for decoding the IDR. This break in coding dependency causes the aforementioned bit rate spike in coding and visual flashes when the coded video is decoded and displayed.
- To minimize bit rate spikes and visual flashes, a frame may be identified as a RAP frame based on relative motion masking.
FIG. 7 illustrates avideo sequence 700 including a selectedRAP frame 710, according to an embodiment.Frame 710 from thevideo sequence 700 may be selected as a RAP frame if theframes 720 beforeframe 710 in coding order have a relatively high motion masking and theframes 730 afterframe 710 in coding order have a relatively low motion masking. - Motion masking may be computed as a weighted average over the video segment of motion compensated error energy (MCEE). The MCEE represents the amount of pixel changes between frames, along motion trajectories (motion vectors). The MCEE may be computed, for example, as a sum of absolute differences (SAD), sum of squared differences (SSD), etc., between successive frames in that video segment.
- In an embodiment, the motion masking levels may be determined by a pre-processor and/or a controller.
-
FIG. 8 illustrates amethod 800 to select a RAP frame according to an embodiment. Themethod 800 may compute the MCEE of the current frame and determine whether it exceeds a threshold (box 810). If it does, themethod 800 may compute a weighted average of motion compensated error energy (WMCEE) of the frames successive to the current frame (box 820). If the MCEE of the current frame exceeds the WMCEE of the successive frames by a first factor (830), themethod 800 may compute a WMCEE of the current frame and frames adjacent to the current frame (840). If the WMCEE of the current frame and adjacent frames exceed the WMCEE of the successive frames (box 850), the current frame may be selected as a RAP frame. Otherwise, themethod 800 may determine whether the next frame qualifies as a RAP frame. - In an embodiment, if multiple frames within a video sequence are identified as potential RAP frames by
method 800 and the potential RAP frames are within a proximity threshold to each other, not all potential RAP frames need to be selected as RAP frames. In an embodiment, the last potential RAP frame (in decoding order) can be selected as a RAP frame. In another embodiment, a subsampling of potential RAP frames may be selected as RAP frames. - In an embodiment, the highest motion masking video segment within a video sequence may be identified and a selected RAP frame may be inserted into that video segment.
- In an embodiment, RAPs within a scene may be re-used to reduce the bit rate overhead. A re-usable RAP frame may be defined as a “Delayed Decoder Refresh” (DDR) frame. A DDR frame may not force an immediate state refresh at a decoder but guarantees that state information from frames prior to the DDR frame (in decode order) is not necessary to decode the DDR frame itself or to decode frames subsequent to the DDR frame (in decode order) that are more than a specified number Ndelay of frames from the DDR frame. Thus, the DDR frame may be used as a reference frame for the frames immediately after it (in decode order) and as a RAP for the frames Ndelay+1 frames after it (in decode order).
- In an embodiment, multiple delays (delay0, delay1, . . . , delayX) may be associated with a single a DDR frame to indicate that the DDR frame may be used as a RAP for X+1 video segments. The frame at the beginning (in decode order) of the each such segment may include the appropriate delay value, Ndelay. In an embodiment, setting Ndelay to 0 may indicate that the DDR is to be inserted immediately as an IDR frame in the associated video segment.
- In an embodiment, information pertaining to the DDR frame such as the number Ndelay may be signaled in the channel data 260 (
FIG. 2 ) as part of syntax defining a DDR frame. - In an embodiment, a frame may specify the DDR frame which it needs as a reference frame. This may be done by giving each DDR frame an identifier which may be signaled in the bitstream using a specified number of bits.
- The foregoing discussion has described operation of the embodiments of the present invention in the context of coders and decoders. Commonly, video coders are provided as electronic devices. They can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors. Alternatively, they can be embodied in computer programs that execute on personal computers, notebook computers or computer servers. Similarly, decoders can be embodied in integrated circuits, such as application specific integrated circuits, field programmable gate arrays and/or digital signal processors, or they can be embodied in computer programs that execute on personal computers, notebook computers or computer servers. Decoders commonly are packaged in consumer electronics devices, such as gaming systems, DVD players, portable media players and the like and they also can be packaged in consumer software applications such as video games, browser-based media players and the like.
- Several embodiments of the invention are specifically illustrated and/or described herein. However, it will be appreciated that modifications and variations of the invention are covered by the above teachings and within the purview of the appended claims without departing from the spirit and intended scope of the invention.
Claims (36)
1. A video coding method, comprising:
selecting coding parameters of a video sequence based on a target bit rate;
predictively coding the video sequence based on the parameters; and
if the target bit rate is not achieved:
identifying regions of the video sequence with high bit rates,
increasing a filtering strength applied to the identified regions, and
predictively coding the video sequence with the increased filtering strength.
2. A video coding method, comprising:
predictively coding, on a first coding pass, a video sequence based on a first set of coding parameters;
storing values of a characteristic of frames from the video sequence during the first coding pass;
identifying frames which violate a constraint imposed on the characteristic based on the stored values;
determining target characteristic values for the frames from the video sequence, wherein the target characteristic values are lower than the constraint;
computing a second set of coding parameters to achieve the target characteristic values; and
predictively coding, on a second pass, the video sequence based on the second set of coding parameters.
3. The method of claim 2 , wherein the characteristic is at least one of data rate, visual quality, decoder complexity, decoder thermal generation, and decoder power usage.
4. The method of claim 2 , wherein the second set of coding parameters includes a quantization parameter.
5. A video coding method, comprising:
determining perceptual model values from a video sequence;
computing an index based on the perceptual model values into a matrix, wherein the matrix stores associations between at least one parameter range and file sizes;
retrieving a file size from the matrix corresponding to the computed index; and
predictively coding the video sequence with parameters associated with the file size.
6. The method of claim 5 , wherein the perceptual model values are determined from at least one of spatial and temporal visual masking values.
7. The method of claim 5 , wherein the at least one parameter range includes at least one of data rate range, resolution range, thermal output range, power utilization range, and decoder complexity range.
8. A video coding method, comprising:
determining a number of buffered frames from a video sequence available to calculate an optimum file size; and
if the number of available buffered frames is below a threshold, predictively coding the video sequence to achieve one of a predetermined video quality level or a video quality level determined from previous coding history.
9. A video coding method, comprising:
computing a motion compensated error energy of a current frame of a video sequence;
computing a weighted average motion compensated error energy of frames successive to the current frame in coding order;
if the motion compensated error energy of the current frame is higher than the weighted average motion compensated error energy of the successive frames by a first factor, computing a weighted average motion compensated error energy of the current frame and frames adjacent to the current frame;
if the weighted average motion compensated error energy of the current frame and the adjacent frames is higher than the weighted average motion compensated error energy of the successive frames by a second factor, marking the current frame as a random access pictures frame; and
predictively coding the current frame.
10. A video coding method, comprising:
marking a frame from a video sequence as a delayed decoder refresh frame; and
predictively coding frames successive to the delayed decoder refresh frame in coding order without reference to frames preceding the delayed decoder refresh frame in coding order, wherein the distance between the delayed decoder refresh frame and the successive frames exceeds a distance threshold.
11. The method of claim 10 , further comprising:
predictively coding the delayed decoder refresh frame without reference to the frames preceding the delayed decoder refresh frame in coding order.
12. The method of claim 10 , further comprising:
communicating the distance threshold to a decoder.
13. A decoding method, comprising:
decoding frames successive to a current frame in decoding order without reference to frames preceding the current frame in decoding order, wherein a distance between the current frame and the successive frames exceeds a distance threshold.
14. The method of claim 13 , wherein the current frame is marked by a coder as a delayed decoder refresh frame.
15. The method of claim 13 , further comprising:
decoding the current frame without reference to the frames preceding the current frame in decoding order.
16. A coding apparatus, comprising:
a controller to select coding parameters of a video sequence based on a target bit rate; and
a coding engine to:
predictively code the video sequence based on the parameters, and
if the target bit rate is not achieved:
identify regions of the video sequence with high bit rates, and
predictively code the video sequence with an increased filtering strength applied to the identified regions by a pre-processor.
17. A coding apparatus, comprising:
a coding engine to:
predictively code, on a first coding pass, a video sequence based on a first set of coding parameters, and
predictively code, on a second coding pass, the video sequence based on a second set of coding parameters;
a storage device to store values of a characteristic of frames from the video sequence during the first coding pass; and
a controller to:
identify frames which violate a constraint imposed on the characteristic based on the stored values,
determine target characteristic values for the frames from the video sequence, wherein the target characteristic values are lower than the constraint, and
compute the second set of coding parameters to achieve the target characteristic values.
18. The apparatus of claim 17 , wherein the characteristic is at least one of data rate, visual quality, decoder complexity, decoder thermal generation, and decoder power usage.
19. The apparatus of claim 17 , wherein the second set of coding parameters includes a quantization parameter.
20. A coding apparatus, comprising:
a pre-processor to determine perceptual model values from a video sequence;
a controller to:
compute an index based on the perceptual model values into a matrix, wherein the matrix stores associations between at least one parameter range and file sizes, and
retrieve a file size from the matrix corresponding to the computed index; and
a coding engine to predictively coding the video sequence with parameters associated with the file size.
21. The apparatus of claim 20 , wherein the perceptual model values are determined from at least one of spatial and temporal visual masking values.
22. The apparatus of claim 20 , wherein the at least one parameter range includes at least one of data rate range, resolution range, thermal output range, power utilization range, and decoder complexity range.
23. A coding apparatus, comprising:
a controller to:
determine a number of buffered frames from a video sequence available to calculate an optimum file size, and
determine if the number of available buffered frames is below a threshold; and
a coding engine to:
if the number of available buffered frames is below the threshold, predictively code the video sequence to achieve one of a predetermined video quality level or a video quality level determined from previous coding history.
24. A coding apparatus, comprising:
a controller to:
compute a weighted average motion compensated error energy of frames successive to the current frame in coding order,
if the motion compensated error energy of the current frame is higher than the weighted average motion compensated error energy of the successive frames by a first factor, compute a weighted average motion compensated error energy of the current frame and frames adjacent to the current frame, and
if the weighted average motion compensated error energy of the current frame and the adjacent frames is higher than the weighted average motion compensated error energy of the successive frames by a second factor, mark the current frame as a random access pictures frame; and
a coding engine to predictively code the current frame.
25. A coding apparatus, comprising:
a controller to mark a frame from a video sequence as a delayed decoder refresh frame; and
a coding engine to predictively code frames successive to the delayed decoder refresh frame in coding order without reference to frames preceding the delayed decoder refresh frame in coding order, wherein the distance between the delayed decoder refresh frame and the successive frames exceeds a distance threshold.
26. The coding apparatus of claim 25 , wherein the coding engine is further configured to predictively code the delayed decoder refresh frame without reference to the frames preceding the delayed decoder refresh frame in coding order.
27. The coding apparatus of claim 25 , further comprising:
a channel to communicate the distance threshold to a decoder.
28. A decoding apparatus, comprising:
a decoding engine to decode frames successive to a current frame in decoding order without reference to frames preceding the current frame in decoding order, wherein a distance between the current frame and the successive frames exceeds a distance threshold.
29. The decoding apparatus of claim 28 , wherein the current frame is marked by a coder as a delayed decoder refresh frame.
30. The decoding apparatus of claim 28 , wherein the decoding engine is further configured to decode the current frame without reference to the frames preceding the current frame in decoding order.
31. A storage device storing program instructions that, when executed by a processor, cause the processor to:
select coding parameters of a video sequence based on a target bit rate;
predictively code the video sequence based on the parameters; and
if the target bit rate is not achieved:
identify regions of the video sequence with high bit rates,
increase a filtering strength applied to the identified regions, and
predictively code the video sequence with the increased filtering strength.
32. A storage device storing program instructions that, when executed by a processor, cause the processor to:
predictively code, on a first coding pass, a video sequence based on a first set of coding parameters;
store values of a characteristic of frames from the video sequence during the first coding pass;
identify frames which violate a constraint imposed on the characteristic based on the stored values;
determine target characteristic values for the frames from the video sequence, wherein the target characteristic values are lower than the constraint;
compute a second set of coding parameters to achieve the target characteristic values; and
predictively code, on a second pass, the video sequence based on the second set of coding parameters.
33. A storage device storing program instructions that, when executed by a processor, cause the processor to:
determine perceptual model values from a video sequence;
compute an index based on the perceptual model values into a matrix, wherein the matrix stores associations between at least one parameter range and file sizes;
retrieve a file size from the matrix corresponding to the computed index; and
predictively code the video sequence with parameters associated with the file size.
34. A storage device storing program instructions that, when executed by a processor, cause the processor to:
compute a motion compensated error energy of a current frame of a video sequence;
compute a weighted average motion compensated error energy of frames successive to the current frame in coding order;
if the motion compensated error energy of the current frame is higher than the weighted average motion compensated error energy of the successive frames by a first factor, compute a weighted average motion compensated error energy of the current frame and frames adjacent to the current frame;
if the weighted average motion compensated error energy of the current frame and the adjacent frames is higher than the weighted average motion compensated error energy of the successive frames by a second factor, mark the current frame as a random access pictures frame; and
predictively code the current frame.
35. A storage device storing program instructions that, when executed by a processor, cause the processor to:
mark a frame from a video sequence as a delayed decoder refresh frame; and
predictively code frames successive to the delayed decoder refresh frame in coding order without reference to frames preceding the delayed decoder refresh frame in coding order, wherein the distance between the delayed decoder refresh frame and the successive frames exceeds a distance threshold.
36. A storage device storing program instructions that, when executed by a processor, cause the processor to:
decode frames successive to a current frame in decoding order without reference to frames preceding the current frame in decoding order, wherein a distance between the current frame and the successive frames exceeds a distance threshold.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US13/652,311 US20130235928A1 (en) | 2012-03-06 | 2012-10-15 | Advanced coding techniques |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201261607484P | 2012-03-06 | 2012-03-06 | |
| US13/652,311 US20130235928A1 (en) | 2012-03-06 | 2012-10-15 | Advanced coding techniques |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20130235928A1 true US20130235928A1 (en) | 2013-09-12 |
Family
ID=49114112
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/631,428 Expired - Fee Related US9432694B2 (en) | 2012-03-06 | 2012-09-28 | Signal shaping techniques for video data that is susceptible to banding artifacts |
| US13/652,311 Abandoned US20130235928A1 (en) | 2012-03-06 | 2012-10-15 | Advanced coding techniques |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US13/631,428 Expired - Fee Related US9432694B2 (en) | 2012-03-06 | 2012-09-28 | Signal shaping techniques for video data that is susceptible to banding artifacts |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US9432694B2 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150181208A1 (en) * | 2013-12-20 | 2015-06-25 | Qualcomm Incorporated | Thermal and power management with video coding |
| CN109040765A (en) * | 2018-07-25 | 2018-12-18 | 成都鼎桥通信技术有限公司 | The playing method and device of video data |
| US20190020810A1 (en) * | 2017-07-11 | 2019-01-17 | Hanwha Techwin Co., Ltd. | Apparatus for processing image and method of processing image |
| CN109479147A (en) * | 2016-07-14 | 2019-03-15 | 诺基亚技术有限公司 | Method and technique equipment for time interview prediction |
| US20250077532A1 (en) * | 2013-09-27 | 2025-03-06 | Lucas J. Myslinski | Apparatus, systems and methods for scoring and distributing the reliablity of online information |
| CN120416497A (en) * | 2025-07-01 | 2025-08-01 | 瀚博半导体(上海)有限公司 | Parallel decoding method, device and computer equipment for hardware decoder |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014171001A1 (en) * | 2013-04-19 | 2014-10-23 | 日立マクセル株式会社 | Encoding method and encoding device |
| US11477351B2 (en) | 2020-04-10 | 2022-10-18 | Ssimwave, Inc. | Image and video banding assessment |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090148058A1 (en) * | 2007-12-10 | 2009-06-11 | Qualcomm Incorporated | Reference selection for video interpolation or extrapolation |
| WO2011115045A1 (en) * | 2010-03-17 | 2011-09-22 | 株式会社エヌ・ティ・ティ・ドコモ | Moving image prediction encoding device, moving image prediction encoding method, moving image prediction encoding program, moving image prediction decoding device, moving image prediction decoding method, and moving image prediction decoding program |
| US20140146885A1 (en) * | 2011-07-02 | 2014-05-29 | Samsung Electronics Co., Ltd. | Method and apparatus for multiplexing and demultiplexing video data to identify reproducing state of video data |
Family Cites Families (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP1603338A4 (en) | 2003-03-10 | 2007-12-05 | Mitsubishi Electric Corp | VIDEO SIGNAL ENCODING DEVICE AND VIDEO SIGNAL ENCODING METHOD |
| US7394856B2 (en) | 2003-09-19 | 2008-07-01 | Seiko Epson Corporation | Adaptive video prefilter |
| US7697759B2 (en) * | 2004-05-11 | 2010-04-13 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | Split-remerge method for eliminating processing window artifacts in recursive hierarchical segmentation |
| SG130962A1 (en) | 2005-09-16 | 2007-04-26 | St Microelectronics Asia | A method and system for adaptive pre-filtering for digital video signals |
| US8126283B1 (en) | 2005-10-13 | 2012-02-28 | Maxim Integrated Products, Inc. | Video encoding statistics extraction using non-exclusive content categories |
| US8009963B2 (en) | 2006-01-26 | 2011-08-30 | Qualcomm Incorporated | Adaptive filtering to enhance video bit-rate control performance |
| WO2008085377A2 (en) * | 2006-12-28 | 2008-07-17 | Thomson Licensing | Banding artifact detection in digital video content |
| US8107571B2 (en) | 2007-03-20 | 2012-01-31 | Microsoft Corporation | Parameterized filters and signaling techniques |
| US7973977B2 (en) * | 2007-05-18 | 2011-07-05 | Reliance Media Works | System and method for removing semi-transparent artifacts from digital images caused by contaminants in the camera's optical path |
| JP2010532628A (en) * | 2007-06-29 | 2010-10-07 | トムソン ライセンシング | Apparatus and method for reducing artifacts in images |
| CN101855910B (en) | 2007-09-28 | 2014-10-29 | 杜比实验室特许公司 | Video compression and transmission techniques |
| JP5276170B2 (en) * | 2008-08-08 | 2013-08-28 | トムソン ライセンシング | Method and apparatus for detecting banding artifacts |
| WO2011081637A1 (en) | 2009-12-31 | 2011-07-07 | Thomson Licensing | Methods and apparatus for adaptive coupled pre-processing and post-processing filters for video encoding and decoding |
-
2012
- 2012-09-28 US US13/631,428 patent/US9432694B2/en not_active Expired - Fee Related
- 2012-10-15 US US13/652,311 patent/US20130235928A1/en not_active Abandoned
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20090148058A1 (en) * | 2007-12-10 | 2009-06-11 | Qualcomm Incorporated | Reference selection for video interpolation or extrapolation |
| WO2011115045A1 (en) * | 2010-03-17 | 2011-09-22 | 株式会社エヌ・ティ・ティ・ドコモ | Moving image prediction encoding device, moving image prediction encoding method, moving image prediction encoding program, moving image prediction decoding device, moving image prediction decoding method, and moving image prediction decoding program |
| US20130044813A1 (en) * | 2010-03-17 | 2013-02-21 | Ntt Docomo, Inc. | Moving image prediction encoding/decoding system |
| US20140146885A1 (en) * | 2011-07-02 | 2014-05-29 | Samsung Electronics Co., Ltd. | Method and apparatus for multiplexing and demultiplexing video data to identify reproducing state of video data |
Non-Patent Citations (3)
| Title |
|---|
| Boon et al. (WO 2011/115045 A1) translation from Espacenet. * |
| Chen et al., "Comments on Clean Decoding Refresh Pictures," JCTVC-E400, March 2011. * |
| Fujibayashi et al., "Random access support for HEVC," JCTVC-D234, January 2011. * |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250077532A1 (en) * | 2013-09-27 | 2025-03-06 | Lucas J. Myslinski | Apparatus, systems and methods for scoring and distributing the reliablity of online information |
| US20150181208A1 (en) * | 2013-12-20 | 2015-06-25 | Qualcomm Incorporated | Thermal and power management with video coding |
| WO2015094776A3 (en) * | 2013-12-20 | 2015-09-03 | Qualcomm Incorporated | Thermal and power management with video coding |
| US20160007024A1 (en) * | 2013-12-20 | 2016-01-07 | Qualcomm Incorporated | Thermal and power management with video coding |
| CN109479147A (en) * | 2016-07-14 | 2019-03-15 | 诺基亚技术有限公司 | Method and technique equipment for time interview prediction |
| US20190020810A1 (en) * | 2017-07-11 | 2019-01-17 | Hanwha Techwin Co., Ltd. | Apparatus for processing image and method of processing image |
| US10778878B2 (en) * | 2017-07-11 | 2020-09-15 | Hanwha Techwin Co., Ltd. | Apparatus for processing image and method of processing image |
| CN109040765A (en) * | 2018-07-25 | 2018-12-18 | 成都鼎桥通信技术有限公司 | The playing method and device of video data |
| CN120416497A (en) * | 2025-07-01 | 2025-08-01 | 瀚博半导体(上海)有限公司 | Parallel decoding method, device and computer equipment for hardware decoder |
Also Published As
| Publication number | Publication date |
|---|---|
| US20130235942A1 (en) | 2013-09-12 |
| US9432694B2 (en) | 2016-08-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20130235928A1 (en) | Advanced coding techniques | |
| US9215466B2 (en) | Joint frame rate and resolution adaptation | |
| AU2014275405B2 (en) | Tuning video compression for high frame rate and variable frame rate capture | |
| JP5351040B2 (en) | Improved video rate control for video coding standards | |
| EP3207701B1 (en) | Metadata hints to support best effort decoding | |
| USRE44457E1 (en) | Method and apparatus for adaptive encoding framed data sequences | |
| US9584832B2 (en) | High quality seamless playback for video decoder clients | |
| US9025664B2 (en) | Moving image encoding apparatus, moving image encoding method, and moving image encoding computer program | |
| US10574997B2 (en) | Noise level control in video coding | |
| US9888240B2 (en) | Video processors for preserving detail in low-light scenes | |
| US10721476B2 (en) | Rate control for video splicing applications | |
| US20180184089A1 (en) | Target bit allocation for video coding | |
| US20090074075A1 (en) | Efficient real-time rate control for video compression processes | |
| US9565404B2 (en) | Encoding techniques for banding reduction | |
| US12413738B2 (en) | Video encoding method and apparatus and electronic device | |
| US9451288B2 (en) | Inferred key frames for fast initiation of video coding sessions | |
| WO2012027892A1 (en) | Rho-domain metrics | |
| EP1204279A2 (en) | A method and apparatus for adaptive encoding framed data sequences | |
| US20150350688A1 (en) | I-frame flashing fix in video encoding and decoding | |
| CN117812268A (en) | Video transcoding method, device, equipment and medium | |
| CN120303925A (en) | Method and apparatus for video codec performance measurement and evaluation | |
| US8345992B2 (en) | Method and device of image encoding and image processing apparatus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: APPLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHUNG, CHRIS Y.;PAN, HAO;ZHAI, JIEFU;AND OTHERS;SIGNING DATES FROM 20121005 TO 20121015;REEL/FRAME:029131/0878 |
|
| STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |