[go: up one dir, main page]

WO2018164930A1 - Amélioration de la précision d'un vecteur de mouvement par partage d'informations croisées entre des vues normales et de zoom - Google Patents

Amélioration de la précision d'un vecteur de mouvement par partage d'informations croisées entre des vues normales et de zoom Download PDF

Info

Publication number
WO2018164930A1
WO2018164930A1 PCT/US2018/020414 US2018020414W WO2018164930A1 WO 2018164930 A1 WO2018164930 A1 WO 2018164930A1 US 2018020414 W US2018020414 W US 2018020414W WO 2018164930 A1 WO2018164930 A1 WO 2018164930A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
field
view
video
zoom
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2018/020414
Other languages
English (en)
Inventor
Kumar Ramaswamy
Jeffrey Allen Cooper
Louis Kerofsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vid Scale Inc
Original Assignee
Vid Scale Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vid Scale Inc filed Critical Vid Scale Inc
Publication of WO2018164930A1 publication Critical patent/WO2018164930A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/25Image signal generators using stereoscopic image cameras using two or more image sensors with different characteristics other than in their location or field of view, e.g. having different resolutions or colour pickup characteristics; using image signals from one sensor to control the characteristics of another sensor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • H04N19/194Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive involving only two passes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • User devices such as smartphones are equipped in some cases with multiple cameras with different resolutions and different fields of view.
  • light field cameras may consist of multiple conventional cameras or, in some cases, multiple camera sensors using shared optics.
  • Different cameras on a platform (such as a smartphone or a light field camera) are offered with different functionalities. For example, in a recent device from a major smartphone manufacturer, there are two cameras with different lens arrangements. One uses a 28 mm-equivalent lens (with f/1.8) and the other is a 56 mm-equivalent lens (with f/2.8). Both cameras can be functional simultaneously to capture video from the same scene in wide-angle or zoom views.
  • Exemplary embodiments disclosed herein relate to image motion estimation for video compression.
  • Motion estimation uses a search algorithm to explore and evaluate different candidate motion vectors.
  • complexity constraints lead to use of a fast search algorithm rather than an exhaustive full search.
  • Complexity limits can affect the level of compression, e.g. by limiting the maximum search range and/or by limiting the motion vector precision.
  • a video compression method may comprise receiving a first set of video frames with a first field of view and a first resolution, simultaneously receiving a second set of video frames with a second field of view and a second resolution different than the first resolution, wherein the second field of view is wider than the first field of view, and the first set of video frames is a set of zoom video frames.
  • the method may further comprise performing a motion search in the first set of video frames to produce a first motion field, compressing the first set of video frames using the first motion field, scaling the first motion field using the first resolution and the second resolution, to produce an estimated motion field, refining the estimated motion field to produce a refined motion field, and compressing the second set of video frames using the refined motion field.
  • portions of a video frame in the first set of video frames that correspond to edges of the second field of view may be spatially smoothed.
  • the refining may occur within second pass encoding of the second set of video frames.
  • the first motion field may be derived in a first pass encoding and prior to a second pass encoding of the first set of video frames.
  • the method may further comprise selecting from among multiple derived motion fields, wherein the first motion field is the selected motion field.
  • the first field of view may or may not be fully contained within the second field of view.
  • a video compression method may comprise receiving a first set of video frames with a first field of view and a first resolution, simultaneously receiving a second set of video frames with a second field of view and a second resolution different than the first resolution, wherein the first field of view is wider than the second field of view, and the second set of video frames is a set of zoom video frames.
  • the method may further comprise performing a motion search in the first set of video frames to produce a first motion field, compressing the first set of video frames using the first motion field, scaling the first motion field using the first resolution and the second resolution, to produce an estimated motion field, refining the estimated motion field to produce a refined motion field, and compressing the second set of video frames using the refined motion field.
  • refining may occur within second pass encoding of the second set of video frames or within first pass encoding of the second set of video frames.
  • the first motion field may be derived after in a first pass encoding and prior to a second pass encoding of the first set of video frames.
  • the method may further comprise selecting from among multiple derived motion fields, wherein the first motion field is the selected motion field.
  • the second field of view may or may not be fully contained within the first field of view.
  • FIG. 1 is an illustration of comparative pixel densities of a normal lens and a zoom lens according to a possible configuration used in some embodiments.
  • FIG. 2 is a functional block diagram of an exemplary system according to some embodiments using cross information to improve encoding in normal and zoom streams.
  • FIG. 3 is a functional block diagram of an exemplary system according to some embodiments using zoom video motion vectors to drive a normal (i.e., wider view).
  • FIG. 4 is a functional block diagram of an exemplary system according to some embodiments sharing information from the wider view encoder into the zoom view encoder.
  • FIG. 5 is a functional block diagram of an exemplary system according to some embodiments that uses wide view motion vectors to simplify calculations in the zoom view encoder path.
  • FIG. 6 is an illustration of a smartphone used as a two-camera video capture device according to some embodiments.
  • FIG. 7 is a block diagram of an exemplary method, according to some embodiments, of using cross information to improve encoding in normal and zoom streams.
  • FIG. 8 depicts an example wireless transmit/receive unit (WTRU) that may be used within some embodiments.
  • WTRU wireless transmit/receive unit
  • FIG. 9 depicts an exemplary network entity that may be used within a communication system in accordance with some embodiments.
  • Video cameras are increasingly equipped with multiple lens configurations.
  • the most common functionality offered by such an arrangement is to provide both a normal (i.e., a wider, relative to a zoomed capture, field of view) resolution and a zoom capture capability.
  • this functionality is of a fixed nature.
  • This arrangement enables 2x optical zoom functions, and is illustrated in FIG. 6 on a device 601 , with a wider view image 602 and a zoom image 603.
  • a dual-camera device may avail itself of a video coding technique referred to as multi-view coding (MVC).
  • MVC multi-view coding
  • multiple video sequences are simultaneously captured using an array of video cameras that are spatially separated. This usually presumes that the cameras are matched in resolution and frame rate. Then, information (such as motion estimates) can be shared between the video sequences to improve the quality of encoding in each of the views. Prediction across views is used in MVC, coupling the data between views in order to reduce the coded bitrate.
  • a primary stream may use conventional motion estimation while additional views of MVC may include disparity compensation, giving dependence on other views, in addition to motion compensation based on other frames of the same view.
  • scalable video coding In another video coding technique, known as scalable video coding (SVC), a single high resolution video is broken down into a hierarchy (two or more) of lower resolution videos, and encoded so that progressive refinement of the higher quality video can be achieved with a base encoded stream and an enhancement encoded stream corresponding to the hierarchical layers.
  • the different video streams consist of the same entire scene represented at different pixel densities produced from a single high resolution capture. During this process, motion information may be shared between the layers for improving the quality and/or efficiency of encoding.
  • FIG. 1 is an illustration 100 of comparative pixel densities of a normal lens and a zoom lens.
  • pixel densities of two cameras are different, providing different resolution.
  • FIG. 1 schematically represents pixel densities of two cameras - one with a 28 mm (or equivalent) lens and one with a 56 mm (or equivalent) lens.
  • a lower pixel density image 101 i.e. normal, wider angle view image
  • a lower pixel density image 101 is illustrated as containing a wide field of view that includes a zoomed view with a higher pixel density image 102.
  • both cameras have a similar image sensor (e.g.
  • the wider lens frame has a lower pixel capture density than the zoom camera in the areas of overlap of the two views. That is, although the pixel counts may be the same, lower pixel density image 101 spreads its set of pixels out over a wider field of view than does higher pixel density image 102, with its zoomed, narrower field of view. One image may correspond to a wider field of view than the other, giving different resolutions, even if the pixel count is the same for both images. In this arrangement, the image with the narrower field of view may be considered a zoom image. For each image, the resolution may be determined by the field of view and the number of pixels (i.e., the size of the pixel matrix), which gives the pixels per degree visual angle.
  • the zoom camera motion estimation calculations for higher pixel density image 102 will be of a finer nature than for lower pixel density image 101 , due to the differences in the densities of the pixel information. As illustrated in FIG. 1 , in that central overlap region, the zoom factor is approximately 2. Thus, the calculated native integer pixel based estimation for higher pixel density image 102 will be the equivalent of performing 1 ⁇ 2 sub-pel motion estimation in lower pixel density image 101. This applies both to intra-image and inter-image prediction.
  • an exemplary motion estimate vector 103 is within the field of view of both images 101 and 102.
  • motion estimate vector 103 is a generic motion vector which will have corresponding versions in each of images 101 and 102.
  • the calculated magnitude values of the corresponding vectors in each of images 101 and 102 will have the ratio tied to the zoom factor, as described above. In order to locate the specific vectors in the corresponding positions, though, image registration is generally necessary.
  • Image registration is the process of translating different sets of data (in this case, two different sets of pixels) into a common coordinate system.
  • Image registration may be accomplished any of multiple different methods, known in the art, that spatially transform one image to align with the other. Thus the location of a particular pixel or set of pixels in one image can be mapped to the properly-located pixel or set of pixels in the other image.
  • image registration may use relatively quicker algorithms, although it should be understood that any two images that can be registered can be used with the systems and inventive processes described herein, including images that do not fully overlap and are collected using cameras that are not in fixed relative positions.
  • motion estimate vector 103 (corresponding to motion estimate vector 103) produced by zoom view image 102 may result in finer precision than use of the motion vector obtained by a similar search on wider view image 101.
  • Use of the motion vector from a zoom view may thus be used to provide better-quality picture, via improved motion estimation, for a given bit rate or to improve the compression efficiency by providing the same or similar quality in the wider picture that is at a lower bit rate.
  • a wider (non-zoomed) view image has motion vectors that extend beyond the range explored by a motion search in a zoom view region, then in the presence of large motion, higher quality information may be obtained from the wider video encoding process and then used in the zoom encoding process.
  • a single motion estimate vector is shown, in general, there will be a motion search to produce a motion field (a set of multiple motion vectors) that can be used to compress video frames.
  • FIG. 2 is a functional block diagram of an exemplary system 200 according to some embodiments.
  • embodiments that uses cross information to improve encoding in the wide lens path (upper portion of the diagram) and the zoom lens path (lower portion of the diagram.
  • information of motion vectors that point beyond the range explored by the zoom motion search is fed to the zoom dual pass encoder 207b to improve the quality of the motion vectors for video containing large motion.
  • the relevant motion vectors from the wide view may be used to feed the first pass of the zoom view (thus simplifying or eliminating motion calculations in the first pass of the zoom view) as an initial estimate of motion vectors.
  • higher resolution (and thus more accurate) motion information 212 from the zoom coding path may be fed back to the wide path dual pass encoder 207a.
  • An illustrated device 201 is a dual camera device, having a normal (wider angle view) lens 202 and a zoom lens 203.
  • Device 201 may be similar to device 601 of FIG. 6, with one possible image produced through wide lens 202 corresponding to example wide view image 602 and one possible image produced through zoom lens 203 corresponding to example zoom image 603.
  • the compressed output 204 from wide lens 202 camera is sent to a decompression box 205, which produces a baseband video output 206 that is fed into a wide path dual pass encoder 207a.
  • the compressed output 208 from zoom lens 203 camera is sent to a decompression box 209, which produces a zoom baseband video output 210 that is fed into a zoom path dual pass encoder 207b.
  • Higher resolution motion output information 212 from the zoom region which may include motion estimate vectors (a motion field) is fed from dual pass encoder 207b into dual pass encoder 207a.
  • wide view (lower pixel density) motion output information 211 which may include motion estimate vectors (such as an equivalent of motion estimate vector 103 of FIG. 1) is fed from dual pass encoder 207a into dual pass encoder 207b.
  • One result of this advantageous process is the production of higher quality zoom view video output 213, from dual pass encoder 207b along the zoom path. Another result is that, as output 214 from dual pass encoder 207a along with wide path is passed through edge filtering 215 at the edges of the zoom region (i.e., the locations of where the edges of the image of zoom lens compressed output 208 register within the image of normal lens compressed output 204), a higher quality wide view video output 216 is achieved.
  • the above-described cross information process may thus help improve both encoder (207a and 207b) outputs.
  • FIG. 3 illustrates functional block diagram of an exemplary system 300 according to some embodiments that uses zoom video motion vectors in the encoding of the wide view.
  • System 300 illustrates a case of three choices:
  • Prediction vectors are derived directly from the output 304 of a zoom camera that already has a built-in compressor. (Option A of FIG. 3)
  • the content in the zoom chain is decompressed and recompressed with a two-pass encoder; the output 309 of the first pass re-encoder in the zoom chain may be used. (Option B of FIG. 3)
  • the content in the zoom chain is decompressed and recompressed with a two-pass encoder; the output 312 of the second pass of the two-pass encoder may be used. (Option C of FIG. 3)
  • the wide lens 202 camera output 204 may go through an optional decompression cycle in box 205, with that baseband output 301 sent to a first pass encoder 302a of a two- pass encoder (combination of first pass encoder 302a and a second pass encoder 310a).
  • the depression cycle is not used.
  • the first pass estimates the bit rate allocation for each of the frames.
  • the zoom lens 203 camera output 303 (which may have been compressed by the camera module) is split into derived motion field information 304, which is fed as Option A into a selector 305. It should be noted that prediction vectors may be a significant portion of the compressed bit stream.
  • selector 305 may be controlled by a user input 306 that permits selection of the three options (A, B and C) described above. In some embodiments, the input 306 may be automatically derived from instructions stored in system memory.
  • Baseband zoom lens video output 307 from the decompression cycle in box 209, (which may have been compressed by the camera module) is sent to a first pass encoder 302b of a two-pass encoder (combination of first pass encoder 302b and a second pass encoder 310b).
  • the first pass estimates the bit rate allocation for each of the frames.
  • the first pass encoder output 308 is split into derived motion field information 309, which is fed as Option B into selector 305; first pass encoder output 308 is also sent to second pass encoder 310b.
  • the output 311 from second pass encoder 310b is split into derived motion field information 312, which is fed as Option C into selector 305.
  • Zoom video output 311 is compressed with the motion field derived from the zoom view.
  • the output 313 of selector 305 is input to a motion vector scaling and temporal alignment process 314 to produce an estimated motion field.
  • Process 314 may perform image registration as part of the scaling process. Temporal alignment is used in the situation of motion video to synchronize vector information from the zoom path with the proper frame in the wide path.
  • the output 315 of process 314 is provided as a secondary input, along with output 316 from first pass encoder 302a, into second pass encoder 310a.
  • the wide video compression chain may directly, and beneficially, use some of the motion vector information coming in from the zoom lens path, so that the estimated motion field is transformed into a refined motion field.
  • output 317 from second pass encoder 310a is passed through edge filtering 318 at the edges of the zoom region (i.e., the locations of where the edges of the image of output 303 register within the image of output 204), the portions of the wide view video frames that correspond to edges of the zoom field of view are spatially smoothed.
  • Higher quality wide view video output 319 is achieved when compressed using the refined motion field.
  • the overall bit rate may be lower for the same quality, or a higher quality video may be achieved using the higher quality motion estimates (at the same bit rate).
  • FIG. 4 is a functional block diagram of an exemplary system 400, according to some embodiments, that shares information from a wider view encoder into a zoom view encoder.
  • the motion vectors to feed the zoom view encoding may be derived from one of three locations in the wide view marked D, E, and F in FIG. 4.
  • a user input 411 selection may determine which information option is to be used.
  • the input 411 may be automatically derived from instructions stored in system memory.
  • Method 400 may be useful when motion vectors are within the wide view, but (according to the registration information), are outside (i.e. beyond the edges) of the zoom view.
  • System 400 illustrates a case of three options for estimated motion vector use:
  • Prediction vectors are derived directly from the output 402 of a wide view camera that already has a built-in compressor. (Option D of FIG. 4)
  • the content in the wide view chain is decompressed and recompressed with a two-pass encoder; the output 407 of the first pass re-encoder in the wide view chain may be used. (Option E of FIG. 4)
  • the content in the wide view chain is decompressed and recompressed with a two-pass encoder; the output 409 of the second pass of the two-pass encoder may be used. (Option F of FIG. 4)
  • the wide lens 202 camera output 401 may split into derived motion information 402, which is fed as Option D into a selector 410. It should be noted that prediction vectors may be a significant portion of the compressed bit stream. As illustrated in FIG. 4, selector 410 may be controlled by a user input 411 that permits selection of the three options (D, E, and F) described above. Wide lens 202 camera output 401 may pass through an optional decompression cycle in box 205, with that baseband output 403 being sent to a first pass encoder 404a of a two-pass encoder (combination of first pass encoder 404a and a second pass encoder 406). In some embodiments, a decompression cycle is not used.
  • the first pass encoder 405 estimates the bit rate allocation for each of the frames.
  • the first pass encoder output 405 is sent to second pass encoder 406, and also split into derived motion information 407 that is fed as Option E into selector 410.
  • the output 408 from second pass encoder 406 is split into derived motion information 409, which is fed as Option F into selector 410.
  • Each of Options D, E and F is available at selector 410, with one or more selected using user input 411 , the output 412 of selector 410 is input to a motion vector scaling and temporal alignment process 413 to produce an estimated motion field.
  • Process 413 may perform image registration as part of the scaling process.
  • Temporal alignment is used in the situation of motion video to synchronize vector information from the wide view path with the proper frame in the zoom path.
  • the zoom lens 203 camera output 208 is sent through an optional decompression cycle in box 209. In some embodiments, a decompression cycle is not used.
  • Baseband output 307 from decompression cycle in box 209, is sent to a first pass encoder 404b of a two-pass encoder (combination of first pass encoder 404b and a second pass encoder 415).
  • Output 414 of process 413 is provided as a secondary input, along with output 416 from first pass encoder 404b, into second pass encoder 415.
  • the zoom video compression chain may directly, and beneficially, use some of the motion vector information coming in from the wide view lens chain to transform the estimated motion field into a refined motion field.
  • Output 417, from second pass encoder 415 may thus be higher quality and be compressed with the refined motion field.
  • FIG. 5 is a functional block diagram of an exemplary system 500, according to some embodiments, that uses wider view motion vectors to simplify calculations in the zoom view encoder path.
  • System 500 is an alternative variation of system 400 (of FIG. 4), in which motion vectors from the wide path use the relevant motion vector information (corresponding to the zoom region) in the first pass encoding of the zoom coder. This can significantly simply (or even eliminate) the need for motion vector calculations in the first pass.
  • Method 500 may be useful when motion vectors from the wide view are also (according to the registration information), within the zoom view.
  • System 500 uses the same three source options for estimated motion vectors as system 400, identified as D, E and F.
  • the wide lens 202 camera output 401 may split into derived motion information 402, which is fed as Option D into a selector 410.
  • Selector 410 may be controlled by a user input 501 that permits selection of the three options (D, E, and F) described above.
  • the input 501 may be automatically derived from instructions stored in system memory.
  • Wide lens 202 camera output 401 may pass through an optional decompression cycle in box 205, with that baseband output 403 being sent to a first pass encoder 505a of a two-pass encoder (combination of first pass encoder 505a and a second pass encoder 406).
  • a decompression cycle is not used.
  • the first pass encoder output 405 is sent to second pass encoder 406, and also split into derived motion information 407 that is fed as Option E into selector 410.
  • the output 408 from second pass encoder 406 is split into derived motion information 409, which is fed as Option F into selector 410.
  • Each of Options D, E and F is available at selector 410, with one or more selected using user input 501 , the output 412 of selector 410 is input to a motion vector scaling and temporal alignment process 502 to produce an estimated motion field.
  • Process 502 may perform image registration as part of the scaling process.
  • the zoom lens 203 camera output 208 may be sent through an optional decompression cycle in box 209.
  • Baseband output 504, from decompression cycle in box 209, is sent to a first pass encoder 5054b of a two-pass encoder (combination of first pass encoder 5054b and a second pass encoder 507). In some embodiments a decompression cycle is not used.
  • Output 503 of process 502 is provided as a secondary input, along with output 504 from the decompression cycle in box 209, into first pass encoder 505b.
  • This dual input arrangement can significantly simplify (or even eliminate) the need for motion vector calculations in first pass encoder 505b.
  • Output 506 from first pass encoder 505b is fed into second pass encoder 507, resulting in higher quality zoom output 508.
  • additional information about the scene may be shared across the views. For example, object detection may be run on one image (either the wide view image or the zoom image) and used to guide encoding in the other one of the images.
  • the images must be co-registered so that detected object location in one of the images can be mapped to the proper corresponding location in the other of the images.
  • Various scene behaviors such as flashes, scene cuts or general activity measures may also be used across the different images.
  • FIG. 6 is an illustration of an exemplary dual-camera smartphone device 601 according to some embodiments, showing wider view image 602 and zoom image 603.
  • Device 601 may be configured as any of the above-described embodiments.
  • device 601 may have two lenses: a 28 mm equivalent lens with f/1.8 and a 56 mm equivalent lens with f/2.8, although other sizes, configurations, and combinations may be used. This arrangement enables 2x optical zoom functions, and is illustrated in FIG. 6.
  • Embodiments disclosed herein operate to improve the coding quality and/or efficiency of video coding using information from a wide view video capture lens and one or more zoom view video capture lenses. These improvements may be made to either or both of the wide view and zoom images.
  • zoom image 603 may be entirely contained with wider view image 602, as determined by an image registration process.
  • zoom image 603 may only partially overlap with wider view image 602, as determined by the image registration process, such that only a portion of zoom image 603 is within wider view image 602, while the remainder of zoom image 603 is outside of the boundaries of wider view image 602.
  • the higher resolution capture from a zoom camera the zoom image 603 may be entirely contained with wider view image 602, as determined by an image registration process.
  • zoom image 603 may only partially overlap with wider view image 602, as determined by the image registration process, such that only a portion of zoom image 603 is within wider view image 602, while the remainder of zoom image 603 is outside of the boundaries of wider view image 602.
  • compression quality of the video produced from the wide camera may be improved.
  • motion estimation can be improved without significant increase in complexity.
  • Some embodiments make use of a camera with multiple lenses (in some cases with at least one having a telephoto or zoom view arrangement).
  • Estimated motion vectors for the zoom video may be used in the wide video motion estimation loop for encoding the non-zoom video. Such embodiments may improve the quality of the motion vector estimation and can result in a higher quality video compression at comparable bit rates (or the same quality at a lower bit rate).
  • extended motion information from the wide view camera is used to improve the motion estimation of the zoom view.
  • Some exemplary embodiments operate by making a motion vector found in one view available for use in encoding of the other view with the goal of advantageously affecting compression performance. Motion vectors from the zoomed view can contribute higher precision to the first view without need for subpixel motion estimation.
  • motion vectors from the wide view can describe motion outside of the range the zoomed view may ordinarily explore. Even if not used directly, the motion vector information from the alternate view may be used to narrow the motion search to reduce the complexity or conduct a more complete search around a seed value provided by the motion in the other view. By combining the effectively finer search due to the zoom view and the effectively wider search on the wide view motion estimation can be improved without significant increase in complexity.
  • Initial integer pixel motion vector searches on both the wide field of view and the zoomed image may be used to enhance the motion vector search of the opposite image.
  • an initial search at precision P is conducted on one of the wide view and zoom images over an image pixel matrix of size NxN.
  • Results of the initial search are used as input to a refined search on the opposite image; the corresponding position in the other matrix being calculated using co-registration of the two different images.
  • the NxN search on the wide image corresponds to a larger range, coarse, search on the zoom image.
  • a precision P search on the zoom image corresponds to a finer search on the wide image, over a limited range.
  • a first video of a scene is captured with a first camera on a device (e.g. a camera with a standard field of view).
  • the first video is represented as a plurality of frames, with each frame comprising a plurality of pixels having positions on a first grid.
  • the device captures a second video of the scene with a second camera on the device (e.g. a zoom camera).
  • the second video is represented as a plurality of frames, each frame comprising a plurality of pixels having positions on a second grid.
  • the positions in the second grid have corresponding positions in the first grid according to a predetermined mapping (based on the registration).
  • the predetermined mapping may be different for different settings of the optics (e.g. according to an internally-stored table of parameters).
  • the mapping is a linear function.
  • the mapping is a non-linear function that may, e.g. take into consideration different levels of pincushion or barrel distortion.
  • the second video may be encoded using block-based video coding in which at least some blocks are inter-coded using motion vectors.
  • At least one current block of pixels in a current frame of the second video is inter coded using an initial motion vector.
  • This initial motion vector from the second video is used in the encoding of the first video.
  • the method includes using the predetermined mapping to identify a current block of pixels in a current frame of the first video that corresponds to the current frame of the second video.
  • the method further includes transforming the initial motion vector to a transformed motion vector using the predetermined mapping (e.g. separately applying the mapping to the start and end points of the vector).
  • the current block of pixels in the current frame is intra coded using the transformed motion vector.
  • the transformed motion vector is used as a candidate motion vector in a search for a selected motion vector for encoding of the current block of pixels, and the current block of pixels is encoded using the selected motion vector.
  • FIG. 7 is a block diagram of an exemplary method 700, according to some embodiments, of using cross information to improve encoding in normal and zoom streams.
  • Method 700 can be used for video compression for video captured on a device supporting two cameras with different resolutions.
  • Method 700 starts in blocks 701 and 702, which may be performed simultaneously.
  • a first set (possibly a plurality) of video frames (images), perhaps video streams or still images, from a first camera is received at a first resolution and with a first field of view.
  • a second set (possibly a plurality) of video frames from a second camera is received at a second resolution and with a second field of view.
  • the received images may be captured from cameras on the device performing method 700, or external to the device.
  • Each of the frames comprises a plurality of pixels in a grid, specifically a pixel matrix.
  • the first camera has a wider field of view than the second camera, giving the different resolutions, even if the pixel count is the same for both of the cameras.
  • the second camera (corresponding to box 711) may be considered a zoom camera, because it has a narrower field of view than the first camera.
  • the resolution may be determined by the field of view and the number of pixels (i.e., the size of the pixel matrix), which gives the pixels per degree visual angle. These values may be different for the different cameras.
  • At least a portion of the images produced in box 711 will overlap at least a portion of the images produced in box 701 , although they may be entirely contained within.
  • the respective images are optionally compressed by the camera modules, and in boxes 703 and 713, motion estimate prediction vectors are optionally derived from the respective images (frames).
  • Generating prediction vectors may include conducting a motion search on the video frames to produce a motion field. If the images had been compressed, they are decompressed in boxes 704 and 714, respectively.
  • the respective image streams undergo first pass encoding in boxes 705 and 715, which may be a further source of prediction vectors.
  • the respective image streams then undergo second pass encoding in boxes 706 and 716, which may yet be a further source of prediction vectors.
  • the various prediction vector options are input into a selection process in box 720, which may take as input received user input from box 721.
  • the input 721 may be automatically derived from instructions stored in system memory.
  • the selected derived motion field (motion vectors) undergo a scale and temporal alignment process in box 722 to produce an estimated motion field.
  • This process may include image registration (alignment, scaling, shifting) and frame delays.
  • Image registration may use the known relative position of the cameras (compensating for displacement between the two camera image sensors), as well as relative zoom factors. Image registration may also calculate where the edges of the zoom images are located within the wider view images, and whether the zoom images are entirely contained within the wide view images or only partially overlap.
  • zoom or resolution information is used to scale the motion vectors upward or downward by an amount corresponding to the ratio of pixels per degree angle for the different images. That is, the motion vectors are scaled up or scaled down, based on whether they were extracted from the lower pixel density image or the higher pixel density image (for use in the other).
  • extracted motion field vectors may be extracted from the lower pixel density images and scaled for use as initial values for the higher pixel density images, or extracted motion field vectors may be extracted from the higher pixel density images and scaled for use as initial values for the lower pixel density images.
  • the resulting motion field information is provided as input to at least one of first pass encoding 715, second pass encoding 716, or second pass encoding 706, which transform the estimated motion field into a refined motion field.
  • a second motion search of the video frames may be conducted to estimate the motion field video frames processed in the box receiving the output of box 722.
  • the initial values are used to produce a refined motion field for the other video frames (than the video frames from which the initial values were derived or extracted).
  • the output of second pass encoding box 706 undergoes edge filtering in box 707, in which portions of the video frames corresponding to the edge of the field of view of the narrower (zoom) camera are spatially smoothed.
  • the edge filtering of box 707 may include information from the scale and temporal alignment process of box 722, specifically the image registration information.
  • the respective images are then compressed in boxes 708 and 718, possibly using the refined motion fields (or alternatively, the derived motion fields for that particular video stream).
  • the compression may be still image compression and/or motion picture compression, and can beneficially use the improved motion vectors to encode the images or video with higher quality results.
  • At least block of pixels in a first frame of one of the video streams is inter- coded using a reference block in a reference frame of the other video.
  • a predetermined mapping it is possible to determine the current block of pixels in the current frame of one video stream that corresponds to a candidate block of pixels in a reference frame of the other video stream.
  • the predetermined mapping corresponding positions of pixels within a first pixel grid (pixel matrix) can found within the other pixel grid.
  • a candidate block of pixels may be used for inter-coding the current block of pixels in the current frame, using an initial motion vector.
  • a wireless transmit/receive unit may be used as a dual-camera tablet or other mobile device in embodiments described herein.
  • FIG. 8 depicts an example WTRU 9001.
  • WTRU 9001 may include a processor 9003, a transceiver 9005, a transmit/receive element 9007, a speaker/microphone 9009, a keypad 9011 , a display/touchpad 9013, a non-removable memory 9015, a removable memory 9017, a power source 9019, a global positioning system (GPS) chipset 9021 , and other peripherals 9023.
  • Transceiver 9005 may be implemented as a component of decoder logic in communication interface 9025.
  • the transceiver 9005 and decoder logic within a communication interface 9025 may be implemented on a single LTE or LTE-A chip or other communications system protocol chip.
  • the decoder logic may include a processor operative to perform instructions stored in a non-transitory computer- readable medium.
  • the decoder logic may be implemented using custom and/or programmable digital logic circuitry.
  • Processor 9003 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Array (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • DSP digital signal processor
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Array
  • Processor 9003 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables WTRU 9001 to operate in a wireless environment.
  • Processor 9003 may be coupled to transceiver 9005, which may be coupled to transmit/receive element 9007. While FIG. 8 depicts processor 9003 and transceiver 9005 as separate components, processor 9003 and transceiver 9005 may be integrated together in an electronic package or chip.
  • Transmit/receive element 9007 may be configured to transmit signals to, or receive signals from, a base station over an air interface 9027.
  • transmit/receive element 9007 may be an antenna configured to transmit and/or receive RF signals.
  • transmit/receive element 9007 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, as examples.
  • transmit/receive element 9007 may be configured to transmit and receive both RF and light signals.
  • Transmit/receive element 9007 may be configured to transmit and/or receive any combination of wireless signals.
  • transmit/receive element 9007 is depicted in FIG. 8 as a single element
  • WTRU 9001 may include any number of transmit/receive elements 9007. More specifically, WTRU 9001 may employ MIMO technology. Thus, in some embodiments, WTRU 9001 may include two or more transmit/receive elements 9007 (e.g., multiple antennas) for transmitting and receiving wireless signals over air interface 9027. Transceiver 9005 may be configured to modulate the signals that are to be transmitted by transmit/receive element 9007 and to demodulate the signals that are received by transmit/receive element 9007. As noted above, WTRU 9001 may have multi-mode capabilities. Thus, transceiver 9005 may include multiple transceivers for enabling the WTRU 9001 to communicate via multiple RATs, such as UTRA and IEEE 802.11 , as examples.
  • RATs such as UTRA and IEEE 802.11
  • Processor 9003 of WTRU 9001 may be coupled to, and may receive user input data from, speaker/microphone 9009, keypad 9011 , and/or display/touchpad 9013 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit). Processor 9003 may also output user data to speaker/microphone 9009, keypad 9011 , and/or display/touchpad 9013. In addition, processor 9003 may access information from, and store data in, any type of suitable memory, such as non-removable memory 9015 and/or removable memory 9017.
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • Non-removable memory 9015 may include random-access memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
  • Removable memory 9017 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
  • SIM subscriber identity module
  • SD secure digital
  • Non-removable memory 9015 and removable memory 9017 both comprise non- transitory computer-readable media.
  • processor 9003 may access information from, and store data in, memory that is not physically located on the WTRU 9001 , such as on a server or a home computer (not shown).
  • Processor 9003 may receive power from power source 9019, and may be configured to distribute and/or control the power to the other components in WTRU 9001.
  • Power source 9019 may be any suitable device for powering WTRU 9001.
  • power source 9019 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), and the like), solar cells, fuel cells, and the like.
  • Processor 9003 may also be coupled to GPS chipset 9021 , which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of WTRU 9001.
  • WTRU 9001 may receive location information over air interface 9027 from a base station and/or determine its location based on the timing of the signals being received from two or more nearby base stations. WTRU 9001 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
  • Processor 9003 may further be coupled to other peripherals 9023, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • peripherals 9023 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, and the like.
  • peripherals 9023 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module,
  • FIG. 9 depicts an exemplary network entity 9101 that may be used within embodiments of systems described herein.
  • a network entity 9101 includes a communication interface 9103, a processor 9105, and non-transitory computer-readable data storage 9107, all of which are communicatively linked by a bus, network, or other communication path 9109.
  • Communication interface 9103 may include one or more wired communication interfaces and/or one or more wireless-communication interfaces. With respect to wired communication,
  • communication interface 9103 may include one or more interfaces such as Ethernet interfaces, as an example.
  • communication interface 9103 may include components such as one or more antennae, one or more transceivers/chipsets designed and configured for one or more types of wireless (e.g., LTE) communication, and/or any other components deemed suitable by those of skill in the relevant art.
  • communication interface 9103 may be equipped at a scale and with a configuration appropriate for acting on the network side, rather than the client side, of wireless communications (e.g., LTE communications, Wi-Fi communications, and the like).
  • communication interface 9103 may include the appropriate equipment and circuitry (including multiple transceivers) for serving multiple mobile stations, UEs, or other access terminals in a coverage area.
  • Processor 9105 may include one or more processors of any type deemed suitable by those of skill in the relevant art, some examples including a general-purpose microprocessor and a dedicated DSP.
  • Data storage 9107 may take the form of any non-transitory computer-readable medium or combination of such media, some examples including flash memory, read-only memory (ROM), and random-access memory (RAM) to name but a few, as any one or more types of non-transitory data storage deemed suitable by those of skill in the relevant art may be used.
  • data storage 9107 contains program instructions 9111 executable by processor 9105 for carrying out various combinations of the various network-entity functions described herein.
  • network-entity functions described herein may be carried out by a network entity having a structure similar to that of network entity 9101 of FIG. 9.
  • one or more of such functions are carried out by a set of multiple network entities in combination, where each network entity has a structure similar to that of network entity 9101 of FIG. 9.
  • other network entities and/or combinations of network entities may be used in various embodiments for carrying out the network-entity functions described herein, as the foregoing list is provided by way of example and not by way of limitation.
  • modules various hardware elements of one or more of the described embodiments are referred to as "modules" that carry out (perform or execute) various functions that are described herein in connection with the respective modules.
  • a module includes hardware (e.g., one or more processors, one or more microprocessors, one or more microcontrollers, one or more microchips, one or more application-specific integrated circuits (ASICs), one or more field programmable gate arrays (FPGAs), one or more memory devices) deemed suitable by those of skill in the relevant art for a given
  • Each described module may also include instructions executable for carrying out the one or more functions described as being carried out by the respective module, and those instructions may take the form of or include hardware (hardwired) instructions, firmware instructions, software instructions, and/or the like, and may be stored in any suitable non-transitory computer-readable medium or media, such as commonly referred to as RAM or ROM.
  • ROM read only memory
  • RAM random access memory
  • register cache memory
  • semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • a processor in association with software may be used to implement a radio frequency transceiver for use in a WTRU, UE, terminal, base station, RNC, or any host computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne des systèmes et des procédés de codage vidéo pour des dispositifs équipés de deux caméras vidéo, en particulier lorsque l'une des caméras vidéo est une caméra à focale variable. Des vidéos d'une scène sont capturées simultanément par deux caméras vidéo. Des informations de mouvement (telles qu'un champ de mouvement et/ou des vecteurs de mouvement) collectées dans un flux vidéo sont utilisées pour le codage de l'autre. Par exemple, un vecteur de mouvement d'une vidéo peut être transformé en une grille de l'autre vidéo. Le vecteur de mouvement transformé peut être utilisé pour prédire un bloc de pixels dans l'autre vidéo ou il peut être utilisé comme point candidat ou point de départ dans un algorithme pour sélectionner un vecteur de mouvement. La transformation du vecteur de mouvement peut comprendre l'alignement et la mise à l'échelle du vecteur, ou d'autres transformations linéaires ou non linéaires peuvent être utilisées.
PCT/US2018/020414 2017-03-08 2018-03-01 Amélioration de la précision d'un vecteur de mouvement par partage d'informations croisées entre des vues normales et de zoom Ceased WO2018164930A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762468779P 2017-03-08 2017-03-08
US62/468,779 2017-03-08

Publications (1)

Publication Number Publication Date
WO2018164930A1 true WO2018164930A1 (fr) 2018-09-13

Family

ID=61899342

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/020414 Ceased WO2018164930A1 (fr) 2017-03-08 2018-03-01 Amélioration de la précision d'un vecteur de mouvement par partage d'informations croisées entre des vues normales et de zoom

Country Status (1)

Country Link
WO (1) WO2018164930A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12483714B2 (en) 2023-04-03 2025-11-25 Axis Ab Encoding of video stream during changing camera field-of-view

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008014350A1 (fr) * 2006-07-25 2008-01-31 Qualcomm Incorporated Dispositif de capture d'images stéréo et de vidéo avec des capteurs numériques doubles et procédés d'utilisation de ceux-ci
US20090290641A1 (en) * 2008-05-22 2009-11-26 Microsoft Corporation Digital video compression acceleration based on motion vectors produced by cameras
US20110229109A1 (en) * 2010-03-18 2011-09-22 Canon Kabushiki Kaisha Chapter information creation apparatus and control method therefor
WO2013073107A1 (fr) * 2011-11-14 2013-05-23 Sony Corporation Affichage d'images dans un moyen de saisie d'images tridimensionnels utilisé dans un mode de saisie bidimensionnel
US20130322517A1 (en) * 2012-05-31 2013-12-05 Divx, Inc. Systems and Methods for the Reuse of Encoding Information in Encoding Alternative Streams of Video Data
WO2014190308A1 (fr) * 2013-05-24 2014-11-27 Sonic Ip, Inc. Systèmes et procédés pour coder de multiples flux vidéo par quantification adaptative pour diffusion en flux à débit binaire adaptatif
US20160007008A1 (en) * 2014-07-01 2016-01-07 Apple Inc. Mobile camera system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008014350A1 (fr) * 2006-07-25 2008-01-31 Qualcomm Incorporated Dispositif de capture d'images stéréo et de vidéo avec des capteurs numériques doubles et procédés d'utilisation de ceux-ci
US20090290641A1 (en) * 2008-05-22 2009-11-26 Microsoft Corporation Digital video compression acceleration based on motion vectors produced by cameras
US20110229109A1 (en) * 2010-03-18 2011-09-22 Canon Kabushiki Kaisha Chapter information creation apparatus and control method therefor
WO2013073107A1 (fr) * 2011-11-14 2013-05-23 Sony Corporation Affichage d'images dans un moyen de saisie d'images tridimensionnels utilisé dans un mode de saisie bidimensionnel
US20130322517A1 (en) * 2012-05-31 2013-12-05 Divx, Inc. Systems and Methods for the Reuse of Encoding Information in Encoding Alternative Streams of Video Data
WO2014190308A1 (fr) * 2013-05-24 2014-11-27 Sonic Ip, Inc. Systèmes et procédés pour coder de multiples flux vidéo par quantification adaptative pour diffusion en flux à débit binaire adaptatif
US20160007008A1 (en) * 2014-07-01 2016-01-07 Apple Inc. Mobile camera system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
EDOUARD FRANCOIS ET AL: "Extended Spatial Scalability : A Generalization of Spatial Scalability for Non Dyadic Configurations", IMAGE PROCESSING, 2006 IEEE INTERNATIONAL CONFERENCE ON, IEEE, PI, 1 October 2006 (2006-10-01), pages 169 - 172, XP031048600, ISBN: 978-1-4244-0480-3 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12483714B2 (en) 2023-04-03 2025-11-25 Axis Ab Encoding of video stream during changing camera field-of-view

Similar Documents

Publication Publication Date Title
US10244167B2 (en) Apparatus and methods for image encoding using spatially weighted encoding quality parameters
EP3306925B1 (fr) Dispositif de traitement d'image et procédé de traitement d'image
US9167224B2 (en) Image processing device, imaging device, and image processing method
US20160191759A1 (en) Method and system of lens shift correction for a camera array
CN111279673A (zh) 具有电子卷帘快门校正的图像拼接
WO2018164932A1 (fr) Codage de grossissement utilisant de multiples captures de caméras simultanées et synchrones
EP3703372B1 (fr) Procédé et appareil de prédiction inter-trames et dispositif terminal
WO2017176400A1 (fr) Procédé et système de codage vidéo utilisant un masque de correction de données d'image
US9967586B2 (en) Method and apparatus of spatial motion vector prediction derivation for direct and skip modes in three-dimensional video coding
US12309386B2 (en) Inter prediction method and apparatus, and corresponding encoder and decoder that reduce redundancy in video coding
US10356417B2 (en) Method and system of video coding using projected motion vectors
CN103096098B (zh) 成像装置
WO2019157427A1 (fr) Traitement d'image
CN112585962A (zh) 用于形成大视点变化的扩展焦平面的方法和系统
WO2018017599A1 (fr) Système et procédé d'évaluation de qualité pour vidéo à 360 degrés
WO2018164930A1 (fr) Amélioration de la précision d'un vecteur de mouvement par partage d'informations croisées entre des vues normales et de zoom
US12008776B2 (en) Depth map processing
US12316844B2 (en) 3D point cloud enhancement with multiple measurements
US20250337952A1 (en) Providing segmentation information for immersive video
US20240095962A1 (en) Image data re-arrangement for improving data compression effectiveness
WO2022023157A1 (fr) Dématriçage par de multiples rotations de caméra
WO2018222532A1 (fr) Suivi d'objets d'intérêt par objectif zoom sur la base d'une identification par radiofréquence (rfid)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18715830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18715830

Country of ref document: EP

Kind code of ref document: A1