[go: up one dir, main page]

WO2025108779A1 - 3d motion maps for compression of time varying mesh textures - Google Patents

3d motion maps for compression of time varying mesh textures Download PDF

Info

Publication number
WO2025108779A1
WO2025108779A1 PCT/EP2024/082036 EP2024082036W WO2025108779A1 WO 2025108779 A1 WO2025108779 A1 WO 2025108779A1 EP 2024082036 W EP2024082036 W EP 2024082036W WO 2025108779 A1 WO2025108779 A1 WO 2025108779A1
Authority
WO
WIPO (PCT)
Prior art keywords
mesh
motion
map
sequence
texture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2024/082036
Other languages
French (fr)
Inventor
Jean-Eudes Marvie
Gurdeep BHULLAR
Franck Galpin
Olivier Mocquard
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital CE Patent Holdings SAS
Original Assignee
InterDigital CE Patent Holdings SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by InterDigital CE Patent Holdings SAS filed Critical InterDigital CE Patent Holdings SAS
Publication of WO2025108779A1 publication Critical patent/WO2025108779A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • the present principles generally relate to the domain of encoding, decoding and rendering volumetric videos represented as varying 3D meshes, varying texture images and geometry motion information.
  • the present principles relate to lowering the complexity of encoding and decoding of a dynamic texture images for transmission over a network.
  • Recent volumetric video encoders and decoders propose a motion detection stage where two variants of a mesh are candidate for a frame: the original base mesh M ⁇ (i) or a base mesh using the topology of the previous frame M ⁇ i).
  • the second case encoding of the vertex motion is used, and the topology is skipped since the one from the previous frame is used.
  • time varying texture images do not benefit from motion encoding.
  • TVTM Time Varying Textured Mesh
  • the present principles relate to a method for encoding a 3D mesh of a sequence of 3D meshes in a data stream.
  • the method comprises deriving a motion map from the 3D mesh according to a different 3D mesh of the sequence.
  • This different 3D mesh may be the previous 3D mesh in the sequence or the I reference 3D mesh or any other already encoded one.
  • a mapping is generated between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table.
  • QP quantization parameters
  • a texture map is generated by using the per block QP table.
  • the 3D mesh and the texture map are encoded in the data stream.
  • deriving the motion map is performed from a per triangle metadata indicating surface areas to be coded with highest quality.
  • the present principles also relate to a device comprising a processor and a memory associated with the processor that is configured to implement the method above.
  • the present principles also relate to a method that comprises decoding a 3D mesh from a data stream.
  • This different 3D mesh may be the previous 3D mesh in the sequence or the I reference 3D mesh or any other already encoded one.
  • a mapping is generated between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table.
  • QP quantization parameters
  • the present principles also relate to a device comprising a processor and a memory associated with the processor that is configured to implement the method above.
  • - Figure 1 shows an example of an original mesh frame together with the corresponding UV atlas of patches (i.e. a parameterization of mesh) and the associated texture map with inter-patch padding (i.e. color gradients filling empty spaces between patches);
  • - Figure 2 illustrates a base mesh with a new UV atlas and the texture map obtained after the attribute transfer;
  • FIG. 3 shows an example architecture of a device which may be configured to implement encoding and/or decoding methods according to an embodiment of the present principles
  • FIG. 4 shows an example of an embodiment of the syntax of a data stream encoding a volumetric video as a time varying textured mesh according to the present principles
  • each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s).
  • the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.
  • Figure 1 shows an example of an original mesh 10 frame together with the corresponding UV atlas 11 of patches (i.e. a parameterization of mesh 10) and the associated texture map 12 with inter-patch padding (i.e. color gradients filling empty spaces between patches).
  • the UV parameterization provides a link between the mesh vertices in 3D space and the color values in the 2D space of the texture map.
  • the padding is usually added to help the spatial coding of the texture map, especially at the edges of the patches. Padding also leads to better rendering of the object during consumption of the content.
  • the attributes like texture map 12 are transferred like texture map 12 are transferred. Other attributes like normal, transparency or reflectance can be obtained and are transferred at the same step.
  • FIG 2 illustrates a base mesh 20 with a new UV atlas 21 and the texture map 22 obtained after the attribute transfer.
  • Base mesh 20 is obtained either by decimation of an original mesh or by applying motion to a base mesh of previous frame (mesh 10 of Figure 1 in the example).
  • Texture map 22 also includes a padding applied after the attribute transfer.
  • a reconstructed deformed mesh 23 is obtained after a tessellation of base mesh 20 and the displacement of its vertices using wavelet transforms.
  • Corresponding UV map 24 may be obtained.
  • Texture map 25 is texture map 22 of the base mesh.
  • a time varying textured mesh (TVTM) codec is proposed to enhance compression gains, by introducing a “3D motion” guided video encoding of the animated texture map (the dynamic photometric attributes) of the animated mesh (the dynamic geometry).
  • a 3D motion map that can be derived into different types of maps such as motion amplitude is generated by exploiting the 3D motion information from the base mesh encoder.
  • Those maps have the property of being a bij ection mapping between the obj ect texture map and the motion texture map.
  • 3D motion does not represent the full motion of the pixels in the texture video. Indeed, some movement over the texture frames can be due to the mesh deformation and some movement can be intrinsic to the texture and due, for instance, to lighting evolutions (e.g.
  • the object texture atlas might contain some motion due to geometry evolution and some due to photometry evolutions. Consequently, the 3D movement cannot be directly used as a motion estimator for the video coding.
  • This technique is used to reduce quality of texture where object motion is important. Indeed, when an object moves fast, a blur can be tolerated at the rendering, so the quality of texture can be reduced.
  • the amplitude of the 3D motion is interpreted in order to build an importance map that is transformed in a quantization parameter map (QP map) provided the video texture encoder/decoder.
  • QP map quantization parameter map
  • the video encoder and decoder are able to ingest per block QP table, each block being encoded or decoded using its associated specific QP, thus not uniformly on the entire frame.
  • an encoding method using motion map as importance map to decide the QP map is proposed.
  • the motion map is derived from the 3D mesh.
  • this first step is performed from a per triangle metadata indicating surface areas to be coded with highest quality (e.g. to preserve quality faces or hands).
  • a 3D mapping between motion amplitude and QP is managed to obtain a per block QP table.
  • the texture map is encoded by using the QP table.
  • a normative method at decoder side is proposed.
  • the motion map is derived from the 3D mesh, from a per triangle metadata indicating surface areas to be coded with highest quality (like face or hands) in a variant.
  • a mapping between motion amplitude and QP is setup to obtain a per block QP table as in encoding stage.
  • the texture map is decoded by using the QP table.
  • the video encoder/decoder that is used to encode the texture map can ingest a per-frame information (i.e. the QP map) that associates each block of the video with a specific QP value instead of using a global QP for the whole frame and the whole video.
  • This information is ingested by the encoder to perform the compression and by the decoder to perform the decompression.
  • This approach allows that the per frame QP map (i.e., the per block QP metadata) is not encoded in the bitstream since it is regenerated at the decoding prior to the texture map decoding stage which uses it.
  • each new frame of a volumetric video (mesh + texture map)
  • the current frame is detected as static it is considered as an Intra-frame and a motion map all zeroed (no motion) is generated but for the padding areas which are set to max motion (no need to preserve quality for padding areas).
  • the current frame is detected as dynamic, it is considered as an inter-frame and a motion map is generated by using the motion data computed by the 3D geometry codec.
  • the motion map is then converted to an importance map.
  • the importance map is then down sampled to obtain one importance value per video block.
  • the importance value per video block is converted into a 2D QP table (one QP per video block) and is ingested by the video encoder that is used to adaptively encode the animated texture maps. Similar steps are performed at the decoding, using the decoded 3D geometry to reconstruct the 2D QP table and decode the texture map adaptively.
  • an extension is proposed, using marked faces, to force high texture quality on specific areas of the mesh surface.
  • the generated map is a 2D array which dimensions may be identical (not mandatory as seen later) to the texture images.
  • Each pixel (i,y) of this motion map contains an unquantized 3D vector in which a 3D motion information is stored.
  • the base mesh is coded through a set of motion vectors, each vector modifying a vertex of the base mesh from a previous frame (e.g. the previous frame or the reference I frame).
  • the rest of the mesh is defined by using the topology and the texture UV coordinates from the I frame.
  • the UV coordinates of the base mesh are used to reconstruct the motion map and the UV coordinates of the tessellated (fully reconstructed) model to interpolate missing 3D motion information for the pixels of projected triangles where no vertex projection falls.
  • Figure 5 shows midpoint subdivision and motion interpolation.
  • a triangle ml, m2, m3 of the base-mesh is retrieved.
  • Mid-point tessellation m’31, m’ 12, m’23 with motion interpolation is introduced.
  • Figure 8 shows two steps subdivision. Motion is reconstructed for all the vertices that comes from the base mesh tessellation (called mesh subdivision) by interpolating the motion vectors associated with the vertices of the base mesh.
  • mesh subdivision the base mesh is tessellated (subdivided) using a mid-point recursive subdivision scheme (other subdivision schemes may be used so motion vectors are adapted).
  • the system uses UV coordinates of the vertices to interpolate new UV coordinates at the newly generated vertices according to equation Eql .
  • uv, and uv 2 are vectors representing the 2D coordinates in the UV space at base mesh triangle vertices.
  • a similar method is used to generate some interpolated motions at the newly generated vertices, for example, according to equation Eq2. m , +m
  • m 1 and m 2 are vectors representing the 3D motion at base mesh triangle vertices.
  • two opposite vectors of similar norm will cancel each other and generate a null motion at the new vertex V 12 , as expected.
  • this mesh that are its per vertex UV coordinates and the per vertex motion values, is used to generate the motion map. This operation is performed by rendering each triangle of the mesh into the texture image space, setting every motion pixel that is part of the projection of a triangle to the motion value interpolated from the three motion values from the vertices of the triangle.
  • Processing the full mesh is performed as follows:
  • Barycentric coordinates permits to locate a point in the space of a triangle. Several related properties can be defined using such coordinates. Finding the UV coordinates of the center C of a pixel i,j of a map with dimensions w, h, is performed according to equation Eq3.
  • the algorithm uses the base mesh instead of the tessellated mesh and motions are not interpolated during subdivision since used before.
  • the main embodiment shall be used and the computation of motion vector for the introduced vertices can also be adapted to the specific subdivision/fitting.
  • the quality of the padding part between the patches is not important. Their role is to help the spatial coding. Thus, inter-patch areas are set to a high motion value, which will lead the encoder to disregard quality in these areas.
  • the amplitude map is generated according to the following algorithm:
  • a pass is further performed on the amplitude map to create the normalized importance map according to equation Eq5.
  • motion and/or importance maps are generated at original texture map resolution and then summarize square areas of the maps using low pass filtering to obtain lower resolution maps.
  • These lower resolution maps can present, for instance, one summary pixel per video coding block (e.g. 8x8, 16x16 or 32x32) of the original map.
  • a minimum filter for each the blocks permit to retain coding quality on patch edges, discard quality on full padding blocks and adapt quality to lowest motion inside block on other blocks.
  • Other kind of filters may be envisioned to preserve contours quality in case border blocks contains large motions. It is possible to preserve border details even if the block has strong motion.
  • a down-sampled map of importance'll, j] is set and provides, for each block of the video, a value between 0 and 1.
  • a target QP vid is determined for the video coder and a max QP var variation around the target QP.
  • the values can be set by an operator or automatically determined, for example according to equation Eq6 or Eq7, depending if the video encoder/decoder uses higher or lower quality for lower QP.
  • a high quality may be wanted on some parts of the surface even if it moves quickly (e.g. preserve quality on moving faces or hands).
  • an additional step is performed, forcing a null motion for some triangles of the mesh.
  • Figure 3 shows an example architecture of a device 30 which may be configured to implement encoding and/or decoding methods according to an embodiment of the present principles.
  • the device is linked with other devices via their bus 31 and/or via I/O interface 36.
  • Device 30 comprises following elements that are linked together by a data and address bus 31 :D
  • processor 32 which is, for example, a DSP (or Digital Signal Processor);
  • RAM or Random Access Memory
  • a power supply (not represented in Figure 2), e.g. a battery.
  • the power supply is external to the device.
  • the word « register » used in the specification may correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data).
  • the ROM 33 comprises at least a program and parameters. The ROM 33 may store algorithms and instructions to perform techniques in accordance with present principles. When switched on, the CPU 32 uploads the program in the RAM and executes the corresponding instructions.
  • the RAM 34 comprises, in a register, the program executed by the CPU 32 and uploaded after switch-on of the device 30, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Sensors 37 may be, for example, cameras, microphones, temperature sensors, Inertial Measurement Units, GPS, hygrometry sensors, IR or UV light sensors or wind sensors.
  • Rendering devices 38 may be, for example, displays, speakers, vibrators, heat, fan, etc.
  • the device 30 is configured to implement a method according to the present principles of encoding, decoding and rendering a 3D scene or a volumetric video, and belongs to a set comprising:
  • Figure 4 shows an example of an embodiment of the syntax of a data stream encoding a volumetric video as a time varying textured mesh according to the present principles.
  • the structure consists in a container which organizes the stream in independent elements of syntax.
  • the structure may comprise a header part 41 which is a set of data common to every syntax element of the stream.
  • the header part comprises some of metadata about syntax elements, describing the nature and the role of each of them.
  • the structure also comprises a pay load comprising an element of syntax 42 and an element of syntax 43.
  • Syntax element 42 comprises data representative of the media content items, comprising the encoded geometry and texture images of the sequence.
  • Element of syntax 43 is a part of the payload of the data stream and comprises metadata like the subsampling ratios according to the present principles.
  • Syntax proposed in the present document is an extension of V3C syntax which is a reference syntax for volumetric video represented by time varying textured meshes. Additional information to store in the bitstream according to the present principles are an information (e.g. a flag) indicating whether the motion map is used for guiding the encoding/decoding of texture maps and the QP var parameter. Local triangle constraints require one additional information for faceSkip activation/deactivation. Of course, other syntaxes may be used.
  • vdmc_ext_motion_maps_flag syntax element is present in the bitstream to indicate whether a motion map is used for the texture video stream (i.e. attribute video). If enabled, there may be additional parameters which may be used to derive parameters such as importance, parameters required for QP calculations for the texture video stream and local per triangle constraint.
  • the syntax element relates to a sequence of dynamic mesh stream and requires information which are expressed to the V-DMC decoder such that base mesh decoder explicitly generates the motion maps. Therefore, the syntax element is signaled at the sequence parameter set level in the atlas bitstream (i.e. Atlas Sequence Parameter Set (ASPS)) of the dynamic mesh sequence.
  • ASS Atlas Sequence Parameter Set
  • an extension to ASPS exists.
  • the syntax may be contained in a V-DMC extension.
  • the additional parameter related to motion maps may be signaled at patch data unit level or at the ASPS extension level. If motion maps parameters remain constant through the sequence as well as the sub-meshes, it is efficient to contain the parameters in ASPS extension.
  • a mesh can define multiple per face/per vertex atributes.
  • face(s) i.e. group of faces
  • a syntax element is present in basemesh sequence parameter set is present namely, bmsps motion constraints facegroup atribute index.
  • the facegroup atribute index remains common for the entire sequence of the basemesh sequence.
  • each sub-layer may signal facegroup of motion constraint information in different atribute index than others. Therefore, each sub-layer may signal its own facegroup atribute index for identification. If not present, the base sub-layer facegroup atribute index of motion constraint information is used.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, Smartphones, tablets, computers, mobile phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, texture processing, and other processing of images and related texture information and/or depth information.
  • equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”).
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination.
  • a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Methods and devices are provided to encode and decode time-varying textured 3D meshes in a data stream. By exploiting the 3D motion information from the base mesh encoder, a 3D motion map is generated that can be derived into different types of maps such as motion amplitude. Those maps have the property of being a bijection mapping between the object texture map and the motion texture map. As the 3D movement cannot be directly used as a motion estimator for the video coding, the amplitude of the 3D motion is interpreted in order to build an importance map that is transformed in a QP map provided to the video texture encoder and decoder.

Description

3D MOTION MAPS FOR COMPRESSION OF TIME VARYING MESH TEXTURES
1. Technical Field
The present principles generally relate to the domain of encoding, decoding and rendering volumetric videos represented as varying 3D meshes, varying texture images and geometry motion information. In particular, the present principles relate to lowering the complexity of encoding and decoding of a dynamic texture images for transmission over a network.
2. Background
The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present principles that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present principles. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
Recent volumetric video encoders and decoders propose a motion detection stage where two variants of a mesh are candidate for a frame: the original base mesh M^(i) or a base mesh using the topology of the previous frame M^i). In the second case, encoding of the vertex motion is used, and the topology is skipped since the one from the previous frame is used. However, time varying texture images do not benefit from motion encoding.
There is a lack for a Time Varying Textured Mesh (TVTM) codec with better compression gains. According to the present principles, such a codec uses a “3D motion” guided video encoding of the animated texture map (i.e. the dynamic photometric attributes) for the animated mesh (i.e. the dynamic geometry).
3. Summary
The following presents a simplified summary of the present principles to provide a basic understanding of some aspects of the present principles. This summary is not an extensive overview of the present principles. It is not intended to identify key or critical elements of the present principles. The following summary merely presents some aspects of the present principles in a simplified form as a prelude to the more detailed description provided below.
The present principles relate to a method for encoding a 3D mesh of a sequence of 3D meshes in a data stream. The method comprises deriving a motion map from the 3D mesh according to a different 3D mesh of the sequence. This different 3D mesh may be the previous 3D mesh in the sequence or the I reference 3D mesh or any other already encoded one. Then, a mapping is generated between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table. A texture map is generated by using the per block QP table. The 3D mesh and the texture map are encoded in the data stream. In a variant, deriving the motion map is performed from a per triangle metadata indicating surface areas to be coded with highest quality.
The present principles also relate to a device comprising a processor and a memory associated with the processor that is configured to implement the method above.
The present principles also relate to a method that comprises decoding a 3D mesh from a data stream. A deriving a motion map from the 3D mesh according to a different 3D mesh of the sequence. This different 3D mesh may be the previous 3D mesh in the sequence or the I reference 3D mesh or any other already encoded one. A mapping is generated between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table. Then, a texture map related to the 3D mesh is decoded from the data stream by using the per block QP table.
The present principles also relate to a device comprising a processor and a memory associated with the processor that is configured to implement the method above.
4. Brief Description of Drawings
The present disclosure will be better understood, and other specific features and advantages will emerge upon reading the following description, the description making reference to the annexed drawings wherein:
- Figure 1 shows an example of an original mesh frame together with the corresponding UV atlas of patches (i.e. a parameterization of mesh) and the associated texture map with inter-patch padding (i.e. color gradients filling empty spaces between patches); - Figure 2 illustrates a base mesh with a new UV atlas and the texture map obtained after the attribute transfer;
- Figure 3 shows an example architecture of a device which may be configured to implement encoding and/or decoding methods according to an embodiment of the present principles;
- Figure 4 shows an example of an embodiment of the syntax of a data stream encoding a volumetric video as a time varying textured mesh according to the present principles;
- Figure 5 shows midpoint subdivision and motion interpolation.
5. Detailed description of embodiments
The present principles will be described more fully hereinafter with reference to the accompanying figures, in which examples of the present principles are shown. The present principles may, however, be embodied in many alternate forms and should not be construed as limited to the examples set forth herein. Accordingly, while the present principles are susceptible to various modifications and alternative forms, specific examples thereof are shown by way of examples in the drawings and will herein be described in detail. It should be understood, however, that there is no intent to limit the present principles to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present principles as defined by the claims.
The terminology used herein is for the purpose of describing particular examples only and is not intended to be limiting of the present principles. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises", "comprising," "includes" and/or "including" when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Moreover, when an element is referred to as being "responsive" or "connected" to another element, it can be directly responsive or connected to the other element, or intervening elements may be present. In contrast, when an element is referred to as being "directly responsive" or "directly connected" to other element, there are no intervening elements present. As used herein the term "and/or" includes any and all combinations of one or more of the associated listed items and may be abbreviated as"/". It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element without departing from the teachings of the present principles.
Although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.
Some examples are described with regard to block diagrams and operational flowcharts in which each block represents a circuit element, module, or portion of code which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in other implementations, the function(s) noted in the blocks may occur out of the order noted. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality involved.
Reference herein to “in accordance with an example” or “in an example” means that a particular feature, structure, or characteristic described in connection with the example can be included in at least one implementation of the present principles. The appearances of the phrase in accordance with an example” or “in an example” in various places in the specification are not necessarily all referring to the same example, nor are separate or alternative examples necessarily mutually exclusive of other examples.
Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims. While not explicitly described, the present examples and variants may be employed in any combination or sub-combination.
Figure 1 shows an example of an original mesh 10 frame together with the corresponding UV atlas 11 of patches (i.e. a parameterization of mesh 10) and the associated texture map 12 with inter-patch padding (i.e. color gradients filling empty spaces between patches). The UV parameterization provides a link between the mesh vertices in 3D space and the color values in the 2D space of the texture map. The padding is usually added to help the spatial coding of the texture map, especially at the edges of the patches. Padding also leads to better rendering of the object during consumption of the content. The attributes like texture map 12 are transferred like texture map 12 are transferred. Other attributes like normal, transparency or reflectance can be obtained and are transferred at the same step. Figure 2 illustrates a base mesh 20 with a new UV atlas 21 and the texture map 22 obtained after the attribute transfer. Base mesh 20 is obtained either by decimation of an original mesh or by applying motion to a base mesh of previous frame (mesh 10 of Figure 1 in the example). Texture map 22 also includes a padding applied after the attribute transfer. A reconstructed deformed mesh 23 is obtained after a tessellation of base mesh 20 and the displacement of its vertices using wavelet transforms. Corresponding UV map 24 may be obtained. Texture map 25 is texture map 22 of the base mesh.
According to the present principles, a time varying textured mesh (TVTM) codec is proposed to enhance compression gains, by introducing a “3D motion” guided video encoding of the animated texture map (the dynamic photometric attributes) of the animated mesh (the dynamic geometry). A 3D motion map that can be derived into different types of maps such as motion amplitude is generated by exploiting the 3D motion information from the base mesh encoder. Those maps have the property of being a bij ection mapping between the obj ect texture map and the motion texture map. 3D motion does not represent the full motion of the pixels in the texture video. Indeed, some movement over the texture frames can be due to the mesh deformation and some movement can be intrinsic to the texture and due, for instance, to lighting evolutions (e.g. reflections or light blinks) during the capture of the model. At the end, the object texture atlas might contain some motion due to geometry evolution and some due to photometry evolutions. Consequently, the 3D movement cannot be directly used as a motion estimator for the video coding. This technique is used to reduce quality of texture where object motion is important. Indeed, when an object moves fast, a blur can be tolerated at the rendering, so the quality of texture can be reduced. So, according to the present principles, the amplitude of the 3D motion is interpreted in order to build an importance map that is transformed in a quantization parameter map (QP map) provided the video texture encoder/decoder. In the following embodiments, the video encoder and decoder are able to ingest per block QP table, each block being encoded or decoded using its associated specific QP, thus not uniformly on the entire frame.
In an embodiment an encoding method using motion map as importance map to decide the QP map is proposed. At a first step, the motion map is derived from the 3D mesh. In a variant, this first step is performed from a per triangle metadata indicating surface areas to be coded with highest quality (e.g. to preserve quality faces or hands). At a second step, a 3D mapping between motion amplitude and QP is managed to obtain a per block QP table. And at a third step, the texture map is encoded by using the QP table.
In an embodiment, a normative method at decoder side is proposed. The motion map is derived from the 3D mesh, from a per triangle metadata indicating surface areas to be coded with highest quality (like face or hands) in a variant. A mapping between motion amplitude and QP is setup to obtain a per block QP table as in encoding stage. And the texture map is decoded by using the QP table.
According to the present principles, the video encoder/decoder that is used to encode the texture map can ingest a per-frame information (i.e. the QP map) that associates each block of the video with a specific QP value instead of using a global QP for the whole frame and the whole video. This information is ingested by the encoder to perform the compression and by the decoder to perform the decompression. This approach allows that the per frame QP map (i.e., the per block QP metadata) is not encoded in the bitstream since it is regenerated at the decoding prior to the texture map decoding stage which uses it. At the coding of each new frame of a volumetric video (mesh + texture map), if the current frame is detected as static it is considered as an Intra-frame and a motion map all zeroed (no motion) is generated but for the padding areas which are set to max motion (no need to preserve quality for padding areas). If the current frame is detected as dynamic, it is considered as an inter-frame and a motion map is generated by using the motion data computed by the 3D geometry codec.
For each new frame, the motion map is then converted to an importance map. The importance map is then down sampled to obtain one importance value per video block. The importance value per video block is converted into a 2D QP table (one QP per video block) and is ingested by the video encoder that is used to adaptively encode the animated texture maps. Similar steps are performed at the decoding, using the decoded 3D geometry to reconstruct the 2D QP table and decode the texture map adaptively. In another embodiment, an extension is proposed, using marked faces, to force high texture quality on specific areas of the mesh surface.
The generated map is a 2D array which dimensions may be identical (not mandatory as seen later) to the texture images. Each pixel (i,y) of this motion map contains an unquantized 3D vector in which a 3D motion information is stored. When working on an inter-frame, the base mesh is coded through a set of motion vectors, each vector modifying a vertex of the base mesh from a previous frame (e.g. the previous frame or the reference I frame). The rest of the mesh is defined by using the topology and the texture UV coordinates from the I frame. According to the present principles, the UV coordinates of the base mesh are used to reconstruct the motion map and the UV coordinates of the tessellated (fully reconstructed) model to interpolate missing 3D motion information for the pixels of projected triangles where no vertex projection falls.
Figure 5 shows midpoint subdivision and motion interpolation. A triangle ml, m2, m3 of the base-mesh is retrieved. Mid-point tessellation m’31, m’ 12, m’23 with motion interpolation is introduced. On the right, Figure 8 shows two steps subdivision. Motion is reconstructed for all the vertices that comes from the base mesh tessellation (called mesh subdivision) by interpolating the motion vectors associated with the vertices of the base mesh. In some models, for example the VMesh Test Model, the base mesh is tessellated (subdivided) using a mid-point recursive subdivision scheme (other subdivision schemes may be used so motion vectors are adapted). During the subdivision, the system uses UV coordinates of the vertices to interpolate new UV coordinates at the newly generated vertices according to equation Eql .
UV, + uv >
Eql: UV12 ~ -
2
Where uv, and uv2 are vectors representing the 2D coordinates in the UV space at base mesh triangle vertices. A similar method is used to generate some interpolated motions at the newly generated vertices, for example, according to equation Eq2. m , +m
Eq2: 7 mi2 - -
2
Where m1 and m2 are vectors representing the 3D motion at base mesh triangle vertices. In this example, two opposite vectors of similar norm will cancel each other and generate a null motion at the new vertex V12, as expected. Once the per vertex motion values are obtained for all the vertices of the tessellated mesh, this mesh, that are its per vertex UV coordinates and the per vertex motion values, is used to generate the motion map. This operation is performed by rendering each triangle of the mesh into the texture image space, setting every motion pixel that is part of the projection of a triangle to the motion value interpolated from the three motion values from the vertices of the triangle.
Processing the full mesh is performed as follows:
Initializing the motion map of size w, h with null motions (0,0,0) Initializing a boolean occupancy map of size w, h to false.
For each triangle T of the subdivided mesh with texture coordinates uvl, uv2 and uv3 and motion vectors ml, m2 and m3. o Computing the bounding box of the triangle in image space using uvl, uv2 and uv3 o For each motion pixel P(i,j) of the box, compute coordinates of center C of the pixel P in UV space.
■ Finding the barycentric coordinates of C in the space of triangle T using its UV coordinates. Barycentric coordinates makes the link between the different spaces where T lies (image, and UV). It also permits to know if a uv coordinate uv' is part of a triangle with per vertex uv coordinates uvl, uv2 and uv3.
■ If C is covered by the proj ection of T in the image space (use barycentric coordinates to test this)
• Computing m’ by interpolating ml , m2 and m3 using the barycentric coordinates
• motionfi, j] = m’
• occupancy [i,j] = true
Barycentric coordinates permits to locate a point in the space of a triangle. Several related properties can be defined using such coordinates. Finding the UV coordinates of the center C of a pixel i,j of a map with dimensions w, h, is performed according to equation Eq3.
Figure imgf000010_0001
And finding the barycentric coordinates (u, v, w) of a point C with 2D coordinates uvc with respect to a triangle (uvl, uv2, uv3) is performed according tot equation Eq4.
Figure imgf000010_0002
So, to test if point C with given barycentric coordinates (u, v, w) expressed in the space of triangle T lies inside this triangle, the following conditions have to be met: inside = ( 0 < u and u < 1 and 0 < v and v < 1 and u + v < 1 ) In a variant, since the subdivision in VDMC is a simple mid-point subdivision, the motion map could be derived before the subdivision stage using the same projection and interpolation approach. This would lead to similar results. In the described embodiment, the motions are interpolated at the same block of VDMC as the UV coordinates interpolation one. For an efficient implementation, in a variant, the algorithm uses the base mesh instead of the tessellated mesh and motions are not interpolated during subdivision since used before. In counterpart if the subdivision introduces new points in a non-linear manner, or if for instance, UV coordinates are modified with some displacements (or other fitting) after the subdivision, then the main embodiment shall be used and the computation of motion vector for the introduced vertices can also be adapted to the specific subdivision/fitting.
According to the present principles, the quality of the padding part between the patches is not important. Their role is to help the spatial coding. Thus, inter-patch areas are set to a high motion value, which will lead the encoder to disregard quality in these areas. The amplitude map is generated according to the following algorithm:
Let maxnorm = max( motion[i, j] ) + epsilon i,J
For each pixel i,j
If occupancy [i,j] = 1 amplitude [i,j] = norm motion[i,j] ) else amplitude [i,j] = maxnorm maxnorm may be set to a higher value using epsilon greater than zero to systematically have a “high fake motion” for the gradient areas that is always higher than the motion of the other pixels. A pass is further performed on the amplitude map to create the normalized importance map according to equation Eq5.
Eq5: importance [i,j] = 1.0 — amplitude [i,j] / maxnorm An importance set to 0 means huge degradation is possible on the pixel since its motion is large. At the opposite, an importance set to 1 means that the quality of the pixel has to be preserved as much as possible.
Using the presented principles, generating motion and importance maps with lower (or even higher) resolutions than the texture map is facilitated. One just need to set a different w and h in the previous equations. However, the presented method performs a sampling, and so, reduces the size of the motion maps collect the motions at specific points and might not reflects an average motion on an area. To overcome this, according to a second embodiment, motion and/or importance maps are generated at original texture map resolution and then summarize square areas of the maps using low pass filtering to obtain lower resolution maps. These lower resolution maps can present, for instance, one summary pixel per video coding block (e.g. 8x8, 16x16 or 32x32) of the original map. Using for instance a minimum filter for each the blocks permit to retain coding quality on patch edges, discard quality on full padding blocks and adapt quality to lowest motion inside block on other blocks. Other kind of filters may be envisioned to preserve contours quality in case border blocks contains large motions. It is possible to preserve border details even if the block has strong motion.
At this step, a down-sampled map of importance'll, j] is set and provides, for each block of the video, a value between 0 and 1. To build the QP map (2D table) for the video encoder/decoder (one entry per video block), a target QPvid is determined for the video coder and a max QPvar variation around the target QP. The values can be set by an operator or automatically determined, for example according to equation Eq6 or Eq7, depending if the video encoder/decoder uses higher or lower quality for lower QP.
Eq6: QP[i,j] = QPvid + QPvar ■ importance[i,j]
Eq7: QP[i,j] = QPvid ~ QPvar ■ importance[i,j]
In some cases, like for encoding character performances, a high quality may be wanted on some parts of the surface even if it moves quickly (e.g. preserve quality on moving faces or hands). In such an embodiment, an additional step is performed, forcing a null motion for some triangles of the mesh. To do so, for instance, a preprocessing like face detection or hand detection can be performed on the 3D model and the triangles of the base mesh that are covered by such information are marked so the 3D motion generator always considers those triangles as static (null motion), hence using highest quality at encoding/decoding. For instance, FaceId=0 can be used on faces having to be skipped by the motion and Faceld=l on other faces. The pre-processor (face detector, other) only provides this additional information to the encoder. Then, it is possible to use a Faceld mechanism to encode/decode those Facelds. So, during motion map generation, if Facelds are provided with the face skip option activated, those faces with FaceId=0 are set to motion = 0. At decoding, once the mesh is decoded, if the face skip option is activated, the decoded Facelds are used by the motion map generator in the same manner.
Figure 3 shows an example architecture of a device 30 which may be configured to implement encoding and/or decoding methods according to an embodiment of the present principles. The device is linked with other devices via their bus 31 and/or via I/O interface 36.
Device 30 comprises following elements that are linked together by a data and address bus 31 :D
- a processor 32 (or CPU), which is, for example, a DSP (or Digital Signal Processor);
- a ROM (or Read Only Memory) 33;
- a RAM (or Random Access Memory) 34;
- a storage interface 35;
- an I/O interface 36 for reception of data to transmit, from an application; and
- a power supply (not represented in Figure 2), e.g. a battery.
In accordance with an example, the power supply is external to the device. In each of mentioned memory, the word « register » used in the specification may correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). The ROM 33 comprises at least a program and parameters. The ROM 33 may store algorithms and instructions to perform techniques in accordance with present principles. When switched on, the CPU 32 uploads the program in the RAM and executes the corresponding instructions.
The RAM 34 comprises, in a register, the program executed by the CPU 32 and uploaded after switch-on of the device 30, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Device 30 is linked, for example via bus 31 to a set of sensors 37 and to a set of rendering devices 38. Sensors 37 may be, for example, cameras, microphones, temperature sensors, Inertial Measurement Units, GPS, hygrometry sensors, IR or UV light sensors or wind sensors. Rendering devices 38 may be, for example, displays, speakers, vibrators, heat, fan, etc.
In accordance with examples, the device 30 is configured to implement a method according to the present principles of encoding, decoding and rendering a 3D scene or a volumetric video, and belongs to a set comprising:
- a mobile device;
- a communication device;
- a game device;
- a tablet (or tablet computer);
- a laptop;
- a still picture camera;
- a video camera.
Figure 4 shows an example of an embodiment of the syntax of a data stream encoding a volumetric video as a time varying textured mesh according to the present principles. The structure consists in a container which organizes the stream in independent elements of syntax. The structure may comprise a header part 41 which is a set of data common to every syntax element of the stream. For example, the header part comprises some of metadata about syntax elements, describing the nature and the role of each of them. The structure also comprises a pay load comprising an element of syntax 42 and an element of syntax 43. Syntax element 42 comprises data representative of the media content items, comprising the encoded geometry and texture images of the sequence. Element of syntax 43 is a part of the payload of the data stream and comprises metadata like the subsampling ratios according to the present principles.
Syntax proposed in the present document is an extension of V3C syntax which is a reference syntax for volumetric video represented by time varying textured meshes. Additional information to store in the bitstream according to the present principles are an information (e.g. a flag) indicating whether the motion map is used for guiding the encoding/decoding of texture maps and the QPvar parameter. Local triangle constraints require one additional information for faceSkip activation/deactivation. Of course, other syntaxes may be used.
In V3C, vdmc_ext_motion_maps_flag syntax element is present in the bitstream to indicate whether a motion map is used for the texture video stream (i.e. attribute video). If enabled, there may be additional parameters which may be used to derive parameters such as importance, parameters required for QP calculations for the texture video stream and local per triangle constraint. The syntax element relates to a sequence of dynamic mesh stream and requires information which are expressed to the V-DMC decoder such that base mesh decoder explicitly generates the motion maps. Therefore, the syntax element is signaled at the sequence parameter set level in the atlas bitstream (i.e. Atlas Sequence Parameter Set (ASPS)) of the dynamic mesh sequence. Since to define metadata for V-DMC bitstream in a V3C syntax, an extension to ASPS exists. The syntax may be contained in a V-DMC extension. The additional parameter related to motion maps may be signaled at patch data unit level or at the ASPS extension level. If motion maps parameters remain constant through the sequence as well as the sub-meshes, it is efficient to contain the parameters in ASPS extension.
In Atlas frame sequence set attribute tile information structure, an additional syntax element to signal the QP var is added, according to the present principles. A possible syntax is the following one.
Figure imgf000015_0001
Figure imgf000016_0001
Figure imgf000016_0002
A mesh can define multiple per face/per vertex atributes. In order to facilitate the signaling of which face(s) (i.e. group of faces) are motion constraints, a syntax element is present in basemesh sequence parameter set is present namely, bmsps motion constraints facegroup atribute index. The facegroup atribute index remains common for the entire sequence of the basemesh sequence.
Figure imgf000016_0003
There may be additional sub-layers specified which may have different atribute index for facegroup of motion constraint information. Each sub-layer may signal facegroup of motion constraint information in different atribute index than others. Therefore, each sub-layer may signal its own facegroup atribute index for identification. If not present, the base sub-layer facegroup atribute index of motion constraint information is used.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a computer program product, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or a device), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, Smartphones, tablets, computers, mobile phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications, particularly, for example, equipment or applications associated with data encoding, data decoding, view generation, texture processing, and other processing of images and related texture information and/or depth information. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a pre-processor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD”), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor-readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described embodiment, or to carry as data the actual syntax-values written by a described embodiment. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.

Claims

1. A method for encoding a 3D mesh of a sequence of 3D meshes in a data stream, the method comprising:
- deriving a motion map from the 3D mesh according to a different 3D mesh of the sequence;
- generating a mapping between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table;
- generating a texture map by using the per block QP table; and
- encoding the 3D mesh and the texture map in the data stream.
2. The method of claim 1, wherein deriving the motion map is performed from a per triangle metadata indicating surface areas to be coded with highest quality.
3. The method of claim 1 or 2, wherein the different 3D mesh of the sequence is a 3D mesh right before in the sequence or wherein the different 3D mesh of the sequence is a I reference 3D mesh.
4. A device for encoding a 3D mesh of a sequence of 3D meshes in a data stream, the device comprising a memory associated with a processor configured for:
- deriving a motion map from the 3D mesh according to a different 3D mesh of the sequence;
- generating a mapping between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table;
- generating a texture map by using the per block QP table; and
- encoding the 3D mesh and the texture map in the data stream.
5. The device of claim 4, wherein deriving the motion map is performed from a per triangle metadata indicating surface areas to be coded with highest quality.
6. The device of claim 4 or 5, wherein the different 3D mesh of the sequence is a 3D mesh right before in the sequence or wherein the different 3D mesh of the sequence is a I reference 3D mesh.
7. A method comprising: - decoding a 3D mesh from a data stream;
- a deriving a motion map from the 3D mesh according to a different 3D mesh of a sequence of 3D meshes;
- generating a mapping between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table; and
- decoding a texture map from the data stream by using the per block QP table.
8. The method of claim 7, wherein deriving the motion map is performed from a per triangle metadata indicating surface areas to be coded with highest quality obtained from the data stream.
9. The method of claim 7 or 8, wherein the different 3D mesh of the sequence is a 3D mesh right before in the sequence or wherein the different 3D mesh of the sequence is a I reference 3D mesh.
10. The method of one of claims 7 to 9, further comprising rendering the 3D mesh for a 3D point of view.
11. A device comprising a memory associated with a processor configured for:
- decoding a 3D mesh from a data stream;
- a deriving a motion map from the 3D mesh according to a different 3D mesh of a sequence of 3D meshes;
- generating a mapping between motion amplitude of the motion map and quantization parameters (QP) to obtain a per block QP table; and
- decoding a texture map from the data stream by using the per block QP table.
12. The device of claim 11, wherein deriving the motion map is performed from a per triangle metadata indicating surface areas to be coded with highest quality obtained from the data stream.
13. The device of claim 11 or 12, wherein the different 3D mesh of the sequence is a 3D mesh right before in the sequence or wherein the different 3D mesh of the sequence is a I reference 3D mesh.
14. The device of one of claims 11 to 13, wherein the processor is further configured for rendering the 3D mesh for a 3D point of view.
PCT/EP2024/082036 2023-11-22 2024-11-12 3d motion maps for compression of time varying mesh textures Pending WO2025108779A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP23307030 2023-11-22
EP23307030.9 2023-11-22

Publications (1)

Publication Number Publication Date
WO2025108779A1 true WO2025108779A1 (en) 2025-05-30

Family

ID=89619331

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/082036 Pending WO2025108779A1 (en) 2023-11-22 2024-11-12 3d motion maps for compression of time varying mesh textures

Country Status (1)

Country Link
WO (1) WO2025108779A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5351083A (en) * 1991-10-17 1994-09-27 Sony Corporation Picture encoding and/or decoding system
US20230316655A1 (en) * 2022-03-11 2023-10-05 Apple Inc. Attribute transfer for efficient dynamic mesh coding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5351083A (en) * 1991-10-17 1994-09-27 Sony Corporation Picture encoding and/or decoding system
US20230316655A1 (en) * 2022-03-11 2023-10-05 Apple Inc. Attribute transfer for efficient dynamic mesh coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHOI YIHYUN ET AL: "Overview of the Video-based Dynamic Mesh Coding (V-DMC) Standard Work", 2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), IEEE, 19 October 2022 (2022-10-19), pages 578 - 581, XP034233613, DOI: 10.1109/ICTC55196.2022.9952734 *

Similar Documents

Publication Publication Date Title
US12100183B2 (en) Point cloud attribute transfer algorithm
US11044478B2 (en) Compression with multi-level encoding
US11922664B2 (en) Method and apparatus of adaptive sampling for mesh compression by decoders
WO2019076503A1 (en) An apparatus, a method and a computer program for coding volumetric video
US12423874B2 (en) Method and apparatus of adaptive sampling for mesh compression by encoders
US20240153150A1 (en) Mesh Compression Texture Coordinate Signaling and Decoding
US12125250B2 (en) Decoding of patch temporal alignment for mesh compression
CN114051734A (en) Method and device for decoding three-dimensional scene
EP4218237A1 (en) A method and apparatus for encoding mpi-based volumetric video
WO2025108779A1 (en) 3d motion maps for compression of time varying mesh textures
US12236650B2 (en) Encoding of patch temporal alignment for mesh compression
WO2025038454A1 (en) Base graph based mesh compression
EP4505706A1 (en) Depth-color alignment with assistance metadata for transcoding of volumetric video
JP2024516550A (en) Learning-Based Point Cloud Compression with Tearing Transform
WO2025073466A1 (en) Point cloud complexity reduction through attributes subsampling
WO2025125406A1 (en) Point cloud attribute coding by transfer function parameters
WO2025045630A1 (en) Hybrid volumetric video encoding with implicit neural video representations to handle non-lambertian surfaces
US20250232478A1 (en) Feature adaptive v-dmc subdivisions and tessellations
WO2024213522A1 (en) Compression scheme for point cloud attributes with implicit inter-frame prediction
WO2025056412A1 (en) Methods and apparatus for motion correction at decoder side for point cloud attributes prediction
KR20240115237A (en) Point cloud compression based on outlier grouping
EP4533799A1 (en) Decoder side depth and color alignment with the assistance of metadata for the transcoding of volumetric video
WO2024213477A1 (en) Methods and apparatus for hybrid intra/inter compression scheme for point cloud attributes
WO2024209129A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
TW202529442A (en) Transforms for coding v-dmc base mesh attributes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24805159

Country of ref document: EP

Kind code of ref document: A1