[go: up one dir, main page]

WO2025080446A1 - Codage prédictif explicite pour compression de nuage de points - Google Patents

Codage prédictif explicite pour compression de nuage de points Download PDF

Info

Publication number
WO2025080446A1
WO2025080446A1 PCT/US2024/049050 US2024049050W WO2025080446A1 WO 2025080446 A1 WO2025080446 A1 WO 2025080446A1 US 2024049050 W US2024049050 W US 2024049050W WO 2025080446 A1 WO2025080446 A1 WO 2025080446A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
cloud frame
feature
feature map
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/US2024/049050
Other languages
English (en)
Inventor
Jiahao PANG
Junghyun Ahn
Dong Tian
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital VC Holdings Inc
Original Assignee
InterDigital VC Holdings Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by InterDigital VC Holdings Inc filed Critical InterDigital VC Holdings Inc
Publication of WO2025080446A1 publication Critical patent/WO2025080446A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/001Model-based coding, e.g. wire frame
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/537Motion estimation other than block-based
    • H04N19/54Motion estimation other than block-based using feature points or meshes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding

Definitions

  • Patent Application Serial No. 63/526,130 entitled “Generative-based Predictive Coding for LiDAR Point Cloud
  • the reference point cloud frame and the current point cloud frame are point-based representations
  • performing the feature extraction on the reference point cloud frame includes: performing a block partition on the reference point cloud frame; and performing a blockwise feature extraction on the block partitioned reference point cloud frame to generate the first feature map
  • performing the feature extraction on the current point cloud frame includes: performing a block partition on the current point cloud frame; and performing a blockwise feature extraction on the block partitioned current point cloud frame to generate the second feature map.
  • obtaining the motion feature that describes motion between the reference point cloud frame and the current point cloud frame includes: performing motion analysis to generate a motion feature map using the reference point cloud frame and the current point cloud frame as inputs; and entropy encoding the motion feature map to generate the motion feature.
  • obtaining the motion feature that describes motion between the reference point cloud frame and the current point cloud frame includes: performing a feature extraction on the reference point cloud frame to generate a first feature map; performing a feature extraction on the current point cloud frame to generate a second feature map; fusing the first and second feature maps to generate a fused feature map; and performing a feature aggregation on the fused feature map to generate the motion feature.
  • An example signal in accordance with some embodiments may include a bitstream generated according to any one of the methods listed above.
  • FIG. 3A is a schematic side view illustrating an example waveguide display that may be used with extended reality (XR) applications according to some embodiments.
  • XR extended reality
  • FIG. 3C is a schematic side view illustrating an example alternative display type that may be used with extended reality applications according to some embodiments.
  • FIG. 4 is a process diagram illustrating an example codec encoding and decoding process with explicit prediction according to some embodiments.
  • FIG. 5 is a process diagram illustrating an example training process for the motion analysis according to some embodiments.
  • FIG. 6 is a process diagram illustrating an example training process for a predictor feature generation block according to some embodiments.
  • FIG. 7 is a process diagram illustrating an example motion analysis process according to some embodiments.
  • FIG. 10 is a process diagram illustrating an example feature aggregation (FC block) according to some embodiments.
  • FIG. 11 is a process diagram illustrating an example feature predictor generation (FA block) according to some embodiments.
  • FIG. 16 is a process diagram illustrating an example point synthesis block (PS block) according to some embodiments.
  • FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented.
  • the communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
  • the communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
  • the communications systems 100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
  • CDMA code division multiple access
  • TDMA time division multiple access
  • FDMA frequency division multiple access
  • OFDMA orthogonal FDMA
  • SC-FDMA single-carrier FDMA
  • ZT UW DTS-s OFDM zero-tail unique-word DFT-Spread OFDM
  • UW-OFDM unique word OFDM
  • FBMC filter bank multicarrier
  • the communications systems 100 may also include a base station 114a and/or a base station 114b.
  • the cell associated with the base station 114a may be divided into three sectors.
  • the base station 114a may include three transceivers, i.e., one for each sector of the cell.
  • the base station 114a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell.
  • MIMO multiple-input multiple output
  • beamforming may be used to transmit and/or receive signals in desired spatial directions.
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies.
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles.
  • DC dual connectivity
  • the air interface utilized by WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
  • the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN).
  • the base station 114b and the WTRUs 102c, 102d may utilize a cellular-based RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell.
  • the base station 114b may have a direct connection to the Internet 110.
  • the base station 114b may not be required to access the Internet 110 via the CN 106.
  • the processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • the processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment.
  • the processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. 1 B depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
  • the processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102.
  • the power source 134 may be any suitable device for powering the WTRU 102.
  • the power source 134 may include one or more dry cell batteries (e.g. , nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
  • the processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like.
  • FM frequency modulated
  • the peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • a gyroscope an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, a light sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • the WTRU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • the WTRU is described in FIGs. 1 A-1 B as a wireless terminal, it is contemplated that in certain representative embodiments that such a terminal may use (e.g., temporarily or permanently) wired communication interfaces with the communication network.
  • FIG. 1 C is a system diagram illustrating an example set of interfaces for a system according to some embodiments.
  • An extended reality display device together with its control electronics, may be implemented for some embodiments.
  • System 150 can be embodied as a device including the various components described below and is configured to perform one or more of the aspects described in this document. Examples of such devices, include, but are not limited to, various electronic devices such as personal computers, laptop computers, smartphones, tablet computers, digital multimedia set top boxes, digital television receivers, personal video recording systems, connected home appliances, and servers. Elements of system 150, singly or in combination, can be embodied in a single integrated circuit (IC), multiple ICs, and/or discrete components.
  • IC integrated circuit
  • the processing and encoder/decoder elements of system 150 are distributed across multiple ICs and/or discrete components.
  • the system 150 is communicatively coupled to one or more other systems, or other electronic devices, via, for example, a communications bus or through dedicated input and/or output ports.
  • the system 150 is configured to implement one or more of the aspects described in this document.
  • the system 150 includes at least one processor 152 configured to execute instructions loaded therein for implementing, for example, the various aspects described in this document.
  • Processor 152 may include embedded memory, input output interface, and various other circuitries as known in the art.
  • the system 150 includes at least one memory 154 (e.g., a volatile memory device, and/or a non-volatile memory device).
  • System 150 includes an encoder/decoder module 156 configured, for example, to process data to provide an encoded video or decoded video, and the encoder/decoder module 156 can include its own processor and memory.
  • the encoder/decoder module 156 represents module(s) that can be included in a device to perform the encoding and/or decoding functions. As is known, a device can include one or both of the encoding and decoding modules. Additionally, encoder/decoder module 156 can be implemented as a separate element of system 150 or can be incorporated within processor 152 as a combination of hardware and software as known to those skilled in the art.
  • the input to the elements of system 150 can be provided through various input devices as indicated in block 172.
  • Such input devices include, but are not limited to, (i) a radio frequency (RF) portion that receives an RF signal transmitted, for example, over the air by a broadcaster, (ii) a Component (COMP) input terminal (or a set of COMP input terminals), (iii) a Universal Serial Bus (USB) input terminal, and/or (iv) a High Definition Multimedia Interface (HDMI) input terminal.
  • RF radio frequency
  • COMP Component
  • USB Universal Serial Bus
  • HDMI High Definition Multimedia Interface
  • the input devices of block 172 have associated respective input processing elements as known in the art.
  • the RF portion of various embodiments includes one or more elements to perform these functions, for example, frequency selectors, signal selectors, bandlimiters, channel selectors, filters, downconverters, demodulators, error correctors, and demultiplexers.
  • the RF portion can include a tuner that performs various of these functions, including, for example, downconverting the received signal to a lower frequency (for example, an intermediate frequency or a near-baseband frequency) or to baseband.
  • the RF portion and its associated input processing element receives an RF signal transmitted over a wired (for example, cable) medium, and performs frequency selection by filtering, downconverting, and filtering again to a desired frequency band.
  • the embodiments can be carried out by computer software implemented by the processor 152 or by hardware, or by a combination of hardware and software. As a non-limiting example, the embodiments can be implemented by one or more integrated circuits.
  • the memory 154 can be of any type appropriate to the technical environment and can be implemented using any appropriate data storage technology, such as optical memory devices, magnetic memory devices, semiconductor-based memory devices, fixed memory, and removable memory, as non-limiting examples.
  • the processor 152 can be of any type appropriate to the technical environment, and can encompass one or more of microprocessors, general purpose computers, special purpose computers, and processors based on a multi-core architecture, as non-limiting examples.
  • a video sequence Before being encoded, a video sequence may go through pre-encoding processing (204), for example, applying a color transform to an input color picture (e.g., conversion from RGB 4:4:4 to YCbCr 4:2:0), or performing a remapping of the input picture components in order to get a signal distribution more resilient to compression (for instance using a histogram equalization of one of the color components).
  • Metadata can be associated with the pre-processing and attached to the bitstream.
  • the mode decision block (214) in the encoder chooses the best prediction mode, for example based on a rate-distortion optimization method. This selection may be made after spatial and/or temporal prediction is performed.
  • the intra/inter decision may be indicated by, for example, a prediction mode flag.
  • the prediction block is subtracted from the current video block (216) to generate a prediction residual.
  • the prediction residual is de-correlated using transform (218) and quantized (220).
  • Light representing an image 312 generated by the image generator 302 is coupled into a waveguide 304 by a diffractive in-coupler 306.
  • the in-coupler 306 diffracts the light representing the image 312 into one or more diffractive orders.
  • light ray 308 which is one of the light rays representing a portion of the bottom of the image, is diffracted by the in-coupler 306, and one of the diffracted orders 310 (e.g. the second order) is at an angle that is capable of being propagated through the waveguide 304 by total internal reflection.
  • the image generator 302 displays images as directed by a control module 324, which operates to render image data, video data, point cloud data, or other displayable data.
  • a control module 342 controls a display 344, which may be an LCD, to display an image.
  • the image is focused by one or more lenses of display optics 346 to make the image visible to the user.
  • exterior light does not reach the user’s eyes directly.
  • an exterior camera 348 may be used to capture images of the exterior environment and display such images on the display 344 together with any virtual content that may also be displayed.
  • Point cloud is a universal data format across several business domains from autonomous driving, robotics, AR/VR, civil engineering, computer graphics, to the animation /movie industry.
  • 3D LiDAR sensors have been deployed in self-driving cars, and affordable LiDAR sensors are released from Velodyne Velabit, Apple iPad Pro 2020 and Intel RealSense LiDAR camera L515. With advances in sensing technologies, 3D point cloud data becomes more practical than ever and is expected to be an ultimate enabler in the applications mentioned.
  • VR Virtual Reality
  • immersive worlds have become a hot topic and foreseen by many as the future of 2D flat video.
  • the basic idea is to immerse the viewer in an environment all around him as opposed to standard TV where he can only look at the virtual world in front of him.
  • Point cloud is a good format candidate to distribute VR worlds. They may be static or dynamic and are typically of average size, say no more than millions of points at a time.
  • Dynamic point clouds that are captured by LiDAR within autonomous driving or captured for VR/AR applications, may impose great challenges when being stored or transmitted due to a huge amount of data.
  • the proposed technology in this application is called explicit predictive coding given an explicit motion analysis to be present.
  • the third section is known as the coding section, or deep feature-based coding section.
  • This section is mainly composed of two steps.
  • a feature encoder I feature extraction step FE is followed by an entropy encoding.
  • the FE block 412 takes the current point cloud PC CURR as its main input and outputs a feature map F M .
  • the entropy coding starts with a rounding or quantization to allow entropy (arithmetic) coding on a sequence of symbols.
  • a bitstream BS M is then generated by an encoder (ENC M ) block 414.
  • the feature extraction block is restricted by a condition input (see 798 and ‘130 applications) that is the predictor feature map F P . Later, on the decoding side, the same condition will be applied on a corresponding decoding block.
  • a conditional autoencoder architecture is employed in the proposed method.
  • a consistency metric may be computed between M m ' and -M m as a loss function fortraining purpose.
  • Pointnet Deep Learning on Point sets for 3D Classification and Segmentation, PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (2017). It consists of a few pointwise MLP layers with dimensions of (3, 32, 64, 128, 128) and a global max pooling to obtain a block feature with 128 channels. It is then followed by three MLP layers with dimensions of (128, 64, 64) so that the output feature maps F S 1 and F s 2 have 64 channels.
  • both encoder and decoder share a same design of predictor generation block.
  • the predictor generation plays an important role for inter-frame prediction to benefit the compression.
  • the feature decoder FD block may be used as the FS block.
  • both the motion analysis block in FIG. 5 and the predictor generation block in FIG. 6 may be viewed as an “encoding” network.
  • encoding network
  • FIG. 11 is a process diagram illustrating an example feature predictor generation (FA block) according to some embodiments.
  • FIG. 11 shows an example design of the predictor generation block 1100.
  • feature map F S 1 is extracted for the reference point cloud frame PC REF using a spatial feature extractor FB 1102.
  • the FB block 1102 may be the same design of the FB block 702, 704 in FIG. 7 for motion analysis.
  • the feature F S 1 is concatenated (the symbol 1104 in FIG. 11) with the motion feature F m that is followed by a series of MLPs 1106, 1108 and convolutional layers 1110, 1112 to output the predictor feature map F P .
  • the dimensions of the layers in FIG. 11 are (128, 256, 256, 64).
  • IRN or transformer blocks may be used to enhance the feature aggregation. See Szegedy, Christian, et al., Inception-v4, inception-resnet and the impact of residual connections on learning, 31 :1 IN PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (2017) ‘Szeged )-, Zhao, Hengshuang, et al., Point Transformer, ICCV 16239-16248 (2021) (“Zhao”); Mao, Jiageng, et al., Voxel transformer for 3d object detection, IN PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION 3164-3173 (2021) (“Mao”); and Zhang, Cheng, et al., PVT: Point-Voxel Transformer for Point Cloud Learning, arXiv preprint arXiv: 2108.06076 (2021) ‘Zhang’’).
  • FIG. 18 is a process diagram illustrating an example transformer block for feature aggregation according to some embodiments.
  • the diagram of a transformer block 1800 is shown in FIG. 18, where again, (1804, 1808) denotes summation.
  • FIG. 18 shows the basic diagram of a transformer block 1800, which consists of a self-attention block 1802 with residual connection, and a MLP block 1806 (consisting of a few MLP layers) with residual connection.
  • the points A t (1904) are obtained by passing the feature f A (1902) through a kNN block 1926 to perform a k nearest neighbor (kNN) search based on the coordinate of A. Then the query embedding Q A for A is computed with:
  • the transformer block updates the feature map for all locations in the same way and then outputs the updated feature map.
  • MLPQ(-), MLPK( ), MLPV ⁇ ), and MLPp(-) may contain only one fully-connected layer, which corresponds to linear projections.
  • obtaining the motion feature that describes motion between the reference point cloud frame and the current point cloud frame includes: performing a feature extraction on the reference point cloud frame to generate a first feature map; performing a feature extraction on the current point cloud frame to generate a second feature map; fusing the first and second feature maps to generate a fused feature map; and performing a feature aggregation on the fused feature map to generate the motion feature.
  • the reference point cloud frame and the current point cloud frame are voxel-based representations
  • performing the feature extraction on the reference point cloud frame includes: passing the reference point cloud frame through one or more multi-layer perceptron (MLP) blocks; and passing an output of the one or more MLP blocks through one or more convolutional layers to generate the first feature map
  • performing the feature extraction on the current point cloud frame includes: passing the current point cloud frame through one or more multi-layer perceptron (MLP) blocks; and passing an output of the one or more MLP blocks through one or more convolutional layers to generate the second feature map.
  • MLP multi-layer perceptron
  • performing the feature extraction on at least one of the reference point cloud frame and the current point cloud frame includes using an Inception-ResNet block.
  • a second example method/apparatus in accordance with some embodiments may include: a processor; and a non-transitory computer-readable medium storing instructions operative, when executed by the processor, to cause the apparatus to perform any one of the methods listed above.
  • a fifth example apparatus in accordance with some embodiments may include at least one processor and at least one non-transitory computer-readable medium storing instructions for causing the at least one processor to perform any one of the methods listed above.
  • At least one of the aspects generally relates to video encoding and decoding, and at least one other aspect generally relates to transmitting a bitstream generated or encoded.
  • At least one of the aspects can be implemented as a method, an apparatus, a computer readable storage medium having stored thereon instructions for encoding or decoding video data according to any of the methods described, and/or a computer readable storage medium having stored thereon a bitstream generated according to any of the methods described.
  • Decoding can encompass all or part of the processes performed, for example, on a received encoded sequence in order to produce a final output suitable for display.
  • processes include one or more of the processes typically performed by a decoder, for example, entropy decoding, inverse quantization, inverse transformation, and differential decoding.
  • processes also, or alternatively, include processes performed by a decoder of various implementations described in this application, for example, extracting a picture from a tiled (packed) picture, determining an upsampling filter to use and then upsampling a picture, and flipping a picture back to its intended orientation.
  • decoding refers only to entropy decoding
  • decoding refers only to differential decoding
  • decoding refers to a combination of entropy decoding and differential decoding. Whether the phrase “decoding process” is intended to refer specifically to a subset of operations or generally to the broader decoding process will be clear based on the context of the specific descriptions.
  • encoding can encompass all or part of the processes performed, for example, on an input video sequence in order to produce an encoded bitstream.
  • processes include one or more of the processes typically performed by an encoder, for example, partitioning, differential encoding, transformation, quantization, and entropy encoding.
  • processes also, or alternatively, include processes performed by an encoder of various implementations described in this application.
  • a mix of these two approaches can also be used, such as by using an approximated distortion for only some of the possible encoding options, and a complete distortion for other encoding options.
  • Other approaches only evaluate a subset of the possible encoding options. More generally, many approaches employ any of a variety of techniques to perform the optimization, but the optimization is not necessarily a complete evaluation of both the coding cost and related distortion.
  • references to “one embodiment” or “an embodiment” or “one implementation” or “an implementation”, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment.
  • the appearances of the phrase “in one embodiment” or “in an embodiment” or “in one implementation” or “in an implementation”, as well any other variations, appearing in various places throughout this application are not necessarily all referring to the same embodiment.
  • receiving is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • signaling can be accomplished in a variety of ways. For example, one or more syntax elements, flags, and so forth are used to signal information to a corresponding decoder in various embodiments. While the preceding relates to the verb form of the word “signal”, the word “signal” can also be used herein as a noun.
  • Implementations can produce a variety of signals formatted to carry information that can be, for example, stored or transmitted.
  • the information can include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal can be formatted to carry the bitstream of a described embodiment.
  • Such a signal can be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting can include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries can be, for example, analog or digital information.
  • the signal can be transmitted over a variety of different wired or wireless links, as is known.
  • the signal can be stored on a processor-readable medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Certains modes de réalisation d'un procédé peuvent consister à : obtenir une trame de nuage de points précédemment décodée en tant que trame de nuage de points de référence ; obtenir une caractéristique de mouvement qui décrit un mouvement entre la trame de nuage de points de référence et une trame de nuage de points courante ; déterminer une carte de caractéristiques de prédicteur sur la base de la trame de nuage de points de référence et de la caractéristique de mouvement ; obtenir une carte de caractéristiques courante qui représente la trame de nuage de points courante en considérant la carte de caractéristiques de prédicteur comme condition ; et reconstruire la trame de nuage de points courante sur la base de la carte de caractéristiques courante et utiliser la carte de caractéristiques de prédicteur en tant que condition.
PCT/US2024/049050 2023-10-10 2024-09-27 Codage prédictif explicite pour compression de nuage de points Pending WO2025080446A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363543479P 2023-10-10 2023-10-10
US63/543,479 2023-10-10

Publications (1)

Publication Number Publication Date
WO2025080446A1 true WO2025080446A1 (fr) 2025-04-17

Family

ID=93150227

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/049050 Pending WO2025080446A1 (fr) 2023-10-10 2024-09-27 Codage prédictif explicite pour compression de nuage de points

Country Status (1)

Country Link
WO (1) WO2025080446A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023132919A1 (fr) * 2022-01-10 2023-07-13 Interdigital Vc Holdings, Inc. Structure évolutive pour compression de nuage de points
WO2023133350A1 (fr) * 2022-01-10 2023-07-13 Interdigital Vc Holdings, Inc. Affinement de coordonnées et suréchantillonnage à partir d'une reconstruction de nuage de points quantifiée
WO2024220568A1 (fr) * 2023-04-20 2024-10-24 Interdigital Vc Holdings, Inc. Codage prédictif génératif pour compression de nuage de points

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023132919A1 (fr) * 2022-01-10 2023-07-13 Interdigital Vc Holdings, Inc. Structure évolutive pour compression de nuage de points
WO2023133350A1 (fr) * 2022-01-10 2023-07-13 Interdigital Vc Holdings, Inc. Affinement de coordonnées et suréchantillonnage à partir d'une reconstruction de nuage de points quantifiée
WO2024220568A1 (fr) * 2023-04-20 2024-10-24 Interdigital Vc Holdings, Inc. Codage prédictif génératif pour compression de nuage de points

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
BALLÉ, JOHANNES: "Variational image compression with a scale hyper prior", INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, 2018
DAVID MINNEN ET AL.: "Joint Autoregressive and Hierarchical Priors for Learned Image Compression", ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2018
MAO, JIAGENG ET AL.: "Voxel transformer for 3d object detection", IN PROCEEDINGS OF THE IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, 2021, pages 3164 - 3173
QI, CHARLES R. ET AL.: "Pointnet: Deep Learning on Point sets for 3D Classification and Segmentation", PROCEEDINGS OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 2017
SZEGEDY, CHRISTIAN ET AL.: "Inception-v4, inception-resnet and the impact of residual connections on learning", IN PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 31, 2017, pages 1
WANG, JIANQIANG ET AL.: "2021 DATA COMPRESSION CONFERENCE (DCC", 2021, IEEE, article "Multiscale Point Cloud Geometry Compression"
ZHANG, CHENG ET AL.: "PVT.- Point-Voxel Transformer for Point Cloud Learning", ARXIV: 2108.06076, 2021
ZHAO, HENGSHUANG ET AL.: "Point Transformer", ICCV, 2021, pages 16239 - 16248, XP034093296, DOI: 10.1109/ICCV48922.2021.01595

Similar Documents

Publication Publication Date Title
US20240212220A1 (en) System and method for procedurally colorizing spatial data
US20250119579A1 (en) Coordinate refinement and upsampling from quantized point cloud reconstruction
EP4588242A1 (fr) Suréchantillonnage basé sur un voxel sensible au contexte pour un traitement de nuage de points
EP4537303A1 (fr) Extracteur de caractéristiques de point sensible à la distribution à apprentissage profond pour compression de nuage de points à base d'ia
WO2024220568A1 (fr) Codage prédictif génératif pour compression de nuage de points
KR20250108615A (ko) 이기종 메시 자동 인코더
WO2025049125A1 (fr) Traitement amélioré des caractéristiques pour la compression d'images basé sur l'apprentissage de la distribution des caractéristiques
US12316844B2 (en) 3D point cloud enhancement with multiple measurements
JP2025536907A (ja) テクスチャメッシュの点ベースの属性転送
WO2025080446A1 (fr) Codage prédictif explicite pour compression de nuage de points
WO2025080447A1 (fr) Codage prédictif implicite pour compression de nuage de points
WO2025080594A1 (fr) Caractéristique d'arbre octal pour compression de nuage de points basée sur des caractéristiques profondes
US20250365427A1 (en) Multi-resolution motion feature for dynamic pcc
WO2025080438A1 (fr) Dynamique intra-trame pour compression de nuage de points lidar
US20250324089A1 (en) Reproducible learning-based point cloud coding
EP4636611A1 (fr) Prédiction basée sur l'apprentissage raht
US20250343920A1 (en) Rate control for point cloud coding with a hyperprior model
WO2025049126A1 (fr) Traitement de caractéristiques amélioré pour compression de nuage de points sur la base d'un apprentissage de distribution de caractéristiques
WO2025014553A1 (fr) Codage prédictif génératif pour compression de nuage de points lidar
WO2025078267A1 (fr) Procédé de codage de nuage de points hybride avec représentation de surface locale
WO2025078201A1 (fr) Schéma de codage d'attribut de nuage de points à deux étages avec transformées locales et globales imbriquées
WO2025101793A1 (fr) Mesure de géométrie adaptative pour des nuages de points 3d
WO2025149464A1 (fr) Procédés de prédiction alternative pour la transformation en ondelettes par relèvement sur des surfaces de maillages par subdivision
WO2025078337A1 (fr) Représentation multimédia d'avatar pour transmission
WO2025153193A1 (fr) Codec multimedia d'avatar géométrique pour une transmission

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24791172

Country of ref document: EP

Kind code of ref document: A1