WO2024011426A1

WO2024011426A1 - Point cloud geometry data augmentation method and apparatus, encoding method and apparatus, decoding method and apparatus, and encoding and decoding system

Info

Publication number: WO2024011426A1
Application number: PCT/CN2022/105285
Authority: WO
Inventors: 马展; 薛瑞翔; 魏红莲
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2024-01-18
Anticipated expiration: 2025-01-12
Also published as: CN119213771A; TW202406344A

Abstract

A point cloud geometry data augmentation method and apparatus, an encoding method and apparatus, a decoding method and apparatus, and an encoding and decoding system. According to the methods in embodiments of the present disclosure, at an encoding end, pixel down-sampling and feature extraction are performed multiple times on geometry data of an Nth scale to obtain feature data for augmenting geometry data of a (N+1)th-scale point cloud; at a decoding end, after pixel up-sampling and feature induction are performed on the feature data, the output feature data is spliced with the geometry data of the (N+1)th-scale point cloud to obtain augmented geometry data of the (N+1)th-scale point cloud, and then subsequent decoding processing is performed to obtain reconstructed geometry features of a Nth-scale point cloud. According to the embodiments of the present disclosure, point cloud coding performance can be improved, and an auto-encoder model used is convenient to train and use.

Description

A point cloud geometric data enhancement, encoding and decoding method, device and system

Technical field

本公开实施例涉及但不限于点云压缩技术，更具体地，涉及一种点云几何数据的增强方法、编解码方法、装置和系统。Embodiments of the present disclosure relate to, but are not limited to, point cloud compression technology, and more specifically, to an enhancement method, encoding and decoding method, device and system for point cloud geometric data.

Background technique

点云是空间中一组无规则分布的、表达三维物体或场景的空间结构及表面属性的离散点集，点云是一种三维数据，是在一个三维坐标系统下的一组向量的集合，这些向量可以表示(x,y,z)三维坐标，还可以表示颜色、反射率等属性信息。随着增强现实、虚拟现实、自动驾驶和机器人等新兴技术的蓬勃发展，点云数据因其对三维空间的简洁表达成为其主要的数据形式之一，但是点云数据量庞大，直接存储点云数据会消耗大量内存，不利于传输，因此需要不断提高点云压缩的性能。Point cloud is a set of discrete points randomly distributed in space that expresses the spatial structure and surface properties of a three-dimensional object or scene. Point cloud is a kind of three-dimensional data, which is a collection of vectors under a three-dimensional coordinate system. These vectors can represent (x, y, z) three-dimensional coordinates, and can also represent attribute information such as color and reflectivity. With the vigorous development of emerging technologies such as augmented reality, virtual reality, autonomous driving and robots, point cloud data has become one of its main data forms due to its concise expression of three-dimensional space. However, the amount of point cloud data is huge, and it is difficult to directly store point cloud data. Data consumes a lot of memory and is not conducive to transmission, so the performance of point cloud compression needs to be continuously improved.

发明概述Summary of the invention

以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.

本公开一实施例提供了一种点云几何数据增强方法，应用于点云解码器，包括：An embodiment of the present disclosure provides a point cloud geometric data enhancement method, which is applied to a point cloud decoder, including:

解析码流，得到用于增强第i+1尺度点云几何数据的特征数据；Parse the code stream to obtain feature data used to enhance the i+1th scale point cloud geometric data;

通过第i解码器网络的部分解码器对所述特征数据进行M _i-1次体素上采样和特征推理，输出的特征数据与第i+1尺度点云待增强的几何数据拼接，得到第i+1尺度点云增强后的几何数据； The feature data is subjected to M _i -1 times of voxel upsampling and feature inference through the partial decoder of the i-th decoder network, and the output feature data is spliced with the geometric data to be enhanced in the i+1-th scale point cloud to obtain the i-th i+1 scale point cloud enhanced geometric data;

其中，i为大于等于1的整数，M _i是大于等于2的整数。 Among them, i is an integer greater than or equal to 1, and M _i is an integer greater than or equal to 2.

本公开一实施例还提供了一种点云几何数据的解码方法，应用于点云解码器，包括：An embodiment of the present disclosure also provides a method for decoding point cloud geometric data, which is applied to a point cloud decoder, including:

解析码流，得到的第N+1尺度点云的几何数据作为待增强的几何数据，按照本公开任一实施例所述的点云几何数据增强方法进行数据增强，得到第N+1尺度点云增强后的几何数据，N≥1；The code stream is parsed, and the geometric data of the N+1th scale point cloud obtained is used as the geometric data to be enhanced. Data enhancement is performed according to the point cloud geometric data enhancement method described in any embodiment of the present disclosure, and the N+1th scale point is obtained. Cloud-enhanced geometric data, N≥1;

通过第N解码器网络其余的解码器对第N+1尺度点云增强后的几何数据进行一次体素上采样和特征推理，输出的数据再经过概率预测和点云裁剪，得到第N尺度点云的重建几何数据。The remaining decoders in the N-th decoder network perform voxel upsampling and feature inference on the enhanced geometric data of the N+1 scale point cloud. The output data is then subjected to probability prediction and point cloud clipping to obtain the N-th scale point. Reconstructed geometric data of clouds.

本公开一实施例还提供了一种点云几何数据的编码方法，应用于点云编码器，包括：An embodiment of the present disclosure also provides a method for encoding point cloud geometric data, which is applied to a point cloud encoder, including:

对第一尺度点云的几何数据进行N次体素下采样，得到第二尺度点云至第N+1尺度点云的几何数据，N≥1；Perform N times of voxel downsampling on the geometric data of the first-scale point cloud to obtain the geometric data of the second-scale point cloud to the N+1th scale point cloud, N≥1;

将第N尺度点云的几何数据输入第N自编码器模型的第N编码器网络进行M _N次体素下采样和特征提取，输出用于增强第N+1尺度点云几何数据的特征数据，M _N≥2； Input the geometric data of the Nth scale point cloud into the Nth encoder network of the Nth autoencoder model for M _N times of voxel downsampling and feature extraction, and output the feature data used to enhance the N+1th scale point cloud geometric data. , M _N ≥ 2;

对所述第N+1尺度点云的几何数据和所述第N编码器网络输出的所述特征数据进行熵编码。Entropy encoding is performed on the geometric data of the N+1th scale point cloud and the feature data output by the Nth encoder network.

本公开一实施例还提供了一种点云几何码流，其中，所述几何码流按照本公开任一实施例所述的点云几何数据的编码方法得到，包括第N+1尺度点云的几何数据和第N编码器网络输出的所述特征数据。An embodiment of the present disclosure also provides a point cloud geometric code stream, wherein the geometric code stream is obtained according to the encoding method of point cloud geometric data described in any embodiment of the present disclosure, including the N+1th scale point cloud The geometric data and the feature data output by the Nth encoder network.

本公开一实施例还提供了一种点云几何数据增强装置，包括处理器以及存储有计算机程序的存储器，其中，所述处理器执行所述计算机程序时能够实现如本公开任一实施例所述的点云几何数据增强方法。An embodiment of the present disclosure also provides a point cloud geometric data enhancement device, including a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can implement the method described in any embodiment of the present disclosure. The point cloud geometric data enhancement method described above.

本公开一实施例还提供了一种点云解码器，包括处理器以及存储有计算机程序的存储器，其中，所述处理器执行所述计算机程序时能够实现如本公开任一实施例所述的点云几何数据的解码方法。An embodiment of the present disclosure also provides a point cloud decoder, including a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can implement the method described in any embodiment of the present disclosure. Decoding method of point cloud geometric data.

本公开一实施例还提供了一种点云编码器，包括处理器以及存储有计算机程序的存储器，其中，所述处理器执行所述计算机程序时能够实现如本公开任一实施例所述的点云几何数据的编码方法。An embodiment of the present disclosure also provides a point cloud encoder, including a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can implement the method described in any embodiment of the present disclosure. Encoding method for point cloud geometric data.

本公开一实施例还提供了一种点云编解码系统，其中，包括本公开任一实施例所述的点云编码器，及本公开任一实施例所述的点云解码器。An embodiment of the present disclosure also provides a point cloud encoding and decoding system, which includes the point cloud encoder described in any embodiment of the present disclosure, and the point cloud decoder described in any embodiment of the present disclosure.

本公开一实施例还提供了一种非瞬态计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，其中，所述计算机程序时被处理器执行时能够实现本公开任一实施例所述的点云几何数据增强方法，或能够实现本公开任一实施例所述的点云几何信息的解码方法，或能够实现本公开任一实施例所述的点云几何信息的编码方法。An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium. The computer-readable storage medium stores a computer program, wherein the computer program, when executed by a processor, can implement any implementation of the present disclosure. The point cloud geometric data enhancement method described in the example may be able to implement the decoding method of point cloud geometric information described in any embodiment of the present disclosure, or may be able to implement the encoding method of point cloud geometric information described in any embodiment of the present disclosure. .

本公开一实施例还提供了一种点云裁剪方法，应用于点云解码器，包括：An embodiment of the present disclosure also provides a point cloud cropping method, which is applied to a point cloud decoder, including:

解析码流，得到待裁剪点云中的被占据体素的数量K；Analyze the code stream to obtain the number K of occupied voxels in the point cloud to be cropped;

确定所述待裁剪点云中的体素的占据概率；Determine the occupancy probability of the voxels in the point cloud to be clipped;

将所述待裁剪点云中由同一体素分解得到的M个体素分为一组，将每一组中占据概率最高的m个体素的占据概率置为1，然后对所述待裁剪点云中所有体素的占据概率排序，将占据概率最高的K个体素确定为所述待裁剪点云中的被占据体素，1≤m<M<K。Divide the M voxels decomposed from the same voxel in the point cloud to be clipped into one group, set the occupation probability of the m voxels with the highest occupancy probability in each group to 1, and then classify the point cloud to be clipped The occupancy probabilities of all voxels in are sorted, and the K voxels with the highest occupancy probabilities are determined as occupied voxels in the point cloud to be clipped, 1≤m<M<K.

在阅读并理解了附图和详细描述后，可以明白其他方面。Other aspects will be apparent after reading and understanding the drawings and detailed description.

附图概述Figure overview

附图用来提供对本公开实施例的理解，并且构成说明书的一部分，与本公开实施例一起用于解释本公开的技术方案，并不构成对本公开技术方案的限制。The drawings are used to provide an understanding of the embodiments of the present disclosure and constitute a part of the specification. Together with the embodiments of the present disclosure, they are used to explain the technical solutions of the present disclosure and do not constitute a limitation of the technical solutions of the present disclosure.

图1是G-PCC编码的流程框图；Figure 1 is a flow chart of G-PCC encoding;

图2是G-PCC解码的流程框图；Figure 2 is a flow chart of G-PCC decoding;

图3是本公开一实施例点云几何信息的编解码方法的示意图；Figure 3 is a schematic diagram of a method for encoding and decoding point cloud geometric information according to an embodiment of the present disclosure;

图4是本公开另一实施例点云几何数据的编解码方法的示意图；Figure 4 is a schematic diagram of a method for encoding and decoding point cloud geometric data according to another embodiment of the present disclosure;

图5是本公开一实施例点云几何数据的编码方法的流程图；Figure 5 is a flow chart of a method for encoding point cloud geometric data according to an embodiment of the present disclosure;

图6是本公开一实施例残差层的网络结构示意图；Figure 6 is a schematic diagram of the network structure of the residual layer according to an embodiment of the present disclosure;

图7是本公开一实施例编码器的网络结构示意图；Figure 7 is a schematic diagram of the network structure of an encoder according to an embodiment of the present disclosure;

图8是本公开一实施例自注意力层的网络结构示意图；Figure 8 is a schematic diagram of the network structure of the self-attention layer according to an embodiment of the present disclosure;

图9是本公开一实施例点云邻域自注意力层从输入数据中得到点云空间中的邻域上下文特征的过程的示意图；Figure 9 is a schematic diagram of the process of obtaining neighborhood context features in the point cloud space from input data by the point cloud neighborhood self-attention layer according to an embodiment of the present disclosure;

图10是本公开一实施例点云几何数据增强方法的流程图；Figure 10 is a flow chart of a point cloud geometric data enhancement method according to an embodiment of the present disclosure;

图11是本公开一实施例点云几何数据的解码方法的流程图；Figure 11 is a flow chart of a method for decoding point cloud geometric data according to an embodiment of the present disclosure;

图12是本公开一实施例概率预测器的网络结构示意图；Figure 12 is a schematic network structure diagram of a probability predictor according to an embodiment of the present disclosure;

图13A是本公开一实施例第二尺度点云中体素被占据情况的示意图；Figure 13A is a schematic diagram of the occupied status of voxels in the second scale point cloud according to an embodiment of the present disclosure;

图13B是本公开一实施例第三尺度点云中体素被占据情况的示意图；Figure 13B is a schematic diagram of the occupied status of voxels in the third scale point cloud according to an embodiment of the present disclosure;

图13C是本公开一实施例概率预测后得到的第二尺度点云中体素的占据概率的示意图；Figure 13C is a schematic diagram of the occupancy probability of voxels in the second-scale point cloud obtained after probability prediction according to an embodiment of the present disclosure;

图14是本公开一实施例解码器的网络结构示意图；Figure 14 is a schematic network structure diagram of a decoder according to an embodiment of the present disclosure;

图15是本公开一实施例点云几何数据增强装置的示意图。Figure 15 is a schematic diagram of a point cloud geometric data enhancement device according to an embodiment of the present disclosure.

详述Elaborate

本公开描述了多个实施例，但是该描述是示例性的，而不是限制性的，并且对于本邻域的普通技术人员来说显而易见的是，在本公开所描述的实施例包含的范围内可以有更多的实施例和实现方案。The present disclosure describes multiple embodiments, but the description is illustrative rather than restrictive, and it will be obvious to a person of ordinary skill in the art that within the scope of the embodiments described in the present disclosure Many more embodiments and implementations are possible.

本公开的描述中，“示例性的”或者“例如”等词用于表示作例子、例证或说明。本公开中被描述为“示例性的”或者“例如”的任何实施例不应被解释为比其他实施例更优选或更具优势。本文中的“和/或”是对关联对象的关联关系的一种描述，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。“多个”是指两个或多于两个。另外，为了便于清楚描述本公开实施例的技术方案，采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本邻域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定，并且“第一”、“第二”等字样也并不限定一定不同。In the description of the present disclosure, the words "exemplary" or "such as" are used to mean an example, illustration, or explanation. Any embodiment described in this disclosure as "exemplary" or "such as" is not intended to be construed as preferred or advantageous over other embodiments. "And/or" in this article is a description of the relationship between associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and they exist alone. B these three situations. "Plural" means two or more than two. In addition, in order to facilitate a clear description of the technical solutions of the embodiments of the present disclosure, words such as “first” and “second” are used to distinguish the same or similar items with basically the same functions and effects. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not limit the number and execution order.

在描述具有代表性的示例性实施例时，说明书可能已经将方法和/或过程呈现为特定的步骤序列。然而，在该方法或过程不依赖于本文所述步骤的特定顺序的程度上，该方法或过程不应限于所述的特定顺序的步骤。如本邻域普通技术人员将理解的，其它的步骤顺序也是可能的。因此，说明书中阐述的步骤的特定顺序不应被解释为对权利要求的限制。此外，针对该方法和/或过程的权利要求不应限于按照所写顺序执行它们的步骤，本邻域技术人员可以容易地理解，这些顺序可以变化，并且仍然保持在本公开实施例的精神和范围内。In describing representative exemplary embodiments, the specification may have presented methods and/or processes as a specific sequence of steps. However, to the extent that the method or process does not rely on the specific order of steps described herein, the method or process should not be limited to the specific order of steps described. As one of ordinary skill in the art will appreciate, other sequences of steps are possible. Therefore, the specific order of steps set forth in the specification should not be construed as limiting the claims. Furthermore, claims directed to the method and/or process should not be limited to steps performing them in the order written, as those skilled in the art will readily understand that such order may be varied and still remain within the spirit and spirit of the disclosed embodiments. within the range.

点云压缩算法包括基于几何的点云压缩(Geometry-based Point Cloud Compression，G-PCC)，G-PCC中的几何压缩主要通过八叉树模型和/或三角形表面模型实现。Point cloud compression algorithms include geometry-based point cloud compression (G-PCC). Geometric compression in G-PCC is mainly implemented through octree models and/or triangular surface models.

为了便于对本公开实施例所提供的技术方案的理解，首先提供一种G-PCC编码的流程框图和G-PCC解码的流程框图。需要说明的是，本公开实施例描述的G-PCC编码的流程框图和G-PCC解码的流程框图仅是为了更加清楚地说明本公开实施例的技术方案，并不构成对于本公开实施例的限定。本领域技术人员可知，随着点云压缩技术的演变和新业务场景的出现，本公开实施例提供的技术方案对于类似G-PCC的点云压缩架构同样适用，本公开实施例压缩的点云可以是视频中的点云，但不局限于此。In order to facilitate understanding of the technical solutions provided by the embodiments of the present disclosure, a flow chart of G-PCC encoding and a flow chart of G-PCC decoding are first provided. It should be noted that the flow chart of G-PCC encoding and the flow chart of G-PCC decoding described in the embodiment of the present disclosure are only for the purpose of more clearly illustrating the technical solutions of the embodiment of the present disclosure, and do not constitute a guarantee for the embodiment of the present disclosure. limited. Those skilled in the art know that with the evolution of point cloud compression technology and the emergence of new business scenarios, the technical solutions provided by the embodiments of the present disclosure are also applicable to point cloud compression architectures similar to G-PCC. The point cloud compressed by the embodiments of the present disclosure It can be a point cloud in the video, but is not limited to this.

在点云G-PCC编码器框架中，将输入三维图像模型的点云进行切片(slice)划分后，对每一个slice进行独立编码。In the point cloud G-PCC encoder framework, the point cloud of the input three-dimensional image model is divided into slices, and each slice is independently encoded.

如图1所示的G-PCC编码的流程框图中，应用于点云编码器中，针对待编码的点云数据，先通过slice划分，将点云数据划分为多个slice。在每一个slice中，点云的几何信息和属性信息是分开进行编码的。在几何编码过程中，对几何信息进行坐标转换，使点云全都包含在一个包围盒(bounding box)中，然后再进行量化，量化主要起到缩放的作用，由于量化取整，使得一部分点云的几何信息相同，可以基于参数来决定是否移除重复点，量化和移除重复点这一过程又被称为体素化过程。接着对bounding box进行八叉树划分。在基于八叉树的几何信息编码流程中，将包围盒八等分为8个子立方体，对非空的(包含点云中的点)的子立方体继续进行八等分，直到划分得到的叶子结点为1x1x1的单位立方体时停止划分，对叶子结点中的点进行算术编码，生成二进制的几何比特流，即几何码流。在基于三角面片集(triangle soup，trisoup)的几何信息编码过程中，同样也要先进行八叉树划分，但区别于基于八叉树的几何信息编码，该trisoup不需要将点云逐级划分到边长为1x1x1的单位立方体，而是划分到子块(block)边长为W时停止划分，基于每个block中点云的分布所形成的表面，得到该表面与block的十二条边所产生的至多十二个交点(vertex)，对vertex进行算术编码(基于交点进行表面拟合)，生成二进制的几何比特流(即几何码流)。vertex还用于在几何重建的过程的实现，而重建的几何信息在对点云的属性编码时使用。As shown in the flow chart of G-PCC encoding shown in Figure 1, it is applied to the point cloud encoder. For the point cloud data to be encoded, the point cloud data is first divided into multiple slices through slice division. In each slice, the geometric information and attribute information of the point cloud are encoded separately. In the process of geometric encoding, the geometric information is transformed into coordinates so that all point clouds are contained in a bounding box, and then quantized. The quantization mainly plays the role of scaling. Due to the quantization rounding, part of the point cloud The geometric information is the same, and it can be decided whether to remove duplicate points based on parameters. The process of quantifying and removing duplicate points is also called the voxelization process. Then divide the bounding box into an octree. In the geometric information encoding process based on the octree, the bounding box is divided into eight equal parts into eight sub-cubes, and the non-empty sub-cubes (containing points in the point cloud) continue to be divided into eight equal parts until the leaf structure is obtained. The division stops when the point is a 1x1x1 unit cube, and the points in the leaf nodes are arithmetic encoded to generate a binary geometric bit stream, that is, a geometric code stream. In the process of encoding geometric information based on triangle patch set (triangle soup, trisoup), octree division is also required first, but unlike octree-based geometric information encoding, this trisoup does not need to divide the point cloud step by step. It is divided into a unit cube with a side length of 1x1x1, but is divided into sub-blocks (blocks). The division stops when the side length is W. Based on the surface formed by the distribution of point clouds in each block, twelve links between the surface and the block are obtained. At most twelve intersection points (vertex) generated by the edges, the vertex is arithmetic encoded (surface fitting based on the intersection points), and a binary geometry bit stream (i.e., geometry code stream) is generated. Vertex is also used in the implementation of the geometric reconstruction process, and the reconstructed geometric information is used when encoding the attributes of the point cloud.

在属性编码过程中，进行颜色转换，将颜色信息(即属性信息)从RGB颜色空间转换到YUV颜色空间。然后，利用重建的几何信息对点云重新着色，使得未编码的属性信息与重建的几何信息对应起来。在颜色信息编码过程中，主要有两种变换方法，一是依赖于细节层次(Level of Detail，LOD)划分的基于距离的提升变换，二是直接进行区域自适应分层变换(Region Adaptive Hierarchal Transform，RAHT)的变换，这两种方法都会将颜色信息从空间域转换到频域，通过变换得到高频系数和低频系数，最后对系数进行量化(即量化系数)，最后，将经过八叉树划分及表面拟合的几何编码数据与量化系数处理属性编码数据进行slice合成后，依次编码每个block的vertex坐标(即算数编码)，生成二进制的属性比特流，即属性码流。During the attribute encoding process, color conversion is performed to convert color information (ie, attribute information) from RGB color space to YUV color space. Then, the point cloud is recolored using the reconstructed geometric information so that the unencoded attribute information corresponds to the reconstructed geometric information. In the process of color information encoding, there are two main transformation methods. One is the distance-based lifting transformation that relies on Level of Detail (LOD) division, and the other is the direct Region Adaptive Hierarchal Transform (Region Adaptive Hierarchal Transform). , RAHT) transformation, both methods will convert the color information from the spatial domain to the frequency domain, obtain high-frequency coefficients and low-frequency coefficients through transformation, and finally quantize the coefficients (i.e., quantization coefficients). Finally, the octree will be passed After the geometric encoding data of division and surface fitting and the attribute encoding data of quantization coefficient processing are slice-synthesized, the vertex coordinates of each block are sequentially encoded (i.e., arithmetic encoding) to generate a binary attribute bit stream, that is, an attribute code stream.

如图2所示的G-PCC解码的流程框图，应用于点云解码器中。解码器获取二进制码流，针对二进制码流中的几何比特流(即几何码流)和属性比特流分别进行独立解码。在对几何比特流的解码时，通过算术解码、八叉树合成、表面拟合、重建几何和反坐标变换，得到点云的几何信息；在对属性比特流的解码时，通过算术解码、反量化、基于LOD的反提升或者基于RAHT的反变换、及反颜色转换，得到点云的属性信息，基于几何信息和属性信息还原点云数据的三维图像模型。The flow chart of G-PCC decoding shown in Figure 2 is applied to the point cloud decoder. The decoder obtains the binary code stream and independently decodes the geometric bit stream (i.e., the geometric code stream) and the attribute bit stream in the binary code stream. When decoding the geometry bit stream, the geometric information of the point cloud is obtained through arithmetic decoding, octree synthesis, surface fitting, reconstructed geometry and inverse coordinate transformation; when decoding the attribute bit stream, through arithmetic decoding, inverse coordinate transformation Quantization, LOD-based inverse lifting or RAHT-based inverse transformation, and inverse color conversion are used to obtain the attribute information of the point cloud, and the three-dimensional image model of the point cloud data is restored based on the geometric information and attribute information.

神经网络和深度学习技术也可应用于点云几何压缩技术，例如，基于三维卷积神经网络(3D Convolution Neural Network,3D CNN)的体积模型压缩技术，直接对点坐标集合利用基于多层感知机(Multi-Layer Perceptron,MLP)的神经网络的压缩技术，对八叉树的节点符号利用MLP或3D CNN进行概率估计和熵编码的压缩技术，以及基于三维稀疏卷积神经网络的压缩技术，等等。点云按照点的密度可以分为稀疏点云和稠密点云，其中稀疏点云在三维空间有表示范围大，分布稀疏的特点，可以表示一个场景；而稠密点云则具有表示范围小，分布密集的特点，可以表示一个物体。以上压缩技术在这两种点云上的压缩性能往往具有较大差异，在稠密点云上表现较好，在稀疏点云上则表现较差。Neural network and deep learning technology can also be applied to point cloud geometry compression technology. For example, volume model compression technology based on 3D Convolution Neural Network (3D CNN) directly uses multi-layer perceptron based on point coordinate collection. (Multi-Layer Perceptron, MLP) neural network compression technology, compression technology that uses MLP or 3D CNN for probability estimation and entropy coding of octree node symbols, and compression technology based on three-dimensional sparse convolutional neural networks, etc. wait. Point clouds can be divided into sparse point clouds and dense point clouds according to the density of points. Sparse point clouds have the characteristics of large representation range and sparse distribution in three-dimensional space, and can represent a scene; while dense point clouds have the characteristics of small representation range and sparse distribution. Dense features can represent an object. The compression performance of the above compression technologies on these two point clouds is often quite different, with better performance on dense point clouds and worse performance on sparse point clouds.

为了提高基于神经网络的编解码方法在稀疏点云上的性能，本公开实施例提供了一种基于自编码器模型的点云几何编解码方法，可以实现对点云的有损压缩。In order to improve the performance of the neural network-based encoding and decoding method on sparse point clouds, embodiments of the present disclosure provide a point cloud geometric encoding and decoding method based on an autoencoder model, which can achieve lossy compression of point clouds.

本公开实施例点云几何数据的编码方法可以应用于如图1所示的G-PCC的几何信息编码流程中，替代体素化完成之后的编码处理(如八叉树划分、表面拟合等)，得到几何码流。本公开实施例点云几何数据的解码方法可以应用于如图2所示的G-PCC的几何信息解码流程中，替代反坐标变换之前的对几何码流的解码处理(如八叉树合成、表面拟合等)，得到点云的重建几何数据。本公开实施例的编码方法中的熵编码可以使用图1中的算术编码方法，本公开实施例的解码方法中的熵解码可以使用图2中的算术解码方法。但本公开实施例点云几何数据的编解码方法也可以用于G-PCC之外的其他点云编码和解码流程中。The encoding method of point cloud geometric data in the embodiment of the present disclosure can be applied to the geometric information encoding process of G-PCC as shown in Figure 1, replacing the encoding process after voxelization is completed (such as octree division, surface fitting, etc. ), get the geometric code stream. The decoding method of point cloud geometric data in the embodiment of the present disclosure can be applied to the geometric information decoding process of G-PCC as shown in Figure 2, replacing the decoding processing of the geometric code stream before inverse coordinate transformation (such as octree synthesis, Surface fitting, etc.) to obtain reconstructed geometric data of the point cloud. The entropy coding in the encoding method of the embodiment of the present disclosure can use the arithmetic coding method in Figure 1, and the entropy decoding in the decoding method of the embodiment of the present disclosure can use the arithmetic decoding method in Figure 2. However, the encoding and decoding method of point cloud geometric data in the embodiment of the present disclosure can also be used in other point cloud encoding and decoding processes besides G-PCC.

本公开一实施例点云几何数据的编解码方法的示意图如图3所示，在编码端，对第一尺度点云的几何数据进行两次体素下采样。该第一尺度点云可以是待编码的原始尺度点云，对第一尺度点云的几何数据进行一次体素下采样后，得到第二尺度点云的几何数据；对第二尺度点云的几何数据再进行一次体素下采样后，得到第三尺度点云的几何数据。第三尺度点云的几何数据经熵编码后生成几何码流。解码端经熵解码可得到第三尺度点云无损的几何数据，且需要基于第三尺度点云的几何数据得到更高尺度(如第二尺度点云、第一尺度点云)的重建几何数据。A schematic diagram of a method for encoding and decoding point cloud geometric data according to an embodiment of the present disclosure is shown in Figure 3. At the encoding end, the geometric data of the first scale point cloud is subjected to voxel downsampling twice. The first-scale point cloud can be the original-scale point cloud to be encoded. After performing voxel downsampling on the geometric data of the first-scale point cloud, the geometric data of the second-scale point cloud is obtained; After another voxel downsampling of the geometric data, the geometric data of the third-scale point cloud is obtained. The geometric data of the third-scale point cloud is entropy-encoded to generate a geometric code stream. The decoding end can obtain lossless geometric data of the third-scale point cloud through entropy decoding, and it is necessary to obtain reconstructed geometric data of higher scales (such as second-scale point cloud, first-scale point cloud) based on the geometric data of the third-scale point cloud. .

为了提高更高尺度的重建几何数据的准确性，本公开实施例通过自编码器模型增强低尺度点云的几何数据。具体地，本实施例在编码端，通过编码器网络对第二尺度点云的几何数据进行至少两次的体素下采样和特征提取，得到用于增强第三尺度点云几何数据的特征数据，图中是通过两个编码器分别进行一次体素下采样(步长2×2×2)和特征提取，以提取出真正对重建有帮助的特征数据，且减少要传输的数据量。文中将通过神经网络提取的特征数据称为隐式特征(latent feature)数据。编码器网络输出的特征数据经量化、熵编码写入码流，也可以直接经熵编码写入码流。In order to improve the accuracy of reconstructed geometric data at higher scales, embodiments of the present disclosure enhance the geometric data of low-scale point clouds through an autoencoder model. Specifically, on the encoding side, this embodiment performs at least two voxel downsampling and feature extraction on the geometric data of the second-scale point cloud through the encoder network to obtain feature data used to enhance the geometric data of the third-scale point cloud. , in the figure, voxel downsampling (step size 2×2×2) and feature extraction are performed through two encoders respectively to extract feature data that is truly helpful for reconstruction and reduce the amount of data to be transmitted. In this article, the feature data extracted through the neural network is called latent feature data. The feature data output by the encoder network is quantized and entropy-encoded and written into the code stream, or it can also be directly entropy-encoded and written into the code stream.

在解码端，经熵解码得到第三尺度点云无损的几何数据和用于增强第三尺度点云几何数据的特征数据。该无损的几何数据即第三尺度点云待增强的几何数据。通过解码器网络的一个解码器对所述特征数据进行一次体素上采样和特征推理后，输出的特征数据与第三尺度点云待增强的几何数据拼接，得到第三尺度点云增强后的几何数据，图中表示为第三尺度点云的几何数据+特征数据。At the decoding end, lossless geometric data of the third-scale point cloud and feature data used to enhance the geometric data of the third-scale point cloud are obtained through entropy decoding. This lossless geometric data is the geometric data to be enhanced in the third scale point cloud. After a decoder of the decoder network performs voxel upsampling and feature inference on the feature data, the output feature data is spliced with the geometric data to be enhanced in the third scale point cloud to obtain the enhanced third scale point cloud. Geometric data, represented in the figure as geometric data + feature data of the third-scale point cloud.

如图3所示，得到第三尺度点云增强后的几何数据后，再通过解码器网络的另一解码器对第三尺度点云增强后的几何数据进行一次体素上采样和特征推理，该解码器输出的数据再经过概率预测和点云裁剪，得到第二尺度点云的重建几何数据。第三尺度点云的几何数据经特征增强后，再解码得到的第二尺度点云的重建几何数据更接近于第二尺度点云原始的几何数据，可以明显提升解码性能。上述编码器网络和解码器网络属于同一自编码器模型，两者的网络参数通过共同训练得到。As shown in Figure 3, after obtaining the enhanced geometric data of the third-scale point cloud, voxel upsampling and feature inference are performed on the enhanced geometric data of the third-scale point cloud through another decoder of the decoder network. The data output by the decoder is then subjected to probability prediction and point cloud clipping to obtain reconstructed geometric data of the second-scale point cloud. After the geometric data of the third-scale point cloud is enhanced with features, the reconstructed geometric data of the second-scale point cloud obtained by decoding is closer to the original geometric data of the second-scale point cloud, which can significantly improve the decoding performance. The above-mentioned encoder network and decoder network belong to the same autoencoder model, and the network parameters of the two are obtained through joint training.

第二尺度点云的重建几何数据可以继续送入概率预测模型，进行一次体素上采样和特征推理，以及概率预测和点云裁剪，得到第一尺度点云的重建几何数据。这里用于进行一次体素上采样和特征推理的解码器可以与解码器网络中的解码器采用相同的结构，也可以另行设计。The reconstructed geometric data of the second-scale point cloud can continue to be fed into the probabilistic prediction model, which performs voxel upsampling and feature inference, as well as probabilistic prediction and point cloud clipping, to obtain the reconstructed geometric data of the first-scale point cloud. The decoder used here for one-time voxel upsampling and feature inference can adopt the same structure as the decoder in the decoder network, or it can be designed separately.

图示的实施例没有对第二尺度点云的重建几何数据进行增强，但在其他实施例中，也可以用类似的方式对第二尺度点云的重建几何数据进行增强，得到第二尺度点云增强后的几何数据，再送入概率预测模型得到第一尺度点云的重建几何数据。是否需要增强，可以根据所需的开销和性能提升的幅度来确定，本公开对此不做局限。The illustrated embodiment does not enhance the reconstructed geometric data of the second-scale point cloud. However, in other embodiments, the reconstructed geometric data of the second-scale point cloud can also be enhanced in a similar manner to obtain the second-scale point. The cloud-enhanced geometric data is then fed into the probabilistic prediction model to obtain the reconstructed geometric data of the first-scale point cloud. Whether enhancement is needed can be determined based on the required overhead and the extent of performance improvement, and this disclosure is not limited in this regard.

本公开实施例在点云几何数据的编解码过程中，可以灵活地对不同尺度点云的几何数据进行增强，自编码器模型的训练可以基于两个相邻尺度点云的几何数据来实现，不必如其他方法那样为所有尺度的点云设计好编码网络和解码网络后再一起训练，简单方便，可移植性好。In the encoding and decoding process of point cloud geometric data, the embodiments of the present disclosure can flexibly enhance the geometric data of point clouds of different scales. The training of the autoencoder model can be implemented based on the geometric data of two adjacent scale point clouds. It is not necessary to design the encoding network and the decoding network for point clouds of all scales and then train them together like other methods. It is simple, convenient and has good portability.

本公开一实施例点云几何数据的编解码方法的示意图如图4所示，本实施例对第一尺度点云的几何数据进行体素下采样的次数i大于等于3次，比图3所示的实施例更多。图中所示的第i+1尺度点云是最小尺度的点云，该点云的几何数据通过熵编码无损压缩。本实施例除了对该最小尺度即第i+1尺度点云的几何数据进行增强外，还对次小尺度即第i尺度点云的重建几何数据进行增强。其中增强第i+1尺度点云几何数据的方法可以参见图3所示实施例中增强第三尺度点云几何数据的方法，两者是相同的，仅点云、编解码网络的编号不同。其中使用的第i编码器网络(包括两个编码器)和第i解码器网络(包括两个解码器)同属于第i自编码器模型。A schematic diagram of a method for encoding and decoding point cloud geometric data according to an embodiment of the present disclosure is shown in Figure 4. In this embodiment, the number of times i of voxel downsampling for the geometric data of the first scale point cloud is greater than or equal to 3 times, which is better than that shown in Figure 3. More examples are shown. The i+1th scale point cloud shown in the figure is the smallest scale point cloud, and the geometric data of this point cloud is losslessly compressed through entropy coding. In addition to enhancing the geometric data of the point cloud at the smallest scale, that is, the i+1th scale, this embodiment also enhances the reconstructed geometric data of the point cloud at the next smallest scale, that is, the i-th scale. The method of enhancing the i+1th scale point cloud geometric data can be found in the method of enhancing the third scale point cloud geometric data in the embodiment shown in Figure 3. The two are the same, only the numbers of the point cloud and the encoding and decoding network are different. The i-th encoder network (including two encoders) and i-th decoder network (including two decoders) used in it both belong to the i-th autoencoder model.

如图所示，本实施例为了增强第i尺度点云的重建几何数据，在编码端使用第i-1编码器网络(包括两个编码器)对第i-1尺度点云的几何数据进行M _i-1次体素下采样和特征提取，得到用于增强第i尺度点云几何数据的特征数据。这里使用的第i-1编码器网络和第i编码器网络的结构可以相同或不同，可以分别进行训练。对第i-1尺度点云的几何数据进行体素下采样和特征提取的次数M _i-1和对第i尺度点云的几何数据进行体素下采样和特征提取的次数M _i均大于2，也可以相同或不同。用于增强第i尺度点云几何数据的特征数据经量化和熵编码写入几何码流，或者经熵编码写入几何码流。 As shown in the figure, in this embodiment, in order to enhance the reconstructed geometric data of the i-th scale point cloud, the i-1th encoder network (including two encoders) is used at the encoding end to perform the processing on the geometric data of the i-1th scale point cloud. Mi _-1 times of voxel downsampling and feature extraction are performed to obtain feature data used to enhance the i-th scale point cloud geometric data. The structures of the i-1th encoder network and the i-th encoder network used here can be the same or different, and can be trained separately. The number of voxel down-sampling and feature extraction operations Mi _-1 for the geometric data of the i-1th scale point cloud and the number of voxel down-sampling and feature extraction operations for the geometric data of the i-th scale point cloud Mi _i are both greater than 2. , can also be the same or different. The feature data used to enhance the i-th scale point cloud geometric data is quantized and entropy-encoded and written into the geometry code stream, or is entropy-encoded and written into the geometry code stream.

在解码端，经熵解码得到用于增强第i尺度点云几何数据的特征数据，同时通过对第i+1尺度点云经特征增强的几何数据进行一次体素上采样和特征推理，以及概率预测和点云裁剪，可以得到第i尺度点云的重建几何数据即待增强的几何数据。通过第i-1解码器网络的一个解码器对用于增强第i尺度点云几何数据的特征数据进行一次体素上采样和特征推理后，输出的特征数据与第i尺度点云待增强的几何数据拼接，得到第i尺度点云增强后的几何数据。At the decoding end, the feature data used to enhance the geometric data of the i-th scale point cloud is obtained through entropy decoding. At the same time, a voxel upsampling and feature inference are performed on the feature-enhanced geometric data of the i+1-th scale point cloud, as well as probability Through prediction and point cloud clipping, the reconstructed geometric data of the i-th scale point cloud can be obtained, which is the geometric data to be enhanced. After a voxel upsampling and feature inference is performed on the feature data used to enhance the i-th scale point cloud geometric data through a decoder of the i-1 decoder network, the output feature data is consistent with the i-th scale point cloud to be enhanced. The geometric data is spliced to obtain the enhanced geometric data of the i-th scale point cloud.

如图4所示，得到第i尺度点云增强后的几何数据后，再通过概率预测模型对第i尺度点云增强后的几何数据进行一次体素上采样和特征推理，以及概率预测和点云裁剪，输出第i-1尺度点云的重建几何数据。在该概率预测模型中，是使用第i-1解码器网络的另一解码器对第i尺度点云增强后的几何数据进行一次体素上采样和特征推理。第i-1解码器网络和第i-1编码器网络同属于第i-1自编码器模型。图示示例中的第i-1解码器网络包括两个编码器，第i-1编码器网络包括两个解码器。但在其他实施例中可以采用更多的编码器来实现更多次数的体素下采样和特征提取，以及采用更多的解码器来实现更多次数的体素上采样和特征推理。As shown in Figure 4, after obtaining the enhanced geometric data of the i-th scale point cloud, a voxel upsampling and feature inference are performed on the enhanced geometric data of the i-th scale point cloud through the probabilistic prediction model, as well as probability prediction and point Cloud clipping, outputs the reconstructed geometric data of the i-1th scale point cloud. In this probabilistic prediction model, another decoder of the i-1 decoder network is used to perform voxel upsampling and feature inference on the geometric data after the i-th scale point cloud enhancement. The i-1 decoder network and the i-1 encoder network both belong to the i-1 autoencoder model. The i-1th decoder network in the illustrated example includes two encoders, and the i-1th encoder network includes two decoders. However, in other embodiments, more encoders may be used to implement more times of voxel downsampling and feature extraction, and more decoders may be used to implement more times of voxel upsampling and feature inference.

本实施例是对最小尺度点云和次小尺度点云的几何数据进行增强，但这仅仅是示例性的，在其他实施例中，也可以对更多尺度点云的几何数据进行增强，或者对最小尺度点云和除次小尺度之外的其他尺度点云的几何数据进行增强，实现方式都是类似的，这里不再赘述。对哪些尺度点云的几何数据进行增强，可以根据所需的开销和性能提升的幅度来确定。This embodiment enhances the geometric data of the smallest scale point cloud and the sub-small scale point cloud, but this is only exemplary. In other embodiments, the geometric data of more scale point clouds can also be enhanced, or The implementation methods for enhancing the geometric data of the smallest scale point cloud and other scale point clouds except the sub-small scale are similar and will not be described again here. Which scale point cloud geometric data should be enhanced can be determined based on the required overhead and the extent of performance improvement.

以图3为例，对自编码器模型在训练时，按照图示连接好编码器网络和解码器网络，但图中熵编码和熵解码可以取消，可以采用常用的对深度神经网络训练时的点云样本，训练损失函数可以设定为BCE(Binary Cross Entropy)loss，即通过概率预测得到的第三尺度点云中体素的占据概率与第三尺度点云中体素的实际占据符号的交叉熵。Take Figure 3 as an example. When training the autoencoder model, connect the encoder network and decoder network as shown in the figure. However, the entropy encoding and entropy decoding in the figure can be cancelled. You can use the commonly used methods for training deep neural networks. For point cloud samples, the training loss function can be set to BCE (Binary Cross Entropy) loss, which is the difference between the occupation probability of voxels in the third-scale point cloud obtained through probability prediction and the actual occupation symbol of the voxel in the third-scale point cloud. Cross entropy.

本公开实施例还提供了一种点云几何数据的编码方法，如图5所示，包括：Embodiments of the present disclosure also provide a method for encoding point cloud geometric data, as shown in Figure 5, including:

步骤110，对第一尺度点云的几何数据进行N次体素下采样，得到第二尺度点云至第N+1尺度点云的几何数据，N≥1；Step 110: Perform N times of voxel downsampling on the geometric data of the first scale point cloud to obtain the geometric data of the second scale point cloud to the N+1th scale point cloud, N≥1;

步骤120，将第N尺度点云的几何数据输入第N自编码器模型的第N编码器网络进行M _N次体素下采样和特征提取，输出用于增强第N+1尺度点云几何数据的特征数据，M _N≥2； Step 120: Input the geometric data of the Nth scale point cloud into the Nth encoder network of the Nth autoencoder model for M _N times of voxel downsampling and feature extraction, and the output is used to enhance the N+1th scale point cloud geometric data. Characteristic data, M _N ≥ 2;

步骤130，对所述第N+1尺度点云的几何数据和所述第N编码器网络输出的所述特征数据进行熵编码。Step 130: Entropy encoding is performed on the geometric data of the N+1 scale point cloud and the feature data output by the Nth encoder network.

在上述步骤110中，对第一尺度点云的几何数据进行体素下采样之前，需要完成对点云几何信息的体素化。体素化之后，点云呈现为体素网格的形式。体素是体素网格中的最小单元，点云中的一个点对应一个被占据的体素(即非空体素)，而未被占据的体素(即空体素)表示该位置没有点。点云的几何数据可以有不同的表示方式。例如，点云的几何数据可以用点云中体素的占据符号(也可称为占位符号、占位符等)来表示，将被占据的体素标记为1，未被占据的体素标记为0，得到一个二进制的符号序列。又如，点云的几何数据也可以用以稀疏张量的形式表示，将点云中所有点的坐标数据按照约定的顺序排列。不同的表示方式之间可以相互转换。In the above step 110, before performing voxel downsampling on the geometric data of the first-scale point cloud, the voxelization of the point cloud geometric information needs to be completed. After voxelization, the point cloud is presented as a voxel grid. A voxel is the smallest unit in the voxel grid. A point in the point cloud corresponds to an occupied voxel (i.e., a non-empty voxel), while an unoccupied voxel (i.e., an empty voxel) indicates that there is no point. The geometric data of point clouds can be represented in different ways. For example, the geometric data of the point cloud can be represented by the occupation symbols of the voxels in the point cloud (also called placeholders, placeholders, etc.). The occupied voxels are marked as 1, and the unoccupied voxels are marked as 1. Marked as 0, a binary symbol sequence is obtained. For another example, the geometric data of a point cloud can also be expressed in the form of a sparse tensor, and the coordinate data of all points in the point cloud are arranged in an agreed order. Different representations can be converted to each other.

本公开实施例点云几何数据的编码方法在编码端，将第N尺度点云的几何数据输入第N自编码器模型的第N编码器网络进行M _N次体素下采样和特征提取，输出用于增强第N+1尺度点云几何数据的特征数据。该特征数据经熵编码后，随几何码流传输到解码端。该特征数据是以第N尺度点云的几何数据为输入提取到的特征数据，其中包含了第N+1尺度点云的几何数据没有覆盖的更高尺度点云的隐式的几何信息，可以帮助解码端增强第N+1尺度点云的几何数据，从而得到更为准确的第N尺度点云的重建几何数据。提高重建点云的质量，而该特征数据经过多次体素下采样，需要传输的数据量少，可以提升点云压缩的效率。 In the encoding method of point cloud geometric data in the embodiment of the present disclosure, at the encoding end, the geometric data of the Nth scale point cloud is input into the Nth encoder network of the Nth autoencoder model to perform M _N times of voxel downsampling and feature extraction, and output Feature data used to enhance N+1 scale point cloud geometric data. After the feature data is entropy encoded, it is transmitted to the decoder along with the geometric code stream. This feature data is the feature data extracted by taking the geometric data of the Nth scale point cloud as input, which contains the implicit geometric information of the higher scale point cloud that is not covered by the geometric data of the N+1th scale point cloud. Helps the decoder to enhance the geometric data of the N+1 scale point cloud, thereby obtaining more accurate reconstructed geometric data of the Nth scale point cloud. Improve the quality of the reconstructed point cloud, and the feature data has been down-sampled multiple times, requiring less data to be transmitted, which can improve the efficiency of point cloud compression.

在本公开一示例性的实施例中，对第一尺度点云的几何数据进行体素下采样时，可以通过简单的池化方式实现。如采用步长为2×2×2的最大池化层，将第一尺度点云的8个体素合并为第二尺度点云中的1个体素，从而实现一次体素下采样，每次下采样将点云在三个维度上的尺寸均缩小为原来的一半。两个尺度点云之间，尺寸较大的可称为高尺度点云，尺寸较小的可称为低尺度点云。通过N次体素下采样得到的点云中，第N+1尺度点云是尺度最小的点云，其数据量最少，可以经熵编码写入码流。In an exemplary embodiment of the present disclosure, when performing voxel downsampling on the geometric data of the first-scale point cloud, it can be implemented through a simple pooling method. For example, a maximum pooling layer with a step size of 2×2×2 is used to merge 8 voxels of the first-scale point cloud into 1 voxel of the second-scale point cloud, thus achieving one voxel downsampling. Sampling reduces the size of the point cloud to half its original size in all three dimensions. Between two scale point clouds, the larger one can be called a high-scale point cloud, and the smaller one can be called a low-scale point cloud. Among the point clouds obtained by N times of voxel downsampling, the N+1th scale point cloud is the point cloud with the smallest scale, has the smallest amount of data, and can be written into the code stream through entropy encoding.

请参见图3，图中第三尺度点云包括2×2×1个体素，而第二尺度点云包括4×4×2个体素，第一尺度点云包括8×8×4个体素。图中仅用实的立方块示出了各尺度点云中被占据的体素。图3所示的点云仅仅是示例性的，实际的点云通常包括更多的体素。低尺度点云的几何数据和高尺度点云的几何数据存在一定程度的相关性，例如，低尺度点云中一个被占据的体素的周围均是被占据的体素(如该体素为位于一个物体的中部时)，则该体素分解为高尺度点云中的多个体素后，分解得到的该多个体素有较大的概率也是被占据的体素。这些相关性可以通过神经网络提取的特征来体现。Please refer to Figure 3. The third-scale point cloud in the figure includes 2×2×1 voxels, while the second-scale point cloud includes 4×4×2 voxels, and the first-scale point cloud includes 8×8×4 voxels. Only solid cubes are used in the figure to show the occupied voxels in the point cloud at each scale. The point cloud shown in Figure 3 is only exemplary, and actual point clouds usually include many more voxels. There is a certain degree of correlation between the geometric data of low-scale point clouds and the geometric data of high-scale point clouds. For example, an occupied voxel in a low-scale point cloud is surrounded by occupied voxels (for example, the voxel is (located in the middle of an object), after the voxel is decomposed into multiple voxels in the high-scale point cloud, the multiple voxels obtained by decomposition have a greater probability of being occupied voxels. These correlations can be represented by features extracted by neural networks.

在本公开一示例性的实施例中，对所述特征数据进行熵编码之前，所述方法还包括：对所述特征数据进行量化。量化可以减少传输特征数据所需的码字，也会带来一定的损失。In an exemplary embodiment of the present disclosure, before entropy encoding the feature data, the method further includes: quantizing the feature data. Quantization can reduce the codewords required to transmit feature data, but will also bring certain losses.

在本公开一示例性的实施例中，所述方法还包括：对所述第N尺度点云中被占据的体素的数量K _N进行熵编码。数量K _N经熵编码写入几何码流后，可以用于解码端的点云裁剪，提高点云裁剪的精准度。 In an exemplary embodiment of the present disclosure, the method further includes entropy encoding the number K _N of occupied voxels in the Nth scale point cloud. After the number K _N is written into the geometric code stream through entropy encoding, it can be used for point cloud clipping at the decoder to improve the accuracy of point cloud clipping.

在本公开一示例性的实施例中，所述方法还包括：In an exemplary embodiment of the present disclosure, the method further includes:

当N≥2时，将第j尺度点云的几何数据输入第j自编码器模型的第j编码器网络进行M _j次体素下采样和特征提取，输出用于增强第j+1尺度点云几何数据点云的特征数据； When N≥2, the geometric data of the j-th scale point cloud is input into the j-th encoder network of the j-th autoencoder model for M _j voxel downsampling and feature extraction, and the output is used to enhance the j+1-th scale point. Cloud geometry data point cloud feature data;

对所述第j编码器网络输出的所述特征数据进行量化和熵编码，M _j≥2，j的取值为{1,2,…,N-1}中的任意一个或更多个。 The feature data output by the jth encoder network is quantized and entropy encoded, M _j ≥ 2, and the value of j is any one or more of {1, 2,...,N-1}.

也即，本实施例在存在3个以上尺度的点云时，不仅对最小尺度点云的几何数据进行增强，还对除第一尺度点云外的其他一个或多个尺度点云的几何数据进行增强，可以参见图4所示的编码过程。例如，在N＝4的情况下，共存在第一尺度到第五尺度共5个尺度的点云。除对最小尺度即第五尺度点云的几何数据进行增强外，j的取值可以为3时，表示还对第四尺度点云的几何数据进行增强，j的取值为2时，表示还对第三尺度点云的几何数据进行增强，j的取值为2和3时，表示还对第三尺度点云和第四尺度点云的几何数据进行增强，以此类推。j值不同时，M _j的值可以相同或不同。 That is, when there are point clouds of more than three scales, this embodiment not only enhances the geometric data of the smallest scale point cloud, but also enhances the geometric data of one or more other scale point clouds except the first scale point cloud. For enhancement, please refer to the encoding process shown in Figure 4. For example, in the case of N=4, there are point clouds in 5 scales from the first scale to the fifth scale. In addition to enhancing the geometric data of the smallest scale, that is, the fifth-scale point cloud, when the value of j can be 3, it means that the geometric data of the fourth-scale point cloud is also enhanced. When the value of j is 2, it means that the geometric data of the fourth-scale point cloud is also enhanced. To enhance the geometric data of the third-scale point cloud, when the value of j is 2 and 3, it means that the geometric data of the third-scale point cloud and the fourth-scale point cloud are also enhanced, and so on. When j values are different, the values of M _j can be the same or different.

本文中所记载的编码器网络进行体素下采样和特征提取，并不表示编码器网络先进行体素下采样再进行特征提取，体素下采样可以在特征提取之前进行，也可以在特征提取之后进行，也可以在多次特征提取之间进行，本公开对此不做任何局限。同样地，本文记载的解码器网络进行体素上采样和特征推理，也不表示解码器网络先进行体素上采样再时行特征推理，体素上采样可以在特征推理之前进行，也可以在特征推理之后进行，也可以在多次特征推理之间进行，本公开对此也不做任何局限。The encoder network recorded in this article performs voxel downsampling and feature extraction. It does not mean that the encoder network performs voxel downsampling first and then performs feature extraction. Voxel downsampling can be performed before feature extraction or during feature extraction. It can be performed later or between multiple feature extractions, and this disclosure does not impose any limitations on this. Similarly, the decoder network recorded in this article performs voxel upsampling and feature inference, which does not mean that the decoder network first performs voxel upsampling and then performs feature inference. Voxel upsampling can be performed before feature inference, or it can be performed before feature inference. It can be performed after feature inference or between multiple feature inferences, and this disclosure does not impose any limitations on this.

在本公开一示例性的实施例中，每一次所述体素下采样和特征提取包括：In an exemplary embodiment of the present disclosure, each time of voxel downsampling and feature extraction includes:

通过基于稀疏卷积的第一残差网络和第一自注意力网络中的至少一种对输入数据进行特征提取；Perform feature extraction on the input data through at least one of a first residual network and a first self-attention network based on sparse convolution;

通过步长为2×2×2的稀疏卷积层对第一残差网络或第一自注意力网络输出的数据进行一次体素下采样；Perform voxel downsampling on the data output by the first residual network or the first self-attention network through a sparse convolution layer with a stride of 2×2×2;

通过基于稀疏卷积的第二残差网络和第二自注意力网络中的至少一种对所述稀疏卷积层输出的数据进行特征提取。Feature extraction is performed on the data output by the sparse convolution layer through at least one of a second residual network based on sparse convolution and a second self-attention network.

本实施例编码器网络每次进行体素下采样和特征提取时，是按特征提取、体素下采样、特征提取的方式进行的。Each time the encoder network in this embodiment performs voxel downsampling and feature extraction, it does so in the manner of feature extraction, voxel downsampling, and feature extraction.

在本实施例的一个示例中，所述第一残差网络和第二残差网络包括一个或多个基于稀疏卷积的残差层，每一残差层如图6所示，包括三个以上的分支，分支一将输入数据直接输出，其他分支通过不同数量的稀疏卷积层对输入数据进行特征推理，所述其他分支的输出拼接后再与分支一的输出相加，得到该残差层的输出。图6中示出了三个分支，分支二包括2个稀疏卷积层，分支三包括3个稀疏卷积层，相邻稀疏卷积层之间设有激活函数。In an example of this embodiment, the first residual network and the second residual network include one or more residual layers based on sparse convolution. Each residual layer is shown in Figure 6 and includes three For the above branches, branch one directly outputs the input data, and other branches perform feature inference on the input data through different numbers of sparse convolution layers. The outputs of the other branches are spliced and then added to the output of branch one to obtain the residual. The output of the layer. Three branches are shown in Figure 6. Branch two includes two sparse convolution layers, and branch three includes three sparse convolution layers. There are activation functions between adjacent sparse convolution layers.

在本实施例的一个示例中，每一次所述体素下采样和特征提取通过基于神经网络的一编码器实现，如图7所示，该编码器依次包括：第一稀疏卷积网络、第一自注意力网络、第一残差网络、步长为2×2×2的稀疏卷积层、第二残差网络、第二自注意力网络、及第二稀疏卷积网络；在所述第一稀疏卷积网络和所述第一自注意力网络之间，以及所述第一残差网络和所述稀疏卷积层之间设有激活函数，所述第一稀疏卷积网络和第二稀疏卷积网络包括一个或多个稀疏卷积层。In an example of this embodiment, each time the voxel downsampling and feature extraction are implemented through an encoder based on a neural network, as shown in Figure 7, the encoder includes: a first sparse convolutional network, a third A self-attention network, a first residual network, a sparse convolution layer with a step size of 2×2×2, a second residual network, a second self-attention network, and a second sparse convolution network; in the There is an activation function between the first sparse convolution network and the first self-attention network, and between the first residual network and the sparse convolution layer. The first sparse convolution network and the Two sparse convolutional networks include one or more sparse convolutional layers.

在本公开一示例性的实施例中，所述第一自注意力网络和/或第二自注意力网络包括一个或多个自注意力层，每一自注意力层执行的处理包括：In an exemplary embodiment of the present disclosure, the first self-attention network and/or the second self-attention network includes one or more self-attention layers, and the processing performed by each self-attention layer includes:

对点云中的每一个点，基于该点的坐标数据查找该点的邻居点，并对该点到所述邻居点的距离信息进行线性变换得到位置特征，将所述位置特征与所述邻居点的特征相加，得到位置编码后的聚合特征；For each point in the point cloud, search for the neighbor points of the point based on the coordinate data of the point, linearly transform the distance information from the point to the neighbor point to obtain the location feature, and compare the location feature with the neighbor point. The features of the points are added to obtain the aggregated features after position encoding;

对输入的特征数据进行第一线性变换得到第一向量，将所述第一向量与对所述聚合特征进行第二线性变换得到的第二向量作矩阵乘法，所得结果经激活后，得到点云中每一个点相对于该点的邻居点的注意力权重；Perform a first linear transformation on the input feature data to obtain a first vector, perform matrix multiplication of the first vector and a second vector obtained by performing a second linear transformation on the aggregated features, and after activation, a point cloud is obtained. The attention weight of each point relative to its neighbor points;

将所述注意力权重和第三向量相乘，得到包含所述邻域上下文特征的数据，所述第三向量通过对所述聚合特征进行第三线性变换得到。Data containing the neighborhood context features are obtained by multiplying the attention weight and a third vector, and the third vector is obtained by performing a third linear transformation on the aggregated features.

在本实施例的一个示例中，如图8所示，所述自注意力层包括依次连接的点云邻域自注意力层、第一归一化层、线性层和第二归一化层，点云邻域自注意层用于实现从输入数据中得到点云空间中的邻域上下文特征，点云邻域自注意层的输出数据和输入数据相加后输入到第一归一化层进行批量归一化，结果再输入到线性层进行线性变换，线性层的输出数据和输入数据相加后输入到第二归一化层进行批量归一化后，得到所述自注意力层的输出。In an example of this embodiment, as shown in Figure 8, the self-attention layer includes a point cloud neighborhood self-attention layer, a first normalization layer, a linear layer and a second normalization layer connected in sequence. , the point cloud neighborhood self-attention layer is used to obtain neighborhood context features in the point cloud space from the input data. The output data of the point cloud neighborhood self-attention layer is added to the input data and then input to the first normalization layer. Batch normalization is performed, and the results are then input to the linear layer for linear transformation. The output data of the linear layer and the input data are added and then input to the second normalization layer for batch normalization, and the self-attention layer is obtained. output.

本实施例的一个示例中，如图9所示。输入点云邻域自注意力层的点云数据(input)包括特征数据

和坐标C _in∈R ^n×3，坐标C _in∈R ^n×3用于邻居点的查找。n为点云的总点数，d _in为输入的特征数据的维度。 An example of this embodiment is shown in Figure 9. The point cloud data (input) of the input point cloud neighborhood self-attention layer includes feature data

and the coordinates C _in ∈R ^n×3 , the coordinates C _in ∈R ^n×3 are used to find neighbor points. n is the total number of points in the point cloud, and _din is the dimension of the input feature data.

如图所示，点云邻域自注意力层执行的处理包括：As shown in the figure, the processing performed by the point cloud neighborhood self-attention layer includes:

K近邻(KNN)搜索：K nearest neighbor (KNN) search:

对点云中的每个点p _i，用K近邻搜索算法(k nearest neighbor,KNN)找到距离该点最近的k个邻居点{p _i1,p _i2,…p _ik}，并聚集得到k个邻居点的坐标C _knn∈R ^n×k×3和特征

在一些算法中，可以将该点也算成是该点的k个邻居点中的一个，在另一些算法中，该点不作为该点的邻居点，对此本公开不做局限。 For each point p _i in the point cloud, use the K nearest neighbor search algorithm (k nearest neighbor, KNN) to find the k nearest neighbor points {p _i1 , p _i2 ,...p _ik } to the point, and aggregate them to obtain k The coordinates of neighbor points C _knn ∈R ^n×k×3 and features

In some algorithms, this point can also be counted as one of the k neighbor points of this point. In other algorithms, this point is not regarded as a neighbor point of this point, and this disclosure is not limited.

位置编码：Location code:

将每个点p _i作为中心点，求p _i的k个邻居点{p _i1,p _i2,…p _ik}与中心点p _i的相对距离{dist _i1,dist _i2,…dist _ik}，得到相对距离信息dist _knn∈R ^n×k×1。再通过线性层W _dist将dist _knn的维度从1映射到d _in维，得到的相对位置特征与特征F _knn相加(即附加在特征F _knn之后)，实现位置编码： Taking each point p _i as the center point, find the relative distance {dist _i1 , dist _i2 ,...dist _ik } between p _i 's k neighbor points {p _i1 , p _i2 ,...p _ik } and the center point p _i , and get Relative distance information dist _knn ∈R ^n×k×1 . Then the dimension of dist _knn is mapped from 1 to _din dimension through the linear layer W _dist , and the obtained relative position feature is added to the feature F _knn (that is, appended to the feature F _knn ) to achieve position encoding:

F′ _knn＝dist _knn·W _dist+F _knn F′ _knn ＝dist _knn ·W _dist +F _knn

其中，

F′ _knn是位置编码后的聚合特征，

通过位置编码，为特征赋予了对应点之间相对位置的感知信息，每个邻居点的特征都具有了空间位置信息。 in,

F′ _knn is the aggregated feature after position encoding,

Through position coding, features are given perceptual information of relative positions between corresponding points, and the features of each neighbor point have spatial position information.

QKV向量生成：QKV vector generation:

将输入的特征数据F _in通过线性层W _Q变换，得到Q向量，将经过位置编码的聚合特征F′ _knn分别通过线性层W _K和线性层W _V变换，得到K向量和V向量，即： The input feature data F _in is transformed through the linear layer W _Q to obtain the Q vector. The position-encoded aggregate feature F′ _knn is transformed through the linear layer W _K and the linear layer W _V respectively to obtain the K vector and V vector, namely:

Q＝F _in·W _q,(K,V)＝F′ _knn·(W _k,W _v) Q＝F _in ·W _q ,(K,V)＝F′ _knn ·(W _k ,W _v )

其中，

和

表示3个不同的线性变换。Q向量代表查询向量(Query)，K向量代表被查询信息与其他信息的相关性的向量(Key)，V向量代表被查询信息的向量(Value)。上述维度参数d _a和d _out可以等于d _in，如均设置为32。d _a和d _out也可以不等于d _in即可以进行维度变换。 in,

and

Represents 3 different linear transformations. The Q vector represents the query vector (Query), the K vector represents the vector (Key) of the correlation between the queried information and other information, and the V vector represents the vector (Value) of the queried information. The above dimension parameters d _a and d _out can be equal to d _in , for example, both are set to 32. D _a and d _out can also be different from d _in , that is, dimensional transformation can be performed.

注意力权重生成及基于注意力的特征聚合：Attention weight generation and attention-based feature aggregation:

得到Q向量、K向量和V向量后，将Q向量与K向量作矩阵乘法，结果经过Softmax激活函数激活，输出每一个点作为中心点时相对于其邻居点的注意力权重A，最后将注意力权重A和V向量相乘，得到输出的点云的特征数据F _out。即： After obtaining the Q vector, K vector and V vector, perform matrix multiplication of the Q vector and the K vector. The result is activated by the Softmax activation function, and the attention weight A relative to its neighbor point when each point is used as the center point is output. Finally, the attention The force weights A and V vectors are multiplied to obtain the feature data F _out of the output point cloud. Right now:

本示例在激活前，还可以对Q向量与K向量作矩阵乘法的结果乘以比例因子

In this example, before activation, the result of matrix multiplication of Q vector and K vector can also be multiplied by the scaling factor.

本公开上述实施例相比单纯基于稀疏卷积的神经网络，通过引入注意力机制网络，能够增强在稀疏点云上的空间建模能力。因为卷积核尺寸固定的卷积网络难以在分布稀疏的点云上提取有效的邻居特征(即邻域上下文的特征)，而本公开上述实施例引入基于注意力机制的网络，直接在点集合上基于k近邻算法得到中心点周围的k个点，然后通过注意力机制得到中心点对于其他点的注意力权重，能更加有效地提取邻域上下文的特征信息，提高在稀疏点云上的压缩性能。Compared with neural networks based solely on sparse convolutions, the above embodiments of the present disclosure can enhance spatial modeling capabilities on sparse point clouds by introducing attention mechanism networks. Because it is difficult for a convolutional network with a fixed convolution kernel size to extract effective neighbor features (i.e., features of neighborhood context) on sparsely distributed point clouds, the above-mentioned embodiments of the present disclosure introduce a network based on an attention mechanism to directly collect points on a point cloud. Based on the k nearest neighbor algorithm, k points around the center point are obtained, and then the attention weight of the center point to other points is obtained through the attention mechanism, which can more effectively extract the feature information of the neighborhood context and improve compression on sparse point clouds. performance.

本公开一实施例提供了一种点云几何数据增强方法，如图10所示，所述方法包括：An embodiment of the present disclosure provides a point cloud geometric data enhancement method, as shown in Figure 10. The method includes:

步骤210，解析码流，得到用于增强第i+1尺度点云几何数据的特征数据；所述特征数据是通过第i编码器网络对第i尺度点云的几何数据进行M _i次体素下采样和特征提取而得到，i≥1，M _i≥2； Step 210, parse the code stream to obtain feature data used to enhance the geometric data of the i+1-th scale point cloud; the feature data is obtained by performing M _i voxels on the geometric data of the i-th scale point cloud through the i-th encoder network Obtained by downsampling and feature extraction, i≥1, M _i≥2 ;

本实施例的第i编码器网络可以设置级联的M _i个编码器，每个编码器对输入数据进行一次体素下采样和特征提取。但在其他实施例中，编码器的个数是可变的，单个编码器也可以实现多次体素下采样和特征提取。 The i-th encoder network in this embodiment can be configured with M _i encoders in cascade, and each encoder performs voxel downsampling and feature extraction on the input data. However, in other embodiments, the number of encoders is variable, and a single encoder can also implement multiple voxel downsampling and feature extraction.

步骤220，通过第i解码器网络的部分解码器对所述特征数据进行M _i-1次体素上采样和特征推理，输出的特征数据与第i+1尺度点云待增强的几何数据拼接，得到第i+1尺度点云增强后的几何数据； Step 220: Perform Mi _-1 voxel upsampling and feature inference on the feature data through the partial decoder of the i-th decoder network, and the output feature data is spliced with the geometric data to be enhanced in the i+1-th scale point cloud. , obtain the enhanced geometric data of the i+1th scale point cloud;

其中，所述第i编码器网络和第i解码器网络同属于第i自编码器模型。Wherein, the i-th encoder network and i-th decoder network both belong to the i-th autoencoder model.

在本实施例的一示例中，所述输出的特征数据包括L _i+1个特征数据，所述第i+1尺度点云的重建几何数据包括L _i+1个点的坐标数据；所述拼接是将所述L _i+1个特征数据和L _i+1个点的坐标数据一一对应拼接，得到L _i+1个点的坐标及特征数据，L _i+1为第i+1尺度点云中点的数量。在编码端对第i尺度点云的几何数据进行体素下采样和特征提取时，得到的特征数据(如特征值)与几何数据(如点的坐标)是按照顺序一一对应的。在解码端再将两者拼接起来，就可以得到点云中每一个点的坐标和特征数据。或者说，可以得到点云中每一个被占据体素的特征值。 In an example of this embodiment, the output feature data includes Li ₊₁ feature data, and the reconstructed geometric data of the i+1th scale point cloud includes coordinate data of Li ₊₁ points; Splicing is to splice the Li ₊₁ feature data and the coordinate data of Li ₊₁ points in one-to-one correspondence to obtain the coordinates and feature data of Li ₊₁ points. Li ₊₁ is the i+1th scale. The number of points in the point cloud. When the encoding end performs voxel downsampling and feature extraction on the geometric data of the i-th scale point cloud, the obtained feature data (such as eigenvalues) and geometric data (such as point coordinates) are in one-to-one correspondence in order. At the decoding end, by splicing the two together, the coordinates and feature data of each point in the point cloud can be obtained. In other words, the characteristic value of each occupied voxel in the point cloud can be obtained.

在本实施例的一示例中，所述第i编码器网络通过步长为2×2×2的稀疏卷积实现所述体素下采样；所述第i解码器网络通过步长为2×2×2的转置稀疏卷积实现所述体素上采样。在编码器网络通过稀疏卷积实现体素下采样，在解码器网络中通过转置稀疏卷积实现体素上采样，稀疏卷积和转置稀疏卷积的参数都是可学习的，有利于提升压缩编码的性能。In an example of this embodiment, the i-th encoder network implements the voxel downsampling through sparse convolution with a step size of 2×2×2; the i-th decoder network implements the voxel downsampling through a step size of 2×2 A 2×2 transposed sparse convolution implements the voxel upsampling. Voxel downsampling is implemented through sparse convolution in the encoder network, and voxel upsampling is implemented through transposed sparse convolution in the decoder network. The parameters of sparse convolution and transposed sparse convolution are both learnable, which is beneficial to Improve compression encoding performance.

本公开一实施例还提供了一种点云几何数据的解码方法，如图11所示，包括：An embodiment of the present disclosure also provides a method for decoding point cloud geometric data, as shown in Figure 11, including:

步骤310，解析码流，得到的第N+1尺度点云的几何数据作为待增强的几何数据，按照本公开任一实施例所述的点云几何数据增强方法进增强，得到第N+1尺度点云增强后的几何数据，N≥1；Step 310: Parse the code stream and obtain the geometric data of the N+1th scale point cloud as the geometric data to be enhanced. Enhance it according to the point cloud geometric data enhancement method described in any embodiment of the present disclosure to obtain the N+1th scale point cloud. Scaled point cloud enhanced geometric data, N≥1;

步骤320，通过第N解码器网络其余的解码器对第N+1尺度点云增强后的几何数据进行一次体素上采样和特征推理，输出的数据再经过概率预测和点云裁剪，得到第N尺度点云的重建几何数据。Step 320: Use the remaining decoders of the N-th decoder network to perform voxel upsampling and feature inference on the enhanced geometric data of the N+1-th scale point cloud. The output data is then subjected to probability prediction and point cloud clipping to obtain the N-th scale point cloud. Reconstructed geometric data from N-scale point clouds.

上述第N解码器网络其余的解码器、以及用于执行概率预测和点云裁剪的网络构成了第N概率预测模型，第N概率预测模型的输出即第N尺度点云的重建几何数据。概率预测可以通过概率预测器实现，点云裁剪可以通过用于点云的裁剪器实现。The remaining decoders of the above-mentioned Nth decoder network and the network used to perform probability prediction and point cloud clipping constitute the Nth probability prediction model. The output of the Nth probability prediction model is the reconstructed geometric data of the Nth scale point cloud. Probabilistic prediction can be achieved through probabilistic predictors, and point cloud clipping can be achieved through clippers for point clouds.

本公开实施例点云几何数据的解码方法在解码端，利用解码得到的特征数据对第N+1尺度点云的几何数据进行增强，再基于第N+1尺度点云增强后的几何数据进行体素上采样和特征推理，以及概率预测和点云裁剪，得到第N尺度点云的重建几何数据。该特征数据是对第N尺度点云的几何数据进行M _N次体素下采样和特征提取而得到，包含了第N尺度点云的隐式特征信息，可以帮助解码端更为准确的第N尺度点云的重建几何数据。提高重建点云的质量，而该特征数据经过多次体素下采样，需要传输的数据量少，可以提升点云压缩的效率。 The decoding method of point cloud geometric data in the embodiment of the present disclosure uses the decoded feature data to enhance the geometric data of the N+1th scale point cloud at the decoding end, and then performs the processing based on the enhanced geometric data of the N+1th scale point cloud. Voxel upsampling and feature inference, as well as probability prediction and point cloud clipping, are used to obtain the reconstructed geometric data of the Nth scale point cloud. This feature data is obtained by performing M _N times of voxel downsampling and feature extraction on the geometric data of the Nth scale point cloud. It contains the implicit feature information of the Nth scale point cloud, which can help the decoder to obtain a more accurate Nth Reconstructed geometric data from scaled point clouds. Improve the quality of the reconstructed point cloud, and the feature data has been down-sampled multiple times, requiring less data to be transmitted, which can improve the efficiency of point cloud compression.

当N≥2时，将第N尺度点云的重建几何数据输入级联的N-1个概率预测模型，在每一所述概率预测模型中进行一次体素上采样和特征推理，及概率预测和点云裁剪，输出相应尺度点云的重建几何数据；When N≥2, the reconstructed geometric data of the Nth scale point cloud is input into the cascaded N-1 probabilistic prediction models, and voxel upsampling, feature inference, and probability prediction are performed in each of the probabilistic prediction models. And point cloud clipping, output the reconstructed geometric data of the corresponding scale point cloud;

从最后一个概率预测模型的输出得到第一尺度点云的重建几何数据。The reconstructed geometric data of the first-scale point cloud is obtained from the output of the last probabilistic prediction model.

本实施例在对第一尺度点云进行2次以上的体素下采样的情况下，只对最小尺度的几何数据进行特征增强，不对其他尺度的重建几何数据进行增强。通过将第N尺度的重建几何数据输入一个概率预测模型得到第N-1尺度点云的重建几何数据，再将第N-1尺度的重建几何数据输入一个概率预测模型得到第N-2尺度点云的重建几何数据，直到得到第一尺度点云的重建几何数据。该过程可以参见图3及其相关说明。In this embodiment, when voxel downsampling is performed on the first-scale point cloud more than twice, feature enhancement is only performed on the geometric data at the smallest scale, and the reconstructed geometric data at other scales is not enhanced. By inputting the reconstructed geometric data of the N-th scale into a probabilistic prediction model, the reconstructed geometric data of the N-1 scale point cloud is obtained, and then inputting the reconstructed geometric data of the N-1 scale into a probabilistic prediction model to obtain the N-2-th scale point. The reconstructed geometric data of the cloud is obtained until the reconstructed geometric data of the first-scale point cloud is obtained. This process can be seen in Figure 3 and its related description.

当N≥2时，将第j尺度点云的重建几何数据或者第j尺度点云增强后的几何数据输入第j-1概率预测模型，在所述第j-1概率预测模型中进行一次体素上采样和特征推理，以及进行概率预测和点云裁剪后，输出第j-1尺度点云的重建几何数据，j＝2,3,…,N；When N ≥ 2, the reconstructed geometric data of the j-th scale point cloud or the enhanced geometric data of the j-th scale point cloud is input into the j-1th probability prediction model, and a single volume is performed in the j-1th probability prediction model. After pixel upsampling and feature inference, as well as probability prediction and point cloud clipping, the reconstructed geometric data of the j-1th scale point cloud is output, j=2,3,...,N;

其中，第j尺度点云增强后的几何数据是将第j尺度点云的重建几何数据作为第j尺度点云待增强的几何数据，按照本公开任一实施例所述的点云几何数据增强方法进行增强后得到的。Wherein, the enhanced geometric data of the j-th scale point cloud is to use the reconstructed geometric data of the j-th scale point cloud as the geometric data to be enhanced for the j-th scale point cloud, according to the point cloud geometric data enhancement described in any embodiment of the present disclosure. obtained after enhancing the method.

本实施例除对最小尺度点云的几何数据进行增强外，还可以对除第一尺度点云之外的一个或多个尺度点云的重建几何数据进行增强。图4所示的编解码过程是本实施例的一个示例，可参见图4及相关说明。在输入的数据是经特征增强的几何数据时，第j-1概率预测模型中应使用第j-1解码器网络中余下的解码器进行一次体素上采样和特征推理，而与该第j-1解码器网络同属于一个自编码器模型的第j-1编码器网络，则用于对第j-1尺度点云进行多次体素下采样和特征提取，得到用于增强第j尺度点云几何数据的特征数据。而在输入的数据是未经增强的重建几何数据时，第j-1概率预测模型中进行一次体素上采样和特征推理的解码器可以单独设计。In addition to enhancing the geometric data of the smallest scale point cloud, this embodiment can also enhance the reconstructed geometric data of one or more scale point clouds other than the first scale point cloud. The encoding and decoding process shown in Figure 4 is an example of this embodiment. Please refer to Figure 4 and related descriptions. When the input data is feature-enhanced geometric data, the j-1th probabilistic prediction model should use the remaining decoders in the j-1th decoder network to perform voxel upsampling and feature inference, and the j-th The -1 decoder network belongs to the j-1th encoder network of the same autoencoder model, and is used to perform multiple voxel downsampling and feature extraction on the j-1th scale point cloud to obtain the j-th scale point cloud. Feature data of point cloud geometric data. When the input data is reconstructed geometric data without enhancement, the decoder that performs voxel upsampling and feature inference in the j-1th probabilistic prediction model can be designed separately.

在本公开一示例性的实施例中，所述概率预测通过多个稀疏卷积层和sigmod函数实现。在本实施例的一个示例中，可以采用如图12所示的概率预测器来实现概率预测。该概率预测器包括3个稀疏卷积层、设在相邻稀疏卷积层之间的2个激活函数(如ReLU函数)，以及设置在最后一层的Sigmod函数，Sigmod函数输出推理得到的点云中体素的占据概率。可以将占据概率的数值范围限制到0到1之间。稀疏卷积层可以使用SConv K1 ³,S1 ³,C32，其三个维度上的卷积核大小为1，步长为1，其通道数为32。 In an exemplary embodiment of the present disclosure, the probability prediction is implemented through multiple sparse convolution layers and a sigmod function. In an example of this embodiment, a probability predictor as shown in Figure 12 can be used to implement probability prediction. The probability predictor includes 3 sparse convolution layers, 2 activation functions (such as ReLU functions) set between adjacent sparse convolution layers, and a Sigmod function set in the last layer. The Sigmod function outputs the points obtained by inference Occupancy probability of voxels in the cloud. The numerical range of occupancy probability can be limited to between 0 and 1. The sparse convolution layer can use SConv K1 ³ , S1 ³ , C32. The convolution kernel size in three dimensions is 1, the stride is 1, and the number of channels is 32.

在得到某一尺度点云中点的占据概率后，可以采用简单的二分类法来确定点云中被占据的体素，参见图3，图13A示出了图3中第二尺度点云中体素被占据的情况，而图13B示出了图3中第三尺度点云中体素被占据的情况，图13C是经概率预测得到的第二尺度点云中体素的占据概率(表示被占据的概率)。使用二分类时，可以将占据概率不小于设定阈值(如0.5)的体素作为被占据的体素，将占据概率小于设定阈值(如0.5)的体素作为末被占据的体素，从而得到点云的重建几何数据。但是，使用二分类法进行点云裁剪有时不够准确。After obtaining the occupation probability of a point in a certain scale point cloud, a simple binary classification method can be used to determine the occupied voxels in the point cloud. See Figure 3. Figure 13A shows the second scale point cloud in Figure 3. The occupied situation of voxels, and Figure 13B shows the occupied situation of voxels in the third-scale point cloud in Figure 3, and Figure 13C is the occupation probability of voxels in the second-scale point cloud obtained by probability prediction (representing probability of being occupied). When using binary classification, voxels with an occupation probability not less than a set threshold (such as 0.5) can be regarded as occupied voxels, and voxels with an occupation probability less than a set threshold (such as 0.5) can be regarded as unoccupied voxels. Thus the reconstructed geometric data of the point cloud is obtained. However, using the binary classification method for point cloud cropping is sometimes not accurate enough.

为了提高点云裁剪的准确度。本公开一示例性的实施例提供了一种基于点云中点的数量辅助裁剪的方法。在编码端将要裁剪的一种或多种尺度的点云中点的数量熵编码，解码端根据该数量来辅助确定被占据的体素。In order to improve the accuracy of point cloud cropping. An exemplary embodiment of the present disclosure provides a method of assisting cropping based on the number of points in a point cloud. At the encoding end, the number of points in the point cloud of one or more scales to be cropped is entropy-encoded, and the decoding end uses this number to assist in determining the occupied voxels.

本实施例的解码方法还包括：解析码流，得到第N尺度点云中被占据的体素的数量K _N，K _N也是第N尺度点云中点的数量；以及，通过以下方式实现点云裁剪：将概率预测后得到的第N尺度点云中由同一体素分解得到的M个体素为一组，将每一组体素中占据概率最高的m个体素的占据概率置为1，然后对第N尺度点云中所有体素的占据概率排序，将占据概率最高的K _N个体素确定为第N尺度点云的被占据体素，1≤m<M。 The decoding method in this embodiment also includes: parsing the code stream to obtain the number K _N of occupied voxels in the N-th scale point cloud, where K _N is also the number of points in the N-th scale point cloud; and, realizing the points in the following manner Cloud clipping: Group M voxels decomposed from the same voxel in the Nth scale point cloud obtained after probability prediction into a group, and set the occupation probability of the m voxels with the highest occupancy probability in each group of voxels to 1. Then the occupancy probabilities of all voxels in the Nth scale point cloud are sorted, and the K _N voxels with the highest occupancy probability are determined as occupied voxels of the Nth scale point cloud, 1≤m<M.

在一示例中，可以将同一体素分解得到的8个体素为一组，将每一组体素中占据概率最高的1个或2个或3个体素的占据概率置为1。对低尺度点云的体素进行分解时，未被占据的体素是不需要分解的，因此分解出来的8个体素中至少有1个为1。在其他示例中，M也可以等于64等其他值，在M 较大时，m的值也可以相应增大。In an example, 8 voxels obtained by decomposing the same voxel can be grouped into a group, and the occupancy probability of the 1, 2, or 3 voxels with the highest occupancy probability in each group of voxels is set to 1. When decomposing voxels of low-scale point clouds, unoccupied voxels do not need to be decomposed, so at least one of the eight decomposed voxels is 1. In other examples, M can also be equal to other values such as 64. When M is larger, the value of m can also increase accordingly.

本实施例在统一排序前先将每一组体素中占据概率最高的至少一个体素的占据概率置为1，再以点云中点的数量为约束条件选出占据概率最高的K _N个体素为被占据的体素，可明显提高点云裁剪的准确度。 In this embodiment, before unified sorting, the occupancy probability of at least one voxel with the highest occupancy probability in each group of voxels is set to 1, and then the K _N individuals with the highest occupancy probability are selected using the number of points in the point cloud as a constraint. Voxels are occupied voxels, which can significantly improve the accuracy of point cloud clipping.

在本公开一示例性的实施例中，每一次所述体素上采样和特征推理包括：In an exemplary embodiment of the present disclosure, each of the voxel upsampling and feature inference includes:

通过基于稀疏卷积的第一残差网络和第一自注意力网络中的至少一种对输入数据进行特征推理；perform feature inference on the input data through at least one of a first residual network and a first self-attention network based on sparse convolution;

通过步长为2×2×2的转置稀疏卷积层对第一残差网络或第一自注意力网络输出的数据进行一次体素上采样；Perform voxel upsampling on the data output by the first residual network or the first self-attention network through a transposed sparse convolution layer with a stride of 2×2×2;

通过基于稀疏卷积的第二残差网络和第二自注意力网络中的至少一种对所述转置稀疏卷积层输出的数据进行特征推理。Feature inference is performed on the data output by the transposed sparse convolution layer through at least one of a second residual network based on sparse convolution and a second self-attention network.

本实施例解码器网络每次进行体素上采样和特征提取时，是按特征提取、体素上采样、特征提取的方式进行的。Each time the decoder network in this embodiment performs voxel upsampling and feature extraction, it does so in the manner of feature extraction, voxel upsampling, and feature extraction.

在本实施例的一个示例中，所述第一残差网络和第二残差网络包括一个或多个基于稀疏卷积的残差层，每一残差层可参见图6，包括三个以上的分支，分支一将输入数据直接输出，其他分支通过不同数量的稀疏卷积层对输入数据进行特征推理，所述其他分支的输出拼接后再与分支一的输出相加，得到该残差层的输出。图6中示出了三个分支，分支二包括2个稀疏卷积层，分支三包括3个稀疏卷积层，相邻稀疏卷积层之间设有激活函数。In an example of this embodiment, the first residual network and the second residual network include one or more residual layers based on sparse convolution. Each residual layer can be seen in Figure 6 and includes more than three branch, branch one directly outputs the input data, and other branches perform feature inference on the input data through different numbers of sparse convolution layers. The outputs of the other branches are spliced and then added to the output of branch one to obtain the residual layer. Output. Three branches are shown in Figure 6. Branch two includes two sparse convolution layers, and branch three includes three sparse convolution layers. There are activation functions between adjacent sparse convolution layers.

在本实施例的一个示例中，每一次所述体素上采样和特征提取通过基于神经网络的一解码器实现，如图14所示，该编码器依次包括：第一稀疏卷积网络、第一自注意力网络、第一残差网络、步长为2×2×2的转置稀疏卷积层、第二残差网络、第二自注意力网络、及第二稀疏卷积网络；所述第一稀疏卷积网络和所述第一自注意力网络之间，以及所述第一残差网络和所述转置稀疏卷积层之间设有激活函数，所述第一稀疏卷积网络和第二稀疏卷积网络包括一个或多个稀疏卷积层。In an example of this embodiment, each time the voxel upsampling and feature extraction are implemented through a decoder based on a neural network, as shown in Figure 14, the encoder includes: a first sparse convolutional network, a third A self-attention network, a first residual network, a transposed sparse convolution layer with a stride of 2×2×2, a second residual network, a second self-attention network, and a second sparse convolution network; so An activation function is provided between the first sparse convolution network and the first self-attention network, and between the first residual network and the transposed sparse convolution layer. The first sparse convolution The network and the second sparse convolutional network include one or more sparse convolutional layers.

虽然本公开实施例以及上述实施例给出了一种解码器和编码器的结构，但可以实现特征提取和特征推理的神经网络是各种各样的，在本公开中均可能使用。因此本公开并不局限于本文公开的某种特定的网络结构，能够基于稀疏卷积实现特征提取或特征推理的神经网络均可以使用。特别地，实现特征提取和特征推理的网络结构可以是相同的，在编码网络中称为特征提取，而在解码网络中的称为特征推理。Although the embodiments of the present disclosure and the above-mentioned embodiments provide a structure of a decoder and an encoder, there are various neural networks that can realize feature extraction and feature reasoning, and all of them may be used in the present disclosure. Therefore, the present disclosure is not limited to a specific network structure disclosed herein, and any neural network that can implement feature extraction or feature inference based on sparse convolution can be used. In particular, the network structure that implements feature extraction and feature inference can be the same, which is called feature extraction in the encoding network and feature inference in the decoding network.

在本实施例的一个示例中，所述第一自注意力网络和/或第二自注意力网络包括一个或多个自注意力层，每一自注意力层执行的处理包括：通过以下方式从输入数据中得到点云空间中的邻域上下文特征：In an example of this embodiment, the first self-attention network and/or the second self-attention network includes one or more self-attention layers, and the processing performed by each self-attention layer includes: in the following manner Obtain neighborhood context features in point cloud space from input data:

在一个示例中，请参见图8，自注意力层包括依次连接的点云邻域自注意力层、第一归一化层、线性层和第二归一化层，所述点云邻域自注意层用于从输入数据中得到点云空间中的邻域上下文特征，所述点云邻域自注意层的输出数据和输入数据相加后输入到所述第一归一化层进行批量归一化，结果再输入到所述线性层进行线性变换，所述线性层的输出数据和输入数据相加后输入到所述第二归一化层进行批量归一化后，得到所述自注意力层的输出。其中，点云邻域自注意层从输入数据中得到点云空间中的邻域上下文特征的过程可参见图9及相关说明，这里不再赘述。In one example, please refer to Figure 8. The self-attention layer includes a point cloud neighborhood self-attention layer, a first normalization layer, a linear layer and a second normalization layer connected in sequence. The point cloud neighborhood The self-attention layer is used to obtain the neighborhood context features in the point cloud space from the input data. The output data of the point cloud neighborhood self-attention layer and the input data are added and then input to the first normalization layer for batch processing. Normalization, the results are then input to the linear layer for linear transformation, the output data of the linear layer and the input data are added and then input to the second normalization layer for batch normalization, and the self- The output of the attention layer. Among them, the process of obtaining the neighborhood context features in the point cloud space from the input data by the point cloud neighborhood self-attention layer can be seen in Figure 9 and related explanations, and will not be described again here.

本公开一些实施例提供的点云编解码方法可以实现点云几何有损压缩。通过将注意力机制和卷积神经网络相结合，构建自编码器模型和概率预测模型，注意力机制相比现有基于卷积的结构提升了模型提取特征的能力，提高了模型的压缩性能。The point cloud encoding and decoding methods provided by some embodiments of the present disclosure can realize point cloud geometric lossy compression. By combining the attention mechanism with the convolutional neural network to build an autoencoder model and a probabilistic prediction model, the attention mechanism improves the model's ability to extract features and improves the model's compression performance compared to the existing convolution-based structure.

本公开一些实施例针对点云的局部密度，提出了一种基于概率的点云裁剪方法，可以提高模型恢复点云局部密度的能力。Some embodiments of the present disclosure propose a probability-based point cloud clipping method for the local density of point clouds, which can improve the model's ability to restore the local density of point clouds.

本公开上述实施例的编解码方法可用于多个尺度点云之间，且每一个尺度的压缩互相独立，可以实现尺度可伸缩的编码，灵活性强。The encoding and decoding methods of the above embodiments of the present disclosure can be used between point clouds of multiple scales, and the compression of each scale is independent of each other. Scale-scalable encoding can be achieved with high flexibility.

本公开实施例实现点云几何有损压缩的点云编解码方法和G-PCC点云压缩方案进行了对比，对比指标为BD-rate。结果如下：The embodiment of the present disclosure compares the point cloud encoding and decoding method that implements point cloud geometric lossy compression with the G-PCC point cloud compression scheme, and the comparison index is BD-rate. The result is as follows:

相对于GPCC的BD-rate增益BD-rate gain relative to GPCC 本公开实施例Embodiments of the present disclosure Arco_Valentino_Dense_vox12Arco_Valentino_Dense_vox12 -25％-25% Egyptian_mask_vox12Egyptian_mask_vox12 -16％-16% Facade_00009_vox12Facade_00009_vox12 -61％-61% House_without_roof_00057_vox12House_without_roof_00057_vox12 -56％-56% Shiva_00035_vox12Shiva_00035_vox12 -44％-44% Staue_Klimt_vox12Staue_Klimt_vox12 -43％-43% AverageAverage -41％-41%

表中的“Arco_Valentino_Dense_vox12”是GPCC公开测试条件中提供的12bit点云数据。从上表可以看出，本公开实施例方法对比G-PCC点云压缩方案，在各个码率点都体现出一定的优势，对比MPEG G-PCC平均BD-rate提高41％。相比于MPEG G-PCC方法实现了更好的压缩性能。"Arco_Valentino_Dense_vox12" in the table is the 12bit point cloud data provided in the GPCC public test conditions. As can be seen from the above table, compared with the G-PCC point cloud compression scheme, the method of the embodiment of the present disclosure shows certain advantages at each code rate point. Compared with MPEG G-PCC, the average BD-rate is increased by 41%. Compared with MPEG G-PCC method achieves better compression performance.

本公开一实施例还提供了一种点云几何数据增强装置，如图15所示，包括处理器5以及存储有计算机程序的存储器6，其中，所述处理器5执行所述计算机程序时能够实现如本公开任一实施例所述的点云几何数据增强方法。An embodiment of the present disclosure also provides a point cloud geometric data enhancement device. As shown in Figure 15, it includes a processor 5 and a memory 6 storing a computer program. When the processor 5 executes the computer program, it can Implement the point cloud geometric data enhancement method described in any embodiment of the present disclosure.

本公开一实施例还提供了一种点云解码器，参见图15，包括处理器以及存储有计算机程序的存储器，其中，所述处理器执行所述计算机程序时能够实现如本公开任一实施例所述的点云几何数据的解码方法。An embodiment of the present disclosure also provides a point cloud decoder, see Figure 15, which includes a processor and a memory storing a computer program. When the processor executes the computer program, it can implement any implementation of the present disclosure. The decoding method of point cloud geometric data described in the example.

本公开一实施例还提供了一种点云编码器，参见图15，包括处理器以及存储有计算机程序的存储器，其中，所述处理器执行所述计算机程序时能够实现如本公开任一实施例所述的点云几何数据的编码方法。An embodiment of the present disclosure also provides a point cloud encoder, see Figure 15, which includes a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can implement any implementation of the present disclosure. The encoding method of point cloud geometric data described in the example.

本公开一实施例还提供了一种点云编解码系统，其中，包括如本公开任一实施例所述的点云编码器，及如本公开任一实施例所述的点云解码器。An embodiment of the present disclosure also provides a point cloud encoding and decoding system, which includes a point cloud encoder as described in any embodiment of the present disclosure, and a point cloud decoder as described in any embodiment of the present disclosure.

本公开上述实施例的处理器可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)、微处理器等等，也可以是其他常规的处理器等；所述处理器还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)、离散逻辑或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件；也可以是上述器件的组合。即上述实施例的处理器可以是实现本发明实施例中公开的各方法、步骤及逻辑框图的任何处理器件或器件组合。如果部分地以软件来实施本公开实施例，那么可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中，且可使用一个或多个处理器在硬件中执行所述指令从而实施本公开实施例的方法。The processor in the above embodiments of the present disclosure may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), a microprocessor, etc., or it may be other conventional processors. Processor, etc.; the processor may also be a digital signal processor (DSP), application specific integrated circuit (ASIC), off-the-shelf programmable gate array (FPGA), discrete logic or other programmable logic devices, discrete gates or transistor logic devices , discrete hardware components; it can also be a combination of the above devices. That is, the processor in the above embodiments can be any processing device or device combination that implements the methods, steps and logical block diagrams disclosed in the embodiments of the present invention. If embodiments of the present disclosure are implemented in part in software, instructions for the software may be stored in a suitable non-volatile computer-readable storage medium and may be executed in hardware using one or more processors. Instructions are provided to perform the methods of embodiments of the present disclosure.

本公开上述实施例的装置和系统可基于终端或服务器等计算设备来实现。其中终端可以包括诸如手机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant，PDA)、便捷式媒体播放器(Portable Media Player，PMP)、导航装置、可穿戴设备、智能手环、计步器等移动终端，以及诸如数字TV、台式计算机等固定终端。The devices and systems of the above embodiments of the present disclosure can be implemented based on computing devices such as terminals or servers. Terminals may include mobile phones, tablets, laptops, PDAs, Personal Digital Assistants (PDAs), Portable Media Players (PMPs), navigation devices, wearable devices, and smart bracelets. , mobile terminals such as pedometers, and fixed terminals such as digital TVs and desktop computers.

本公开一实施例还提供了一种非瞬态计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，其中，所述计算机程序时被处理器执行时能够实现如本公开任一实施例所述的点云几何数据增强方法，或能够实现如本公开任一实施例所述的点云几何信息的解码方法，或能够实现如本公开任一实施例所述的点云几何信息的编码方法。An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium. The computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, it can implement any of the aspects of the present disclosure. The point cloud geometric data enhancement method described in the embodiment may be able to implement the decoding method of point cloud geometric information as described in any embodiment of the present disclosure, or may be able to implement the point cloud geometric information as described in any embodiment of the present disclosure. encoding method.

本实施例的一示例中，m＝1或2或3，M＝8。但本公开不局限于此，例如M也可以为64，M越大，m也可以设置的较大。In an example of this embodiment, m=1 or 2 or 3, M=8. However, the present disclosure is not limited to this. For example, M may also be 64. The larger M is, the larger m may also be set.

本实施例不仅通过解码得到待裁剪点云中的被占据体素的准确数量K,而且在对概率排序时，将同一体素分解得到的M个体素分为一组，将每一组中占据概率最高的m个体素的占据概率置为1。因为未被占据的体素不进行概率预测，而分解得到的体素至少有一个是被占据的，因此本实施的方法利用了点云分解的规律，可以明显提高点云裁剪(即确定点云中的被占据体素)的准确性。This embodiment not only obtains the accurate number K of occupied voxels in the point cloud to be cropped through decoding, but also divides the M voxels obtained by decomposing the same voxel into one group when sorting the probability, and divides the occupied voxels in each group into The occupation probability of the m voxels with the highest probability is set to 1. Because unoccupied voxels are not subject to probability prediction, and at least one of the decomposed voxels is occupied, the method implemented in this implementation takes advantage of the law of point cloud decomposition and can significantly improve point cloud clipping (i.e., determine point cloud the accuracy of occupied voxels in .

在一个或多个示例性实施例中，所描述的功能可以硬件、软件、固件或其任一组合来实施。如果以软件实施，那么功能可作为一个或多个指令或代码存储在计算机可读介质上或经由计算机可读介质传输，且由基于硬件的处理单元执行。计算机可读介质可包含对应于例如数据存储介质等有形介质的计算机可读存储介质，或包含促进计算机程序例如根据通信协议从一处传送到另一处的任何介质的通信介质。以此方式，计算机可读介质通常可对应于非暂时性的有形计算机可读存储介质或例如信号或载波等通信介质。数据存储介质可为可由一个或多个计算机或者一个或多个处理器存取以检索用于实施本公开中描述的技术的指令、代码和/或数据结构的任何可用介质。计算机程序产品可包含计算机可读介质。In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media that corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, such as according to a communications protocol. In this manner, computer-readable media generally may correspond to non-transitory, tangible computer-readable storage media or communication media such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code, and/or data structures for implementing the techniques described in this disclosure. A computer program product may include computer-readable media.

举例来说且并非限制，此类计算机可读存储介质可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来以指令或数据结构的形式存储所要程序代码且可由计算机存取的任何其它介质。而且，还可以将任何连接称作计算机可读介质举例来说，如果使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)或例如红外线、无线电及微波等无线技术从网站、服务器或其它远程源传输指令，则同轴电缆、光纤电缆、双纹线、DSL或例如红外线、无线电及微波等无线技术包含于介质的定义中。然而应了解，计算机可读存储介质和数据存储介质不包含连接、载波、信号或其它瞬时(瞬态)介质，而是针对非瞬时有形存储介质。如本文中所使用，磁盘及光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)、软磁盘或蓝光光盘等，其中磁盘通常以磁性方式再生数据，而光盘使用激光以光学方式再生数据。上文的组合也应包含在计算机可读介质的范围内。By way of example, and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory or may be used to store instructions or data. Any other medium that stores the desired program code in the form of a structure and that can be accessed by a computer. Furthermore, any connection is also termed a computer-readable medium if, for example, a connection is sent from a website, server, or using any of the following: coaxial cable, fiber-optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave or other remote source transmits instructions, then coaxial cable, fiber optic cable, twin-wire, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transient (transitory) media, but are directed to non-transitory tangible storage media. As used herein, disks and optical discs include compact discs (CDs), laser discs, optical discs, digital versatile discs (DVDs), floppy disks, or Blu-ray discs. Disks usually reproduce data magnetically, while optical discs use lasers to reproduce data. Regenerate data optically. Combinations of the above should also be included within the scope of computer-readable media.

可由例如一个或多个数字信号理器(DSP)、通用微处理器、专用集成电路(ASIC)现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一个或多个处理器来执行指令。因此，如本文中所使用的术语“处理器”可指上述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外，在一些方面中，本文描述的功能性可提供于经配置以用于编码和解码的专用硬件和/或软件模块内，或并入在组合式编解码器中。并且，可将所述技术完全实施于一个或多个电路或逻辑元件中。May be performed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits. Execute instructions. Accordingly, the term "processor" as used herein may refer to any of the structures described above or any other structure suitable for implementing the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Furthermore, the techniques may be implemented entirely in one or more circuits or logic elements.

本公开实施例的技术方案可在广泛多种装置或设备中实施，包含无线手机、集成电路(IC)或一组IC(例如，芯片组)。本公开实施例中描各种组件、模块或单元以强调经配置以执行所描述的技术的装置的功能方面，但不一定需要通过不同硬件单元来实现。而是，如上所述，各种单元可在编解码器硬件单元中组合或由互操作硬件单元(包含如上所述的一个或多个处理器)的集合结合合适软件和/或固件来提供。Technical solutions of embodiments of the present disclosure may be implemented in a wide variety of devices or equipment, including wireless handsets, integrated circuits (ICs), or a set of ICs (eg, chipsets). Various components, modules or units are depicted in embodiments of the present disclosure to emphasize functional aspects of devices configured to perform the described techniques, but do not necessarily require implementation by different hardware units. Rather, as described above, the various units may be combined in a codec hardware unit or provided by a collection of interoperating hardware units (including one or more processors as described above) in conjunction with suitable software and/or firmware.

Claims

A point cloud geometric data enhancement method applied to point cloud decoders, including:

Parse the code stream to obtain feature data used to enhance the i+1th scale point cloud geometric data;

The feature data is subjected to M _i -1 times of voxel upsampling and feature inference through the partial decoder of the i-th decoder network, and the output feature data is spliced with the geometric data to be enhanced in the i+1-th scale point cloud to obtain the i-th i+1 scale point cloud enhanced geometric data;

Among them, i is an integer greater than or equal to 1, and M _i is an integer greater than or equal to 2.

The method of claim 1, wherein:

The feature data is obtained by performing M _i times of voxel downsampling and feature extraction on the geometric data of the i-th scale point cloud through the i-th encoder network;

The i-th encoder network and the i-th decoder network both belong to the i-th autoencoder model.

The method of claim 1, wherein:

The output feature data includes Li ₊₁ feature data, and the reconstructed geometric data of the i+1-th scale point cloud includes coordinate data of Li ₊₁ points;

The splicing is to splice the Li ₊₁ feature data and the coordinate data of Li ₊₁ points in one-to-one correspondence to obtain the coordinates and feature data of Li ₊₁ points, where Li ₊₁ is the i+th The number of points in the 1-scale point cloud.

A decoding method for point cloud geometric data, applied to point cloud decoders, including:

The code stream is parsed, and the geometric data of the N+1th scale point cloud obtained is used as the geometric data to be enhanced. The point cloud geometric data is enhanced according to the method as described in any one of claims 1 to 3 to obtain the N+1th scale. Point cloud enhanced geometric data, N is an integer greater than or equal to 1;

The remaining decoders in the Nth decoder network perform voxel upsampling and feature inference on the enhanced geometric data of the N+1 scale point cloud. The output data is then subjected to probability prediction and point cloud clipping to obtain the Nth Reconstructed geometric data from scaled point clouds.

The method of claim 4, further comprising:

When N≥2, input the reconstructed geometric data of the N-th scale point cloud into a cascade of N-1 probabilistic prediction models, perform voxel upsampling and feature inference in each of the probabilistic prediction models, and Probabilistic prediction and point cloud clipping, output reconstructed geometric data of corresponding scale point cloud;

The reconstructed geometric data of the first-scale point cloud is obtained from the output of the last probabilistic prediction model.

The method of claim 4, further comprising:

When N≥2, the reconstructed geometric data of the j-th scale point cloud or the enhanced geometric data of the j-th scale point cloud is input into the j-1th probability prediction model, and a voxel calculation is performed in the j-th probability prediction model. After sampling and feature inference, as well as probability prediction and point cloud clipping, the reconstructed geometric data of the j-1th scale point cloud is output, j=2,3,...,N;

Wherein, the enhanced geometric data of the j-th scale point cloud is to use the reconstructed geometric data of the j-th scale point cloud as the geometric data to be enhanced for the j-th scale point cloud, and is performed according to the method as described in any one of claims 1 to 3. Point cloud geometric data obtained after enhancement.

The method of claim 4, wherein:

The method further includes: parsing the code stream to obtain the number K _N of occupied voxels in the Nth scale point cloud;

The point cloud clipping is implemented in the following way: M voxels decomposed from the same voxel in the Nth scale point cloud obtained after probability prediction are divided into a group, and the m voxels with the highest probability in each group of voxels are grouped. Set the occupancy probability to 1, then sort the occupancy probabilities of all voxels in the Nth scale point cloud, and determine the K _N voxels with the highest occupancy probability as the occupied voxels of the Nth scale point cloud, 1≤m<M .

The method of any one of claims 4 to 6, wherein:

Each time the voxel upsampling and feature inference includes:

perform feature inference on the input data through at least one of a first residual network based on sparse convolution and a first self-attention network;

Perform voxel upsampling on the data output by the first residual network or the first self-attention network through a transposed sparse convolution layer with a stride of 2×2×2;

Feature inference is performed on the data output by the transposed sparse convolution layer through at least one of a second residual network based on sparse convolution and a second self-attention network.

The method of claim 8, wherein:

The first residual network and the second residual network include one or more residual layers based on sparse convolution. Each residual layer includes more than three branches. Branch one directly outputs the input data, and the other branches pass through Different numbers of sparse convolution layers perform feature inference on the input data. The outputs of the other branches are spliced and then added to the output of branch one to obtain the output of the residual layer.

The method of claim 8, wherein:

The first self-attention network and/or the second self-attention network includes one or more self-attention layers, and the processing performed by each self-attention layer includes: obtaining the point cloud space from the input data in the following manner. Neighborhood context features:

For each point in the point cloud, search for the neighbor points of the point based on the coordinate data of the point, linearly transform the distance information from the point to the neighbor point to obtain the location feature, and compare the location feature with the neighbor point. The features of the points are added to obtain the aggregated features after position encoding;

Perform a first linear transformation on the input feature data to obtain a first vector, perform matrix multiplication of the first vector and a second vector obtained by performing a second linear transformation on the aggregated features, and after activation, a point cloud is obtained. The attention weight of each point relative to its neighbor points;

Data containing the neighborhood context features are obtained by multiplying the attention weight and a third vector, and the third vector is obtained by performing a third linear transformation on the aggregated features.

The method of claim 10, wherein:

The self-attention layer includes a point cloud neighborhood self-attention layer, a first normalization layer, a linear layer and a second normalization layer connected in sequence. The point cloud neighborhood self-attention layer is used to extract data from the input data. The neighborhood context features in the point cloud space are obtained. The output data and input data of the point cloud neighborhood self-attention layer are added together and then input to the first normalization layer for batch normalization, and the results are then input to The linear layer performs linear transformation. The output data of the linear layer and the input data are added and then input to the second normalization layer for batch normalization to obtain the output of the self-attention layer.

The method of claim 8, wherein:

Each time the voxel upsampling and feature inference are implemented through a decoder based on a neural network, the decoder sequentially includes: a first sparse convolution network, a first self-attention network, a first residual network, a step size It is a 2×2×2 transposed sparse convolution layer, a second residual network, a second self-attention network, and a second sparse convolution network; the first sparse convolution network and the first self-attention An activation function is provided between the force network and between the first residual network and the transposed sparse convolution layer. The first sparse convolution network and the second sparse convolution network include one or more sparse convolutional layers. Convolutional layer.

The method of claim 4, wherein:

The probability prediction is implemented through multiple sparse convolution layers and sigmod function.

An encoding method for point cloud geometric data, applied to point cloud encoders, including:

Perform N times of voxel downsampling on the geometric data of the first-scale point cloud to obtain the geometric data of the second-scale point cloud to the N+1th scale point cloud, N≥1;

Input the geometric data of the Nth scale point cloud into the Nth encoder network of the Nth autoencoder model for M _N times of voxel downsampling and feature extraction, and output the feature data used to enhance the N+1th scale point cloud geometric data. , M _N ≥ 2;

Entropy encoding is performed on the geometric data of the N+1th scale point cloud and the feature data output by the Nth encoder network.

The method of claim 14, wherein:

Before performing entropy encoding on the feature data, the method further includes: quantizing the feature data.

The method of claim 14, wherein:

The method further includes entropy encoding the number K _N of occupied voxels in the Nth scale point cloud.

The method of claim 14, wherein the method further includes:

When N≥2, the geometric data of the j-th scale point cloud is input into the j-th encoder network of the j-th autoencoder model for M _j voxel downsampling and feature extraction, and the output is used to enhance the j+1-th scale point. Cloud geometry data point cloud feature data;

The feature data output by the jth encoder network is quantized and entropy encoded, M _j ≥ 2, and the value of j is any one or more of {1, 2,...,N-1}.

The method of claim 14 or 17, wherein:

Each time the voxel downsampling and feature extraction includes:

Perform feature extraction on the input data through at least one of a first residual network and a first self-attention network based on sparse convolution;

Perform voxel downsampling on the data output by the first residual network or the first self-attention network through a sparse convolution layer with a stride of 2×2×2;

Feature extraction is performed on the data output by the sparse convolution layer through at least one of a second residual network and a second self-attention network based on sparse convolution.

The method of claim 18, wherein:

The method of claim 20, wherein:

The method of claim 18, wherein:

Each time the voxel downsampling and feature extraction are implemented through an encoder based on a neural network, the encoder includes in turn: a first sparse convolutional network, a first self-attention network, a first residual network, a step size is a 2×2×2 sparse convolution layer, a second residual network, a second self-attention network, and a second sparse convolution network; in the first sparse convolution network and the first self-attention An activation function is provided between the networks and between the first residual network and the sparse convolution layer. The first sparse convolution network and the second sparse convolution network include one or more sparse convolution layers. .

A point cloud geometric code stream, wherein the geometric code stream is obtained according to the encoding method of point cloud geometric data as described in any one of claims 14 to 22, including the geometric data of the N+1th scale point cloud and the feature data output by the Nth encoder network.

A device for enhancing point cloud geometric data, including a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can realize the point cloud geometric data as claimed in any one of claims 1 to 3 Enhancement method.

A point cloud decoder, including a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can realize the decoding of the point cloud geometric data according to any one of claims 4 to 13 method.

A point cloud encoder, including a processor and a memory storing a computer program, wherein when the processor executes the computer program, it can implement the encoding of point cloud geometric data as claimed in any one of claims 14 to 22 method.

A point cloud encoding and decoding system, which includes the point cloud encoder as claimed in claim 26 and the point cloud decoder as claimed in claim 25.

A non-transitory computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein when the computer program is executed by a processor, it can achieve the points described in any one of claims 1 to 3 Cloud geometric data enhancement method, or can realize the decoding method of point cloud geometric data as described in any one of claims 4 to 13, or can realize the encoding of point cloud geometric data as described in any one of claims 14 to 22 method.

A point cloud cropping method applied to point cloud decoders, including:

Analyze the code stream to obtain the number K of occupied voxels in the point cloud to be cropped;

Determine the occupancy probability of the voxels in the point cloud to be clipped;

Divide the M voxels decomposed from the same voxel in the point cloud to be clipped into one group, set the occupation probability of the m voxels with the highest occupancy probability in each group to 1, and then classify the point cloud to be clipped The occupancy probabilities of all voxels in are sorted, and the K voxels with the highest occupancy probabilities are determined as occupied voxels in the point cloud to be clipped, 1≤m<M<K.

The point cloud clipping method according to claim 29, wherein m=1 or 2 or 3, and M=8.