CN117036873A

CN117036873A - Multisource three-dimensional data fusion system

Info

Publication number: CN117036873A
Application number: CN202211091029.6A
Authority: CN
Inventors: 王雨禾; 陈卓荣; 吴劲; 曾宪光; 段志奎
Original assignee: Guangzhou Bochuang Information Technology Co ltd
Current assignee: Guangzhou Bochuang Information Technology Co ltd
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-11-10

Abstract

The invention relates to a multi-source three-dimensional data fusion method, which includes a discrete grid input module, a point cloud input module, and a data fusion module; the discrete grid input module and the point cloud input module are parallel, and the outputs of the two modules are jointly input to the data fusion module. ;The output of the data fusion module is the point cloud data and discrete grid data after data fusion. The entire system realizes the alignment and fusion of point cloud data and grid data through the calculation cooperation between various modules, so that downstream tasks can obtain Higher quality fused data. Through the fusion, alignment and optimization of multi-source data, the present invention achieves accurate description of three-dimensional entities, improves model calculation accuracy and saves computing resources.

Description

A multi-source three-dimensional data fusion system

技术领域Technical field

本发明涉及大数据技术领域，尤其涉及一种多源三维数据融合系统。The invention relates to the field of big data technology, and in particular to a multi-source three-dimensional data fusion system.

背景技术Background technique

随着分析对象结构日益复杂、规模日益扩大，CAE在模拟仿真上的作用日趋重要，单一数据源往往只能片面的描述建模目标某些特征，从源头上使三维建模与分析形成了瓶颈，然而目前产业界对CAE仿真的要求日渐增加，仿真在整个设计流程中地位越来越重要，单一数据源的仿真在精度或功能上难以满足要求。As the structure of the analysis object becomes increasingly complex and the scale expands, the role of CAE in simulation becomes increasingly important. A single data source can often only describe certain characteristics of the modeling target in a one-sided manner, which creates a bottleneck in 3D modeling and analysis from the source. , however, the industry's current requirements for CAE simulation are increasing day by day, and simulation plays an increasingly important role in the entire design process. Simulation with a single data source is difficult to meet the requirements in terms of accuracy or functionality.

数据融合的概念是由美国学者在1970年左右提出，它是在“数据融合”的背景下诞生的，常被用于军事应用之中。最初是指对多传感器、多时相、多数据源的图像信息进行分析、提取和综合；现在进一步发展为通过数据的有机融合手段，将不同特性的数据源结合起来，取长补短，达到更全面更客观地反映目标物特性的目的。研究人员在对数据融合及其相关算法进行研究时发现：通过对融合算法进行改进，不仅可以使得信息充分融合，得到更加有价值的三维模型，还可以通过并行计算针对同一模型进行不同功能的仿真，提高仿真分析效率。The concept of data fusion was proposed by American scholars around 1970. It was born under the background of "data fusion" and is often used in military applications. Initially, it refers to the analysis, extraction and synthesis of image information from multiple sensors, multiple phases, and multiple data sources; now it has further developed to combine data sources with different characteristics through the organic fusion of data, learn from each other's strengths, and achieve a more comprehensive and objective reflect the characteristics of the target object. When researchers studied data fusion and related algorithms, they found that by improving the fusion algorithm, not only can information be fully fused to obtain a more valuable three-dimensional model, but also the same model can be simulated with different functions through parallel computing. , improve the efficiency of simulation analysis.

现有技术中，点云是三维建模中常用的数据格式，通过三维激光扫描仪采集到的数据多为点云数据；点云数据具体数据包括位置，法向量，颜色等重要信息；其中距离信息与三维坐标信息是主要信息，目标的反射率、反射强度、点到扫描仪的距离、水平角、垂直角等信息都是激光点云的衍生信息，在处理过程中常被忽视。虽然点云数据中包含大量信息，但也包含大量噪声与误差。需要经过特征匹配与去噪等多重操作才能有效使用。离散网格也是一种常见的三维几何数据，网格由几何拓扑学和力学融合而成衍生而来，是一门复杂的交叉学科，网格的思想源于离散化求解的思想，通过把需要连续求解的区域离散化，分解成若干连续的子区域，分别求解各个子区域的物理变量，各个子区域相邻连续与协调，从而达到整个变量场的协调与连续。但网格的划分对工程师的几何拓扑规划能力是一个严重的挑战，往往进行区域分块会花掉工程师整个分析工作过程近80％的时间。同时，由于结构化要求，导致有时候网格质量难以控制。同时网格的质量又直接影响了最后的仿真精度。In the existing technology, point cloud is a commonly used data format in three-dimensional modeling. Most of the data collected by a three-dimensional laser scanner is point cloud data. The specific data of point cloud data includes important information such as position, normal vector, and color. Among them, distance Information and three-dimensional coordinate information are the main information. Information such as the target's reflectivity, reflection intensity, distance from the point to the scanner, horizontal angle, and vertical angle are all derived information from the laser point cloud and are often ignored during the processing. Although point cloud data contains a lot of information, it also contains a lot of noise and errors. It requires multiple operations such as feature matching and denoising to be used effectively. Discrete grid is also a common three-dimensional geometric data. The grid is derived from the fusion of geometric topology and mechanics. It is a complex interdisciplinary subject. The idea of grid originates from the idea of discretization solution. By combining the needs The continuous solution area is discretized and decomposed into several continuous sub-areas, and the physical variables of each sub-area are solved respectively. Each sub-area is adjacent to each other, continuous and coordinated, thereby achieving coordination and continuity of the entire variable field. However, mesh division is a serious challenge to engineers' geometric topology planning capabilities. Often, area division will take nearly 80% of engineers' entire analysis work process. At the same time, due to structural requirements, the quality of the mesh is sometimes difficult to control. At the same time, the quality of the grid directly affects the final simulation accuracy.

发明内容Contents of the invention

本发明的目的是为了至少解决现有技术的不足之一，提供一种多源三维数据融合系统。The purpose of the present invention is to provide a multi-source three-dimensional data fusion system to at least solve one of the shortcomings of the existing technology.

为了实现上述目的，本发明采用以下的技术方案：In order to achieve the above objects, the present invention adopts the following technical solutions:

具体的，提出一种多源三维数据融合系统，包括：Specifically, a multi-source 3D data fusion system is proposed, including:

点云输入模块，用于对点云数据进行特征提取，得到点云数据的特征向量V，基于所述特征向量V得到生成点云，基于所述生成点云输出二值分类作为点云特征P；The point cloud input module is used to extract features from the point cloud data, obtain the feature vector V of the point cloud data, obtain a generated point cloud based on the feature vector V, and output a binary classification based on the generated point cloud as the point cloud feature P ;

网格输入模块，用于将输入的网格数据进行重网格化，构造细分结构，得到一般网格的多分辨率表示，进行之字形的扩张卷积输出网格特征M；The grid input module is used to re-grid the input grid data, construct a subdivision structure, obtain a multi-resolution representation of the general grid, and perform zigzag expansion convolution to output grid features M;

数据融合模块，输入端并行连接所述点云输入模块的输出端以及所述网格输入模块的输出端，根据所述点云特征P以及网格特征M进行编码，之后进行归一化融合；Data fusion module, the input end is connected in parallel to the output end of the point cloud input module and the output end of the grid input module, encoding is performed according to the point cloud feature P and the grid feature M, and then normalized fusion is performed;

训练模块，用于对所述系统进行训练，采用的策略为，在训练时采用点云数据与网格数据轮流输入的形式，即某次单独输入点云数据，但同时对数据融合模块输出的点云与网格进行损失计算，更新数据融合模块参数；然后再单独使用网格数据作为输入，同时对数据融合模块输出的点云和网格进行损失计算，更新数据融合模块参数，多次重复以上步骤最终实现点云与网格数据的对齐与补齐。The training module is used to train the system. The strategy adopted is that point cloud data and grid data are input in turn during training, that is, the point cloud data is input separately at a certain time, but at the same time the point cloud data output by the data fusion module is input. Perform loss calculations on point clouds and grids, update the data fusion module parameters; then use the grid data alone as input, and simultaneously perform loss calculations on the point clouds and grids output by the data fusion module, update the data fusion module parameters, and repeat multiple times. The above steps finally achieve the alignment and completion of point cloud and grid data.

进一步，具体的，所述点云输入模块包括，Further, specifically, the point cloud input module includes,

多尺度特征提取器E，包括特征嵌入层FEL、注意力层以及多层串联的多层感知机MLP，其中特征嵌入层FEL的作用过程为，The multi-scale feature extractor E includes the feature embedding layer FEL, the attention layer and the multi-layer perceptron MLP in series. The function process of the feature embedding layer FEL is,

点云数据[B,Pnum,3]通过两次多层感知机MLP将逐点特征从3维升至64维,即[B,Pnum,64]，随后网络依次进行采样和K最近邻聚合操作,采样点数和邻域点数的设置与该尺度的点云数据的点数相适应，同时将邻域聚合之后的点云特征与邻域中心点特征的差值作为初始特征,随后通过多层感知机并与邻域聚合之前的点云特征进行串联操作,最终经过最大值池化后,输出聚合特征VT＝[B,pnum/4,128]，具体计算如公式所示，The point cloud data [B, Pnum, 3] passes through two multi-layer perceptron MLPs to increase the point-by-point features from 3 dimensions to 64 dimensions, that is, [B, Pnum, 64], and then the network sequentially performs sampling and K nearest neighbor aggregation operations. , The setting of the number of sampling points and the number of neighborhood points is adapted to the number of points of the point cloud data of this scale. At the same time, the difference between the point cloud features after neighborhood aggregation and the neighborhood center point features is used as the initial feature, and then through the multi-layer perceptron And perform a series operation with the point cloud features before neighborhood aggregation. Finally, after maximum pooling, the aggregate feature VT=[B,pnum/4,128] is output. The specific calculation is as shown in the formula,

式中,V_P表示输入点云的特征，表示经过MLP升维的点云特征,V_E表示FEL输出的点云聚合特征,SG()表示采样聚合操作,RP(x,k)表示将x复制k倍的操作,C表示串联操作,MP表示最大值池化操作，In the formula, V _P represents the characteristics of the input point cloud, Represents the point cloud features that have been dimensionally increased by MLP, V _E represents the point cloud aggregation features output by FEL, SG() represents the sampling aggregation operation, RP(x,k) represents the operation of copying x k times, C represents the concatenation operation, MP Represents the maximum pooling operation,

注意力层的作用过程为，The function process of the attention layer is,

优化了偏移注意力方法,首先对输入VE进行线性变换,求取Que-ry(Q)、Key(K)和Value(V)矩阵,即The offset attention method is optimized. First, linear transformation is performed on the input VE to obtain the Que-ry (Q), Key (K) and Value (V) matrices, that is

式中，W_q，W_k，W_v表示可学习的线性变换权重，d_a是Que-ry(Q)和Key(K)矩阵的维度，d_e是Value(V)矩阵的维度，然后使用Que-ry(Q矩阵和Key(K)矩阵通过矩阵点积计算注意力权重并将其中元素/>进行softmax和L1范数归一化,中间特征V_sa为使用相应注意力权重的Value向量的加权和，计算式如下，In the formula, W _q , W _k , W _v represent the learnable linear transformation weight, d _a is the dimension of Que-ry (Q) and Key (K) matrix, d _e is the dimension of Value (V) matrix, and then use Que-ry (Q matrix and Key (K) matrix calculate attention weight through matrix dot product And add the elements/> Perform softmax and L1 norm normalization, and the intermediate feature V _sa is the weighted sum of the Value vector using the corresponding attention weight. The calculation formula is as follows,

V_sa＝A·V；V _sa =A·V;

金字塔生成器G，用于以最终的特征向量V作为输入,输出为3种尺度与之前对应的生成点云；The pyramid generator G is used to take the final feature vector V as input, and the output is the generated point cloud corresponding to the previous one in 3 scales;

注意力鉴别器D，用于提高点云补全的性能；注意力鉴别器D同样是自编码器的结构，编码器端采用了由特征嵌入层和注意力层组成的单尺度特征提取器，将生成点云P1和其对应的真实点云送入鉴别器D，经特征提取器计算后输出640维度的特征向量，随后通过连续的全连接层，最终为假或者真的二值输出。The attention discriminator D is used to improve the performance of point cloud completion; the attention discriminator D is also an autoencoder structure. The encoder side uses a single-scale feature extractor composed of a feature embedding layer and an attention layer. The generated point cloud P1 and its corresponding real point cloud are sent to the discriminator D. After calculation by the feature extractor, a 640-dimensional feature vector is output, and then passed through continuous fully connected layers, and finally a binary output of false or true.

进一步，具体的，所述网格输入模块的具体作用过程包括，Further, specifically, the specific action process of the grid input module includes:

对给定的三角形网格，学习三维形状的全局表示，或每个面上的特征向量，以获得局部几何特征，对For a given triangular mesh, learn a global representation of the three-dimensional shape, or feature vectors on each face, to obtain local geometric features, for

进一步，具体的，所述数据融合模块包括，Further, specifically, the data fusion module includes:

点云编码器块P_enc和网格编码器块M_enc，用于对点云输入模块输入的点云特征P以及网格输入模块输入的网格特征M进行编码，公式如下，The point cloud encoder block P _enc and the mesh encoder block M _enc are used to encode the point cloud features P input by the point cloud input module and the mesh features M input by the mesh input module. The formula is as follows,

P_enc(P)＝φ_P and M_enc(M)＝φ_M P _enc (P)＝φ _P and M _enc (M)＝φ _M

式中φ_P，编码器块分别由全连接层f₁ ^m、f₂ ^m组成，其中m∈{P，M}，和/>f₁ ^m、f₂ ^m之后分别是批量归一化、ReLU以及dropout层；In the formula φ _P , The encoder block consists of fully connected layers f ₁ ^m and f ₂ ^m respectively, where m∈{P, M}, and/> f ₁ ^m and f ₂ ^m are followed by batch normalization, ReLU and dropout layers respectively;

交叉注意力块，用于分享点云表征φ_P和网格的表征φ_M之间的信息，包括一个多头自注意力层，然后是一个全连接层，对这两个层使用残差链接，然后进行归一化，The cross-attention block, used to share information between the point cloud representation φ _P and the mesh’s representation φ _M , consists of a multi-head self-attention layer, followed by a fully connected layer, using residual links for both layers, Then normalize,

交叉注意区块P_proj和随后的输出模块M_proj的残差链接计算方法如公式所示：The residual link calculation method of the cross attention block P _proj and the subsequent output module M _proj is as shown in the formula:

其中输出模块由两个线性层/>和/>的序列组成，，其中/> 之后分别是批量归一化，ReLU激活函数以及dropout层。in The output module consists of two linear layers/> and/> The sequence consists of , where/> After that are batch normalization, ReLU activation function and dropout layer.

本发明的有益效果为：The beneficial effects of the present invention are:

本发明提出一种多源三维数据融合方法，包括离散网格输入模块、点云输入模块、数据融合模块；离散网格输入模块与点云输入模块并列，两模块的输出共同输入到数据融合模块；数据融合模块的输出即为数据融合后的点云数据与离散网格数据，整个系统通过各个模块之间的计算配合实现了点云数据与网格数据的对齐与融合，使得下游任务可以获得质量更高的融合数据。The present invention proposes a multi-source three-dimensional data fusion method, which includes a discrete grid input module, a point cloud input module, and a data fusion module; the discrete grid input module and the point cloud input module are parallel, and the outputs of the two modules are jointly input to the data fusion module. ;The output of the data fusion module is the point cloud data and discrete grid data after data fusion. The entire system realizes the alignment and fusion of point cloud data and grid data through the calculation cooperation between various modules, so that downstream tasks can obtain Higher quality fused data.

附图说明Description of the drawings

通过对结合附图所示出的实施方式进行详细说明，本公开的上述以及其他特征将更加明显，本公开附图中相同的参考标号表示相同或相似的元素，显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图，在附图中：The above and other features of the present disclosure will be more apparent from the detailed description of the embodiments illustrated in the accompanying drawings, in which the same reference numerals designate the same or similar elements. It will be apparent that the appended drawings in the following description The drawings are only some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts. In the drawings:

图1所示为本发明一种多源三维数据融合系统的结构示意图；Figure 1 shows a schematic structural diagram of a multi-source three-dimensional data fusion system of the present invention;

图2所示为本发明一种多源三维数据融合系统的特征嵌入层FEL的结构图；Figure 2 shows the structure diagram of the feature embedding layer FEL of a multi-source three-dimensional data fusion system of the present invention;

图3所示为本发明一种多源三维数据融合系统的注意力层的结构图；Figure 3 shows the structure diagram of the attention layer of a multi-source three-dimensional data fusion system of the present invention;

图4所示为本发明一种多源三维数据融合系统的网格近似化融合的示意图；Figure 4 shows a schematic diagram of grid approximation fusion of a multi-source three-dimensional data fusion system of the present invention;

图5所示为本发明一种多源三维数据融合系统的网格卷积的示意图；Figure 5 shows a schematic diagram of grid convolution of a multi-source three-dimensional data fusion system of the present invention;

图6所示为本发明一种多源三维数据融合系统的交叉注意力块的结构图。Figure 6 shows the structure diagram of the cross-attention block of a multi-source three-dimensional data fusion system of the present invention.

具体实施方式Detailed ways

以下将结合实施例和附图对本发明的构思、具体结构及产生的技术效果进行清楚、完整的描述，以充分地理解本发明的目的、方案和效果。需要说明的是，在不冲突的情况下，本申请中的实施例及实施例中的特征可以相互组合。附图中各处使用的相同的附图标记指示相同或相似的部分。The following will give a clear and complete description of the concept, specific structure and technical effects of the present invention in conjunction with the embodiments and drawings, so as to fully understand the purpose, solutions and effects of the present invention. It should be noted that, as long as there is no conflict, the embodiments and features in the embodiments of this application can be combined with each other. The same reference numbers used throughout the drawings indicate the same or similar parts.

点云是三维建模中常用的数据格式，通过三维激光扫描仪采集到的数据多为点云数据；点云数据具体数据包括位置，法向量，颜色等重要信息；其中距离信息与三维坐标信息是主要信息，目标的反射率、反射强度、点到扫描仪的距离、水平角、垂直角等信息都是激光点云的衍生信息，在处理过程中常被忽视。虽然点云数据中包含大量信息，但也包含大量噪声与误差。需要经过特征匹配与去噪等多重操作才能有效使用。离散网格也是一种常见的三维几何数据，网格由几何拓扑学和力学融合而成衍生而来，是一门复杂的交叉学科，网格的思想源于离散化求解的思想，通过把需要连续求解的区域离散化，分解成若干连续的子区域，分别求解各个子区域的物理变量，各个子区域相邻连续与协调，从而达到整个变量场的协调与连续。但网格的划分对工程师的几何拓扑规划能力是一个严重的挑战，往往进行区域分块会花掉工程师整个分析工作过程近80％的时间。同时，由于结构化要求，导致有时候网格质量难以控制。同时网格的质量又直接影响了最后的仿真精度。基于以上问题提出了点云与网格融合的方法，通过对多源数据融合、对齐、优化，实现对三维实体的精准描述，对模型计算精度的提高，对计算资源的节省。Point cloud is a commonly used data format in 3D modeling. Most of the data collected by 3D laser scanners is point cloud data. The specific data of point cloud data includes important information such as position, normal vector, color, etc. Among them, distance information and 3D coordinate information is the main information. Information such as the target's reflectivity, reflection intensity, distance from the point to the scanner, horizontal angle, and vertical angle are all derived information from the laser point cloud and are often ignored during the processing. Although point cloud data contains a lot of information, it also contains a lot of noise and errors. It requires multiple operations such as feature matching and denoising to be used effectively. Discrete grid is also a common three-dimensional geometric data. The grid is derived from the fusion of geometric topology and mechanics. It is a complex interdisciplinary subject. The idea of grid originates from the idea of discretization solution. By combining the needs The continuous solution area is discretized and decomposed into several continuous sub-areas, and the physical variables of each sub-area are solved respectively. Each sub-area is adjacent to each other, continuous and coordinated, thereby achieving coordination and continuity of the entire variable field. However, mesh division is a serious challenge to engineers' geometric topology planning capabilities. Often, area division will take nearly 80% of engineers' entire analysis work process. At the same time, due to structural requirements, the quality of the mesh is sometimes difficult to control. At the same time, the quality of the grid directly affects the final simulation accuracy. Based on the above problems, a point cloud and grid fusion method is proposed. Through the fusion, alignment, and optimization of multi-source data, it can achieve accurate description of three-dimensional entities, improve model calculation accuracy, and save computing resources.

参照图1，实施例1，本发明提出一种多源三维数据融合方法，包括：Referring to Figure 1, Embodiment 1, the present invention proposes a multi-source three-dimensional data fusion method, including:

作为本发明的优选实施方式，As a preferred embodiment of the present invention,

具体的，所述点云输入模块是基于Tranformer的多尺度点云补全网络模型，网络整体采用编码器—解码器结构,由多尺度特征提取器E、金字塔点生成器G和注意力鉴别器D组成。模型的输入为3个不同尺度的原始点云,然后分别送入到对应尺度的特征提取器E,生成该原始点云所对应的特征向量V。随后将特征向量V作为金字塔点生成器G的输入,输出3个尺度的生成点云。损失函数主要由生成损失和对抗损失两部分组成,生成损失由3个尺度的生成点云和与之对应的真实点云之间计算CD值(Chamfer distance)得到,对抗损失则由注意力鉴别器D计算得到。注意力鉴别器D借鉴了GAN的思想,其输入为第1尺度生成点云,输出为真实或虚假的二分类值。引入注意力鉴别器D是为了和金字塔点生成器G联合训练,从而间接地提高点生成器的点云补全性能。Specifically, the point cloud input module is a multi-scale point cloud completion network model based on Transformer. The entire network adopts an encoder-decoder structure and consists of a multi-scale feature extractor E, a pyramid point generator G and an attention discriminator. D composition. The input of the model is three original point clouds of different scales, which are then sent to the feature extractor E of the corresponding scale to generate the feature vector V corresponding to the original point cloud. Then the feature vector V is used as the input of the pyramid point generator G to output the generated point cloud in 3 scales. The loss function mainly consists of two parts: generation loss and adversarial loss. The generation loss is obtained by calculating the CD value (Chamfer distance) between the generated point clouds of 3 scales and the corresponding real point clouds. The adversarial loss is obtained by the attention discriminator. D is calculated. The attention discriminator D draws on the idea of GAN. Its input is the first scale generated point cloud, and the output is a binary classification value of true or false. The attention discriminator D is introduced to jointly train with the pyramid point generator G, thereby indirectly improving the point cloud completion performance of the point generator.

结合图2，多尺度特征提取器E，包括特征嵌入层FEL、注意力层以及多层串联的多层感知机MLP，其中特征嵌入层FEL的作用过程为，Combined with Figure 2, the multi-scale feature extractor E includes the feature embedding layer FEL, the attention layer and the multi-layer perceptron MLP in series. The function process of the feature embedding layer FEL is,

点云数据[B,Pnum,3]通过两次多层感知机MLP将逐点特征从3维升至64维,即[B,Pnum,64]，随后网络依次进行采样和K最近邻聚合操作,采样点数和邻域点数的设置与该尺度的点云数据的点数相适应，目的是使得网络更加具备提取物体局部特征信息的能力。同时为了避免点的绝对坐标信息给邻域聚合操作带来不利影响，将邻域聚合之后的点云特征与邻域中心点特征的差值作为初始特征,随后通过多层感知机并与邻域聚合之前的点云特征进行串联操作,最终经过最大值池化后，输出聚合特征VT＝[B,pnum/4,128]，具体计算如公式所示，The point cloud data [B, Pnum, 3] passes through two multi-layer perceptron MLPs to increase the point-by-point features from 3 dimensions to 64 dimensions, that is, [B, Pnum, 64], and then the network sequentially performs sampling and K nearest neighbor aggregation operations. , The setting of the number of sampling points and the number of neighborhood points is adapted to the number of points of the point cloud data of this scale. The purpose is to make the network more capable of extracting local feature information of the object. At the same time, in order to avoid the negative impact of the absolute coordinate information of points on the neighborhood aggregation operation, the difference between the point cloud features after neighborhood aggregation and the neighborhood center point features is used as the initial feature, and then combined with the neighborhood through the multi-layer perceptron. The point cloud features before aggregation are operated in series, and finally after maximum pooling, the aggregated feature VT=[B,pnum/4,128] is output. The specific calculation is as shown in the formula,

式中,V_P表示输入点云的特征,V_P表示经过MLP升维的点云特征,V_E表示FEL输出的点云聚合特征,SG()表示采样聚合操作,RP(x,k)表示将x复制k倍的操作,C表示串联操作,MP表示最大值池化操作，In the formula, V _P represents the characteristics of the input point cloud, V _P represents the point cloud features that have been dimensionally increased by MLP, V _E represents the point cloud aggregation features output by FEL, SG() represents the sampling aggregation operation, and RP(x,k) represents The operation of copying x k times, C represents the concatenation operation, MP represents the maximum pooling operation,

结合图3，注意力层的作用过程为，Combined with Figure 3, the function process of the attention layer is,

V_sa＝A·V；V _sa =A·V;

金字塔生成器G，用于以最终的特征向量V作为输入,输出为3种尺度与之前对应的生成点云，The pyramid generator G is used to take the final feature vector V as input, and the output is the generated point cloud corresponding to the previous three scales.

金字塔点生成器G以最终的特征向量V作为输入,输出为3种尺度与之前对应的生成点云,其网络架构主要由全连接层和Reshape操作组成。在常用算法种全连接解码器善于预测点云的整体几何结构,但因为只是用最后一层特征来进行补全,很难提取到点云的局部结构特征。因此,借鉴特征金字塔网络的金字塔逐层特征提取的思想,按照从粗略到精细的过程,逐步完成点云补全操作；The pyramid point generator G takes the final feature vector V as input, and the output is the generated point cloud corresponding to the previous three scales. Its network architecture mainly consists of a fully connected layer and a Reshape operation. In commonly used algorithms, the fully connected decoder is good at predicting the overall geometric structure of the point cloud, but because it only uses the last layer of features for completion, it is difficult to extract the local structural features of the point cloud. Therefore, we draw on the idea of layer-by-layer feature extraction from the feature pyramid network, and gradually complete the point cloud completion operation according to the process from rough to fine;

注意力鉴别器D，用于提高点云补全的性能；点云补全任务隶属于生成式任务，借鉴生成对抗网络中生成网络和鉴别网络相互促进训练的思想，本发明也引入了注意力鉴别器D来提高点云补全的性能，注意力鉴别器D同样是自编码器的结构，编码器端采用了由特征嵌入层和注意力层组成的单尺度特征提取器，将生成点云P1和其对应的真实点云送入鉴别器D，经特征提取器计算后输出640维度的特征向量，随后通过连续的全连接层，最终为假或者真的二值输出。Attention discriminator D is used to improve the performance of point cloud completion; the point cloud completion task belongs to the generative task. Drawing on the idea of mutually promoting training of the generative network and the discriminant network in the generative adversarial network, the present invention also introduces attention The discriminator D is used to improve the performance of point cloud completion. The attention discriminator D is also the structure of an autoencoder. The encoder uses a single-scale feature extractor composed of a feature embedding layer and an attention layer to generate point clouds. P1 and its corresponding real point cloud are sent to the discriminator D. After calculation by the feature extractor, a 640-dimensional feature vector is output, and then passes through continuous fully connected layers, and finally is a binary output of false or true.

结合图4以及图5，作为本发明的优选实施方式，具体的，所述网格输入模块的具体作用过程包括，In conjunction with Figure 4 and Figure 5, as a preferred embodiment of the present invention, specifically, the specific action process of the grid input module includes:

通过对三维网格卷积运算，不同于不同二维卷积运算，只在同一平面进行卷积，三维网格卷积对整个三维网格数据，进行三维卷积运算，得到精细化即更高分辨率的三维网格，通过正向或反向运算得到类似2维卷积中下采样和上采样操作类似的效果，使得网格数据也可以套用成熟的神经网络结构进行运算。Through the three-dimensional grid convolution operation, unlike different two-dimensional convolution operations, the convolution is only performed on the same plane. The three-dimensional grid convolution performs a three-dimensional convolution operation on the entire three-dimensional grid data, resulting in refinement and higher The high-resolution three-dimensional grid can achieve similar effects to the down-sampling and up-sampling operations in 2-dimensional convolution through forward or reverse operations, so that grid data can also be operated using mature neural network structures.

以往的网格深度学习方法将特征存储在点或者边上，但是点的度数不固定，边的卷积不灵活。通过在面片上的网格卷积方法，充分利用了每个面片与三个面片相邻的规则性质。通过扩展这一性质，依据面片之间的距离，设计了多种不同的卷积模式。Previous grid deep learning methods store features at points or edges, but the degrees of points are not fixed and the convolution of edges is inflexible. Through the grid convolution method on the patches, the regular nature of each patch being adjacent to three patches is fully utilized. By extending this property, a variety of different convolution modes are designed based on the distance between patches.

从图5可以看到，这种面片上的网格卷积方法，直观且灵活，有规律，很类似于图像的情形。图中其中a)为三角面片卷积，b)为卷积中可能出现的重复访问，c)为更复杂的卷积示例。As can be seen from Figure 5, this grid convolution method on the patch is intuitive, flexible, and regular, very similar to the image situation. In the figure, a) is the triangular patch convolution, b) is the repeated access that may occur in the convolution, and c) is a more complex convolution example.

由于三维数据格式中的面片顺序不固定，在计算卷积结果时，通过取邻域均值、差分均值等方式，使得计算结果与面片顺序无关，满足排列不变性。下式给出了卷积的定义及其每项的含义。Since the order of patches in the three-dimensional data format is not fixed, when calculating the convolution results, the calculation results are independent of the patch order by taking the neighborhood mean, difference mean, etc., and satisfy the arrangement invariance. The following formula gives the definition of convolution and the meaning of each term.

w₀e_i代表中心特征，代表邻域特征的和，/>代表邻域差分的和，/>代表中心与领域的差的和。w ₀ e _i represents the central feature, Represents the sum of neighborhood features,/> Represents the sum of neighborhood differences,/> Represents the sum of the differences between center and field.

结合图6，作为本发明的优选实施方式，具体的，所述数据融合模块包括，With reference to Figure 6, as a preferred embodiment of the present invention, specifically, the data fusion module includes:

P_enc(P)＝φ_P and M_enc(M)＝φ_M P _enc (P)＝φ _P and M _enc (M)＝φ _M

交叉注意力块，用于分享点云和网格的表征φ_P和φ_M之间的信息，包括一个多头自注意力层，然后是一个全连接层，对这两个层使用残差链接，然后进行归一化，Cross-attention block, used to share information between point cloud and mesh representations φ _P and φ _M , consists of a multi-head self-attention layer, followed by a fully connected layer, using residual links for both layers, Then normalize,

点云编码器和网格编码器分支的前馈块分别由一个全连接层 m∈{P，M}，然后是GELU激活函数，dropout层，另一个全连接层/> m∈{P，M}，最后连接是dropout层。交叉注意块的输出是/> The feedforward blocks of the point cloud encoder and mesh encoder branches each consist of a fully connected layer m∈{P,M}, then GELU activation function, dropout layer, and another fully connected layer/> m∈{P,M}, the last connection is the dropout layer. The output of the cross attention block is/>

交叉注意区块和随后的输出模块P_proj和M_proj的残差链接计算方法如公式所示：The residual link calculation method of the cross attention block and subsequent output modules P _proj and M _proj is as shown in the formula:

其中输出模块由两个线性层/>和/>的序列组成，，其中/> 之后分别是批量归一化，ReLU激活函数，和丢弃率为r_proj的dropout层。in The output module consists of two linear layers/> and/> The sequence consists of , where/> This is followed by batch normalization, ReLU activation function, and dropout layer with dropout rate r _proj .

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理模块，即可以位于一个地方，或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例中的方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理模块中，也可以是各个模块单独物理存在，也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。In addition, each functional module in various embodiments of the present invention can be integrated into one processing module, or each module can exist physically alone, or two or more modules can be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules.

所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实现上述实施例方法中的全部或部分流程，也可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一计算机可读存储的介质中，该计算机程序在被处理器执行时，可实现上述各个方法实施例的步骤。其中，所述计算机程序包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或系统、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，RandomAccess Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包括的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括是电载波信号和电信信号。If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or system capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , random access memory (RAM, RandomAccess Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc. It should be noted that the content included in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium Excluded are electrical carrier signals and telecommunications signals.

尽管本发明的描述已经相当详尽且特别对几个所述实施例进行了描述，但其并非旨在局限于任何这些细节或实施例或任何特殊实施例，而是应当将其视作是通过参考所附权利要求考虑到现有技术为这些权利要求提供广义的可能性解释，从而有效地涵盖本发明的预定范围。此外，上文以发明人可预见的实施例对本发明进行描述，其目的是为了提供有用的描述，而那些目前尚未预见的对本发明的非实质性改动仍可代表本发明的等效改动。Although the present invention has been described in considerable detail and in particular to several of the described embodiments, it is not intended to be limited to any such details or embodiments or to any particular embodiment, but rather is to be considered by reference The appended claims are intended to provide the broadest possible interpretation of these claims, taking into account the prior art, to effectively cover the intended scope of the invention. In addition, the above description of the present invention is based on embodiments foreseeable by the inventor for the purpose of providing a useful description, and those non-substantive changes to the present invention that are not yet foreseen can still represent equivalent changes of the present invention.

以上所述，只是本发明的较佳实施例而已，本发明并不局限于上述实施方式，只要其以相同的手段达到本发明的技术效果，都应属于本发明的保护范围。在本发明的保护范围内其技术方案和/或实施方式可以有各种不同的修改和变化。The above are only preferred embodiments of the present invention. The present invention is not limited to the above-mentioned embodiments. As long as the technical effects of the present invention are achieved by the same means, they shall fall within the protection scope of the present invention. Various modifications and changes may be made to the technical solutions and/or implementations within the scope of the present invention.

Claims

1. A multi-source three-dimensional data fusion system, which is characterized by including:

The point cloud input module is used to extract features from the point cloud data, obtain the feature vector V of the point cloud data, obtain a generated point cloud based on the feature vector V, and output a binary classification based on the generated point cloud as the point cloud feature P ;

The grid input module is used to re-grid the input grid data, construct a subdivision structure, obtain a multi-resolution representation of the general grid, and perform zigzag expansion convolution to output grid features M;

Data fusion module, the input end is connected in parallel to the output end of the point cloud input module and the output end of the grid input module, encoding is performed according to the point cloud feature P and the grid feature M, and then normalized fusion is performed;

The training module is used to train the system. The strategy adopted is that point cloud data and grid data are input in turn during training, that is, the point cloud data is input separately at a certain time, but at the same time the point cloud data output by the data fusion module is input. Perform loss calculations on point clouds and grids, update the data fusion module parameters; then use the grid data alone as input, and simultaneously perform loss calculations on the point clouds and grids output by the data fusion module, update the data fusion module parameters, and repeat multiple times. The above steps finally achieve the alignment and completion of point cloud and grid data.

2. A multi-source three-dimensional data fusion system according to claim 1, characterized in that, specifically, the point cloud input module includes:

Multi-scale feature extractor E, including feature embedding layer FEL, attention layer and multi-layer concatenated multi-layer perceptron MLP,

The function process of the feature embedding layer FEL is,

The point cloud data [B, Pnum, 3] passes through two multi-layer perceptron MLPs to increase the point-by-point features from 3 dimensions to 64 dimensions, that is, [B, Pnum, 64], and then the network sequentially performs sampling and K nearest neighbor aggregation operations. , The setting of the number of sampling points and the number of neighborhood points is adapted to the number of points of the point cloud data of this scale. At the same time, the difference between the point cloud features after neighborhood aggregation and the neighborhood center point features is used as the initial feature, and then through the multi-layer perceptron And perform a series operation with the point cloud features before neighborhood aggregation. Finally, after maximum pooling, the aggregate feature VT=[B,pnum/4,128] is output. The specific calculation is as shown in the formula,

In the formula, V _P represents the characteristics of the input point cloud, Represents the point cloud features that have been dimensionally increased by MLP, V _E represents the point cloud aggregation features output by FEL, SG() represents the sampling aggregation operation, RP(x,k) represents the operation of copying x k times, C represents the concatenation operation, MP Represents the maximum pooling operation,

The function process of the attention layer is,

The offset attention method is optimized. First, linear transformation is performed on the input VE to obtain the Que-ry (Q), Key (K) and Value (V) matrices, that is

In the formula, W _q , W _k , W _v represent the learnable linear transformation weight, d _a is the dimension of Que-ry (Q) and Key (K) matrix, d _e is the dimension of Value (V) matrix, and then use Que-ry (Q matrix and Key (K) matrix calculate attention weight through matrix dot product And add the elements/> Perform softmax and L1 norm normalization, and the intermediate feature V _sa is the weighted sum of the Value vector using the corresponding attention weight. The calculation formula is as follows,

V _sa =A·V;

The pyramid generator G is used to take the final feature vector V as input, and the output is the generated point cloud corresponding to the previous one in 3 scales;

The attention discriminator D is used to improve the performance of point cloud completion; the attention discriminator D is also an autoencoder structure. The encoder side uses a single-scale feature extractor composed of a feature embedding layer and an attention layer. The generated point cloud P1 and its corresponding real point cloud are sent to the discriminator D. After calculation by the feature extractor, a 640-dimensional feature vector is output, and then passed through continuous fully connected layers, and finally a binary output of false or true.

3. A multi-source three-dimensional data fusion system according to claim 1, characterized in that, specifically, the specific action process of the grid input module includes:

The input data of the grid input module is a three-dimensional grid, and a three-dimensional convolution operation is performed on the three-dimensional grid to obtain a refined, higher-resolution three-dimensional grid.

Since the order of patches in the three-dimensional data format is not fixed, when calculating the convolution results, the neighborhood mean and difference mean are used to make the calculation results independent of the patch order and satisfy the arrangement invariance. The convolution process is as follows: ,

w ₀ e _i represents the central feature, Represents the sum of neighborhood features,/> represents the sum of neighborhood differences, Represents the sum of the differences between center and field.

4. A multi-source three-dimensional data fusion system according to claim 1, characterized in that, specifically, the data fusion module includes:

The point cloud encoder block P _enc and the mesh encoder block M _enc are used to encode the point cloud features P input by the point cloud input module and the mesh features M input by the mesh input module. The formula is as follows,

P _enc (P)＝φ _P and M _enc (M)＝φ _M

In the formula φ _P , The encoder blocks are respectively composed of fully connected layers f ₁ ^m ,/> Composition, where m∈{P, M}, and/> f ₁ ^m and f ₂ ^m are followed by batch normalization, ReLU and dropout layers respectively;

The cross-attention block, used to share information between the point cloud representation φ _P and the mesh’s representation φ _M , consists of a multi-head self-attention layer, followed by a fully connected layer, using residual links for both layers, Then normalize,

The residual link calculation method of the cross attention block P _proj and the subsequent output module M _proj is as shown in the formula:

where θ _P , The output module consists of two linear layers/> and/> The sequence consists of , where/> After that are batch normalization, ReLU activation function and dropout layer.