TWI846598B

TWI846598B - 3d surface reconstruction method

Info

Publication number: TWI846598B
Application number: TW112135412A
Authority: TW
Inventors: 李光庭; 許富雄; 廖書巧; 簡佑如; 金耘志; 洪瑋胤
Original assignee: 華碩電腦股份有限公司
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2024-06-21
Also published as: TW202514540A

Abstract

The application provides a 3D surface reconstruction method, including: extracting multiple visual features from multiple multi-view images of an object and obtaining multiple camera pose information; converting the camera pose information into multiple posed embedding information; adding the visual features with the posed embedding information to generate input information; inputting the input information into an encoder of a transformer model, and sequentially performing a first attention operation from sequence to sequence to output volume features; inputting the volume features into a decoder of the transformer model, several layers of second attention operations are performed sequentially to generate a feature prediction result; mapping an operation dimension of the feature prediction result into a one-dimensional feature dimension; and reconstructing a three-dimensional model of the object according to the one-dimensional feature dimension.

Description

Three-dimensional surface reconstruction method

本案係有關一種利用轉換器（Transformer）模型之神經網路架構的三維表面重建方法。This case is about a three-dimensional surface reconstruction method using a neural network architecture of the Transformer model.

現有三維表面重建方法通常先透過相機擷取多視角影像，並結合相機同步定位與建圖（Simultaneous Localization And Mapping，SLAM）系統來取得相機姿態（camera poses）資訊。接著將不同視角的影像分別利用深度圖估計（Depth Map Estimation）方法生成深度圖，並利用立體匹配演算法（Stereo Matching Algorithm）匹配影像間的特徵像素點，最後結合深度圖、匹配的特徵像素點及相機姿態資訊融合成點雲、多邊形網路或截斷符號距離函數（Truncated Signed Distance Function，TSDF）等三維模型的表示格式。Existing 3D surface reconstruction methods usually first capture multi-view images through a camera and combine them with a camera simultaneous localization and mapping (SLAM) system to obtain camera pose information. Then, the images from different viewpoints are used to generate depth maps using the depth map estimation method, and the feature pixels between the images are matched using the stereo matching algorithm. Finally, the depth map, matched feature pixels and camera pose information are combined into a three-dimensional model representation format such as a point cloud, polygon network or truncated signed distance function (TSDF).

然而，前述方法通常會面臨二大困難點，第一是要融合成三維模型的深度圖都是來自於不同的視角，有著不同的相機姿態，本質上無法平滑的拼接融合。第二則是當影像的紋理（texture）過於平滑或重複時，會導致立體匹配演算法難以找到正確的匹配特徵像素點。因此，前述重建方法所面臨的兩種困難都會導致重建出來的三維模型表面出現許多重建缺陷。However, the aforementioned methods usually face two major difficulties. The first is that the depth maps to be fused into the three-dimensional model are from different perspectives and have different camera postures, and they cannot be smoothly stitched and fused. The second is that when the texture of the image is too smooth or repetitive, it will make it difficult for the stereo matching algorithm to find the correct matching feature pixels. Therefore, the two difficulties faced by the aforementioned reconstruction methods will lead to many reconstruction defects on the surface of the reconstructed three-dimensional model.

本案提供一種三維表面重建方法，包含：自一物體之複數多視角影像中萃取出複數視覺特徵，並取得複數相機姿態資訊。將相機姿態資訊轉換為複數姿態嵌入資訊。將視覺特徵與姿態嵌入資訊進行相加運算，以產生複數輸入資訊。將輸入資訊送入至一轉換器模型之一編碼器中，依序進行數層序列至序列的第一注意力運算，以輸出複數體積特徵。將體積特徵輸入至轉換器模型之一解碼器中，依序進行數層的第二注意力運算，以產生一特徵預測結果。將特徵預測結果之運算維度映射成一維之特徵維度。最後，根據特徵維度重新建構出物體之一三維模型。This case provides a three-dimensional surface reconstruction method, including: extracting multiple visual features from multiple multi-view images of an object and obtaining multiple camera posture information. Converting the camera posture information into multiple posture embedding information. Adding the visual features and the posture embedding information to generate multiple input information. Sending the input information to an encoder of a converter model, and sequentially performing several layers of sequence-to-sequence first attention operations to output multiple volume features. Inputting the volume features into a decoder of the converter model, and sequentially performing several layers of second attention operations to generate a feature prediction result. Mapping the operation dimension of the feature prediction result into a one-dimensional feature dimension. Finally, reconstructing a three-dimensional model of the object based on the feature dimension.

綜上所述，本案係為一種三維表面重建方法，其係利用轉換器（Transformer）模型網路架構能夠很好學習全局長程關係（global long range dependency）的特性，取代習知分別估計不同視角深度圖的方法，同時參考不同視角的影像來更新體積特徵（volumetric representation），達到改善物體表面細節的重建結果。因此，本案可以重建不同視角下的物體表面，並有效改善表面重建缺陷。In summary, this case is a 3D surface reconstruction method that uses the Transformer model network architecture to learn the characteristics of global long range dependency, replacing the method of learning to estimate depth maps at different view angles, and at the same time refers to images at different view angles to update the volumetric representation, thereby improving the reconstruction results of the surface details of the object. Therefore, this case can reconstruct the surface of the object at different view angles and effectively improve the surface reconstruction defects.

以下將配合相關圖式來說明本案的實施例。此外，實施例中的圖式有省略部份元件或結構，以清楚顯示本案的技術特點。在這些圖式中，相同的標號表示相同或類似的元件或電路，必須瞭解的是，儘管術語“第一”、“第二”等在本文中可以用於描述各種元件、部件、區域或功能，但是這些元件、部件、區域及/或功能不應受這些術語的限制，這些術語僅用於將一個元件、部件、區域或功能與另一個元件、部件、區域或功能區隔開來。The following will be used in conjunction with the relevant drawings to illustrate the embodiments of the present invention. In addition, the drawings in the embodiments omit some components or structures to clearly show the technical features of the present invention. In these drawings, the same reference numerals represent the same or similar components or circuits. It must be understood that although the terms "first", "second", etc. may be used in this article to describe various components, parts, regions or functions, these components, parts, regions and/or functions should not be limited by these terms. These terms are only used to separate one component, component, region or function from another component, component, region or function.

請參閱圖1所示，三維表面重建方法主要包含步驟S10至步驟S22。首先，如步驟S10所示，利用一捲積神經網路（Convolutional Neural Network，CNN）特徵萃取器自一物體之複數多視角影像中萃取出複數視覺特徵，並取得複數相機姿態（camera pose）資訊，此相機姿態資訊包含一相機位置座標及一相機拍攝角度。在一實施例中，特徵萃取器可以使用VGGNet、ResNet、EfficientNet、MobileNet、視覺轉換器（Vision Transformer）、Swin轉換器（Swin Transformer）等以深度學習為基礎的特徵萃取器。As shown in FIG. 1 , the three-dimensional surface reconstruction method mainly includes steps S10 to S22. First, as shown in step S10, a convolutional neural network (CNN) feature extractor is used to extract multiple visual features from multiple multi-view images of an object, and multiple camera pose information is obtained. The camera pose information includes a camera position coordinate and a camera shooting angle. In one embodiment, the feature extractor can use a feature extractor based on deep learning such as VGGNet, ResNet, EfficientNet, MobileNet, Vision Transformer, Swin Transformer, etc.

如步驟S12所示，將取得之相機姿態資訊透過一多層感知器（Multi-Layer Perceptron，MLP）來進行轉換，以將相機姿態資訊映射至維度N而轉換為複數姿態嵌入資訊，使維度N與多視角影像之視覺特徵具有相同的維度。As shown in step S12, the acquired camera pose information is transformed through a multi-layer perceptron (MLP) to map the camera pose information to dimension N and transform it into complex pose embedding information, so that the dimension N has the same dimension as the visual features of the multi-view image.

如步驟S14所示，將這些視覺特徵與姿態嵌入資訊進行相加運算，以進行特徵融合（feature fusion），進而產生複數輸入資訊，這些輸入資訊係可作為一轉換器模型之輸入。As shown in step S14, these visual features and posture embedding information are added to perform feature fusion to generate multiple input information, which can be used as input of a converter model.

請同時參閱圖1及圖2所示，如步驟S16所示，將輸入資訊輸入至轉換器模型（Transformer Model）10之一編碼器（Encoder）12中，依序進行數層序列至序列的第一注意力運算，以輸出複數體積特徵。在一實施例中，編碼器12係由複數個編碼區塊（Encoder Blocks）14所組成，每一編碼區塊14更包含一第一多頭自注意力（Muti-head Self Attention，MSA）模組16以及一第一前饋網路（Feed Forward Network，FFN）模組18，其中第一多頭自注意力模組16內部係使用注意力機制進行運算，第一前饋網路模組18負責持續強化特徵的細節表達能力。基此，透過第一多頭自注意力模組16可以將每一輸入資訊轉換為三個特徵向量（Q向量、K向量及V向量），並執行第一注意力運算，再透過第一前饋網路模組18強化細節，以輸出複數體積特徵，此體積特徵係包含有物體的一深度、一紋理及一空間資訊等資訊。以N個編碼區塊14來說，每一編碼區塊14中之第一多頭自注意力模組16會將上一層的輸出生成三個特徵向量並執行第一注意力運算，注意力機制能夠精煉物體影像特徵、提高特徵品質，使物體各個角度下的立體特徵會在一層一層的注意力機制下被提取，並經過第一前饋網路模組18強化細節表達能力。Please refer to FIG. 1 and FIG. 2 at the same time. As shown in step S16, the input information is input into an encoder 12 of the transformer model 10, and the first attention operation of the sequence to the sequence is performed in sequence to output a plurality of volume features. In one embodiment, the encoder 12 is composed of a plurality of encoder blocks 14, each of which further includes a first multi-head self-attention (MSA) module 16 and a first feed forward network (FFN) module 18, wherein the first multi-head self-attention module 16 uses the attention mechanism to perform operations, and the first feed forward network module 18 is responsible for continuously strengthening the detailed expression ability of the feature. Based on this, each input information can be converted into three feature vectors (Q vector, K vector and V vector) through the first multi-head self-attention module 16, and the first attention operation is executed, and then the details are enhanced through the first feedforward network module 18 to output a plurality of volume features, which include information such as a depth, a texture and a spatial information of the object. Taking N coding blocks 14 as an example, the first multi-head self-attention module 16 in each coding block 14 will generate three feature vectors from the output of the previous layer and execute the first attention operation. The attention mechanism can refine the image features of the object and improve the feature quality, so that the three-dimensional features of the object at various angles will be extracted under the layer-by-layer attention mechanism and the detail expression ability will be enhanced through the first feedforward network module 18.

請同時參閱圖1及圖2所示，如步驟S18所示，將體積特徵輸入至轉換器模型10之一解碼器（Decoder）20中，依序進行數層的第二注意力運算，以產生一特徵預測結果。在一實施例中，解碼器20係由複數個解碼區塊（Decoder Blocks）22所組成，每一解碼區塊22更包含一第二多頭自注意力（Muti-head Self Attention，MSA）模組24、一編解碼注意力（Encoder Decoder Attention，EDA）模組26以及一第二前饋網路（Feed Forward Network，FFN）模組28，其中第二多頭自注意力模組24及編解碼注意力模組26內部皆使用注意力機制進行運算，第二前饋網路模組28負責持續強化特徵的細節表達能力。基此，透過第二多頭自注意力模組24將上層輸出之特徵函式轉換為三個特徵向量，分別為Q（Query）向量、K（Key）向量及V（Value）向量，並執行第二注意力運算，使第二多頭自注意力模組24的輸出作為一第一特徵向量（V向量），且編解碼注意力模組26根據編碼區塊14輸出之體積特徵作為輸入而生成一第二特徵向量（Q向量）及一第三特徵向量（K向量），並依據第一特徵向量（V向量）、第二特徵向量（Q向量）及第三特徵向量（K向量）進行第二注意力運算，再透過第二前饋網路模組28強化細節，以輸出特徵預測結果。以N個解碼區塊22來說，每一解碼區塊22中之第二多頭自注意力模組24會將上一層的輸出生成三個特徵向量並執行第一次的第二注意力運算，不同的是在編解碼注意力模組26中會將編碼區塊14輸出之體積特徵作為輸入而生成第二特徵向量（Q向量）及第三特徵向量（K向量），並利用第二多頭自注意力模組24的輸出作為第一特徵向量（V向量），以進行第二注意力運算，最後再經過第二前饋網路模組28強化細節表達能力。Please refer to FIG. 1 and FIG. 2 at the same time. As shown in step S18, the volume feature is input into a decoder 20 of the converter model 10, and several layers of second attention operations are performed in sequence to generate a feature prediction result. In one embodiment, the decoder 20 is composed of a plurality of decoder blocks 22, each of which further includes a second multi-head self-attention (MSA) module 24, an encoder-decoder attention (EDA) module 26, and a second feed-forward network (FFN) module 28, wherein the second multi-head self-attention module 24 and the encoder-decoder attention module 26 both use the attention mechanism for calculation, and the second feed-forward network module 28 is responsible for continuously enhancing the detail expression capability of the features. Based on this, the feature function output by the upper layer is converted into three feature vectors, namely Q (Query) vector, K (Key) vector and V (Value) vector, through the second multi-head self-attention module 24, and the second attention operation is performed, so that the output of the second multi-head self-attention module 24 is used as a first feature vector (V vector), and the encoding and decoding attention module 26 generates a second feature vector (Q vector) and a third feature vector (K vector) according to the volume feature output by the encoding block 14 as input, and performs the second attention operation based on the first feature vector (V vector), the second feature vector (Q vector) and the third feature vector (K vector), and then strengthens the details through the second feedforward network module 28 to output the feature prediction result. Taking N decoding blocks 22 as an example, the second multi-head self-attention module 24 in each decoding block 22 will generate three feature vectors from the output of the previous layer and perform the first second attention operation. The difference is that in the encoding and decoding attention module 26, the volume features of the output of the encoding block 14 will be used as input to generate the second feature vector (Q vector) and the third feature vector (K vector), and the output of the second multi-head self-attention module 24 will be used as the first feature vector (V vector) to perform the second attention operation, and finally the detail expression ability will be enhanced through the second feedforward network module 28.

其中，第二多頭自注意力模組24在解碼區塊22中的作用同樣是精煉特徵函式（TSDF Queries）的品質，而編解碼注意力模組26跨編碼器、解碼器的注意力運算則是利用物體之體積特徵的表達能力一層一層引導特徵預測結果（例如TSDF特徵）的生成，同時將運算維度由高階的特徵維度映射成為本案使用之物體重構模型所在的空間維度。Among them, the role of the second multi-head self-attention module 24 in the decoding block 22 is also to refine the quality of the feature function (TSDF Queries), and the attention operation of the encoder-decoder attention module 26 across the encoder and decoder uses the expression ability of the volume features of the object to guide the generation of feature prediction results (such as TSDF features) layer by layer, and at the same time maps the operation dimension from the high-level feature dimension to the spatial dimension where the object reconstruction model used in this case is located.

在一實施例中，轉換器模型10中之編碼器12中使用之第一注意力運算以及解碼器20中使用之第二注意力運算，都是採用下列公式(1)進行運算，其中，Q表示Q向量（第二特徵向量），K表示K向量（第三特徵向量），V表示V向量（第一特徵向量），上標T表示向量的轉置矩陣， d _k 為K向量之維度（Dimension of K）。 (1) In one embodiment, the first attention operation used in the encoder 12 in the converter model 10 and the second attention operation used in the decoder 20 are both calculated using the following formula (1), wherein Q represents the Q vector (second eigenvector), K represents the K vector (third eigenvector), V represents the V vector (first eigenvector), the superscript T represents the transposed matrix of the vector, and d _k is the dimension of the K vector (Dimension of K). (1)

如步驟S20所示，將特徵預測結果之運算維度利用一截斷符號距離函數映射器（TSDF Projector）映射成原始維度的一維之特徵維度。最後，如步驟S22所示，根據特徵維度重新建構出物體之一三維模型，此三維模型係為以截斷符號距離函數（Truncated Signed Distance Function，TSDF）格式表達的三維模型重建結果。As shown in step S20, the calculation dimension of the feature prediction result is mapped into a one-dimensional feature dimension of the original dimension using a truncated signed distance function mapper (TSDF Projector). Finally, as shown in step S22, a three-dimensional model of the object is reconstructed according to the feature dimension. The three-dimensional model is a three-dimensional model reconstruction result expressed in a truncated signed distance function (TSDF) format.

請同時參閱圖1及圖3所示，本案係透過一三維表面重建系統30執行整個三維表面重建方法。三維表面重建系統30包含一影像擷取裝置32、一中央處理器（CPU）34以及一圖形處理器（GPU）36。影像擷取裝置32電性連接中央處理器34，且中央處理器34電性連接圖形處理器36，影像擷取裝置32係拍攝物體之各視角，以取得複數多視角影像，並將多視角影像傳送至中央處理器34，中央處理器34內建有一機器學習（Machine Learning）演算法，機器學習演算法可以用於執行圖1所示之步驟S10至步驟S22，基此，中央處理器34在接收到多視角影像後即可進行步驟S10至步驟S22所示之三維表面重建方法，且在過程中有關影像處理的部分，可以由中央處理器34傳送給圖形處理器36進行處理，以加速影像處理時間，圖形處理器36處理完成後再回傳給中央處理器34，最後由中央處理器34來輸出重建出來的三維模型。Please refer to FIG. 1 and FIG. 3 . The present invention implements the entire 3D surface reconstruction method through a 3D surface reconstruction system 30. The 3D surface reconstruction system 30 includes an image capture device 32, a central processing unit (CPU) 34, and a graphics processing unit (GPU) 36. The image capture device 32 is electrically connected to the CPU 34, and the CPU 34 is electrically connected to the GPU 36. The image capture device 32 captures each view angle of the object to obtain multiple multi-view angle images, and transmits the multi-view angle images to the CPU 34. The CPU 34 has a built-in machine learning (Machine Learning) The machine learning algorithm can be used to execute steps S10 to S22 shown in FIG. 1. Based on this, the central processing unit 34 can perform the three-dimensional surface reconstruction method shown in steps S10 to S22 after receiving the multi-view images, and the image processing part of the process can be transmitted by the central processing unit 34 to the graphics processing unit 36 for processing to speed up the image processing time. After the processing is completed, the graphics processing unit 36 returns the image to the central processing unit 34, and finally the central processing unit 34 outputs the reconstructed three-dimensional model.

在上述之實施例中，中央處理器34以及圖形處理器36係內建在一電子裝置38中，此電子裝置38可以是一個人電腦、一筆記型電腦或一平板電腦等，但本案不以此為限。在一實施例中，電子裝置38係使用中央處理器34來進行運算，除此之外，在其他實施例中，亦可以選擇使用嵌入式控制器（embedded controller，EC）、微處理器（Microprocessor）、數位訊號處理器（digital signal processor，DSP）、特定應用積體電路（Application-specific Integrated Circuit，ASIC）、系統單晶片（System on a Chip，SOC）或是其他的類似元件或組合等，但本案不以此為限。In the above-mentioned embodiment, the CPU 34 and the GPU 36 are built into an electronic device 38, and the electronic device 38 can be a personal computer, a laptop or a tablet computer, etc., but the present invention is not limited thereto. In one embodiment, the electronic device 38 uses the CPU 34 to perform calculations. In addition, in other embodiments, an embedded controller (EC), a microprocessor (Microprocessor), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a system on a chip (SOC) or other similar components or combinations can also be selected, but the present invention is not limited thereto.

在一實施例中，影像擷取裝置32可以是單色相機或彩色相機、立體相機、數位相機、數位攝影機、深度攝影機，或是其他能夠擷取影像的電子設備。In one embodiment, the image capture device 32 may be a monochrome camera or a color camera, a stereo camera, a digital camera, a digital camera, a depth camera, or other electronic equipment capable of capturing images.

綜上所述，本案係為一種三維表面重建方法，其係利用轉換器（Transformer）模型網路架構能夠很好學習全局長程關係（global long range dependency）的特性，取代習知分別估計不同視角深度圖的方法，同時參考不同視角的影像來更新體積特徵（volumetric representation），以達到改善物體表面細節的重建結果。因此，本案可以重建不同視角下的物體表面，並有效改善習知表面重建缺陷之問題。In summary, this case is a 3D surface reconstruction method that uses the Transformer model network architecture to learn the characteristics of global long range dependency, replacing the learned method of estimating depth maps at different view angles, and at the same time refers to images at different view angles to update the volumetric representation to achieve improved reconstruction results of object surface details. Therefore, this case can reconstruct the surface of objects at different view angles and effectively improve the problem of learned surface reconstruction defects.

以上所述的實施例僅係為說明本案的技術思想及特點，其目的在使熟悉此項技術者能夠瞭解本案的內容並據以實施，當不能以之限定本案的專利範圍，即大凡依本案所揭示的精神所作的均等變化或修飾，仍應涵蓋在本案的申請專利範圍內。The embodiments described above are only for illustrating the technical ideas and features of this case. Their purpose is to enable those familiar with this technology to understand the content of this case and implement it accordingly. They cannot be used to limit the patent scope of this case. In other words, any equivalent changes or modifications made according to the spirit disclosed in this case should still be covered by the scope of the patent application of this case.

10:轉換器模型10: Converter Model

12:編碼器12: Encoder

14:編碼區塊14: Coding block

16:第一多頭自注意力模組16: The first multi-head self-attention module

18:第一前饋網路模組18: First Feedback Network Module

20:解碼器20:Decoder

22:解碼區塊22: Decoding block

24:第二多頭自注意力模組24: Second multi-head self-attention module

26:編解碼注意力模組26: Encoding and decoding attention module

28:第二前饋網路模組28: Second Feedback Network Module

30:三維表面重建系統30: 3D surface reconstruction system

32:影像擷取裝置32: Image capture device

34:中央處理器34:Central Processing Unit

36:圖形處理器36: Graphics Processor

38:電子裝置38: Electronic devices

S10~S22:步驟S10~S22: Steps

圖1為根據本案一實施例之三維表面重建方法的流程示意圖。圖2為根據本案一實施例之三維表面重建方法使用之轉換器模型的架構示意圖。圖3為根據本案一實施例之三維表面重建系統的方塊示意圖。 FIG. 1 is a schematic diagram of the process of a three-dimensional surface reconstruction method according to an embodiment of the present invention. FIG. 2 is a schematic diagram of the structure of a converter model used in a three-dimensional surface reconstruction method according to an embodiment of the present invention. FIG. 3 is a block diagram of a three-dimensional surface reconstruction system according to an embodiment of the present invention.

S10~S22:步驟 S10~S22: Steps

Claims

A three-dimensional surface reconstruction method comprises: extracting a plurality of visual features from a plurality of multi-view images of an object using a convolutional neural network feature extractor and obtaining a plurality of camera posture information; converting the camera posture information into a plurality of posture embedding information through a multi-layer sensor; performing addition operations on the visual features and the posture embedding information through a central processing unit to generate a plurality of input information; inputting the input information Into an encoder of a converter model, several layers of sequence-to-sequence first attention operations are sequentially performed to output multiple volume features; the volume features are input into a decoder of the converter model, several layers of second attention operations are sequentially performed to generate a feature prediction result; the operation dimension of the feature prediction result is mapped into a one-dimensional feature dimension; and a three-dimensional model of the object is reconstructed according to the feature dimension.

The three-dimensional surface reconstruction method as described in claim 1, wherein the multi-view images are obtained by photographing the object at each view angle using an image capture device.

A three-dimensional surface reconstruction method as described in claim 1, wherein the camera posture information includes a camera position coordinate and a camera shooting angle.

A three-dimensional surface reconstruction method as described in claim 1, wherein the encoder comprises a plurality of encoding blocks, each of which further comprises a first multi-head self-attention module and a first feedforward network module, so as to convert each of the input information into three feature vectors and perform the first attention operation through the first multi-head self-attention module, and then enhance the details through the first feedforward network module to output the volume features.

A three-dimensional surface reconstruction method as described in claim 1, wherein the decoder comprises a plurality of decoding blocks, each of which further comprises a second multi-head self-attention module, a coding-decoding attention module and a second feedforward network module, so that the feature function output by the upper layer is converted into three feature vectors through the second multi-head self-attention module and the second attention operation is performed to output a first feature vector, and the coding-decoding attention module generates a second feature vector and a third feature vector according to the volume features, and the second attention operation is performed according to the first feature vector, the second feature vector and the third feature vector, and the details are enhanced through the second feedforward network module to output the feature prediction result.

A three-dimensional surface reconstruction method as described in claim 1, wherein the feature prediction result is mapped into the feature dimension using a truncated signed distance function mapper.

A three-dimensional surface reconstruction method as described in claim 6, wherein the three-dimensional model is expressed in a truncated signed distance function format.

A three-dimensional surface reconstruction method as described in claim 1, wherein the volume feature includes a depth, a texture and a spatial information of the object.