TWI748426B - Method, system and computer program product for generating depth maps of monocular video frames - Google Patents
Method, system and computer program product for generating depth maps of monocular video frames Download PDFInfo
- Publication number
- TWI748426B TWI748426B TW109114069A TW109114069A TWI748426B TW I748426 B TWI748426 B TW I748426B TW 109114069 A TW109114069 A TW 109114069A TW 109114069 A TW109114069 A TW 109114069A TW I748426 B TWI748426 B TW I748426B
- Authority
- TW
- Taiwan
- Prior art keywords
- feature
- depth map
- converter
- maps
- pictures
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 23
- 238000004590 computer program Methods 0.000 title claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 20
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 abstract 2
- 238000010586 diagram Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 8
- 238000013527 convolutional neural network Methods 0.000 description 7
- 238000011176 pooling Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 101000582320 Homo sapiens Neurogenic differentiation factor 6 Proteins 0.000 description 1
- 102100030589 Neurogenic differentiation factor 6 Human genes 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
Description
本發明是有關於一種將二維(two dimensional,2D)影像轉換為三維(three dimensional,3D)影像的技術。 The present invention relates to a technology for converting two dimensional (2D) images into three dimensional (3D) images.
近年來隨著3D顯示技術的蓬勃發展,豐富的3D影像內容生成成為一個重要的課題。目前我們有豐富2D影片,然而3D影像的資源在現今情況卻仍然非常缺乏,乃是因為3D影像的製作需要多個相機拍攝,不僅需要複雜的校正,拍攝困難且勞累外,多個相機的花費也非常昂貴。因此,將現有2D影像轉為3D影像應為一個最佳解。現有2D生成3D影像之技術乃是由人操作,利用彩圖RGB像素變化較大前後,選為關鍵畫面,再利用切圖的後製工具,生成關鍵畫面影格之深度圖後,再依據前後線索,基於區塊運動方向、區塊匹配的方式,內插求得前後影格間的所有深度圖。然而,基於像素差值選定之關鍵畫面,利用區塊的運動或匹配預測,會受限於前後畫面的大量移動 影響,當運動量過大時,會造成選定大量關鍵畫面與嚴重的預測錯誤。因此,現有技術為了避免這項問題,後製時所選擇的前後畫面不能間格過長,需要以人工標記的數量約為整體畫面的30%~50%不等(即假設每秒有30張畫面,每秒必須人工標記9~15張的畫面),此乃為巨大的人力成本與消耗。除此之外,像素的匹配及運動的預測方式無周圍資訊,效果不佳。而區塊的預測方式則容易在畫面有梯度變化與物件形變區域產生錯誤。 With the vigorous development of 3D display technology in recent years, the generation of rich 3D image content has become an important issue. Currently we have a wealth of 2D movies, but the resources of 3D images are still very scarce in today's situation. It is because the production of 3D images requires multiple cameras to shoot, which not only requires complex corrections, but also difficult and tiring shooting, and the cost of multiple cameras. It is also very expensive. Therefore, converting existing 2D images to 3D images should be the best solution. The existing 2D 3D image generation technology is operated by humans, using the color image RGB pixels before and after the large change, select the key image, and then use the post-production tool of cutting the image to generate the depth map of the key image frame, and then follow the clues before and after. , Based on the block movement direction and the block matching method, all the depth maps between the front and rear frames are obtained by interpolation. However, the key picture selected based on the pixel difference, using block motion or matching prediction, will be limited by the large amount of movement of the front and rear pictures Impact, when the amount of exercise is too large, it will cause a large number of key pictures to be selected and serious prediction errors. Therefore, in order to avoid this problem in the prior art, the front and rear frames selected during post-production cannot be too long, and the number of manual markings is about 30% to 50% of the overall picture (that is, assuming 30 frames per second) Picture, 9~15 pictures must be manually marked every second), which is a huge labor cost and consumption. In addition, the pixel matching and motion prediction methods have no surrounding information, and the effect is not good. The block prediction method is prone to errors in areas where the screen has gradient changes and object deformations.
本發明的實施例提出一種單視角影像深度圖序列生成方法,包括:取得多張畫面,並根據特徵轉換器將每一張畫面轉換為特徵圖;對特徵圖執行非監督式分群演算法以將特徵圖分為多個群組,並取得群組的特徵圖群心所對應的畫面做為多張關鍵畫面;提供一使用者介面以取得關鍵畫面的多張深度圖;根據特徵轉換器來初始化一深度圖生成網路,並根據關鍵畫面與深度圖來訓練深度圖生成網路;以及將畫面中除了關鍵畫面的其他畫面輸入至深度圖生成網路以計算出對應的深度圖。 The embodiment of the present invention proposes a method for generating a single-view image depth map sequence, which includes: obtaining multiple pictures, and converting each picture into a feature map according to a feature converter; The feature maps are divided into multiple groups, and the images corresponding to the group's feature maps are obtained as multiple key images; a user interface is provided to obtain multiple depth maps of the key images; initialize according to the feature converter A depth map generation network, and train the depth map generation network according to the key pictures and the depth map; and input the other pictures in the picture except the key pictures to the depth map generation network to calculate the corresponding depth map.
在一些實施例中,單視角影像深度圖序列生成方法還包括:根據畫面訓練自編碼器,此自編碼器包括特徵轉換器與特徵逆轉換器,特徵轉換器與特徵逆轉換器分別包括對應的神經網路。 In some embodiments, the single-view image depth map sequence generation method further includes: training an autoencoder according to the picture. The autoencoder includes a feature converter and a feature inverse converter. The feature converter and the feature inverse converter respectively include corresponding Neural network.
在一些實施例中,單視角影像深度圖序列生成方法 還包括:設定多個候選群數目,對於每一個候選群數目都根據特徵圖執行非監督式分群演算法以計算輪廓係數;以及將最大輪廓係數所對應的候選群數目設定為非監督式分群演算法的群數目,此群數目相同於關鍵畫面的數目。 In some embodiments, the single-view image depth map sequence generation method It also includes: setting the number of multiple candidate groups, for each number of candidate groups, performing an unsupervised clustering algorithm according to the feature map to calculate the contour coefficient; and setting the number of candidate groups corresponding to the maximum contour coefficient to the unsupervised grouping calculation The number of groups of the law, the number of groups is the same as the number of key pictures.
在一些實施例中,上述的非監督式分群演算法為k均值演算法。 In some embodiments, the aforementioned unsupervised clustering algorithm is a k-means algorithm.
在一些實施例中,深度圖生成網路包括時空相似運算,此時空相似運算用以接收主要特徵圖與時序上相鄰於主要特徵圖的多張輔助特徵圖。對於主要特徵圖中的主要特徵點,時空相似運算取得輔助特徵圖中一預設範圍內的多個輔助特徵點,並計算主要特徵點與輔助特徵點之間的相似度以補償主要特徵點。 In some embodiments, the depth map generation network includes a time-space similarity operation, and the time-space similarity operation is used to receive the main feature map and multiple auxiliary feature maps adjacent to the main feature map in time sequence. For the main feature points in the main feature map, the space-time similarity operation obtains multiple auxiliary feature points in a preset range in the auxiliary feature map, and calculates the similarity between the main feature points and the auxiliary feature points to compensate the main feature points.
以另一個角度來說,本發明的實施例還提出一種電腦程式產品,由電腦系統載入並執行以完成上述的單視角影像深度圖序列生成方法。 From another perspective, the embodiment of the present invention also provides a computer program product, which is loaded and executed by a computer system to complete the above-mentioned single-view image depth map sequence generation method.
以另一個角度來說,本發明的實施例還提出一種單視角影像序列深度生成系統,包括記憶體與處理器。記憶體用以儲存多個指令,處理器用以執行這些指令以完成上述的單視角影像深度序列生成方法。 From another perspective, the embodiment of the present invention also provides a single-view image sequence depth generation system, including a memory and a processor. The memory is used to store a plurality of instructions, and the processor is used to execute these instructions to complete the above-mentioned single-view image depth sequence generation method.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。 In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.
100:單視角影像深度圖序列生成系統 100: Single-view image depth map sequence generation system
110:處理器 110: processor
120:記憶體 120: memory
200:影像序列 200: Image sequence
201,202:選取畫面 201, 202: Select screen
310:關鍵畫面萃取系統 310: Key Picture Extraction System
320,330:步驟 320,330: steps
311:關鍵畫面 311: Key Picture
312:畫面 312: Screen
321:深度圖 321: Depth Map
340:深度圖生成網路 340: Depth Map Generation Network
341:深度圖 341: Depth Map
410:特徵轉換器 410: feature converter
420:特徵逆轉換器 420: feature inverse converter
421:影像序列 421: Image Sequence
510:特徵圖 510: feature map
511:特徵圖群心 511: Feature Map Group Heart
520:步驟 520: step
610:特徵轉換器 610: feature converter
620:深度產生器 620: Depth Generator
701~703:輸入畫面 701~703: Input screen
710:特徵擷取卷積神經網路 710: Feature Extraction Convolutional Neural Network
711~713:特徵圖 711~713: feature map
714:特徵點 714: Feature Points
715:預設範圍 715: preset range
720:時空相似運算 720: Time and Space Similarity Operations
730:卷積神經網路 730: Convolutional Neural Network
[圖1]是根據一實施例繪示單視角影像深度圖序列生成系統的示意圖。 [Fig. 1] is a schematic diagram showing a single-view image depth map sequence generating system according to an embodiment.
[圖2]是根據一實施例繪示影像序列中多張畫面的示意圖。 [Fig. 2] is a schematic diagram showing multiple frames in an image sequence according to an embodiment.
[圖3]是根據一實施例繪示單視角影像深度圖序列生成方法的流程示意圖。 [Fig. 3] is a schematic flowchart of a method for generating a single-view image depth map sequence according to an embodiment.
[圖4]是根據一實施例繪示自編碼器的示意圖。 [Fig. 4] is a schematic diagram of a self-encoder according to an embodiment.
[圖5]是根據一實施例繪示非監督式分群演算法的示意圖。 [Fig. 5] is a schematic diagram showing an unsupervised grouping algorithm according to an embodiment.
[圖6]是根據一實施例繪示深度圖生成網路的示意圖。 [Fig. 6] is a schematic diagram showing a depth map generation network according to an embodiment.
[圖7]是根據一實施例繪示時空相似運算的示意圖。 [Fig. 7] is a schematic diagram illustrating a space-time similarity operation according to an embodiment.
[圖8]是根據一實施例繪示深度圖產生結果的示意圖。 [Fig. 8] is a schematic diagram showing the result of generating a depth map according to an embodiment.
圖1是根據一實施例繪示單視角影像深度圖序列生成系統的示意圖。請參照圖1,單視角影像深度圖序列生成系統100可以是智慧型手機、平板電腦、個人電腦、筆記型電腦、伺服器、工業電腦或具有計算能力的各種電子裝置等,本揭露並不在此限。單視角影像深度圖序列生成系統100包括了處理器110與記憶體120,處理器110電性連接至記憶體120。處理器110可為中央處理器(CPU)、圖形處理器(GPU)、微處理器、微控制器、數位信號處理器、影像處理晶片、特殊應用積體電路等,記憶體120可為揮發性記憶體或非揮發性記憶體。記憶體120中儲存有多個指令,處理器110會執行這些指令以完成一單視角影像深度圖序列生成方法,此方法是用以產生一影
像序列對應畫面的深度圖序列。舉例來說,請參照圖2,圖2是根據一實施例繪示影像序列中多張畫面的示意圖。在此實施例中影像序列200的內容為動畫,但在其他實施例中也可以自然場景,本揭露並不限制影像序列200的內容。在此實施例中每張畫面為彩色的,也就是包括了紅色、綠色與藍色等通道,但在其他實施例中每張畫面也可以為灰階的。
FIG. 1 is a schematic diagram of a single-view image depth map sequence generating system according to an embodiment. Please refer to FIG. 1, the single-view image depth map
圖3是根據一實施例繪示單視角影像深度圖序列生成方法的流程示意圖。請參照圖3,關鍵畫面萃取系統310是一個軟體模組,用以從影像序列200中取得少數關鍵畫面311,這些關鍵畫面311為影像序列200中最具特徵性的畫面,關鍵畫面萃取系統310如何取得關鍵畫面311將在後續段落說明。使用者可依照所提供的使用者介面利用步驟320標記這些關鍵畫面311的對應深度值,藉此取得每張關鍵畫面的深度圖321。在一些實施例中,使用者介面用以讓使用者決定那些物件在前,那些物件在後,或者讓使用者決定深度值,然而本揭露並不限制使用者介面的內容。在步驟330,根據關鍵畫面311與其深度圖321訓練一深度圖生成網路340。然後可以將影像序列200中除了關鍵畫面311以外的畫面312輸入至深度圖生成網路340來產生對應的深度圖341。以下將先說明關鍵畫面萃取系統310。
FIG. 3 is a schematic flowchart of a method for generating a single-view image depth map sequence according to an embodiment. 3, the key
請參照圖4,關鍵畫面萃取系統310包含一特徵轉換器410與其對應的特徵逆轉換器420。特徵轉換器
410將影像序列200中的每一張畫面壓縮為較低維度的特徵圖510,此特徵圖510可透過特徵逆轉換器420來解壓縮以重建影像序列421。特徵轉換器410與特徵逆轉換器420都包括了各自的神經網路,兩者也可以合稱為一個自編碼器(auto encoder)。在此實施例中,影像序列200中每張畫面的高度為H,寬度為W,其中H與W為正整數,每張畫面都包括了紅色、綠色與藍色等3個通道。特徵轉換器410包括了7層卷積層與7層池化層,舉例來說,圖4中的“Conv.Block_1”便代表一層卷積層與一層池化層,以此類推,圖4中也繪示了各個特徵圖的大小,例如“”便表示特徵圖的高度為,寬度為,通道數為32。經過這些卷積層與池化層的計算以後可得到高度為H/128、寬度為/128,通道數為512的特徵圖510以輸出至特徵逆轉換器420。特徵逆轉換器420是要根據這些特徵圖510重建出高度為H,寬度為W的重建影像序列421,這些影像序列421中的畫面應該要近似於影像序列200中原本的畫面,因此計算對應兩張畫面之間的損失(loss)可以更新自編碼器中特徵轉換器410與特徵逆轉換器420的權重,當影像序列421中的畫面非常接近或相同於影像序列200中的畫面時便表示訓練完畢。然而,本領域具有通常知識者當可理解自編碼器,在此不再詳細贅述。值得注意的是,在此是根據影像序列200中的畫面來訓練自編碼器,但這個訓練是非監督式的,並不需要使用者介入。此外,圖4中關於7層卷積神經網路的架構僅是
範例,在其他實施例中可以採用其他層數及其他架構的卷積神經網路。
4, the key
圖5是根據一實施例繪示非監督式分群演算法的示意圖。請參照圖4與圖5,在訓練完自編碼器以後,可從自編碼器中取得特徵轉換器410,然後根據此特徵轉換器410將影像序列200中的每一張畫面轉換為對應的特徵圖510。接下來在步驟520,對特徵圖510執行非監督式分群演算法,取得特徵圖群心511,其所對應的畫面選為關鍵畫面311。舉例來說,上述的非監督式分群演算法為k均值(k-means)演算法,此演算法可將特徵圖510分為多個群組,圖5中每個粒子都是對應至一張畫面的特徵圖,每個群組都具有一個特徵圖群心511,這些特徵圖群心511所對應的畫面可以當作是關鍵畫面311。
Fig. 5 is a schematic diagram illustrating an unsupervised grouping algorithm according to an embodiment. 4 and 5, after the self-encoder is trained, a
在一些實施例中,上述非監督式分群演算法中的群數目(即k均值演算法中的正整數k)也可自動計算出。具體來說,可先設定多個候選群數目,例如從2至N,N為任意合適的正整數。每個候選群數目都可以當作是上述的正整數k來執行k均值演算法,根據分群結果可以計算出一輪廓係數(Silhouette Coefficient)。此輪廓係數的計算如以下數學式1~3。 In some embodiments, the number of groups in the above-mentioned unsupervised grouping algorithm (that is, the positive integer k in the k-means algorithm) can also be automatically calculated. Specifically, the number of multiple candidate groups can be set first, for example, from 2 to N, where N is any suitable positive integer. The number of each candidate group can be regarded as the above-mentioned positive integer k to perform the k-means algorithm, and a silhouette coefficient (Silhouette Coefficient) can be calculated according to the grouping result. The calculation of the contour coefficient is as shown in the following mathematical formulas 1~3.
[數學式2]
其中i、j為正整數。d(i,j)代表第i張特徵圖與第j張特徵圖之間的距離,例如為尤拉距離。C i 代表第i張特徵圖所屬的群組,C k 代表與i不同群組之所有特徵點之集合。數學式2表簇內相似度,即簇內特徵點之間之平均距離;數學式3表簇間相似度,即不同簇之特徵點間之平均距離,對於每張特徵圖都執行上述的數學式1~3。接著計算所有s(i)的平均以做為輪廓係數,輪廓係數越大表示分群的效果越好,因此可以把最大輪廓係數所對應的最佳候選群數目設定為k值,此k值也稱為非監督式分群演算法的群數目,此群數目便相同於關鍵畫面311的數目。
Among them, i and j are positive integers. d ( i , j ) represents the distance between the i-th feature map and the j-th feature map, for example, the Euler distance. C i represents the group to which the i-th feature map belongs, and C k represents the set of all the feature points in groups different from i. Mathematical formula 2 shows the similarity within clusters, that is, the average distance between the feature points in the cluster; Mathematical formula 3 shows the similarity between clusters, that is the average distance between the feature points of different clusters, and performs the above-mentioned mathematics for each feature map Formula 1~3. Then calculate the average of all s ( i ) as the contour coefficient. The larger the contour coefficient, the better the grouping effect. Therefore, the optimal number of candidate groups corresponding to the largest contour coefficient can be set to the value of k, which is also called It is the number of groups of the unsupervised grouping algorithm, and the number of groups is the same as the number of
請參照圖2,其中選取畫面201、202便是關鍵畫面,可以看出影像序列200中主要包含了兩個角色,選取畫面201是一個角色的場景,而選取畫面202是兩個角色都在的場景,根據選取畫面201、202就足夠描述場景中的景深關係。
Please refer to Figure 2, where the selected
請參照回圖3,在取得關鍵畫面311以後,便可以透過使用者介面來取得關鍵畫面的深度圖321。以下將詳細說明深度圖生成網路340。圖6是根據一實施例繪示深度圖生成網路的示意圖。請參照圖6,深度圖生成網路340包括了特徵轉換器610與深度產生器620。值得注意
的是特徵轉換器610為圖4的特徵轉換器410的一部份,特徵轉換器410中前5個特徵圖與特徵轉換器610的5個特徵圖的寬度、高度與通道數是相同的。因此,在一些實施例中可以根據特徵轉換器410的網路參數來初始化特徵轉換器610的網路參數,也就是初始設定相同的網路參數。在初始化以後,便可以根據關鍵畫面311與其深度圖321來訓練深度圖生成網路340。
Referring back to FIG. 3, after obtaining the
值得注意的是,在圖6中,“Conv_Block_1”等代表著卷積層與池化層,但“Conv_Block_3 Spatial-Temporal Similarity Block”則表示除了卷積層與池化層以外還採用一時空相似運算,此時空相似運算是用以根據時序上相鄰畫面的相似性來補償當前的畫面,因此共有T張畫面會輸入至深度圖生成網路340,其中T為正整數,例如為3。這T張畫面都會經過卷積層與池化層的運算以得到T個特徵圖。在時空相似運算中,當要處理一張特徵圖的一主要特徵點時會取得主要特徵圖與相鄰特徵圖中一預設範圍的輔助特徵點來補償當前的特徵點,如此一來可以考慮前後時序資訊,這會讓最終產生的深度圖在時序上更有一致性,可增加精確性及去除畫面跳動現象。具體來說,請參照圖7,圖7是根據一實施例繪示時空相似運算的示意圖。不論是在訓練階段或是推論階段,T張畫面701~703是時序上相鄰的畫面,中間畫面702是當前要訓練(或預測)的主要畫面,而畫面701、703是相鄰的輔助畫面。這些畫面701~703會經過特徵擷取
卷積神經網路710,例如為圖6中的“Conv._Block_1”以及“Conv._Block_2”,之後會得到對應的特徵圖711~713,其中特徵圖712是當前要處理的主要特徵圖,特徵圖711、713則稱為輔助特徵圖。時空相似運算720會接收特徵圖711~713,對於主要特徵圖712中的一個特徵點714會取得主要特徵圖712以及輔助特徵圖711、713中預設範圍715內的多個輔助特徵點,並計算特徵點714與這些特徵點之間的相似度以補償特徵點714。本揭露並不限制預設範圍715的大小。上述補償的運算可以表示為以下數學式4~7。
It is worth noting that in Figure 6, "Conv_Block_1" and so on represent the convolutional layer and the pooling layer, but "Conv_Block_3 Spatial-Temporal Similarity Block" means that in addition to the convolutional layer and the pooling layer, a temporal and spatial similarity operation is also used. The spatio-temporal similarity operation is used to compensate the current frame based on the similarity of adjacent frames in time series. Therefore, a total of T frames will be input to the depth
[數學式4]z i =h(y i )+x i [Math 4] z i = h ( y i )+ x i
其中x i 表示特徵點714,z i 是經過增加補償後的特徵圖,y i 為相似加權特徵圖。h(y i )及g(x i )可視為分別對相似加權特徵圖y i 及x i 降維與升維的神經網路非線性轉換,它們亦可為對特徵圖做一般線性轉換,如h(y i )=W z y i 及g(x j )=W g x j ,權重矩陣W g 與W z 權重矩陣,本發明
不限制是非線性神經網路或是線性。f(x i ,x j )是特徵圖的相似度加權值,C(x)為加權後之正規化值。上述數學式7是要計算兩個特徵點之間的相似度,但數學式7僅是範例,在其他實施例中也可以採用其他相似度。
Where x i represents the
A表示特徵圖711、712、713中的預設範圍715內所有特徵點所形成的集合,x j 表示在預設範圍715內的特徵點。其中h(y i )及g(x i )為非線性或線性降維與升維轉換,可用於降低特徵點相似度計算之複雜度,在一些實施例中,此處亦可省略。對於連續影像序列,因應其畫面場景背景雷同,在此實施例中只計算預設範圍715中特徵點的相似度,能有效減少場景背景之冗餘特徵干擾,相較於習知技術來說除了可以減少計算量,亦可以提升預測準確度以及時序上一致性。
A represents the set formed by all the feature points in the
在主要特徵圖712中每個特徵點都經過補償以後,補償後的特徵點會結合原先未補償之特徵並傳送至後續的卷積神經網路730,例如圖6中“Conv._Block_5”的卷積層與池化層,此處不限特徵結合方式,任何相加、串聯、相乘等特徵結合皆可使用。此時空相似運算可以加入至特徵轉換器610中的任何一個卷積層,在圖6的實施例中有兩層卷積層採用了時空相似運算,但在其他實施例中也可以在更多或更少的卷積層中採用時空相似運算。
After each feature point in the
為加速上述做法,於時空相似運算後可釋放輔助畫面之特徵圖,此操作在圖6中標示為的“Discard
Auxiliary Frames”。因此在“Conv.Block_5”的卷積層與之後的卷積層都不再需要輔助畫面的特徵圖。而後經時空相似運算補償後之特徵圖會經過深度產生器620的轉換並生成主要畫面之深度圖。
In order to speed up the above method, the feature map of the auxiliary screen can be released after the space-time similarity calculation. This operation is marked as "Discard" in Figure 6
Auxiliary Frames". Therefore, the convolutional layer of "Conv.Block_5" and the subsequent convolutional layers no longer need the feature map of the auxiliary picture. Then the feature map after compensation by the space-time similarity calculation will be converted by the
根據上述的方法可訓練出深度圖生成網路340並用此深度圖生成網路340來產生除了關鍵畫面以外其他畫面的深度圖,這些結果如圖8所示。在此實施例中,使用者只要標記關鍵畫面的深度圖,經實際測試只需要標記約1%~3%的畫面,相較於習知技術可以大幅降低人工成本,且品質也優於已知的其他方法。這些產生的深度圖可以用於合成立體或多視角3D影片,可採用一般的深度影像繪圖(depth image based rendering,DIBR)演算法,本揭露並不限制如何合成立體或多視角3D影片。
According to the above-mentioned method, a depth
本揭露的貢獻之一為利用卷積神經網路搜尋關鍵畫面,藉此從影像序列中獲得能表示所有呈現物件之最具代表性的畫面,其他畫面可視為基於關鍵畫面的些微變形。本技術為了能提升分群的強健性,首先透過事先訓練好特徵轉換器,將畫面投射至較低的空間維度及較多的通道特徵數,能更有效率的實現關鍵畫面選取。 One of the contributions of this disclosure is to use convolutional neural networks to search for key images, thereby obtaining the most representative images from the image sequence that can represent all the presented objects, and other images can be regarded as slight distortions based on the key images. In order to improve the robustness of clustering, this technology first trains the feature converter in advance to project the image to a lower spatial dimension and a larger number of channel features, so that key image selection can be achieved more efficiently.
以另外一個角度來說,本發明也提出了一電腦程式產品,此產品可由任意的程式語言及/或平台所撰寫,當此電腦程式產品被載入至電腦系統並執行時,可執行上述的單視角影像深度圖序列生成方法。 From another perspective, the present invention also proposes a computer program product, which can be written in any programming language and/or platform. When the computer program product is loaded into the computer system and executed, it can execute the above Single-view image depth map sequence generation method.
雖然本發明已以實施例揭露如上,然其並非用以限 定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed as above in the embodiments, it is not intended to limit Anyone with ordinary knowledge in the technical field of the present invention can make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be defined by the scope of the attached patent application. Prevail.
200:影像序列 200: Image sequence
310:關鍵畫面萃取系統 310: Key Picture Extraction System
320,330:步驟 320,330: steps
311:關鍵畫面 311: Key Picture
312:畫面 312: Screen
321,341:深度圖 321,341: Depth Map
340:深度圖生成網路 340: Depth Map Generation Network
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109114069A TWI748426B (en) | 2020-04-27 | 2020-04-27 | Method, system and computer program product for generating depth maps of monocular video frames |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109114069A TWI748426B (en) | 2020-04-27 | 2020-04-27 | Method, system and computer program product for generating depth maps of monocular video frames |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202141973A TW202141973A (en) | 2021-11-01 |
| TWI748426B true TWI748426B (en) | 2021-12-01 |
Family
ID=80680922
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW109114069A TWI748426B (en) | 2020-04-27 | 2020-04-27 | Method, system and computer program product for generating depth maps of monocular video frames |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI748426B (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101400001A (en) * | 2008-11-03 | 2009-04-01 | 清华大学 | Generation method and system for video frame depth chart |
| CN102196292A (en) * | 2011-06-24 | 2011-09-21 | 清华大学 | Human-computer-interaction-based video depth map sequence generation method and system |
| US20190332942A1 (en) * | 2016-12-29 | 2019-10-31 | Zhejiang Gongshang University | Method for generating spatial-temporally consistent depth map sequences based on convolution neural networks |
| WO2020032354A1 (en) * | 2018-08-06 | 2020-02-13 | Samsung Electronics Co., Ltd. | Method, storage medium and apparatus for converting 2d picture set to 3d model |
-
2020
- 2020-04-27 TW TW109114069A patent/TWI748426B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101400001A (en) * | 2008-11-03 | 2009-04-01 | 清华大学 | Generation method and system for video frame depth chart |
| CN102196292A (en) * | 2011-06-24 | 2011-09-21 | 清华大学 | Human-computer-interaction-based video depth map sequence generation method and system |
| US20190332942A1 (en) * | 2016-12-29 | 2019-10-31 | Zhejiang Gongshang University | Method for generating spatial-temporally consistent depth map sequences based on convolution neural networks |
| WO2020032354A1 (en) * | 2018-08-06 | 2020-02-13 | Samsung Electronics Co., Ltd. | Method, storage medium and apparatus for converting 2d picture set to 3d model |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202141973A (en) | 2021-11-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109377530B (en) | A Binocular Depth Estimation Method Based on Deep Neural Network | |
| CN109671023B (en) | A Super-resolution Reconstruction Method of Face Image | |
| US10542249B2 (en) | Stereoscopic video generation method based on 3D convolution neural network | |
| CN112543317B (en) | Method for converting high-resolution monocular 2D video into binocular 3D video | |
| CN106504190B (en) | A Stereoscopic Video Generation Method Based on 3D Convolutional Neural Network | |
| CN111835983B (en) | A method and system for multi-exposure high dynamic range imaging based on generative adversarial network | |
| CN111798400A (en) | Reference-free low-light image enhancement method and system based on generative adversarial network | |
| CN109509248B (en) | A Neural Network-Based Photon Mapping Rendering Method and System | |
| CN112750201B (en) | Three-dimensional reconstruction method, related device and equipment | |
| Chang et al. | Vornet: Spatio-temporally consistent video inpainting for object removal | |
| CN110689599A (en) | 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement | |
| CN114170286A (en) | Monocular depth estimation method based on unsupervised depth learning | |
| CN113298821B (en) | A superpixel matting method based on Nystrom spectral clustering | |
| CN109218706B (en) | A method for generating stereoscopic vision images from a single image | |
| TWI836972B (en) | Underwater image enhancement method and image processing system using the same | |
| Li et al. | A real-time high-quality complete system for depth image-based rendering on FPGA | |
| CN116912114B (en) | Non-reference low-illumination image enhancement method based on high-order curve iteration | |
| CN107067452A (en) | A kind of film 2D based on full convolutional neural networks turns 3D methods | |
| WO2021057091A1 (en) | Viewpoint image processing method and related device | |
| CN116721018B (en) | Image super-resolution reconstruction method for generating countermeasure network based on intensive residual error connection | |
| CN112907641B (en) | Multi-view depth estimation method based on detail information retention | |
| CN116723305B (en) | Virtual viewpoint quality enhancement method based on generation type countermeasure network | |
| CN102223545A (en) | Rapid multi-view video color correction method | |
| CN110211090B (en) | A method for evaluating the quality of perspective composite images | |
| TWI748426B (en) | Method, system and computer program product for generating depth maps of monocular video frames |