[go: up one dir, main page]

TWI887115B - Temporal assistant module - Google Patents

Temporal assistant module Download PDF

Info

Publication number
TWI887115B
TWI887115B TW113134609A TW113134609A TWI887115B TW I887115 B TWI887115 B TW I887115B TW 113134609 A TW113134609 A TW 113134609A TW 113134609 A TW113134609 A TW 113134609A TW I887115 B TWI887115 B TW I887115B
Authority
TW
Taiwan
Prior art keywords
layer
module
output
information
temporal
Prior art date
Application number
TW113134609A
Other languages
Chinese (zh)
Inventor
陳修志
陳彥霖
邱奕凱
黃志勝
Original Assignee
國立臺北科技大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立臺北科技大學 filed Critical 國立臺北科技大學
Priority to TW113134609A priority Critical patent/TWI887115B/en
Application granted granted Critical
Publication of TWI887115B publication Critical patent/TWI887115B/en

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本發明是一種時序特徵的輔助模組,用於單目3D 物件偵測,其中經由該時序特徵的輔助模組(temporal assistant module),分別調整該遞迴神經網路模組(Recurrent Neural Networks module)、該長短期記憶模組(Long Short-Term Memory module, LSTM module)、以及該閘門循環單元模組(Gated Recurrent Unit module, GRU module)的當下時間點的一隱藏狀態資訊(H t)、以及當下時間點的一輸出狀態資訊(Y t),藉以對物件被遮蔽、被移出偵測畫面、或是小物件偵測,增強輔助效果的平均精準度(average precision, AP)。 The present invention is a temporal feature assistant module for monocular 3D object detection, wherein the temporal feature assistant module adjusts the hidden state information (H t ) of the recurrent neural network module, the long short-term memory module (LSTM module), and the gated recurrent unit module (GRU module) at the current time point, and the output state information (Y t ) at the current time point, respectively, so as to enhance the average precision ( AP ) of the assistant effect for detecting objects that are blocked, moved out of the detection screen, or small objects.

Description

時序特徵的輔助模組Timing feature auxiliary module

一種用於物件偵測的模組,特別是用於單目3D 物件偵測的一種時序特徵的輔助模組。A module for object detection, in particular a temporal feature auxiliary module for monocular 3D object detection.

先前技術,如圖1所示,遞迴神經網路的運作流程如圖1 所示,首先,將隱藏狀態(Hidden State)進行初始化,以確保每組序列資料起始值都是相同的。當開始輸入序列(Input Sequence)資料時,將儲存的隱藏狀態與其進行整合, 並透過隱藏層 11( Hidden Layer)進行運算, 從而輸出新的隱藏狀態與輸出(Output Sequence)。這樣的流程將循環進行,以便在每輪的隱藏層中學習到新的特徵資訊,並同時輸出相應的結果資訊。T0時間點的輸出狀態資訊Y T0,T0時間點的輸入狀態資訊X T0,T0時間點的隱藏狀態資訊H T0,T1時間點的輸出狀態資訊Y T1,T1時間點的輸入狀態資訊X T1,T1時間點的隱藏狀態資訊H T1,T2時間點的輸出狀態資訊Y T2,T2時間點的輸入狀態資訊X T2,T2時間點的隱藏狀態資訊H T2Prior art, as shown in Figure 1, the operation process of the recurrent neural network is shown in Figure 1. First, the hidden state is initialized to ensure that the starting value of each set of sequence data is the same. When the sequence (Input Sequence) data begins to be input, the stored hidden state is integrated with it, and the operation is performed through the hidden layer 11 (Hidden Layer), thereby outputting a new hidden state and output (Output Sequence). This process will be repeated so that new feature information can be learned in each round of the hidden layer and the corresponding result information can be output at the same time. Output state information Y T0 at time point T0, input state information X T0 at time point T0, hidden state information H T0 at time point T0, output state information Y T1 at time point T1, input state information X T1 at time point T1, hidden state information H T1 at time point T1, output state information Y T2 at time point T2, input state information X T2 at time point T2, hidden state information H T2 at time point T2 .

先前技術,如圖2所示,先前技術遞迴神經網路模組 31(Recurrent Neural Networks module)隱藏層在整個架構中扮演著重要的腳色,它使得模型能夠整合輸入資訊與前一層的隱藏狀態,圖 2 展示了隱藏層的基本單元,它將隱藏狀態ℎ  與當下時間點的輸入資訊𝑋 進行整合,通過激勵函數(Activation Function)生成輸出資訊𝑌、及新的隱藏狀態ℎ。Prior art, as shown in Figure 2, the prior art recurrent neural network module 31 (Recurrent Neural Networks module) hidden layer plays an important role in the entire architecture. It enables the model to integrate the input information with the hidden state of the previous layer. Figure 2 shows the basic unit of the hidden layer, which integrates the hidden state ℎ with the input information 𝑋 at the current time point, and generates output information 𝑌 and a new hidden state ℎ through the activation function.

先前技術,如圖3所示,先前技術長短期記憶 41 (Long Short-Term Memory, LSTM)是遞迴神經網路的一種改良版本,與傳統的遞迴神經網路相同,在輸入上也是使用序列資料。設計上,為了改善序列資料較長時會導致梯度爆炸與消失的問題,LSTM 在原有的隱藏狀態上另外加入了一個單元狀態(Cell State)與三個由 S 型函數(Sigmoid)所建構的閘門控制單元(Gate) 進行改良,分別為遺忘閘(Forget Gate)、輸入閘(Input Gate)與輸出閘(Output Gate),使 LSTM 可以更好的去學習到時間序列較長的特徵資訊,如圖 3 所示。As shown in Figure 3, the previous technology Long Short-Term Memory (LSTM) is an improved version of the recurrent neural network. Like the traditional recurrent neural network, it also uses sequence data as input. In terms of design, in order to improve the problem of gradient explosion and disappearance when the sequence data is long, LSTM adds a cell state and three gate control units (Gates) constructed by the S-type function (Sigmoid) to the original hidden state for improvement. They are Forget Gate, Input Gate and Output Gate, respectively, so that LSTM can better learn the feature information of longer time series, as shown in Figure 3.

先前技術長短期記憶(Long Short-Term Memory, LSTM),在計算時間點 t 時,首先會將前一時間點的隱藏狀態與當下的特徵資訊進行合併, 並將其分送至三個同的控制單元計算。遺忘閘的部分如(2.1)式所示,它將透過隱藏狀態與當下特徵的結果,產生出一組介於 0 到 1 之間的數值𝐹 ,𝐹 代表著是否要遺忘單元狀態中的資訊,以此方式確保使用的資訊對當下狀態的必要性,同時移除保留過久的資料。輸入閘如(2.2) 式、(2.3)式所示,分別代表此次資料要更新單元狀態的比例𝐼 以及要更新到單元狀態的資訊𝑆 。而輸出閘則如(2.4)式所示,此閘門主要決定了要輸出單元狀態的資料,透過 Sigmoid 函數計算得出要輸出的比例𝑂 。最後,將各個閘的輸出與前一時間點的單元狀態和隱藏狀態進行運算,即可得到 LSTM 所輸出的資訊如(2.5) 式、(2.6)式所示。 𝐹     = 𝜎0𝑊0  ∙ [ℎ  , 𝑋  ] + 𝑏00                           (2.1) 𝐼    = 𝜎(𝑊0  ∙ [ℎ  , 𝑋  ] + 𝑏0 )                            (2.2) 𝑆   = 𝑡𝑎𝑛ℎ(𝑊0  ∙ [ℎ  , 𝑋  ] + 𝑏0 )                         (2.3) 𝑂   = 𝜎(𝑊0  ∙ [ℎ  , 𝑋 ] + 𝑏0)                            (2.4) 𝐶    = 𝐹     ∗ 𝐶   + 𝐼   ∗ 𝑆                                            (2.5) ℎ   = 𝑂   ∗ 𝑡𝑎𝑛ℎ(𝐶 )                                    (2.6) The previous technology Long Short-Term Memory (LSTM) will first merge the hidden state of the previous time point with the current feature information when calculating the time point t, and send it to the three control units for calculation. The forget gate part is shown in equation (2.1). It will generate a set of values 𝐹 between 0 and 1 through the results of the hidden state and the current feature. 𝐹 represents whether to forget the information in the unit state, thereby ensuring the necessity of the information used for the current state, and removing data that has been retained for too long. The input gate is shown in equations (2.2) and (2.3), which respectively represent the proportion 𝐼 of the data to update the unit state and the information 𝑆 to be updated to the unit state. The output gate is shown in equation (2.4). This gate mainly determines the data of the unit state to be output. The proportion 𝑂 to be output is calculated through the Sigmoid function. Finally, the output of each gate is calculated with the unit state and hidden state at the previous time point to obtain the information output by LSTM as shown in equations (2.5) and (2.6). 𝐹 = 𝜎0𝑊0 ∙ [ℎ , 𝑋 ] + 𝑏00 (2.1) 𝐼 = 𝜎(𝑊0 ∙ [ℎ , 𝑋 ] + 𝑏0) (2.2) 𝑆 = 𝑡𝑎𝑛ℎ(𝑊0 ∙ [ℎ , 𝑋0 ] + 𝑏0 )                                     𝑂 = 𝜎(𝑊0 ∙ [ℎ , 𝑋 ] + 𝑏0) (2.4) 𝐶                                                                                                        ℎ = 𝑂 ∗ falcon𝑎𝑛ℎ(𝐶)

先前技術,如圖4所示,先前技術閘門循環單元 51(Gated Recurrent Unit, GRU),它是 LSTM 相當具有代表性的改良版本。GRU 將 LSTM 中的輸入閘與遺忘閘進行調整,並分別稱為更新閘(Update Gate) 與重置閘(Reset Gate),同時將單元狀態與隱藏狀態進行合併,以及省略了輸出閘,因此其架構比傳統的 LSTM 簡潔許多,如圖4 所示。Prior art, as shown in Figure 4, the prior art gated recurrent unit 51 (Gated Recurrent Unit, GRU), is a very representative improved version of LSTM. GRU adjusts the input gate and forget gate in LSTM and calls them update gate and reset gate respectively. At the same time, it merges the unit state with the hidden state and omits the output gate. Therefore, its architecture is much simpler than the traditional LSTM, as shown in Figure 4.

先前技術,如圖4所示,以資料流來說,首先會先將前一次的隱藏狀態與當下輸入的資料進行整合,並送入更新閘中,透過 Sigmoid 函數取得一組 0 到 1 之間的數值𝑍 ,透過此數值決定有多少比例的資料要繼續傳遞下去,如(2.7)式所示。在重置閘的部分,其主要目標是為了決定多少過去的資訊需要被遺忘,與更新閘相同都是透過 Sigmoid 函數取得一組 0 到 1 之間的數值𝑅 ,如(2.8)式所示。在處理當下資料的部分,如(2.9)式,將前次隱藏狀態ℎ  中的資訊與重置閘所計算的𝑅 進行運算,去除掉要遺忘的部分,得出要繼續傳遞下去的資訊𝑂 。最後,將隱藏狀態ℎ 與當下資料𝑂 分別與更新比例𝑍  進行計算,以得到新的隱藏狀態ℎ ,如(2.10)式。 𝑍   = 𝜎(𝑊0  ∙ [ℎ   , 𝑋 ])                                  (2.7) 𝑅   = 𝜎(𝑊0  ∙ [ℎ   , 𝑋 ])                                  (2.8) 𝑂   = 𝑡𝑎𝑛ℎ(𝑊 ∙ [𝑅   ∗ ℎ   , 𝑋 ])                       (2.9) ℎ   = (1 − 𝑍  ) ∗ ℎ  + 𝑍   ∗ 𝑂                           (2.10) In the previous technology, as shown in Figure 4, in terms of data flow, the previous hidden state is first integrated with the current input data and sent to the update gate. A set of values 𝑍 between 0 and 1 is obtained through the Sigmoid function. This value determines how much of the data should be passed on, as shown in formula (2.7). In the reset gate, its main goal is to determine how much past information needs to be forgotten. Like the update gate, a set of values 𝑅 between 0 and 1 is obtained through the Sigmoid function, as shown in formula (2.8). In the part of processing the current data, as shown in formula (2.9), the information in the previous hidden state ℎ  is calculated with the 𝑅 calculated by the reset gate, and the part to be forgotten is removed to obtain the information 𝑂 to be passed on. Finally, the hidden state ℎ  and the current data 𝑂 are calculated with the update ratio 𝑍  to obtain the new hidden state ℎ , as shown in formula (2.10). 𝑍 = 𝜎(𝑊0 ∙ [ℎ , 𝑋 ]) (2.7) 𝑅 = 𝜎(𝑊0 ∙ [ℎ , 𝑋 ]) (2.8) + ℎ = (1 − 𝑍 ) ℎ + 𝑍 ∗ 𝑂 (2.10)

本發明是一種時序特徵的輔助模組,用於單目3D 物件偵測,其中該時序特徵的輔助模組(temporal assistant module),分別與一遞迴神經網路模組(Recurrent Neural Networks module)、一長短期記憶模組(Long Short-Term Memory module, LSTM module)、以及一閘門循環單元模組(Gated Recurrent Unit module, GRU module)至少其中之一連結,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) 是經由該時序特徵的輔助模組(temporal assistant module)處理,該時序特徵的輔助模組(temporal assistant module)包括:一第1二維卷積層,由前一時間點的一隱藏狀態資訊(H t-1)輸入至該第1二維卷積層;一第2二維卷積層,由當下時間點的一輸入狀態資訊(X t)輸入至該第2二維卷積層; 一第1連接層,由該第1二維卷積層輸出至該第1連接層、以及由該第2二維卷積層分別輸出至該第1連接層;以及一第3二維卷積層,由該第1連接層輸出至該第3二維卷積層:其中經由該時序特徵的輔助模組(temporal assistant module),分別調整該遞迴神經網路模組(Recurrent Neural Networks module)、該長短期記憶模組(Long Short-Term Memory module, LSTM module)、以及該閘門循環單元模組(Gated Recurrent Unit module, GRU module)的當下時間點的一隱藏狀態資訊(H t)、以及當下時間點的一輸出狀態資訊(Y t),藉以對物件被遮蔽、被移出偵測畫面、或是小物件偵測,增強輔助效果的平均精準度(average precision, AP)。 The present invention is a temporal feature assistant module for monocular 3D object detection, wherein the temporal feature assistant module is respectively connected to at least one of a recurrent neural network module, a long short-term memory module (LSTM module), and a gated recurrent unit module (GRU module), wherein a video frame processing a spatio-temporal feature map is processed by the temporal feature assistant module, and the temporal feature assistant module includes: a first two-dimensional convolution layer, which is composed of a hidden state information (H t-1 ) is input to the first two-dimensional convolution layer; a second two-dimensional convolution layer is input to the second two-dimensional convolution layer by an input state information (X t ) at a current time point; a first connection layer is output from the first two-dimensional convolution layer to the first connection layer, and from the second two-dimensional convolution layer to the first connection layer; and a third two-dimensional convolution layer is output from the first connection layer to the third two-dimensional convolution layer: wherein the recurrent neural network module (Recurrent Neural Networks module), the long short-term memory module (Long Short-Term Memory module, LSTM) are adjusted respectively through the auxiliary module of the temporal feature (temporal assistant module) module), a hidden state information (H t ) of the gated recurrent unit module (GRU module) at the current time point, and an output state information (Y t ) at the current time point, so as to enhance the average precision (AP) of the auxiliary effect for detecting objects that are blocked, moved out of the detection screen, or small objects.

本發明是一種時序特徵的輔助模組,其中包括:一骨幹網路層(backbone layer),該骨幹網路層的輸入端連接輸入資料特徵,藉以提取輸入資料特徵;該時序特徵的輔助模組(temporal assistant module)的輸入端連結該骨幹網路層的輸出端;一頸部層(neck layer),該頸部層的輸入端連結該時序特徵的輔助模組(temporal assistant module) 的輸出端,藉以融合資料特徵;以及一檢測頭部層(head layer),該頸部層的輸出端連結該檢測頭部層(head layer) 的輸入端。The present invention is an auxiliary module of temporal features, which includes: a backbone network layer, the input end of the backbone network layer is connected to the input data features to extract the input data features; the input end of the temporal assistant module is connected to the output end of the backbone network layer; a neck layer, the input end of the neck layer is connected to the output end of the temporal assistant module to fuse data features; and a detection head layer, the output end of the neck layer is connected to the input end of the detection head layer.

本發明是一種時序特徵的輔助模組,其中包括:一骨幹網路層(backbone layer),該骨幹網路層的輸入端連接輸入資料特徵,藉以提取輸入資料特徵;一頸部層(neck layer),該頸部層的輸入端連結該骨幹網路層(backbone layer) 的輸出端,藉以融合資料特徵;該時序特徵的輔助模組(temporal assistant module)置入該頸部層(neck layer)中,藉以整合不同尺度的資料特徵;以及一檢測頭部層(head layer),該頸部層的輸出端連結該檢測頭部層(head layer) 的輸入端。The present invention is an auxiliary module of temporal features, which includes: a backbone layer, the input end of the backbone layer is connected to the input data features to extract the input data features; a neck layer, the input end of the neck layer is connected to the output end of the backbone layer to fuse the data features; the temporal assistant module is placed in the neck layer to integrate the data features of different scales; and a detection head layer, the output end of the neck layer is connected to the input end of the detection head layer.

本發明是一種時序特徵的輔助模組,其中包括:一骨幹網路層(backbone layer),該骨幹網路層的輸入端連接輸入資料特徵,藉以提取輸入資料特徵;一頸部層(neck layer),該頸部層的輸入端連結該骨幹網路層(backbone layer),藉以融合資料特徵; 該時序特徵的輔助模組(temporal assistant module)的輸入端連結該骨幹網路層的輸出端;以及一檢測頭部層(head layer),該時序特徵的輔助模組(temporal assistant module)的輸出端連結該頭部層(head layer) 的輸入端。The present invention is a temporal feature assistant module, which includes: a backbone layer, the input end of the backbone layer is connected to the input data feature to extract the input data feature; a neck layer, the input end of the neck layer is connected to the backbone layer to fuse the data feature; the input end of the temporal feature assistant module is connected to the output end of the backbone layer; and a detection head layer, the output end of the temporal feature assistant module is connected to the input end of the head layer.

本發明是一種時序特徵的輔助模組,其中該遞迴神經網路模組(Recurrent Neural Networks module),是由該第3二維卷積層輸出至一第1激勵函數層,由該第1激勵函數層分別輸出至當下時間點的該隱藏狀態資訊(H t)、以及輸出至當下時間點的該輸出狀態資訊(Y t)。 The present invention is an auxiliary module of time series characteristics, wherein the recurrent neural network module (Recurrent Neural Networks module) is output from the third two-dimensional convolution layer to a first excitation function layer, and the first excitation function layer outputs the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point respectively.

本發明是一種時序特徵的輔助模組,其中該長短期記憶模組(Long Short-Term Memory module, LSTM module),包括:該第3二維卷積層分別輸出連結至一遺忘閘、一輸入閘、一第2激勵函數層、以及一輸出閘,其中該遺忘閘、該輸入閘、以及該輸出閘是S型函數(Sigmoid);該遺忘閘輸出端資訊與該遺忘閘輸出資訊相乘的到一第1資訊,該輸入閘輸出端資訊與該第2激勵函數層輸出端資訊相乘的到一第2資訊,該第1資訊與該第2資訊相加後,分別輸出至一第3激勵函數層、以及輸出至當下時間點的一單元狀態(C t);以及該第2激勵函數層輸出端資訊與該輸出閘資訊相乘後,分別輸出至當下時間點的該隱藏狀態資訊(H t)、以及輸出當下時間點的該輸出狀態資訊(Y t)。 The present invention is an auxiliary module of time series features, wherein the long short-term memory module (Long Short-Term Memory module, LSTM) module, comprising: the third two-dimensional convolution layer is output and connected to a forget gate, an input gate, a second excitation function layer, and an output gate, wherein the forget gate, the input gate, and the output gate are sigmoid functions; the forget gate output terminal information is multiplied by the forget gate output terminal information to obtain a first information, the input gate output terminal information is multiplied by the second excitation function layer output terminal information to obtain a second information, and the first information and the second information are added and output to a third excitation function layer and a unit state (C t ) at a current time point. ); and after the second excitation function layer output terminal information is multiplied by the output gate information, the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point are output respectively.

本發明是一種時序特徵的輔助模組,其中該閘門循環單元模組(Gated Recurrent Unit module, GRU module) ,包括:該第3二維卷積層分別輸出連結至一重置閘、以及一更新閘,其中該重置閘、以及該更新閘是S型函數(Sigmoid);該重置閘的輸出端資訊與該第1二維卷積層的輸出端資訊相乘後,輸出至一第2連接層 57,該第2連接層 57的輸出端資訊,輸出至一第4二維卷積層, 該第4二維卷積層的輸出端資訊,輸出至一第4激勵函數層;以及該第1二維卷積層的輸出端資訊與該更新閘輸出端資訊延遲後相乘,輸出一第3資訊,該更新閘輸出端資訊愈該第4激勵函數層輸出端資訊相乘後,輸出一第4資訊,該第3資訊與該第4資訊相加後,分別輸出當下時間點的該隱藏狀態資訊(H t) 、以及輸出當下時間點的一輸出狀態資訊(Y t)。 The present invention is an auxiliary module of timing characteristics, wherein the gated recurrent unit module (GRU module) comprises: the output of the third two-dimensional convolution layer is connected to a reset gate and an update gate respectively, wherein the reset gate and the update gate are sigmoid functions; the output terminal information of the reset gate is multiplied with the output terminal information of the first two-dimensional convolution layer, and then output to a second connection layer 57, and the output terminal information of the second connection layer 57 is output to a fourth two-dimensional convolution layer, The output terminal information of the fourth two-dimensional convolution layer is output to a fourth excitation function layer; and the output terminal information of the first two-dimensional convolution layer is delayed and multiplied with the update gate output terminal information to output a third information, and the update gate output terminal information is multiplied with the output terminal information of the fourth excitation function layer to output a fourth information, and after the third information and the fourth information are added, the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point are output respectively.

本發明是一種時序特徵的輔助模組,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) ,進行物件偵測,包括:至少一預選框模組(anchor base module) ,該至少一預選框模組(anchor base module)是將特徵圖切割成多個比例不同的網格(Grid),並將設定好的該至少一預選框(anchor base)放置於各個網格中,擷取重疊率最高的預選框(anchor base) ,並透過調整偏移量來進行物件偵測。The present invention is an auxiliary module of temporal features, which processes a video frame of a spatio-temporal feature map to detect an object, including: at least one anchor base module, which cuts the feature map into multiple grids of different proportions, and places the set at least one anchor base in each grid, captures the anchor base with the highest overlap rate, and detects an object by adjusting the offset.

本發明是一種時序特徵的輔助模組,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) ,進行物件偵測,包括:至少一非預選框模組(Anchor Free module),通過在特徵圖上尋找物件的中心點座標,並預測出中心點與上下左右邊界個別的距離,來進行物件偵測。The present invention is an auxiliary module of temporal features, which processes a video frame of a spatio-temporal feature map to detect objects, including: at least one anchor free module, which detects objects by finding the coordinates of the center point of the object on the feature map and predicting the distances between the center point and the upper, lower, left and right boundaries.

本發明是一種時序特徵的輔助模組,其中,分別調整該遞迴神經網路模組(Recurrent Neural Networks module)、該長短期記憶模組(Long Short-Term Memory module, LSTM module)、以及該閘門循環單元模組(Gated Recurrent Unit module, GRU module)的當下時間點的一隱藏狀態資訊(H t)、以及當下時間點的一輸出狀態資訊(Y t),達到對物件被遮蔽、被移出偵測畫面、或是小物件偵測,增強輔助效果的平均精準度(average precision, AP)的功效和目的。 The present invention is an auxiliary module of time series characteristics, wherein a hidden state information (H t ) at the current time point and an output state information (Y t ) at the current time point of the recurrent neural network module (Recurrent Neural Networks module), the long short-term memory module (Long Short-Term Memory module, LSTM module), and the gated recurrent unit module (GRU module ) are adjusted respectively, so as to achieve the effect and purpose of enhancing the average precision (AP) of the auxiliary effect for detecting objects that are blocked, moved out of the detection screen, or small objects.

如圖5至圖7所示,本創作是一種時序特徵的輔助模組 10,用於單目3D 物件偵測,其中該時序特徵的輔助模組 10(temporal assistant module),分別與一遞迴神經網路模組 501(Recurrent Neural Networks module)、一長短期記憶模組 601(Long Short-Term Memory module, LSTM module)、以及一閘門循環單元模組 701(Gated Recurrent Unit module, GRU module)至少其中之一連結,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) 是經由該時序特徵的輔助模組 10(temporal assistant module)處理,該時序特徵的輔助模組 10(temporal assistant module)包括:一第1二維卷積層,由前一時間點的一隱藏狀態資訊(H t-1)輸入至該第1二維卷積層;一第2二維卷積層,由當下時間點的一輸入狀態資訊(X t)輸入至該第2二維卷積層; 一第1連接層,由該第1二維卷積層輸出至該第1連接層、以及由該第2二維卷積層分別輸出至該第1連接層;以及一第3二維卷積層 56,由該第1連接層輸出至該第3二維卷積層 56:其中經由該時序特徵的輔助模組 10(temporal assistant module),分別調整該遞迴神經網路模組 501 501(Recurrent Neural Networks module, RNN module)、該長短期記憶模組 601 601(Long Short-Term Memory module, LSTM module)、以及該閘門循環單元模組 701 701(Gated Recurrent Unit module, GRU module)的當下時間點的一隱藏狀態資訊(H t)、以及當下時間點的一輸出狀態資訊(Y t),藉以對物件被遮蔽、被移出偵測畫面、或是小物件偵測,增強輔助效果的平均精準度(average precision, AP)。 As shown in FIGS. 5 to 7 , the present invention is a temporal feature assistant module 10 for monocular 3D object detection, wherein the temporal feature assistant module 10 is respectively connected to at least one of a recurrent neural network module 501 (Recurrent Neural Networks module), a long short-term memory module 601 (Long Short-Term Memory module, LSTM module), and a gated recurrent unit module 701 (Gated Recurrent Unit module, GRU module), wherein a video frame that processes a spatio-temporal feature map is processed by the temporal feature assistant module 10 (temporal assistant module), and the temporal feature assistant module 10 (temporal assistant module) is connected to at least one of a recurrent neural network module 501 (Recurrent Neural Networks module), a long short-term memory module 601 (Long Short-Term Memory module, LSTM module), and a gated recurrent unit module 701 (GRU module). module) comprises: a first two-dimensional convolution layer, to which a hidden state information (H t-1 ) at a previous time point is input; a second two-dimensional convolution layer, to which an input state information (X t ) at a current time point is input; a first connection layer, to which the first two-dimensional convolution layer outputs to the first connection layer, and the second two-dimensional convolution layer outputs to the first connection layer respectively; and a third two-dimensional convolution layer 56, to which the first connection layer outputs to the third two-dimensional convolution layer 56: wherein the auxiliary module 10 (temporal assistant module) of the temporal feature is used to generate the second two-dimensional convolution layer; module), respectively adjusting a hidden state information (H t ) at a current time point of the recurrent neural network module 501 501 (Recurrent Neural Networks module, RNN module), the long short-term memory module 601 601 (Long Short-Term Memory module, LSTM module), and the gated recurrent unit module 701 701 ( Gated Recurrent Unit module, GRU module), and an output state information (Y t ) at a current time point, so as to enhance the average precision (AP) of the auxiliary effect for detecting objects that are covered, moved out of the detection screen, or small objects.

如圖5所示,本創作是一種時序特徵的輔助模組 10,其中該遞迴神經網路模組 501(Recurrent Neural Networks module),是由該第3二維卷積層 56 輸出至一第1激勵函數層,由該第1激勵函數層分別輸出至當下時間點的該隱藏狀態資訊(H t)、以及輸出至當下時間點的該輸出狀態資訊(Y t)。 As shown in FIG5 , the present invention is an auxiliary module 10 of a time series feature, wherein the recurrent neural network module 501 is output by the third two-dimensional convolution layer 56 to a first excitation function layer, and the first excitation function layer outputs the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point.

如圖6所示,本創作是一種時序特徵的輔助模組 10,其中該長短期記憶模組 601(Long Short-Term Memory module, LSTM module),包括:該第3二維卷積層 56 分別輸出連結至一遺忘閘61、一輸入閘62、一第2激勵函數層、以及一輸出閘63,其中該遺忘閘61、該輸入閘62、以及該輸出閘63是S型函數(Sigmoid);該遺忘閘61輸出端資訊與該遺忘閘61輸出資訊相乘的到一第1資訊,該輸入閘62輸出端資訊與該第2激勵函數層輸出端資訊相乘的到一第2資訊,該第1資訊與該第2資訊相加後,分別輸出至一第3激勵函數層 56 、以及輸出至當下時間點的一單元狀態(C t);以及該第2激勵函數層輸出端資訊與該輸出閘63資訊相乘後,分別輸出至當下時間點的該隱藏狀態資訊(H t)、以及輸出當下時間點的該輸出狀態資訊(Y t)。 As shown in FIG. 6 , the invention is an auxiliary module 10 of time series features, wherein the long short-term memory module 601 (Long Short-Term Memory module, LSTM module) includes: the third two-dimensional convolution layer 56 The outputs are connected to a forget gate 61, an input gate 62, a second excitation function layer, and an output gate 63, respectively, wherein the forget gate 61, the input gate 62, and the output gate 63 are sigmoid functions; the output terminal information of the forget gate 61 is multiplied by the output terminal information of the forget gate 61 to obtain the first information, the output terminal information of the input gate 62 is multiplied by the output terminal information of the second excitation function layer to obtain the second information, and after the first information and the second information are added, they are respectively output to a third excitation function layer 56 and output to a unit state (C t ) at the current time point. ); and after the output terminal information of the second excitation function layer is multiplied by the output gate 63 information, the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point are output respectively.

如圖7所示,本創作是一種時序特徵的輔助模組 10,其中該閘門循環單元模組 701(Gated Recurrent Unit module, GRU module) ,包括:該第3二維卷積層 56 分別輸出連結至一重置閘71、以及一更新閘72,其中該重置閘71、以及該更新閘72是S型函數(Sigmoid);該重置閘71的輸出端資訊與該第1二維卷積層的輸出端資訊相乘後,輸出至一第2連接層 57,該第2連接層 57的輸出端資訊,輸出至一第4二維卷積層 58, 該第4二維卷積層 58 的輸出端資訊,輸出至一第4激勵函數層 58 ;以及該第1二維卷積層的輸出端資訊與該更新閘72輸出端資訊延遲後相乘,輸出一第3資訊,該更新閘72輸出端資訊愈該第4激勵函數層 58 輸出端資訊相乘後,輸出一第4資訊,該第3資訊與該第4資訊相加後,分別輸出當下時間點的該隱藏狀態資訊(H t) 、以及輸出當下時間點的一輸出狀態資訊(Y t)。 As shown in FIG. 7 , the invention is an auxiliary module 10 of timing characteristics, wherein the gated recurrent unit module 701 (GRU module) includes: the output of the third two-dimensional convolution layer 56 is connected to a reset gate 71 and an update gate 72, wherein the reset gate 71 and the update gate 72 are sigmoid functions; the output terminal information of the reset gate 71 is multiplied by the output terminal information of the first two-dimensional convolution layer, and then output to a second connection layer 57, and the output terminal information of the second connection layer 57 is output to a fourth two-dimensional convolution layer 58, and the fourth two-dimensional convolution layer 58 The output terminal information of the first two-dimensional convolution layer is output to a fourth excitation function layer 58; and the output terminal information of the first two-dimensional convolution layer is multiplied with the output terminal information of the update gate 72 after delay, and a third information is output. After the output terminal information of the update gate 72 is multiplied with the output terminal information of the fourth excitation function layer 58, a fourth information is output. After the third information and the fourth information are added, the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point are output respectively.

如表1所示,本創作是一種時序特徵的輔助模組 10,其中測試所改良之時序特徵的輔助模組 10,使用 VisualDet3D 之模型,進行初步的測試, 將未使用時序模組的模型架構定義為 Baseline,並於同一位置分別加入所改良之 RNN、LSTM、GRU 模組,透過比較 2D、鳥瞰(bird’s eye view)、3D的平均精確度(AP)。即,  2D AP、BEV AP 和 3D AP 的數值,作為模組有效性的初步比較方法。初步數據如下表1。參考 KITTI 對物件遮蔽率進行分類的方式,分為 E(easy)、M(moderate)、H(hard) 三個難度,其中 H (hard)的物件遮蔽率為最高的,在數值上,紅色表示該欄位中準確度最高的,粗體表示準確度高於 Baseline 的數據。As shown in Table 1, this invention is an auxiliary module 10 of time series features, in which the auxiliary module 10 of the improved time series features is tested. The model of VisualDet3D is used for preliminary testing. The model architecture without the time series module is defined as the baseline, and the improved RNN, LSTM, and GRU modules are added to the same position, respectively, by comparing the average precision (AP) of 2D, bird’s eye view, and 3D. That is, the values of 2D AP, BEV AP, and 3D AP are used as a preliminary comparison method for the effectiveness of the module. The preliminary data is shown in Table 1. Referring to the way KITTI classifies object occlusion rates, it is divided into three levels of difficulty: E (easy), M (moderate), and H (hard). Among them, H (hard) has the highest object occlusion rate. In terms of values, red indicates the highest accuracy in this field, and bold indicates data with higher accuracy than the baseline.

如表1所示,本創作是一種時序特徵的輔助模組 10,其中 KITTI Car 2D  AP70↑ 3D AP70↑ 3D AP 50↑ E M H E M H E M H Baseline 97.30 84.54 64.65 19.43 13.60 10.82 55.49 39.03 30.86 RNN 97.28 84.55 64.66 21.77 15.41 11.85 56.21 39.59 31.36 LSTM 97.22 84.49 67.00 21.24 15.78 12.07 59.13 41.71 32.02 GRU 97.27 86.92 67.06 20.89 14.66 11.74 57.32 41.44 31.82 As shown in Table 1, the present invention is a time series feature auxiliary module 10, in which KITTI Car 2D AP70↑ 3D AP70↑ 3D AP 50↑ E M H E M H E M H Baseline 97.30 84.54 64.65 19.43 13.60 10.82 55.49 39.03 30.86 RNN 97.28 84.55 64.66 21.77 15.41 11.85 56.21 39.59 31.36 LSTM 97.22 84.49 67.00 21.24 15.78 12.07 59.13 41.71 32.02 GRU 97.27 86.92 67.06 20.89 14.66 11.74 57.32 41.44 31.82

依表1據數據可以發現,不論是加入了 RNN、LSTM、GRU 哪一種時序模組,在 BEV 與 3D 的準確度上都由提升,在 2D 的部分雖說效果並沒有比 Baseline 更好,但在 LSTM 與 GRU 都有持平。其中 RNN 與 LSTM 相比缺少了遺忘閘61之設計,在時序資料的參考比重並無不同或是進行取捨,為此會發生物件標記框偏移的問題,如圖 13所示。在此初步實驗中驗證加入了本創作時序特徵的輔助模組 10對於 3D 物件偵測的效果是有幫助的,其中又以 LSTM 的平均提升幅度最為明顯。According to the data in Table 1, it can be found that no matter which time series module is added, RNN, LSTM, or GRU, the accuracy of BEV and 3D is improved. Although the effect of 2D is not better than the Baseline, it is on par with LSTM and GRU. RNN lacks the design of forget gate 61 compared with LSTM, and the reference weight of time series data is the same or trade-off, which will cause the object marker box to shift, as shown in Figure 13. In this preliminary experiment, it is verified that the auxiliary module 10 with the time series feature of this creation is helpful for the effect of 3D object detection, and the average improvement of LSTM is the most obvious.

如圖8所示,本創作是一種時序特徵的輔助模組 10,其中圖8架構中首先會將影像輸入至 骨幹網路層 81 (Backbone) 中,透過 骨幹網路層 81 (Backbone) 進行特徵提取,這時所取得的特徵圖僅包含了影像的特徵資訊。此時的特徵資訊僅包含了最原始影像中的特徵資料,但同時也是包括了最多資訊的特徵。因此本創作的模組擺放位置是將其擺放在 骨幹網路層 81 (Backbone) 後加入時序模組,以最大限度的整合不同時間點的特徵資料,並將整合後的結果繼續送入 頸部層 82(neck layer)中進行特徵的處理。As shown in FIG8 , this invention is an auxiliary module 10 of time series features. In the architecture of FIG8 , the image is first input into the backbone network layer 81 (Backbone), and features are extracted through the backbone network layer 81 (Backbone). At this time, the feature map obtained only contains the feature information of the image. At this time, the feature information only contains the feature data in the most original image, but it also contains the most information. Therefore, the module of this invention is placed after the backbone network layer 81 (Backbone) and the time series module is added to maximize the integration of feature data at different time points, and the integrated results are continuously sent to the neck layer 82 (neck layer) for feature processing.

如圖8所示,本創作是一種時序特徵的輔助模組 10,其中包括:一骨幹網路層 81(backbone layer),該骨幹網路層 81的輸入端連接輸入資料特徵,藉以提取輸入資料特徵;該時序特徵的輔助模組 10(temporal assistant module)的輸入端連結該骨幹網路層 81的輸出端;一頸部層 82(neck layer),該頸部層 82的輸入端連結該時序特徵的輔助模組 10(temporal assistant module) 的輸出端,藉以融合資料特徵;以及一檢測頭部層 83(head layer),該頸部層 82的輸出端連結該檢測頭部層 83(head layer) 的輸入端。As shown in FIG8 , the invention is a temporal feature auxiliary module 10, which includes: a backbone network layer 81, the input end of the backbone network layer 81 is connected to the input data feature to extract the input data feature; the input end of the temporal feature auxiliary module 10 is connected to the output end of the backbone network layer 81; a neck layer 82, the input end of the neck layer 82 is connected to the output end of the temporal feature auxiliary module 10 to fuse the data feature; and a detection head layer 83, the output end of the neck layer 82 is connected to the detection head layer. 83(head layer) input port.

如圖9所示,本創作是一種時序特徵的輔助模組 10,另外,將 骨幹網路層 81 (Backbone) 的特徵繼續傳遞至 頸部層 82(neck layer)中,在 頸部層 82(neck layer)中主要進行不同尺度的特徵整合,透過將 骨幹網路層 81 (Backbone)所產生的不同尺寸的特徵分別抽出做計算後進行整合,並輸出具有多尺度資訊的特徵圖。在 頸部層 82(neck layer)的運算過程中會包含各個尺度的特徵圖,由於各個尺度的大小與特徵資訊皆有所不同,為此,時序特徵的輔助模組 10擺放位置是將模組至於 頸部層 82(neck layer)中,如圖9所示,透過整合各個不同尺度的特徵圖,希望可以保留原尺度的特徵去進行時序的整合。再取得 頸部層 82(neck layer)所整理出的特徵圖後,會將其送入 檢測頭部層 83(head layer) 中進行模型的預測,而 頸部層 82(neck layer)所整理出的特徵圖是具有多尺度特徵資訊的,主要是為了讓 檢測頭部層 83(head layer) 在計算大物件與小物件時可以有更好的效果。As shown in FIG9 , this invention is an auxiliary module 10 of time series features. In addition, the features of the backbone network layer 81 (Backbone) are continuously transmitted to the neck layer 82 (neck layer). In the neck layer 82 (neck layer), features of different scales are mainly integrated. The features of different sizes generated by the backbone network layer 81 (Backbone) are extracted and calculated, and then integrated, and a feature map with multi-scale information is output. The operation process of the neck layer 82 includes feature maps of various scales. Since the size and feature information of each scale are different, the auxiliary module 10 of the timing feature is placed in the neck layer 82, as shown in FIG. 9. By integrating the feature maps of different scales, it is hoped that the features of the original scale can be retained for timing integration. After obtaining the feature map sorted by the neck layer 82 (neck layer), it will be sent to the detection head layer 83 (head layer) for model prediction. The feature map sorted by the neck layer 82 (neck layer) has multi-scale feature information, mainly to allow the detection head layer 83 (head layer) to have better results when calculating large objects and small objects.

如圖9所示,本創作是一種時序特徵的輔助模組 10,其中包括:一骨幹網路層 81(backbone layer),該骨幹網路層 81的輸入端連接輸入資料特徵,藉以提取輸入資料特徵;一頸部層 82(neck layer),該頸部層 82的輸入端連結該骨幹網路層 81(backbone layer) 的輸出端,藉以融合資料特徵;該時序特徵的輔助模組 10(temporal assistant module)置入該頸部層 82(neck layer)中,藉以整合不同尺度的資料特徵;以及一檢測頭部層 83(head layer),該頸部層 82的輸出端連結該檢測頭部層 83(head layer) 的輸入端。As shown in FIG9 , the invention is a temporal feature assistant module 10, which includes: a backbone network layer 81, the input end of which is connected to the input data feature to extract the input data feature; a neck layer 82, the input end of which is connected to the output end of the backbone network layer 81 to fuse the data feature; the temporal feature assistant module 10 (temporal assistant module) is placed in the neck layer 82 to integrate data features of different scales; and a detection head layer 83, the output end of which is connected to the detection head layer 83(head layer) input port.

如圖10所示,本創作是一種時序特徵的輔助模組 10,擺放位置則是將時序特徵的輔助模組 10置於檢測頭部層 83(head layer)前,如圖10所示,透過時序特徵的輔助模組 10將整合後的多尺度特徵進行整合,使輸入至 檢測頭部層 83(head layer) 中的特徵資訊除了可以包含各尺度的物件特徵外,也包含了相鄰時間點的多尺度物件特徵。As shown in FIG10 , the present invention is an auxiliary module 10 of temporal features, and the placement position is to place the auxiliary module 10 of temporal features in front of the detection head layer 83 (head layer). As shown in FIG10 , the integrated multi-scale features are integrated through the auxiliary module 10 of temporal features, so that the feature information input into the detection head layer 83 (head layer) can include not only the object features of each scale, but also the multi-scale object features of adjacent time points.

如圖10所示,本創作是一種時序特徵的輔助模組 10,其中包括:一骨幹網路層 81(backbone layer),該骨幹網路層 81的輸入端連接輸入資料特徵,藉以提取輸入資料特徵;一頸部層 82(neck layer),該頸部層 82的輸入端連結該骨幹網路層 81(backbone layer),藉以融合資料特徵; 該時序特徵的輔助模組 10(temporal assistant module)的輸入端連結該骨幹網路層 81的輸出端;以及一檢測頭部層 83(head layer),該時序特徵的輔助模組 10(temporal assistant module)的輸出端連結該頭部層(head layer) 的輸入端。As shown in FIG10 , the invention is a temporal feature assistant module 10, which includes: a backbone network layer 81, the input end of which is connected to the input data feature to extract the input data feature; a neck layer 82, the input end of which is connected to the backbone network layer 81 to fuse the data feature; the input end of the temporal feature assistant module 10 is connected to the output end of the backbone network layer 81; and a detection head layer 83, the temporal feature assistant module 10 is connected to the output end of the backbone network layer 81; The output of the module is connected to the input of the head layer.

如表2所示,本創作是一種時序特徵的輔助模組 10,其中輔助模組擺放位置測試: KITTI Car 2D  AP70↑ 3D AP70↑ 3D AP 50↑ E M H E M H E M H Baseline 97.30 84.54 64.65 19.43 13.60 10.82 55.49 39.03 30.86 Backbone 後 87.39 72.21 54.78 17.09 11.25 8.58 51.78 35.05 27.01 Neck 中 94.50 76.99 59.58 18.58 12.56 9.81 52.75 36.42 28.53 Head 前 97.33 82.19 64.70 21.24 15.78 12.07 59.13 41.71 32.02 As shown in Table 2, the present invention is an auxiliary module 10 with timing characteristics, wherein the auxiliary module is placed in a position test: KITTI Car 2D AP70↑ 3D AP70↑ 3D AP 50↑ E M H E M H E M H Baseline 97.30 84.54 64.65 19.43 13.60 10.82 55.49 39.03 30.86 Backbone 87.39 72.21 54.78 17.09 11.25 8.58 51.78 35.05 27.01 Neck 94.50 76.99 59.58 18.58 12.56 9.81 52.75 36.42 28.53 Head before 97.33 82.19 64.70 21.24 15.78 12.07 59.13 41.71 32.02

如表2所示,本創作是一種時序特徵的輔助模組 10,其中在擺放位置的效果測試上,本創作同樣使用 VisualDet3D 模型架構進行測試,並依據模組有可行性實驗的結果,選用  LSTM  作為使用之模組,並分別將模組放置於骨幹網路層 81 (Backbone) 後、頸部層 82(neck layer)中、檢測頭部層 83(head layer) 前進行實驗,並以 2D AP 與 3D AP 作為評估指標。測試結果如表 2 所示,同樣參考 KITTI 對於遮蔽率進行分組。依上述實驗,雖說不同位置中都可以加入時序模組進行輔助,但只有在 檢測頭部層 83(head layer) 前加入時序模組,對於輸出結果才會進到輔助的效果,而在 骨幹網路層 81 (Backbone) 後與 頸部層 82(neck layer)中加入時序模組並沒有達到提升的效果,反而會降低輸出的效果。因此,本創作提出在 檢測頭部層 83(head layer) 前加入時序模組是目前測試中最好的位置。As shown in Table 2, this work is an auxiliary module of temporal features 10. In the test of the effect of placement, this work also uses the VisualDet3D model architecture for testing. Based on the results of the feasibility experiment of the module, LSTM is selected as the module to be used. The module is placed behind the backbone network layer 81 (Backbone), in the neck layer 82 (neck layer), and before the detection head layer 83 (head layer) for experiments, and 2D AP and 3D AP are used as evaluation indicators. The test results are shown in Table 2, and the occlusion rate is also grouped with reference to KITTI. According to the above experiment, although the timing module can be added to different positions for assistance, only adding the timing module before the detection head layer 83 (head layer) can achieve the effect of assisting the output result. Adding the timing module after the backbone network layer 81 (Backbone) and the neck layer 82 (neck layer) does not achieve the effect of improvement, but will reduce the output effect. Therefore, this work proposes that adding the timing module before the detection head layer 83 (head layer) is the best position in the current test.

如圖11所示,本創作是一種時序特徵的輔助模組 10,其中Anchor Based是一種使用及於預選框(Anchor)進行物件偵測的方法。該方法透過將特徵圖切割成多個比例不同的網格(Grid),並將設定好的預選框放置於各個網格中, 從而找出重疊率最高的預選框,並透過調整偏移量來進行物件偵測。對於 Anchor Based 的做法來說,預選框的設計是相當重要的,如果設計的預選框與實際物件的大小差距太大,會增加模型在訓練上的負擔,並且會導致收斂效果不佳。常見的預選框設計方法分為經驗法則與資料分群兩種。經驗法主要是透過設計者過往的經驗來設定預選框的大小及參數,而資料分群則是根據統計標記資料中的結果,透過分群的方式去設定對應的預選框參數。As shown in Figure 11, this work is an auxiliary module of time series features 10, in which Anchor Based is a method of using pre-selected boxes (Anchor) for object detection. This method cuts the feature map into multiple grids of different proportions and places the set pre-selected boxes in each grid, so as to find the pre-selected box with the highest overlap rate and detect objects by adjusting the offset. For the Anchor Based approach, the design of the pre-selected box is very important. If the size difference between the designed pre-selected box and the actual object is too large, it will increase the burden of the model in training and lead to poor convergence effect. Common pre-selected box design methods are divided into two types: empirical rules and data clustering. The empirical method mainly sets the size and parameters of the pre-selection box based on the designer's past experience, while data clustering sets the corresponding pre-selection box parameters based on the results in the statistically labeled data through clustering.

如表3所示,本創作是一種時序特徵的輔助模組 10,其中為驗證本創作所提出之時序特徵輔助模組可套用於 Anchor Based 的模型中,使用VisualDet3D所提出之模型架構進行實驗,透過於該模型檢測頭之前加入時序輔助模組,使模型可以將觀測資料中的影像特徵進行整合,並將透過輔助模組整合後的特徵圖送入檢測頭中進行偵測任務。As shown in Table 3, this work is an auxiliary module of temporal features 10. In order to verify that the temporal feature auxiliary module proposed in this work can be applied to the Anchor Based model, the model architecture proposed by VisualDet3D is used for experiments. By adding the temporal auxiliary module before the model detection head, the model can integrate the image features in the observation data and send the feature map integrated by the auxiliary module to the detection head for detection tasks.

如表3所示,本創作是一種時序特徵的輔助模組 10,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) ,進行物件偵測,包括:至少一預選框模組(anchor base module) ,該至少一預選框模組(anchor base module)是將特徵圖切割成多個比例不同的網格(Grid),並將設定好的該至少一預選框(anchor base)放置於各個網格中,擷取重疊率最高的預選框(anchor base) ,並透過調整偏移量來進行物件偵測。As shown in Table 3, the present invention is an auxiliary module 10 of temporal features, which processes a video frame of a spatio-temporal feature map to detect an object, including: at least one anchor base module, which cuts the feature map into a plurality of grids of different proportions, and places the at least one anchor base in each grid, extracts the anchor base with the highest overlap rate, and detects an object by adjusting the offset.

如表3所示,本創作是一種時序特徵的輔助模組 10,其中VisualDet3Det 套用時序輔助模組實驗數據: Car 2D AP70↑ BEV AP70↑ 3D AP70↑ BEV AP50↑ 3D   P50↑ E M H E M H E M H E M H E M H Baseline 96.75  84.07  64.66 26.66  19.35  15.06 18.96  13.73  10.72 61.64  43.95  34.17 55.85  40.14  25.40 LSTM 96.75  84.07  66.06 28.48  20.55  16.12 20.90  15.27  11.77 63.87  45.44  35.13 59.12  41.86  32.06 Diff. 0.00    0.00    +1.40 +1.82   +1.21  +1.06 +1.94   +1.54  +1.05 +2.23  +1.50   +0.97 +3.27   +1.72  +6.66 Car 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D   P25↑ E M H E M H E M H E M H E M H Baseline 55.98  46.22  39.29 8.39    6.71     5.09 7.44    5.83    4.64 27.13  22.07  18.48 26.34  21.35  17.56 LSTM 58.43  47.05  40.14 9.46    7.52    5.69 8.31    6.49    5.14 28.87  23.66  19.64 28.20  22.81  19.10 Diff. +2.45   +0.83  +0.84 +1.07   +0.81  +0.60 0.87    +0.66   +0.50 +1.74  +1.59   +1.16 +1.86    +1.47    +1.54 Cyclist 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D   P25↑ E M H E M H E M H E M H E M H Baseline 53.09   32.25 30.43 3.59    1.98   2.00 3.04   1.72   1.65 14.54  8.22   7.75 13.47  7.50 7.47 LSTM 54.61   3.81   31.67 4.46    2.77    2.00 3.95   2.32   2.36 16.70  9.57   9.50 15.68  9.03 8.76 Diff. +1.52    +1.56 +1.24 +0.87  +0.79   +0.70 +0.91    +0.60    +0.71 +2.16   +1.35   +1.75 +2.21   +1.53    +1.29 mAP 2D↑ BEV  Hard↑ 3D Hard↑ BEV Easy↑ 3D   Easy↑ E M H E M H E M H E M H E M H Baseline 96.75  84.07  64.66 12.88   9.35   7.38 9.81    7.09    5.67 34.44  24.75  20.13 31.88  23.00  16.81 LSTM 69.93  54.98  45.96 14.13  10.28  8.17 11.05   8.03  6.42 36.48  26.23  21.42 34.33  24.57  19.97 Diff. +1.33   +0.80 +1.16 +1.25   +0.94 +0.79 +1.24   +0.93  +0.75 +2.04  +1.48  +1.29 +2.45   +1.57  +3.16 As shown in Table 3, this work is an auxiliary module of time series features 10, in which VisualDet3Det applies the experimental data of the time series auxiliary module: Car 2D AP70↑ BEV AP70↑ 3D AP70↑ BEV AP50↑ 3D P50↑ E M H E M H E M H E M H E M H Baseline 96.75 84.07 64.66 26.66 19.35 15.06 18.96 13.73 10.72 61.64 43.95 34.17 55.85 40.14 25.40 LSTM 96.75 84.07 66.06 28.48 20.55 16.12 20.90 15.27 11.77 63.87 45.44 35.13 59.12 41.86 32.06 Diff. 0.00 0.00 +1.40 +1.82 +1.21 +1.06 +1.94 +1.54 +1.05 +2.23 +1.50 +0.97 +3.27 +1.72 +6.66 Car 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D P25↑ E M H E M H E M H E M H E M H Baseline 55.98 46.22 39.29 8.39 6.71 5.09 7.44 5.83 4.64 27.13 22.07 18.48 26.34 21.35 17.56 LSTM 58.43 47.05 40.14 9.46 7.52 5.69 8.31 6.49 5.14 28.87 23.66 19.64 28.20 22.81 19.10 Diff. +2.45 +0.83 +0.84 +1.07 +0.81 +0.60 0.87 +0.66 +0.50 +1.74 +1.59 +1.16 +1.86 +1.47 +1.54 Cyclist 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D P25↑ E M H E M H E M H E M H E M H Baseline 53.09 32.25 30.43 3.59 1.98 2.00 3.04 1.72 1.65 14.54 8.22 7.75 13.47 7.50 7.47 LSTM 54.61 3.81 31.67 4.46 2.77 2.00 3.95 2.32 2.36 16.70 9.57 9.50 15.68 9.03 8.76 Diff. +1.52 +1.56 +1.24 +0.87 +0.79 +0.70 +0.91 +0.60 +0.71 +2.16 +1.35 +1.75 +2.21 +1.53 +1.29 mAP 2D↑ BEV Hard↑ 3D Hard↑ BEV Easy↑ 3D Easy↑ E M H E M H E M H E M H E M H Baseline 96.75 84.07 64.66 12.88 9.35 7.38 9.81 7.09 5.67 34.44 24.75 20.13 31.88 23.00 16.81 LSTM 69.93 54.98 45.96 14.13 10.28 8.17 11.05 8.03 6.42 36.48 26.23 21.42 34.33 24.57 19.97 Diff. +1.33 +0.80 +1.16 +1.25 +0.94 +0.79 +1.24 +0.93 +0.75 +2.04 +1.48 +1.29 +2.45 +1.57 +3.16

如表3所示,本創作是一種時序特徵的輔助模組 10,其中經由實驗數據,可以驗證出在 Anchor Based 模型中加入時序輔助模組,雖然在個別類別的輔助效果各不相同,但在平均預測準確度中,加入了時序輔助模組的效果約提升1.4。除了透過數據驗證輔助模組對於原模型的效果外,將透過視覺化結果驗證本創作所希望提高的物件形體被遮擋、物件部分形體移出畫面以及小物件偵測等情況之效果。As shown in Table 3, this work is an auxiliary module of temporal features 10. Experimental data shows that adding a temporal auxiliary module to the Anchor Based model can improve the average prediction accuracy by about 1.4, although the auxiliary effects of individual categories vary. In addition to verifying the effect of the auxiliary module on the original model through data, the effects of this work on the object being blocked, the object partly moving out of the screen, and the detection of small objects will be verified through visualization results.

如圖14所示,本創作是一種時序特徵的輔助模組 10,其中在觀測資料(T-1)時畫面中中間的車輛雖然有被前車些微的遮蔽,但大致上還是可以看出後方有一輛小轎車,而在當下資料(T)時,車輛移動導致其被遮蔽的面積變多,因此導致 Baseline 的模型無法偵測出該車輛,但在加入了本創作時序特徵的輔助模組 10進行輔助後,可以發現被遮蔽的車輛還是能被偵測。As shown in FIG14 , the present invention is an auxiliary module 10 of temporal features. In the observation data (T-1), although the vehicle in the middle of the screen is slightly obscured by the vehicle in front, it can still be roughly seen that there is a small car behind it. In the current data (T), the vehicle moves, causing the obscured area to increase, so the Baseline model cannot detect the vehicle. However, after adding the auxiliary module 10 of temporal features of the present invention for assistance, it can be found that the obscured vehicle can still be detected.

如圖15所示,本創作是一種時序特徵的輔助模組 10,其中在觀測資料(T-1)右側有一輛轎車,但當時間推進到當下資料(T)時,因為該車輛的移動到至其移出了畫面,而當使用只考慮當下資料的 Baseline 模型時,會發現該轎車並不會被偵測到,但在使用了本創作時序特徵的輔助模組 10去參考觀測資料時,該車輛則會因為時序特徵的整合而被偵測。As shown in FIG15 , the present invention is an auxiliary module 10 of time series features, in which there is a car on the right side of the observed data (T-1), but when time advances to the current data (T), the car moves out of the screen due to its movement. When the Baseline model that only considers the current data is used, it is found that the car will not be detected. However, when the auxiliary module 10 of time series features of the present invention is used to refer to the observed data, the car will be detected due to the integration of the time series features.

如圖16所示,本創作是一種時序特徵的輔助模組 10,其中在此實施例中,觀測資料(T-1)與當下資料(T)之間並無發生遮擋或被移出畫面的情形,但在畫面中有許多小物件,在僅使用當下資料的型中的Baseline 模型中,因小物件的特徵較少所以偵測效果較差,而在加入時序模組輔助的模型中,由於透過整合觀測資料的特徵資訊,使小物件的偵測有所提高。As shown in FIG16 , the present invention is an auxiliary module 10 of time series features. In this embodiment, there is no occlusion or movement out of the screen between the observed data (T-1) and the current data (T), but there are many small objects in the screen. In the baseline model that only uses the current data, the detection effect of the small objects is poor because the features of the small objects are relatively few. In the model that adds the auxiliary time series module, the detection of small objects is improved by integrating the feature information of the observed data.

如圖17所示,本創作是一種時序特徵的輔助模組 10,其中,最後在未發生遮蔽、移出畫面或小物件的情境中,如圖 4.8 所示,雖然在場景中並沒有發生上述特殊情況,畫面中的三個物件不論是在當下時間點或是過往時間點皆有出現於畫面中,而在加入時序輔助模組後並不換影響其判斷,引此偵測效果會與未加入時序輔助模組之模型相同,以證明加入了本創作時序特徵的輔助模組 10,並不會降低原有的偵測效果。As shown in Figure 17, this creation is an auxiliary module 10 of temporal characteristics. In the last scenario where there is no occlusion, no moving out of the screen or small objects, as shown in Figure 4.8, although the above special situations do not occur in the scene, the three objects in the screen appear in the screen at the current time point or in the past time point. After adding the temporal auxiliary module, it does not affect its judgment. Therefore, the detection effect is the same as the model without the temporal auxiliary module, which proves that adding the auxiliary module 10 of the temporal characteristics of this creation does not reduce the original detection effect.

如表3所示,本創作是一種時序特徵的輔助模組 10,其中使用本創作時序特徵的輔助模組 10於 VisualDet3D 模型之比較結果,如表 4所示,經由實驗數據,可以驗證出在採用預選框(Anchor Based )中加入時序特徵的輔助模組 10,雖然在個別類別的輔助效果各不相同,但在平均預測準確度中,加入了時序特徵的輔助模組 10的效果約提升1.4。除了透過數據驗證時序特徵的輔助模組 10對於原模型的效果外,將透過視覺化結果驗證時序特徵的輔助模組 10提高的物件形體被遮擋、物件部分形體移出畫面以及小物件偵測等情況之效果。As shown in Table 3, this invention is an auxiliary module 10 of temporal features. The comparison results of the auxiliary module 10 of temporal features of this invention in the VisualDet3D model are shown in Table 4. Through experimental data, it can be verified that the auxiliary module 10 of temporal features added to the pre-selected box (Anchor Based) has different auxiliary effects in individual categories, but in terms of average prediction accuracy, the effect of the auxiliary module 10 with temporal features is improved by about 1.4. In addition to verifying the effect of the auxiliary module 10 of the timing characteristics on the original model through data, the auxiliary module 10 of the timing characteristics will be verified through visualization results to improve the effects of situations such as object shape being blocked, part of the object shape moving out of the screen, and small object detection.

如圖12所示,本創作是一種時序特徵的輔助模組 10,其中Anchor Free做法通俗的定義為不使用預選框的所有方法。由於 Anchor Free 的做法不使用預選框,因此無需事先進行設定,通過在特徵圖上尋找物件的中心點座標,並預測出中心點與上下左右邊界個別的距離,以此進行物件偵測,對於 Anchor Free 來說,不需要預先設定預選框,並且在運算上也不會因為大量的預選框需要進行篩選而提高運算成本,然而,由於沒有預選框的資訊,以至於模型在中心點與邊界距離資訊的回歸上較難收斂。As shown in Figure 12, this invention is an auxiliary module 10 of time series features, in which the Anchor Free approach is generally defined as all methods that do not use pre-selected boxes. Since the Anchor Free approach does not use pre-selected boxes, there is no need to set them in advance. By finding the coordinates of the center point of the object on the feature map and predicting the individual distances between the center point and the upper, lower, left and right boundaries, object detection is performed. For Anchor Free, there is no need to set pre-selected boxes in advance, and the computational cost will not be increased due to the need to filter a large number of pre-selected boxes. However, since there is no information about the pre-selected boxes, it is difficult for the model to converge on the regression of the distance information between the center point and the boundary.

如表4所示,本創作是一種時序特徵的輔助模組 10,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) ,進行物件偵測,包括:至少一非預選框模組(Anchor Free module),通過在特徵圖上尋找物件的中心點座標,並預測出中心點與上下左右邊界個別的距離,來進行物件偵測。As shown in Table 4, the present invention is an auxiliary module 10 of temporal features, which processes a video frame of a spatio-temporal feature map to detect an object, including at least one anchor free module, which detects an object by finding the coordinates of the center point of the object on the feature map and predicting the distances between the center point and the upper, lower, left and right boundaries.

如表4所示,本創作是一種時序特徵的輔助模組 10,其中Monodle 套用時序輔助模組實驗數據: Car 2D AP70↑ BEV AP70↑ 3D AP70↑ BEV AP50↑ 3D   P50↑ E M H E M H E M H E M H E M H Baseline 95.54  87.09  78.87 23.74  23.03  21.43 17.26  19.16  16.71 58.70  48.78  43.36 53.25  42.59  40.60 LSTM 95.92  87.37  79.10 28.19  23.49  21.82 21.20  19.77   16.99 60.99  49.71  43.92 56.71  43.65  41.47 Diff. +0.38   +0.28  +0.23 +4.45  +0.46  +0.39 +3.94    +0.61 +0.28 +2.29  +0.93  +0.56 +3.46  +1.06  +0.87 Car 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D   P25↑ E M H E M H E M H E M H E M H Baseline 74.38  59.74  51.27 8.94  7.70   6.99 6.90    7.13    5.44 28.26  24.44  19.39 27.09  23.22  18.62 LSTM 66.21  64.13  56.02 8.32   6.52   6.34 8.31    6.49   5.62 29.17  25.19  23.53 28.84  24.76  20.55 Diff. -8.17    +4.39  +4.75 -0.62   -1.18   -0.65 -0.2    -1.1   +0.18 +0.91  +0.75   +4.14 +1.75   +1.54  +1.93 Cyclist 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D   P25↑ E M H E M H E M H E M H E M H Baseline 67.55  45.55  45.09 8.79   5.48    5.49 7.20    5.40    5.40 23.67  15.25  14.01 23.43  15.05  13.80 LSTM 70.25  46.32  45.85 7.96   5.65    5.65 6.51    5.50    5.51 23.15  14.17  13.43 23.15  14.17  13.43 Diff. +2.7   +0.77    +0.76 -0.83  +0.17    +0.16 -0.69  +0.1   +0.11 -0.52  -1.08   -0.58 -0.28   -0.88  -0.37 mAP 2D↑ BEV  Hard↑ 3D Hard↑ BEV Easy↑ 3D   Easy↑ E M H E M H E M H E M H E M H Baseline 79.16  64.13  58.41 13.82  12.07  11.30 10.45  10.56   9.18 36.88  29.49  25.59 34.59  26.95  24.34 LSTM 77.46  65.94  60.32 14.82  11.89  11.27 11.47  10.43   9.37 37.77  29.69  26.96 36.23  27.53  25.15 Diff. -1.7  +1.81   +1.91 +1     -0.18     -0.03 +1.24    +0.93 +0.75 +0.89    +0.2   +1.37 +1.64   +0.58  +0.81 As shown in Table 4, this work is an auxiliary module of timing characteristics 10, in which Monodle applies the experimental data of the timing auxiliary module: Car 2D AP70↑ BEV AP70↑ 3D AP70↑ BEV AP50↑ 3D P50↑ E M H E M H E M H E M H E M H Baseline 95.54 87.09 78.87 23.74 23.03 21.43 17.26 19.16 16.71 58.70 48.78 43.36 53.25 42.59 40.60 LSTM 95.92 87.37 79.10 28.19 23.49 21.82 21.20 19.77 16.99 60.99 49.71 43.92 56.71 43.65 41.47 Diff. +0.38 +0.28 +0.23 +4.45 +0.46 +0.39 +3.94 +0.61 +0.28 +2.29 +0.93 +0.56 +3.46 +1.06 +0.87 Car 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D P25↑ E M H E M H E M H E M H E M H Baseline 74.38 59.74 51.27 8.94 7.70 6.99 6.90 7.13 5.44 28.26 24.44 19.39 27.09 23.22 18.62 LSTM 66.21 64.13 56.02 8.32 6.52 6.34 8.31 6.49 5.62 29.17 25.19 23.53 28.84 24.76 20.55 Diff. -8.17 +4.39 +4.75 -0.62 -1.18 -0.65 -0.2 -1.1 +0.18 +0.91 +0.75 +4.14 +1.75 +1.54 +1.93 Cyclist 2D AP50↑ BEV AP50↑ 3D AP50↑ BEV AP25↑ 3D P25↑ E M H E M H E M H E M H E M H Baseline 67.55 45.55 45.09 8.79 5.48 5.49 7.20 5.40 5.40 23.67 15.25 14.01 23.43 15.05 13.80 LSTM 70.25 46.32 45.85 7.96 5.65 5.65 6.51 5.50 5.51 23.15 14.17 13.43 23.15 14.17 13.43 Diff. +2.7 +0.77 +0.76 -0.83 +0.17 +0.16 -0.69 +0.1 +0.11 -0.52 -1.08 -0.58 -0.28 -0.88 -0.37 mAP 2D↑ BEV Hard↑ 3D Hard↑ BEV Easy↑ 3D Easy↑ E M H E M H E M H E M H E M H Baseline 79.16 64.13 58.41 13.82 12.07 11.30 10.45 10.56 9.18 36.88 29.49 25.59 34.59 26.95 24.34 LSTM 77.46 65.94 60.32 14.82 11.89 11.27 11.47 10.43 9.37 37.77 29.69 26.96 36.23 27.53 25.15 Diff. -1.7 +1.81 +1.91 +1 -0.18 -0.03 +1.24 +0.93 +0.75 +0.89 +0.2 +1.37 +1.64 +0.58 +0.81

如表4所示,本創作是一種時序特徵的輔助模組 10,其中經實驗數據分析,在 Anchor Free 模型架構下加入時序輔助模組對預測準確度平均提升了 0.62,在個別物件中,又以車輛的提升最為穩定且明顯。除了透過數據比較外, 也將依據物件被遮蔽、因移動而被移出畫面以及小物件偵測等情形進行視覺化數據的呈現,以此證明加入了本創作所提的時序輔助模組對於 Anchor Free 模型在上述情況中的偵測效果可有所提升。As shown in Table 4, this work is an auxiliary module of time series features 10. After experimental data analysis, adding the time series auxiliary module to the Anchor Free model framework improves the prediction accuracy by an average of 0.62. Among individual objects, the improvement of vehicles is the most stable and obvious. In addition to data comparison, visual data will also be presented based on situations such as objects being blocked, being moved out of the screen due to movement, and small object detection, to prove that adding the time series auxiliary module proposed in this work can improve the detection effect of the Anchor Free model in the above situations.

如圖18所示,本創作是一種時序特徵的輔助模組 10,其中,在物件遮蔽的案例中,如圖18所示。可以看到在為加入時序特徵輔助的 Baseline 模型的預測中,並未包含被遮蔽的車輛,而在加入了時序特徵輔助模組的模型中,因為額外參考了觀測資料中的特偵資訊,使的被遮蔽的車輛能被偵測到。As shown in FIG18 , this invention is an auxiliary module 10 of temporal features, in which, in the case of object occlusion, as shown in FIG18 , it can be seen that the prediction of the Baseline model without the assistance of temporal features does not include the occluded vehicle, while in the model with the assistance module of temporal features, the occluded vehicle can be detected because of the additional reference to the special detection information in the observation data.

如圖19所示,本創作是一種時序特徵的輔助模組 10,其中,在物件移出畫面的例子中,如圖19所示,未加入時序特徵輔助的 Baseline 模型由於其僅透過當下資料中的影像特徵進行物件的偵測,當物件移出畫面時會因為缺乏完整的特偵資訊導致無法準確的偵測到該物件,而在加入了時序特徵輔助模組的模型中, 由於參考了未移出畫面的物件資訊,因此當物件些為移出畫面時依舊可以進行偵測。As shown in FIG19 , the present invention is a time-series feature auxiliary module 10, in which, in the example of an object moving out of the screen, as shown in FIG19 , the Baseline model without the time-series feature auxiliary module detects the object only through the image features in the current data. When the object moves out of the screen, it will not be able to accurately detect the object due to the lack of complete detection information. In the model with the time-series feature auxiliary module, the information of the object that has not moved out of the screen is referenced, so when the object moves out of the screen, it can still be detected.

如圖20所示,本創作是一種時序特徵的輔助模組 10,其中,在小物件偵測方面,如圖20所示,在本案例中較無物件的移出遮蔽的狀況, 但在物件體積上由於與輸角裝置距離較遠因此體積較小,在加入時序特徵進行輔助的模組中,因為有觀測資料對當下小物件的特徵進行補強,所以對小物件的偵測效果有所提升。As shown in FIG20 , this invention is an auxiliary module 10 of time series features. In terms of small object detection, as shown in FIG20 , in this case, there is no object moving out of the shielding, but the object is smaller in size because it is farther away from the output angle device. In the module that adds time series features for assistance, because there is observation data to reinforce the features of the current small object, the detection effect of the small object is improved.

如圖21所示,本創作是一種時序特徵的輔助模組 10,其中,在未發生遮蔽、移出畫面或小物件的情境中,如圖 4.13 所示,雖然在場景中並沒有發生上述特殊情況,且畫面中的物件皆穩定的出現,因此在加入了時序輔助模組前後對偵測效果並不影響As shown in Figure 21, this invention is an auxiliary module 10 of the timing feature. In the case where there is no occlusion, no moving out of the screen or small objects, as shown in Figure 4.13, although the above special situations do not occur in the scene, and the objects in the screen appear stably, the detection effect is not affected before and after the addition of the timing auxiliary module.

如表5所示,本創作是一種時序特徵的輔助模組 10,其中單目 3D 物件偵測模型比較: 3D AP70↑ Extra Data Car                   Pedestrian                 Cyclist Depth Temporal E       M       H        E       M       H E M H CaDDN V 24.87 15.63 14.47 16.51 13.37 12.21 9.68 9.09 9.09 Kinematic3D Result 13.01 9.43   7.38   1.19   0.57   0.57 0.00 0.00 0.00 VisualDet3D 19.43 13.60 10.82  6.94   5.11    4.31 2.44 1.41 1.43 Monodle 17.26 19.16 16.71  6.90   7.13   5.44 7.20 5.40 5.40 VisualDet3D LSTM 21.24 15.78 12.07  7.94   6.08   4.92 4.55 2.15 2.27 Monodle LSTM 21.20 19.77 16.99  6.70   6.03   5.62 6.51 5.50 5.51 As shown in Table 5, this work is an auxiliary module of temporal features 10, in which the monocular 3D object detection model is compared: 3D AP70↑ Extra Data Car Pedestrian Cyclist Depth Temporal E M H E M H E M H CaDN V 24.87 15.63 14.47 16.51 13.37 12.21 9.68 9.09 9.09 Kinematic3D Result 13.01 9.43 7.38 1.19 0.57 0.57 0.00 0.00 0.00 VisualDet3D 19.43 13.60 10.82 6.94 5.11 4.31 2.44 1.41 1.43 Monodle 17.26 19.16 16.71 6.90 7.13 5.44 7.20 5.40 5.40 VisualDet3D LSTM 21.24 15.78 12.07 7.94 6.08 4.92 4.55 2.15 2.27 Monodle LSTM 21.20 19.77 16.99 6.70 6.03 5.62 6.51 5.50 5.51

如表5所示,本創作是一種時序特徵的輔助模組 10,其中在驗證過本創作可以通用於不同模型架構後,在本段落中,將比較加入本創作所提出之輔助模組與當下最先進之 3D 物件偵測模型之比較結果。在比較對象上,選擇在單目 3D 物件偵測的作法,並且選擇上盡量選擇僅在訓練時會使用到深度資訊或完全不使用深度資訊的模型,在使用深度的模型中以 CaDDN做為比較對象, 而在不使用深度資訊的模型則選擇  Kinematic3D 、Monodle  與VisualDet3D 作為代表,並透過於兩篇不使用深度資訊的模型中加入時序模組進行比較。實驗結果如表 5 所示,該表格分為兩個部分,上半部分為原始的模型架構,而下半部分則是加入了本創作所提出之時序特徵輔助模組之效果。As shown in Table 5, this work is an auxiliary module of temporal features 10. After verifying that this work can be applied to different model architectures, in this section, the comparison results of adding the auxiliary module proposed in this work and the most advanced 3D object detection model will be compared. In terms of comparison objects, the method of monocular 3D object detection is selected, and the model that only uses depth information during training or does not use depth information at all is selected as much as possible. Among the models that use depth, CaDDN is used as the comparison object, and among the models that do not use depth information, Kinematic3D, Monodle and VisualDet3D are selected as representatives, and the temporal module is added to the two models that do not use depth information for comparison. The experimental results are shown in Table 5. The table is divided into two parts. The upper part is the original model structure, and the lower part is the effect of adding the timing feature auxiliary module proposed in this work.

以上之敘述以及說明僅為本創作之較佳實施例之說明,對於此項技術具有通常知識者當可依據以下所界定申請專利範圍以及上述之說明而作其他之修改,惟此些修改仍應是為本創作之創作精神而在本創作之權利範圍中。The above description and explanation are only the description of the preferred embodiment of the present invention. Those with ordinary knowledge of this technology can make other modifications according to the scope of the patent application defined below and the above explanation. However, these modifications should still be within the creative spirit of the present invention and within the scope of the rights of the present invention.

Y T0T0時間點的輸出狀態資訊 X T0T0時間點的輸入狀態資訊 H T0T0時間點的隱藏狀態資訊 Y T1T1時間點的輸出狀態資訊 X T1T1時間點的輸入狀態資訊 H T1T1時間點的隱藏狀態資訊 Y T2T2時間點的輸出狀態資訊 X T2T2時間點的輸入狀態資訊 H T2T2時間點的隱藏狀態資訊 X t當下時間點的輸入狀態資訊 Y t當下時間點的輸出狀態資訊 H t-1前一時間點的隱藏狀態資訊 H t當下時間點的隱藏狀態資訊 C t-1前一時間點的單元狀態 C t當下時間點的單元狀態 21       先前技術遞迴神經網路模組 31      先前技術長短期記憶模組 41     先前技術閘門循環單元模組 501   遞迴神經網路模組(Recurrent Neural Networks module, RNN module ) 601   長短期記憶模組(Long Short-Term Memory module, LSTM module) 701    閘門循環單元模組(Gated Recurrent Unit module, GRU module) 11       隱藏層 51       第1激勵函數層 64       第2激勵函數層 65       第3激勵函數層 73       第4激勵函數層 54       第1二維卷積層 53       第2二維卷積層 56       第3二維卷積層 58       第4二維卷積層 55       第1連接層 57       第2連接層 61      遺忘閘 62       輸入閘 63       輸出閘 71       重置閘 72       更新閘 81      骨幹網路層 82      頸部層 10      時序特徵的輔助模組 83      檢測頭部層 Y T0 Output state information at time T0 X T0 Input state information at time T0 H T0 Hidden state information at time T0 Y T1 Output state information at time T1 X T1 Input state information at time T1 H T1 Hidden state information at time T1 Y T2 Output state information at time T2 X T2 Input state information at time T2 H T2 Hidden state information at time T2 X t Input state information at current time Y t Output state information at current time H t-1 Hidden state information at previous time H t Hidden state information at current time C t-1 Unit state at previous time C t Unit state at current time 21 Prior art recurrent neural network module 31 Prior art long short-term memory module 41 Prior art gated recurrent unit module 501 Recurrent neural network module (RNN module) 601 Long short-term memory module (LSTM module) 701 Gated recurrent unit module (GRU module) 11 Hidden layer 51 1st activation function layer 64 2nd activation function layer 65 3rd activation function layer 73 4th activation function layer 54 1st two-dimensional convolution layer 53 2nd two-dimensional convolution layer 56 3rd two-dimensional convolution layer 58 4th 2D convolution layer 55 1st connection layer 57 2nd connection layer 61 Forget gate 62 Input gate 63 Output gate 71 Reset gate 72 Update gate 81 Backbone network layer 82 Neck layer 10 Auxiliary module of timing characteristics 83 Detection head layer

圖1是先前技術的遞迴神經網路(Recurrent Neural Networks)流程示意圖。 圖2是先前技術的循環神經網路細胞(Recurrent Neural Networks cell)示意圖。 圖3是先前技術的長短期記憶細胞(Long Short-Term Memory cell, LSTM cell)示意圖。 圖4是先前技術的循環單元細胞 (Gated Recurrent Unit cell, GRU cell) 示意圖。 圖5是本創作遞迴神經網路(Recurrent Neural Networks)示意圖。 圖6是本創作循環神經網路細胞(Recurrent Neural Networks cell)示意圖。 圖7是本創作長短期記憶細胞(Long Short-Term Memory cell, LSTM cell)示意圖。 圖8是本創作於骨幹網路層 (Backbone) 後加入時序模組示意圖。 圖9是本創作於 頸部層(neck layer)中加入時序模組示意圖。 圖10是本創作於 檢測頭部層(head layer) 前加入時序模組示意圖。 圖11是本創作預選框(Anchor Based) 物件偵測示意圖。 圖12是本創作非預選框(Anchor Free) 物件偵測示意圖。 圖13是本創作時序模組未加入遺忘閘之問題示意圖。 圖14是本創作VisualDet3D 加入時序模組後對物件被遮蔽情形的輔助效果示意圖。 圖15是本創作VisualDet3D 加入時序模組後對物件移出畫面情形的輔助效果示意圖。 圖16是本創作VisualDet3D 加入時序模組後對小物件偵測的輔助效果示意圖。 圖17是本創作VisualDet3D 加入時序模組後在一般情況下之偵測效果示意圖。 圖18是本創作Monodle 加入時序模組後對物件被遮蔽情形的輔助效果示意圖。 圖19是本創作Monodle加入時序模組後對物件移出畫面情形的輔助效果示意圖。 圖20是本創作Monodle 加入時序模組對於小物件偵測的輔助效果示意圖。 圖21是本創作Monodle 加入時序模組後在一般情況下之偵測效果示意圖。 Figure 1 is a schematic diagram of the recurrent neural network process of the prior art. Figure 2 is a schematic diagram of the recurrent neural network cell of the prior art. Figure 3 is a schematic diagram of the long short-term memory cell (LSTM cell) of the prior art. Figure 4 is a schematic diagram of the gated recurrent unit cell (GRU cell) of the prior art. Figure 5 is a schematic diagram of the recurrent neural network of the present invention. Figure 6 is a schematic diagram of the recurrent neural network cell of the present invention. Figure 7 is a schematic diagram of the long short-term memory cell (LSTM cell) of the present invention. Figure 8 is a schematic diagram of the present invention adding a timing module after the backbone network layer. Figure 9 is a schematic diagram of the present invention adding a timing module in the neck layer. Figure 10 is a schematic diagram of the present invention adding a timing module before the detection head layer. Figure 11 is a schematic diagram of the present invention's pre-selected box (Anchor Based) object detection. Figure 12 is a schematic diagram of the present invention's non-pre-selected box (Anchor Free) object detection. Figure 13 is a schematic diagram of the present invention's timing module without adding a forget gate. Figure 14 is a schematic diagram of the present invention's VisualDet3D adding a timing module to assist in the case of an object being obscured. Figure 15 is a diagram showing the auxiliary effect of adding a timing module to the VisualDet3D of this creation on the situation where the object moves out of the screen. Figure 16 is a diagram showing the auxiliary effect of adding a timing module to the VisualDet3D of this creation on the detection of small objects. Figure 17 is a diagram showing the detection effect of adding a timing module to the VisualDet3D of this creation under normal circumstances. Figure 18 is a diagram showing the auxiliary effect of adding a timing module to the Monodle of this creation on the situation where the object is blocked. Figure 19 is a diagram showing the auxiliary effect of adding a timing module to the Monodle of this creation on the situation where the object moves out of the screen. Figure 20 is a diagram showing the auxiliary effect of adding a timing module to the Monodle of this creation on the detection of small objects. Figure 21 is a diagram showing the detection effect of the Monodle in general after adding the timing module.

Xt:當下時間點的輸入狀態資訊 Xt : Input status information at the current time point

Yt:當下時間點的輸出狀態資訊 Y t : Output status information at the current time point

Ht-1:前一時間點的隱藏狀態資訊 H t-1 : Hidden state information at the previous time point

Ht:當下時間點的隱藏狀態資訊 H t : Hidden state information at the current time point

Ct-1:前一時間點的單元狀態 C t-1 : The unit state at the previous time point

Ct:當下時間點的單元狀態 C t : The unit state at the current time point

64:第2激勵函數層 64: Second incentive function layer

65:第3激勵函數層 65: The third incentive function layer

54:第1二維卷積層 54: 1st 2D convolution layer

53:第2二維卷積層 53: Second 2D convolution layer

55:第1連接層 55:1st connection layer

56:第3二維卷積層 56: The third two-dimensional convolution layer

57:第2連接層 57: Second connection layer

61:遺忘閘 61: Forgotten Gate

62:輸入閘 62: Input gate

63:輸出閘 63: Output gate

64:第2激勵函數 64: Second incentive function

71:重製閘 71: Remake the gate

72:更新閘 72: Update Gate

73:第4激勵函數 73: The fourth incentive function

Claims (6)

一種時序特徵的輔助模組,用於單目3D 物件偵測,其中該時序特徵的輔助模組(temporal assistant module),分別與一遞迴神經網路模組(Recurrent Neural Networks module, RNN module)、一長短期記憶模組(Long Short-Term Memory module, LSTM module)、以及一閘門循環單元模組(Gated Recurrent Unit module, GRU module)至少其中之一連結,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) 是經由該時序特徵的輔助模組(temporal assistant module)處理,該時序特徵的輔助模組(temporal assistant module)包括: 一第1二維卷積層,由前一時間點的一隱藏狀態資訊(H t-1)輸入至該第1二維卷積層; 一第2二維卷積層,由當下時間點的一輸入狀態資訊(X t)輸入至該第2二維卷積層; 一第1連接層,由該第1二維卷積層輸出至該第1連接層、以及由該第2二維卷積層分別輸出至該第1連接層;以及 一第3二維卷積層,由該第1連接層輸出至該第3二維卷積層: 其中經由該時序特徵的輔助模組(temporal assistant module),分別調整該遞迴神經網路模組(Recurrent Neural Networks module)、該長短期記憶模組(Long Short-Term Memory module, LSTM module)、以及該閘門循環單元模組(Gated Recurrent Unit module, GRU module)的當下時間點的一隱藏狀態資訊(H t)、以及當下時間點的一輸出狀態資訊(Y t),藉以對物件被遮蔽、被移出偵測畫面、或是小物件偵測,增強輔助效果的平均精準度(average precision, AP); 其中該遞迴神經網路模組(Recurrent Neural Networks module, RNN module),是由該第3二維卷積層輸出至一第1激勵函數層,由該第1激勵函數層分別輸出至當下時間點的該隱藏狀態資訊(H t)、以及輸出至當下時間點的該輸出狀態資訊(Y t); 其中該長短期記憶模組(Long Short-Term Memory module, LSTM module),包括: 該第3二維卷積層分別輸出連結至一遺忘閘、一輸入閘、一第2激勵函數層、以及一輸出閘,其中該遺忘閘、該輸入閘、以及該輸出閘是S型函數(Sigmoid); 該遺忘閘輸出端資訊與該遺忘閘輸出資訊相乘的到一第1資訊,該輸入閘輸出端資訊與該第2激勵函數層輸出端資訊相乘的到一第2資訊,該第1資訊與該第2資訊相加後,分別輸出至一第3激勵函數層、以及輸出至當下時間點的一單元狀態(C t);以及 該第2激勵函數層輸出端資訊與該輸出閘資訊相乘後,分別輸出至當下時間點的該隱藏狀態資訊(H t)、以及輸出當下時間點的該輸出狀態資訊(Y t); 其中該閘門循環單元模組(Gated Recurrent Unit module, GRU module) ,包括: 該第3二維卷積層分別輸出連結至一重置閘、以及一更新閘,其中該重置閘、以及該更新閘是S型函數(Sigmoid); 該重置閘的輸出端資訊與該第1二維卷積層的輸出端資訊相乘後,輸出至一第2連接層,該第2連接層的輸出端資訊,輸出至一第4二維卷積層, 該第4二維卷積層的輸出端資訊,輸出至一第4激勵函數層;以及 該第1二維卷積層的輸出端資訊與該更新閘輸出端資訊延遲後相乘,輸出一第3資訊,該更新閘輸出端資訊愈該第4激勵函數層輸出端資訊相乘後,輸出一第4資訊,該第3資訊與該第4資訊相加後,分別輸出當下時間點的該隱藏狀態資訊(H t) 、以及輸出當下時間點的一輸出狀態資訊(Y t)。 A temporal feature assistant module for monocular 3D object detection, wherein the temporal feature assistant module is connected to at least one of a recurrent neural network module (RNN module), a long short-term memory module (LSTM module), and a gated recurrent unit module (GRU module), wherein a video frame processing a spatio-temporal feature map is processed by the temporal feature assistant module, and the temporal feature assistant module comprises: a first two-dimensional convolution layer, which is composed of a hidden state information (H) at a previous time point; t-1 ) is input to the first two-dimensional convolution layer; a second two-dimensional convolution layer is input to the second two-dimensional convolution layer by an input state information (X t ) at the current time point; a first connection layer is output from the first two-dimensional convolution layer to the first connection layer, and from the second two-dimensional convolution layer to the first connection layer respectively; and a third two-dimensional convolution layer is output from the first connection layer to the third two-dimensional convolution layer: wherein the recurrent neural network module (Recurrent Neural Networks module), the long short-term memory module (Long Short-Term Memory module, LSTM) are adjusted respectively through the auxiliary module of the temporal feature (temporal assistant module). module), and a hidden state information (H t ) of the gated recurrent unit module (GRU module) at the current time point, and an output state information (Y t ) at the current time point, so as to enhance the average precision (AP) of the auxiliary effect for detecting objects that are blocked, moved out of the detection screen, or small objects; wherein the recurrent neural network module (RNN module) is output by the third two-dimensional convolution layer to a first excitation function layer, and the first excitation function layer outputs the hidden state information (H t ) at the current time point, and outputs the output state information (Y t ) at the current time point; The LSTM module includes: the third two-dimensional convolution layer is connected to a forget gate, an input gate, a second excitation function layer, and an output gate, wherein the forget gate, the input gate, and the output gate are sigmoid functions; the output information of the forget gate is multiplied by the output information of the forget gate to obtain the first information, the output information of the input gate is multiplied by the output information of the second excitation function layer to obtain the second information, and the first information and the second information are added and output to the third excitation function layer and a unit state (C) at the current time point. t ); and after the output terminal information of the second excitation function layer is multiplied by the output gate information, the hidden state information (H t ) at the current time point and the output state information (Y t ) at the current time point are output respectively; wherein the gated recurrent unit module (GRU module) comprises: the third two-dimensional convolution layer outputs are connected to a reset gate and an update gate respectively, wherein the reset gate and the update gate are sigmoid functions; The output terminal information of the reset gate is multiplied by the output terminal information of the first two-dimensional convolution layer, and then output to a second connection layer. The output terminal information of the second connection layer is output to a fourth two-dimensional convolution layer. The output terminal information of the fourth two-dimensional convolution layer is output to a fourth excitation function layer. The output terminal information of the first two-dimensional convolution layer is multiplied by the output terminal information of the update gate after a delay, and then output a third information. The output terminal information of the update gate is multiplied by the output terminal information of the fourth excitation function layer, and then output a fourth information. After the third information and the fourth information are added, the hidden state information (H t ) at the current time point is output respectively. , and output the output status information (Y t ) at the current time point. 如請求項1所述之時序特徵的輔助模組,其中包括: 一骨幹網路層(backbone layer),該骨幹網路層的輸入端連接輸入資料特徵,藉以提取輸入資料特徵; 該時序特徵的輔助模組(temporal assistant module)的輸入端連結該骨幹網路層的輸出端; 一頸部層(neck layer),該頸部層的輸入端連結該時序特徵的輔助模組(temporal assistant module) 的輸出端,藉以融合資料特徵;以及 一檢測頭部層(head layer),該頸部層的輸出端連結該檢測頭部層(head layer) 的輸入端。 The auxiliary module of the temporal feature as described in claim 1, comprising: a backbone network layer (backbone layer), the input end of the backbone network layer is connected to the input data feature to extract the input data feature; the input end of the auxiliary module of the temporal feature (temporal assistant module) is connected to the output end of the backbone network layer; a neck layer (neck layer), the input end of the neck layer is connected to the output end of the auxiliary module of the temporal feature (temporal assistant module) to fuse data features; and a detection head layer (head layer), the output end of the neck layer is connected to the input end of the detection head layer (head layer). 如請求項1所述之時序特徵的輔助模組,其中包括: 一骨幹網路層(backbone layer),該骨幹網路層的輸入端連接輸入資料特徵,藉以提取輸入資料特徵; 一頸部層(neck layer),該頸部層的輸入端連結該骨幹網路層(backbone layer) 的輸出端,藉以融合資料特徵; 該時序特徵的輔助模組(temporal assistant module)置入該頸部層(neck layer)中,藉以整合不同尺度的資料特徵;以及 一檢測頭部層(head layer),該頸部層的輸出端連結該檢測頭部層(head layer) 的輸入端。 The auxiliary module of the temporal feature as described in claim 1, comprising: a backbone layer, the input end of the backbone layer is connected to the input data feature to extract the input data feature; a neck layer, the input end of the neck layer is connected to the output end of the backbone layer to fuse the data feature; the auxiliary module of the temporal feature (temporal assistant module) is placed in the neck layer to integrate the data features of different scales; and a detection head layer (head layer), the output end of the neck layer is connected to the input end of the detection head layer (head layer). 如請求項1所述之時序特徵的輔助模組,其中包括: 一骨幹網路層(backbone layer),該骨幹網路層的輸入端連接輸入資料特徵,藉以提取輸入資料特徵; 一頸部層(neck layer),該頸部層的輸入端連結該骨幹網路層(backbone layer),藉以融合資料特徵; 該時序特徵的輔助模組(temporal assistant module)的輸入端連結該骨幹網路層的輸出端;以及 一檢測頭部層(head layer),該時序特徵的輔助模組(temporal assistant module)的輸出端連結該檢測頭部層(head layer) 的輸入端。 The auxiliary module of the temporal feature as described in claim 1, comprising: a backbone layer, the input end of the backbone layer is connected to the input data feature to extract the input data feature; a neck layer, the input end of the neck layer is connected to the backbone layer to fuse the data feature; the input end of the auxiliary module of the temporal feature (temporal assistant module) is connected to the output end of the backbone layer; and a detection head layer (head layer), the output end of the auxiliary module of the temporal feature (temporal assistant module) is connected to the input end of the detection head layer (head layer). 如請求項1所述之時序特徵的輔助模組,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) ,進行物件偵測,包括: 至少一預選框模組(anchor base module) ,該至少一預選框模組(anchor base module)是將特徵圖切割成多個比例不同的網格(Grid),並將設定好的該至少一預選框(anchor base)放置於各個網格中,擷取重疊率最高的預選框(anchor base) ,並透過調整偏移量來進行物件偵測。 The auxiliary module of temporal features as described in claim 1, wherein the video frame of the spatio-temporal feature map is processed to detect objects, including: At least one anchor base module, the at least one anchor base module cuts the feature map into multiple grids of different proportions, and places the set at least one anchor base in each grid, captures the anchor base with the highest overlap rate, and detects objects by adjusting the offset. 如請求項1所述之時序特徵的輔助模組,其中處理空間時序特徵圖(spatio-temporal feature map)的視訊幀(video frame) ,進行物件偵測,包括: 至少一非預選框模組(Anchor Free module),通過在特徵圖上尋找物件的中心點座標,並預測出中心點與上下左右邊界個別的距離,來進行物件偵測。 The auxiliary module of the temporal feature as described in claim 1, wherein the video frame of the spatio-temporal feature map is processed to detect an object, including: At least one anchor free module, which detects an object by finding the coordinates of the center point of the object on the feature map and predicting the distances between the center point and the upper, lower, left and right boundaries.
TW113134609A 2024-09-12 2024-09-12 Temporal assistant module TWI887115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW113134609A TWI887115B (en) 2024-09-12 2024-09-12 Temporal assistant module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW113134609A TWI887115B (en) 2024-09-12 2024-09-12 Temporal assistant module

Publications (1)

Publication Number Publication Date
TWI887115B true TWI887115B (en) 2025-06-11

Family

ID=97227696

Family Applications (1)

Application Number Title Priority Date Filing Date
TW113134609A TWI887115B (en) 2024-09-12 2024-09-12 Temporal assistant module

Country Status (1)

Country Link
TW (1) TWI887115B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230244981A1 (en) * 2021-04-16 2023-08-03 Strong Force Vcn Portfolio 2019, Llc Ion-Trapping Quantum Computing Task Execution
US20230252776A1 (en) * 2020-12-18 2023-08-10 Strong Force Vcn Portfolio 2019, Llc Variable-Focus Dynamic Vision for Robotic System
US20240144011A1 (en) * 2019-11-05 2024-05-02 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for using artificial intelligence for instructing smart machines in value chain networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20240144011A1 (en) * 2019-11-05 2024-05-02 Strong Force Vcn Portfolio 2019, Llc Systems, methods, kits, and apparatuses for using artificial intelligence for instructing smart machines in value chain networks
US20230252776A1 (en) * 2020-12-18 2023-08-10 Strong Force Vcn Portfolio 2019, Llc Variable-Focus Dynamic Vision for Robotic System
US20230244981A1 (en) * 2021-04-16 2023-08-03 Strong Force Vcn Portfolio 2019, Llc Ion-Trapping Quantum Computing Task Execution
US12039559B2 (en) * 2021-04-16 2024-07-16 Strong Force Vcn Portfolio 2019, Llc Control tower encoding of cross-product data structure

Similar Documents

Publication Publication Date Title
CN109934115B (en) Face recognition model construction method, face recognition method and electronic equipment
CN108257158B (en) Target prediction and tracking method based on recurrent neural network
CN106960195A (en) A kind of people counting method and device based on deep learning
KR102138680B1 (en) Apparatus for Video Recognition and Method thereof
CN110956069A (en) Pedestrian 3D position detection method and device and vehicle-mounted terminal
CN112507862A (en) Vehicle orientation detection method and system based on multitask convolutional neural network
CN111461221A (en) A multi-source sensor fusion target detection method and system for autonomous driving
CN108846328A (en) Lane detection method based on geometry regularization constraint
CN114494395B (en) Depth map generation method, device, equipment and storage medium based on plane prior
CN115880658B (en) Early warning method and system for lane departure of automobile in night scene
CN106971178A (en) Pedestrian detection and the method and device recognized again
CN111079739A (en) Multi-scale attention feature detection method
US20180173982A1 (en) System and method for 1d root association providing sparsity guarantee in image data
CN112249021B (en) A road pedestrian collision risk prediction method and system
KR101869266B1 (en) Lane detection system based on extream learning convolutional neural network and method thereof
CN115731530A (en) Model training method and device
CN114842439A (en) Cross-perception-device vehicle identification method and device, electronic device and storage medium
CN110738076A (en) People counting method and system in images
CN114708562A (en) A Bicycle Helmet Detection Method Based on Improved FCOS and Embedding Grouping
CN110795975B (en) Face false detection optimization method and device
CN112613427B (en) Road obstacle detection method based on visual information flow partition projection coding model
CN113139567B (en) Information processing device and control method thereof, vehicle, recording medium, information processing server, information processing method
CN117830354A (en) Track acquisition method, track acquisition device, computer equipment and storage medium
TWI887115B (en) Temporal assistant module
CN116452828A (en) Target detection method, electronic device and storage medium