CN117011337A - A lightweight target tracking method - Google Patents
A lightweight target tracking method Download PDFInfo
- Publication number
- CN117011337A CN117011337A CN202310968693.2A CN202310968693A CN117011337A CN 117011337 A CN117011337 A CN 117011337A CN 202310968693 A CN202310968693 A CN 202310968693A CN 117011337 A CN117011337 A CN 117011337A
- Authority
- CN
- China
- Prior art keywords
- target
- template
- feature
- inputting
- search image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a light target tracking method, which comprises the steps of firstly obtaining a target tracking video, defining an initial template and a dynamic template, and constructing a template set. And secondly, reading a search image, inputting the template set and the search image into a preprocessing module to obtain an output result a, and inputting the output result a into a target prediction module to predict a center point and a boundary box of a target. And then inputting the search features into an online target predictor to obtain a predicted result b, and linearly fusing the predicted result b with a center point to obtain a final predicted result of the center point, thereby completing target tracking. And finally, inputting the search image and the prediction result b into a template updater, and determining whether to take the search image as a new dynamic template according to an updating strategy by the template updater. The invention reduces the calculated amount and parameter amount of the model, enhances the capability of coping with scenes with complex environment and larger target change, and improves the success rate and the accuracy rate of tracking.
Description
Technical Field
The invention belongs to the field of target tracking in computer images, and particularly relates to a light target tracking method.
Background
Target tracking is an important research direction in the field of computer vision, and is widely applied to the fields of video monitoring, automatic driving, man-machine interaction and the like. The object tracking algorithm is aimed at automatically tracking a specific object, typically an object or a pedestrian, in a video sequence. In recent years, the development of deep learning technology enables the accuracy of a target tracking algorithm based on a neural network to be remarkably improved, and the method becomes a mainstream method of the target tracking algorithm.
The full convolution classification regression neural network SiamCAR is a target tracking algorithm based on deep learning, and firstly adopts a structure of a twin network, respectively processes a template image and a search image through a weight sharing feature extraction network, fuses multiple layers of intermediate features to obtain two feature images, and calculates a similarity feature image between the two feature images; secondly, enhancing the similarity feature map by using a feature enhancement network stacked by a plurality of layers of common convolutions; and finally, respectively predicting the center point of the target and the scale of the target from the enhanced similarity feature map to realize target tracking. However, because the SiamCAR model has a complex structure and large calculated amount and parameter amount, the application of the SiamCAR model in some scenes with limited running resources is limited, and the SiamCAR model does not update the model image in the tracking process, so that the accuracy of the SiamCAR model in the scenes with complex environment and large target change is low.
Disclosure of Invention
Aiming at the problems, the invention provides a light target tracking method based on a characteristic enhancement network with depth separable convolution and space-time attention fusion and a template updating method in running, which optimizes the SiamCAR algorithm and reduces the calculated amount and the parameter amount of a model.
A light target tracking method comprises the following steps:
step 1: the method comprises the steps of obtaining a video to be subjected to target tracking, defining a first frame image of the video as an initial template, copying the initial template to be defined as a dynamic template at the beginning, and putting the initial template and the dynamic template into a set to construct a template set. Each frame of image except the first frame is sequentially used as a search image to track the target.
Step 2: and reading the search image, and inputting the template set and the search image into a preprocessing module to obtain an output result a. The preprocessing module comprises a feature extractor and a feature similarity calculator.
Step 2.1: in the preprocessing module, extracting features by using a feature extractor t1 from a template set to obtain initial features and dynamic features;
the search image is subjected to feature extraction by a feature extractor t2 which is the same as t1, and search features are obtained.
the structure of the two feature extractors t1 and t2 is based on ResNet50, and the last layer of network is removed.
Step 2.2: and (3) inputting the three features extracted in the step (2.1) into a feature similarity calculator to perform similarity calculation. The feature similarity calculator calculates the similarity of the initial feature, the dynamic feature and the search feature, and selects the feature with the highest similarity as an output result a.
Step 3: and inputting the output result a into a target prediction module, and predicting the center point and the boundary box of the target.
Step 3.1: in the target prediction module, inputting a into a feature enhancer b1, and then inputting the output feature of b1 into a target frame predictor to obtain a boundary frame of a target.
Step 3.2: inputting a into a feature enhancer b2 which is the same as b1, and then inputting the output feature of b2 into a target center predictor to obtain the center point of the target.
Step 4: and inputting the search characteristic into an online target predictor to obtain a prediction result b.
And then, linearly fusing the b with the central point predicting result in the step 3.2 to obtain a final central point predicting result, and completing target tracking according to the boundary frame obtained in the step 3.1 and the final central point predicting result.
Step 5: inputting the search image and the prediction result b in the step 4 into a template updater, and determining whether to take the search image as a new dynamic template according to an updating strategy by the template updater to update the dynamic template.
The invention makes the following contributions:
1. the invention reduces the calculated amount and the parameter amount of the model through the feature enhancement network of the depth separable convolution and the space-time attention fusion in the feature enhancer.
2. The template updating operation is added in the tracking stage, the capability of the algorithm for coping with scenes with complex environments and large target changes is enhanced, and the success rate and the accuracy rate of the tracking algorithm are improved.
3. The invention reduces the calculated amount of 68.3% and the parameter amount of 71% of the model, and improves the success rate and the accuracy rate of tracking in the public data set OTB 100.
Drawings
In order to more intuitively illustrate the details of the invention, a brief description of the drawings will follow.
FIG. 1 is an overall network architecture diagram of the present invention;
FIG. 2 is a flow chart of a specific operation of the present invention;
FIG. 3 is a graph of the cross-over ratio of the present invention with SiamCAR over an OTB100 dataset;
fig. 4 is a graph of the accuracy of the present invention with sialmcar on OTB100 data sets.
Detailed Description
The following describes a light-weight target tracking method with reference to the accompanying drawings, the overall network structure of the invention is shown in fig. 1, the operation flow chart is shown in fig. 2, and the specific steps are as follows:
step 1: the method comprises the steps of obtaining a video to be subject to tracking, defining a first frame image of the whole video as an initial template, copying a dynamic template defined as the initial, and placing the initial template and the dynamic template in a set and defining the initial template and the dynamic template as a template set. Each frame of image except the first frame is sequentially used as a search image to track the target.
Step 2: as shown in fig. 1, the search image is read, and the template set and the search image are input to the preprocessing module. The preprocessing module comprises a feature extractor and a feature similarity calculator, wherein the feature extractor and the feature similarity calculator are respectively used for extracting image features and calculating the similarity of input features for subsequent prediction.
Firstly, extracting features of a template set and a search image by using the same feature extractor respectively, and extracting the initial features and dynamic features by using the template set; and extracting the search image to obtain search features. The feature extractor is structured based on ResNet50, and the last network layer of ResNet50 is eliminated, since the trace task does not need to output classifications during feature extraction and coarser granularity features can better meet the trace task requirements. And, unlike the SiamCAR, which uses a multiple feature fusion feature extractor, the present invention uses only the final output result for subsequent operations, with the ResNet50 last layer network removed. This reduces the number of parameters and computation of the feature extractor.
And then, the feature similarity calculator calculates the similarity between the initial template and the dynamic template after extracting the features in the template set and the search image respectively, and selects the result with the highest similarity as an output result a. The feature calculator is formed by single-layer convolution, takes an initial template and a dynamic template as convolution kernels respectively, inputs a search image to carry out convolution operation, achieves the purpose of calculating similarity, generates two similarity results, and takes a feature with a larger similarity peak value as a final output result a.
Step 3: inputting the a into a target prediction module to predict a center point and a boundary box of the target. The target prediction module is composed of a feature enhancer, a target frame predictor and a target center predictor. The feature enhancer is used for enhancing the output result a in the step 2 so as to improve the accuracy and the precision of the subsequent prediction. The network structure of the feature enhancer is based on depth separable convolution and space-time attention CBAM, and the complete structure is sequentially composed of depth separable convolution, group normalization, reLU activation function, space-time attention CBAM, group normalization and ReLU activation function, wherein the size of convolution kernel used in the depth separable convolution process is 3*3. Compared with the ordinary convolution in SiamCAR, the calculated amount of the depth separable convolution is reduced to 1/9 of the original calculation amount. The use of group normalization reduces the impact of the batch size parameter on the accuracy of the present invention. The invention uses the space-time attention CBAM to improve the accuracy rate at the cost of increasing smaller calculated amount, and the convolution kernel size used by the space-time attention CBAM is 7*7.
Firstly inputting a into a characteristic enhancer b1, and then inputting the output characteristic of b1 into a target center predictor to obtain the center point of the target. The target center predictor consists of a single layer convolution with a convolution kernel of size 3*3. The target center predictor takes the enhanced features as input, outputs a predicted target center point, and is used for determining the center position of the target.
In parallel, a is input to the feature enhancer b2 with the same residue b1, and then the output feature of b2 is input to the target frame predictor, so as to obtain the boundary frame of the target. The bounding box is used to represent the scale size of the object. The target frame predictor consists of a single layer convolution with a convolution kernel of size 3*3. The target frame predictor takes the enhanced features as input, outputs a predicted target boundary frame, and is used for framing the scale of the target.
Step 4: inputting the search image into a template updating module, outputting a predicted result b, and linearly fusing the predicted result b with the predicted result in the step 3.2 to obtain a final center point predicted result. As shown in fig. 1, the template updating module is composed of a feature extractor, an online target predictor, and a template updater. Wherein the feature extractor is the same as the feature extractor used in step 2. The online target predictor is a Classifier used in the DROL tracking algorithm, which consists of a spatio-temporal attention and a normal convolution, in a specific order of normal convolution (convolution kernel 1*1) -a spatio-temporal attention CBAM module-normal convolution (convolution kernel 4*4). The Classifier can provide a prediction result at the running stage of the tracking algorithm as the aid of the target frame predictor and the target center predictor in the step 3 so as to improve the accuracy. The template updater combines the confidence score given by the online target predictor and an update strategy set in the template updater to determine whether to take the search image as a new dynamic template.
Further, the template updating module takes the search image as input, firstly, the search feature is input into the online target predictor to obtain the confidence score of the target center prediction, the confidence score has the same size as the output of the center predictor in the step 3, and then the two are subjected to linear fusion to obtain the final center prediction result. The linear fusion formula is as follows:
o=(1-λ)O cen +λO C
wherein O represents the final central prediction result, O cen Representing the result of the target center predictor, O C Representing the results of the online target predictor, λ is [0,1]And represents the fusion weight.
Step 5: the template updating module decides whether to take the searching image as a new dynamic template according to the updating strategy, and returns to the step 2. The update policy can be described in particular as: first, counting from the start of target tracking method operation, every T u Frame selection of T u The search image with the highest confidence score of the online target predictor in the frame is marked as c, and the prediction result d of the corresponding search image c in the step 3 is recorded. T (T) u Is a positive integer representing the interval of dynamic template update, the invention sets T u =5. Then, the IOU score, i.e., the intersection ratio of the two frames, is calculated by calculating the rectangular prediction frame combined by the prediction results and the actual boundary frame of the target. If the overlap ratio is greater than the set threshold tau_r, the present invention sets tau_r=0.6 using the search image c as a new dynamic template. If the cross ratio is not greater than the threshold TAU_R, the dynamic template is not updated. The above described update strategy makes the present invention run on a different basis than the sialcar, which uses only the initial template, also incorporates a dynamic template,because the dynamic template is updated in real time in the running process, and the IOU cross comparison result is used as a reference to avoid pollution of samples which are unfavorable for tracking to the follow-up tracking process. In this way, the dynamic template provides the latest state (center position and size) of the tracking target for the algorithm, so that the dynamic template can output a calculation result with a higher peak value when the similarity calculation is performed with the search image in the step 2, the capability of the algorithm for coping with scenes with complex environments and larger target changes is enhanced, and the success rate and the accuracy rate of the tracking algorithm are improved.
Fig. 3 shows the tracking success rate comparison of the present invention with the sialcar method, the tracking success rate index refers to the intersection ratio of the prediction bounding box and the real bounding box. The higher the overlap ratio, the more accurate the predicted bounding box. The horizontal axis represents a specific cross ratio threshold, when the cross ratio is greater than the threshold, the tracking is considered successful, and the corresponding vertical axis represents the tracking success rate of the whole video under the threshold. The lower left corner value is the average tracking success rate of the present invention with the sialmcar method. Compared with SiamCAR, the invention improves the average tracking success rate by 0.7%.
Fig. 4 shows the tracking accuracy of the present invention compared to the sialcar method, with the tracking accuracy index being the error (pixel) of the predicted center point and the true center point. The smaller the error, the more accurate the predicted center point. The horizontal axis represents a specific error threshold, and below that threshold, tracking accuracy is considered, and the corresponding vertical axis represents the tracking accuracy of the whole video under that threshold. The lower left corner value is the tracking accuracy of the invention at an error threshold of 20 with the SiamCAR method. Compared with SiamCAR, the method improves the average tracking accuracy by 3.1%.
TABLE 1
Table 1 shows the comparison of the present invention with sialmcar in terms of accuracy, success rate, calculated amount and parameter amount, the present invention improves the success rate and accuracy of tracking in the public data set OTB100 while reducing the calculated amount of 68.3% and the parameter amount of 71% of the model.
Claims (6)
1. The light target tracking method is characterized by comprising the following steps of:
step 1: acquiring a video for target tracking, defining a first frame image of the video as an initial template, copying the initial template to be defined as a dynamic template at the beginning, and putting the initial template and the dynamic template into a set to construct a template set;
step 2: reading a search image, inputting a template set and the search image into a preprocessing module, and obtaining an output result a;
step 3: inputting the output result a into a target prediction module to predict a center point and a boundary box of a target;
step 4: inputting the search features into an online target predictor to obtain a predicted result b, and linearly fusing the predicted result b with the center point predicted result in the step 3 to obtain a final center point predicted result, so that target tracking is completed;
step 5: inputting the search image and the prediction result b in the step 4 into a template updater, and determining whether to take the search image as a new dynamic template according to an updating strategy by the template updater to update the dynamic template.
2. The light-weight object tracking method according to claim 1, wherein in the step 1, object tracking is performed as search images in order for each frame image except for the first frame.
3. The light-weighted object tracking method according to claim 1, wherein the specific process of the preprocessing module in step 2 obtaining the output result a is as follows:
step 2.1: in the preprocessing module, extracting features by using a feature extractor t1 from a template set to obtain initial features and dynamic features; the searching image is subjected to feature extraction through a feature extractor t2 which is the same as t1, so that searching features are obtained;
step 2.2: the feature similarity calculator calculates the similarity of the initial feature, the dynamic feature and the search feature, and selects the feature with the highest similarity as an output result a.
4. A lightweight object tracking method as defined in claim 3, wherein in step 2.1, the structure of the two feature extractors t1 and t2 is based on the res net50, and the last network layer of the res net50 is removed.
5. The light-weight target tracking method according to claim 4, wherein the specific process of step 3 is as follows:
step 3.1: in the target prediction module, inputting a into a feature enhancer b1, and then inputting the output feature of b1 into a target frame predictor to obtain a boundary frame of a target;
step 3.2: inputting a into a feature enhancer b2 which is the same as b1, and then inputting the output feature of b2 into a target center predictor to obtain the center point of the target.
6. The light-weight target tracking method according to any one of claims 1 to 5, wherein in step 5, the update strategy is specifically described as: first, counting from the start of target tracking method operation, every T u Frame selection of T u The search image with the highest confidence score of the online target predictor in the frame is marked as c, and the prediction results d and T of the corresponding search image c in the step 3 are recorded u Is a positive integer representing the interval of dynamic template update; and secondly, calculating the intersection ratio of the rectangular prediction frame combined by the prediction results and the actual boundary frame of the target, and taking the search image c as a new dynamic template if the intersection ratio is larger than a set threshold TAU_R, otherwise, not updating the dynamic template.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310968693.2A CN117011337A (en) | 2023-08-03 | 2023-08-03 | A lightweight target tracking method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310968693.2A CN117011337A (en) | 2023-08-03 | 2023-08-03 | A lightweight target tracking method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN117011337A true CN117011337A (en) | 2023-11-07 |
Family
ID=88563165
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310968693.2A Pending CN117011337A (en) | 2023-08-03 | 2023-08-03 | A lightweight target tracking method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN117011337A (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190320205A1 (en) * | 2016-12-26 | 2019-10-17 | Huawei Technologies Co., Ltd. | Template matching-based prediction method and apparatus |
| CN113129335A (en) * | 2021-03-25 | 2021-07-16 | 西安电子科技大学 | Visual tracking algorithm and multi-template updating strategy based on twin network |
| CN114387459A (en) * | 2022-01-26 | 2022-04-22 | 桂林电子科技大学 | Single target tracking method for dynamic double-template updating and storage medium |
| US20220309686A1 (en) * | 2021-03-25 | 2022-09-29 | Samsung Electronics Co., Ltd. | Method and apparatus with object tracking |
| CN115908492A (en) * | 2022-11-16 | 2023-04-04 | 厦门美图之家科技有限公司 | Target tracking method, system, terminal and storage medium |
-
2023
- 2023-08-03 CN CN202310968693.2A patent/CN117011337A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20190320205A1 (en) * | 2016-12-26 | 2019-10-17 | Huawei Technologies Co., Ltd. | Template matching-based prediction method and apparatus |
| CN113129335A (en) * | 2021-03-25 | 2021-07-16 | 西安电子科技大学 | Visual tracking algorithm and multi-template updating strategy based on twin network |
| US20220309686A1 (en) * | 2021-03-25 | 2022-09-29 | Samsung Electronics Co., Ltd. | Method and apparatus with object tracking |
| CN114387459A (en) * | 2022-01-26 | 2022-04-22 | 桂林电子科技大学 | Single target tracking method for dynamic double-template updating and storage medium |
| CN115908492A (en) * | 2022-11-16 | 2023-04-04 | 厦门美图之家科技有限公司 | Target tracking method, system, terminal and storage medium |
Non-Patent Citations (2)
| Title |
|---|
| 董吉富;刘畅;曹方伟;凌源;高翔;: "基于注意力机制的在线自适应孪生网络跟踪算法", 激光与光电子学进展, no. 02, 25 January 2020 (2020-01-25), pages 320 - 328 * |
| 黄寅佐: "基于SiamCAR的目标跟踪算法优化与研究", 《万方学位论文》, 28 February 2025 (2025-02-28), pages 12 - 64 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111814889B (en) | A single-stage object detection method using anchor-free module and enhanced classifier | |
| CN113256677A (en) | Method for tracking visual target with attention | |
| CN112884742A (en) | Multi-algorithm fusion-based multi-target real-time detection, identification and tracking method | |
| CN114926498B (en) | Rapid target tracking method based on space-time constraint and leachable feature matching | |
| CN114120202B (en) | Multi-scale target model and feature fusion-based semi-supervised video target segmentation method | |
| CN113902991A (en) | Twin network target tracking method based on cascade characteristic fusion | |
| CN116091979A (en) | An Object Tracking Method Based on Feature Fusion and Channel Attention | |
| CN113963204B (en) | A twin network target tracking system and method | |
| CN113642498A (en) | Video target detection system and method based on multilevel space-time feature fusion | |
| CN115439722B (en) | A 3D CAD Model Retrieval Method Based on Image and Attribute Graph Fusion Network | |
| CN116311522B (en) | A continuous sign language recognition method integrating cross-modal alignment auxiliary tasks | |
| Liu et al. | Traffic sign recognition algorithm based on improved YOLOv5s | |
| CN119152193B (en) | A YOLO target detection method and system based on differentiable architecture search | |
| CN114973202B (en) | A traffic scene obstacle detection method based on semantic segmentation | |
| CN113989655A (en) | Radar or sonar image target detection and classification method based on automatic deep learning | |
| CN113129332B (en) | Method and device for performing target object tracking | |
| CN118196484B (en) | Visual place recognition method and system based on image enhancement and scene semantic optimization | |
| CN117115474A (en) | An end-to-end single target tracking method based on multi-stage feature extraction | |
| CN117809306A (en) | Industrial character recognition method, device and storage medium based on small sample target detection | |
| CN119068016A (en) | A RGBT target tracking method based on modality-aware feature learning | |
| CN117011655A (en) | Feature fusion method, target tracking method and system based on adaptive region selection | |
| CN119169056B (en) | Satellite video target tracking method, device and equipment based on refined positioning | |
| CN113963021A (en) | A single target tracking method and system based on spatiotemporal features and position changes | |
| CN117011337A (en) | A lightweight target tracking method | |
| CN118332406A (en) | A high-accuracy single-stage object detection method that integrates sample localization quality scores and classification confidence scores |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |