[go: up one dir, main page]

CN116884039A - A cross-modal person re-identification method based on graph structure - Google Patents

A cross-modal person re-identification method based on graph structure Download PDF

Info

Publication number
CN116884039A
CN116884039A CN202310913967.8A CN202310913967A CN116884039A CN 116884039 A CN116884039 A CN 116884039A CN 202310913967 A CN202310913967 A CN 202310913967A CN 116884039 A CN116884039 A CN 116884039A
Authority
CN
China
Prior art keywords
graph
attention
learning
node
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310913967.8A
Other languages
Chinese (zh)
Inventor
季一木
刘尚东
张驰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Tuoyou Information Intelligent Technology Research Institute Co ltd
Original Assignee
Jiangsu Tuoyou Information Intelligent Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Tuoyou Information Intelligent Technology Research Institute Co ltd filed Critical Jiangsu Tuoyou Information Intelligent Technology Research Institute Co ltd
Priority to CN202310913967.8A priority Critical patent/CN116884039A/en
Publication of CN116884039A publication Critical patent/CN116884039A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

本发明公开了一种基于图结构的跨模态行人重识别方法,属于行人重识别方法技术领域。首先构建训练特征数据集,采用身份平衡抽样策略进行训练,抽取两个模态的不同身份的行人数据;然后通过训练特征数据集生成邻接矩阵去构造无向图结构,通过矩阵乘法进行图构造计算;同时进行两个模态的图注意力学习;并采用多头注意力技术提高图注意力学习的数据精确度。本发明使用多头注意力技术为模态内和模态间图结构分配自适应权重,消除较大变化样本的负面影响,减少了模态差异,并使训练过程稳定、高效。

The invention discloses a cross-modal pedestrian re-identification method based on a graph structure, which belongs to the technical field of pedestrian re-identification methods. First, a training feature data set is constructed, and the identity balanced sampling strategy is used for training to extract pedestrian data of different identities in two modalities; then an adjacency matrix is generated through the training feature data set to construct an undirected graph structure, and the graph construction calculation is performed through matrix multiplication ; Carry out graph attention learning in two modes at the same time; and use multi-head attention technology to improve the data accuracy of graph attention learning. The present invention uses multi-head attention technology to allocate adaptive weights to intra-modal and inter-modal graph structures, eliminates the negative impact of large changing samples, reduces modal differences, and makes the training process stable and efficient.

Description

Cross-mode pedestrian re-identification method based on graph structure
Technical Field
The invention relates to the technical field of pedestrian re-recognition algorithms, in particular to a cross-mode pedestrian re-recognition method based on a graph structure.
Background
Because of popularization of construction planning such as 'safe city', 'smart city', the safety problem is paid unprecedented importance, the safety consciousness of human is also continuously improved, the pedestrian re-recognition technology in computer vision can process video data by using a machine and combine multiple cameras to replace manual processing and analysis of monitoring video, thereby effectively solving the defect of manual inspection, ensuring the safety of society and having wide application in daily life. The pedestrian re-identification technology is an image retrieval technology for determining whether a pedestrian appears in a monitoring camera network or not. The technology can quickly and accurately capture pedestrian images and is excellent in practical application, so that the technology is widely focused in the field of computer vision and gradually becomes a hot research direction.
DaiP et al, "DaiP, jiR, wangH, et al, cross-model persona-identificationwith generativeadversarialtraining [ C ]// IJCAI.2018,1 (3): 6," disclose a discriminant feature representation for learning different modes based on generating discriminators for countermeasure training, the structure of which is a generator of a deep convolutional neural network as a learning image representation and a mode classifier as a discriminator, which attempts to distinguish RGB from infrared image modes, the disadvantage being that learning is easily contaminated and unstable by noise samples when the difference in appearance between the two modes is large. All of these challenges result in poorly-discriminative cross-modal features and unstable training.
Chinese patent publication No. CN116311384a, publication date 2023-06-23, title of invention: the patent discloses a cross-modal pedestrian re-recognition method based on joint intermediate modality and characterization learning, wherein an intermediate modality generator is utilized to map original images of the two modalities into a unified feature space so as to generate an intermediate modality image, and the defect is that the global feature learning method is sensitive to background clutter and cannot clearly process modality differences.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides a cross-mode pedestrian re-identification method based on a graph structure, which is used for enhancing the robustness to noise samples while considering the intra-mode information and the inter-mode discriminant analysis.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a cross-mode pedestrian re-identification method based on a graph structure comprises the following steps:
step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, randomly extracting n pedestrians in different identities from the training characteristic data set, extracting m infrared images and m visible light images, and generating K=2mn images in each training batch;
s2, generating an adjacency matrix for the training characteristic data set acquired in the step S1 so as to construct an undirected graph G; the specific expression is as follows:
wherein the method comprises the steps ofl i And l j Shan Re labels corresponding to two graph nodes, II k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among single thermal labels of training characteristic data;
step S3, drawing meaning learning is carried out, the importance of the node i to another node j in the drawing is learned, and the node i spans between two modes, and the specific formula is as follows:
e ij =a(W i ,W j )
where a is the shared attention mechanism, W i And W is j Weight matrix representing nodes i and j, e ij Representing the importance of node i in a graph feature to another node j in the graph feature, each graph feature is allowed to participate in each other graph feature.
Step S4, improving the attention of the graph by adopting a multi-head attention technology in the intra-mode graph structureAccuracy and stability of learning by learning a plurality of attention heads h having the same structure l And attention weight w l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.
Further, the specific steps of the learning of the attention drawing force in the step S3 are as follows:
by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then calculating the graph attention coefficient +.>
Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,a learnable weight vector is represented to measure the importance between different feature dimensions. By fully utilizing the relationship between all images in the two modalities, the context information of the same identity is used to enhance the presentation effect.
Further, the specific steps of the multi-head attention technology in the step S4 are as follows:
step S41, by learning a plurality of h having the same structure l And w l Where l=1, 2··l, L is the total number of consciousness and is optimized for each, and after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:
wherein the method comprises the steps ofAttention enhancing features representing graph structure, phi being ELU activation function and introducing a graph annotating force network layer of single-head structure to better guide intermodal graph structure learning, final output node features being represented by +.>A representation;
in step S42, to learn Xi Tu attention more effectively, we use a negative log likelihood loss function whose formulation is as follows:
compared with the prior art, the invention has the following beneficial effects:
(1) The invention considers the information in the modes and the discriminant analysis among the modes, effectively reduces the difference among the modes and also enhances the robustness to noise samples.
(2) The invention distributes self-adaptive weights for the intra-mode and inter-mode graph structures by using the multi-head attention technology, eliminates the negative influence of a large variation sample, reduces the mode difference, and ensures that the training process is stable and efficient.
Drawings
FIG. 1 is a network structure diagram of a cross-modal pedestrian re-recognition method based on a graph structure of the invention;
FIG. 2 is a process of calculating an attention coefficient;
FIG. 3 is a graph showing the effect of Rank-1 and mAP on the SYSU-MM01 dataset when K and L are different.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, a cross-mode pedestrian re-identification method based on a graph structure comprises the following steps:
step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, extracting m infrared images and m visible light images from pedestrians in n different identities randomly extracted from the training characteristic data set, and generating K=2mn images in each training batch;
s2, generating an adjacency matrix by adopting the following formula for the training characteristic data set acquired in the step S1 so as to construct an undirected graph G;
wherein the method comprises the steps ofl i And l j Shan Re labels corresponding to two graph nodes, II k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among the single thermal labels of the training characteristic data set;
s3, drawing meaning force learning is carried out, wherein the importance of the node i to another node j in the drawing is learned, and the drawing meaning force spans between two modes;
s4, improving accuracy and stability of drawing attention learning by adopting a multi-head attention technology in a modal internal diagram structure, and learning a plurality of attention heads h with the same structure l And attention weight w l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.
The specific steps of the learning of the attention force by adopting the drawing in the step S3 are as follows:
by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then, as shown in FIG. 2, by calculating the graph attention coefficients +.>
Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,a learnable weight vector is represented to measure the importance between different feature dimensions.
The specific steps of the multi-head attention technology in the step S4 are as follows:
step S41, by learning a plurality of h having the same structure l And w l L=1, 2, L is the total number of consciousness and is optimized separately, after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:
wherein the method comprises the steps ofAttention enhancing features representing graph structure, phi being ELU activation function and introducing a graph annotating force network layer of single-head structure to better guide intermodal graph structure learning, final output node features being represented by +.>A representation;
step S42, deepening the attention of the learning graph by adopting a negative log likelihood loss function, wherein the formula expression method is as follows:
finally, the effect graph of Rank-1 and mAP on the SYSU-MM01 data set shown in the figure 3 can be obtained when the K and the L are different in value.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (3)

1. The cross-mode pedestrian re-identification method based on the graph structure is characterized by comprising the following steps of:
step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, randomly extracting pedestrians in n different identities from the training characteristic data set, extracting m infrared images and m visible light images, and generating K=2mn images in each training batch;
s2, processing the training characteristic data set acquired in the step S1 to generate an adjacency matrix so as to construct an undirected graph G; the specific formula is as follows:
wherein the method comprises the steps ofl i And l j Shan Re labels corresponding to two graph nodes, II k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among the single thermal labels of the training characteristic data set;
step S3, drawing meaning learning is carried out, the importance of the node i to another node j in the drawing is learned, and the node i spans between two modes, and the specific formula is as follows:
e ij =a(W i ,W j )
wherein a is a co-Shared attention mechanism, W i And W is j Weight matrix representing nodes i and j, e ij Representing the importance of a node i in a graph feature to another node j in the graph feature, allowing each graph feature to participate in each other graph feature;
step S4, improving the data accuracy of the drawing attention learning by adopting a multi-head attention technology in the intra-mode diagram structure, and learning a plurality of attention heads h with the same structure l And attention weight w l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.
2. The cross-modal pedestrian re-recognition method based on the graph structure of claim 1, wherein the method comprises the following steps of: the specific steps of the drawing meaning force learning in the step S3 are as follows:
by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then calculating the graph attention coefficient +.>
Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,representing a learnable weight vector.
3. The cross-modal pedestrian re-recognition method based on the graph structure of claim 1, wherein the method comprises the following steps of: the specific steps of the multi-head attention technology in step S4 are as follows:
step S41, by learning a plurality of h having the same structure l And w l Where l=1, 2··l, L is the total number of consciousness and is optimized for each, and after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:
wherein the method comprises the steps ofThe attention enhancement feature of the graph structure is represented, phi is an ELU activation function, and a graph annotation meaning network layer of a single-head structure is introduced to guide the study of the graph structure among modes; the final output node is characterized by->A representation;
step S42, deepening the attention of the learning graph by adopting a negative log likelihood loss function, wherein the attention is shown in the following formula:
CN202310913967.8A 2023-07-25 2023-07-25 A cross-modal person re-identification method based on graph structure Pending CN116884039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310913967.8A CN116884039A (en) 2023-07-25 2023-07-25 A cross-modal person re-identification method based on graph structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310913967.8A CN116884039A (en) 2023-07-25 2023-07-25 A cross-modal person re-identification method based on graph structure

Publications (1)

Publication Number Publication Date
CN116884039A true CN116884039A (en) 2023-10-13

Family

ID=88261950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310913967.8A Pending CN116884039A (en) 2023-07-25 2023-07-25 A cross-modal person re-identification method based on graph structure

Country Status (1)

Country Link
CN (1) CN116884039A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120031699A (en) * 2025-04-22 2025-05-23 中国民航大学 A method for determining the node sequence of flight strings

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220124A (en) * 2021-12-16 2022-03-22 华南农业大学 Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system
CN115100678A (en) * 2022-06-10 2022-09-23 河南大学 Cross-modal pedestrian re-identification method based on channel recombination and attention mechanism
CN116052212A (en) * 2023-01-09 2023-05-02 河南大学 Semi-supervised cross-mode pedestrian re-recognition method based on dual self-supervised learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220124A (en) * 2021-12-16 2022-03-22 华南农业大学 Near-infrared-visible light cross-modal double-flow pedestrian re-identification method and system
CN115100678A (en) * 2022-06-10 2022-09-23 河南大学 Cross-modal pedestrian re-identification method based on channel recombination and attention mechanism
CN116052212A (en) * 2023-01-09 2023-05-02 河南大学 Semi-supervised cross-mode pedestrian re-recognition method based on dual self-supervised learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120031699A (en) * 2025-04-22 2025-05-23 中国民航大学 A method for determining the node sequence of flight strings

Similar Documents

Publication Publication Date Title
CN109712105B (en) An image salient object detection method combining color and depth information
US20200097604A1 (en) Stacked cross-modal matching
CN111626319A (en) Metrics of misuse of explainable artificial intelligence in computing environments
CN111325115A (en) Countermeasures cross-modal pedestrian re-identification method and system with triple constraint loss
WO2017166137A1 (en) Method for multi-task deep learning-based aesthetic quality assessment on natural image
CN110647804A (en) Violent video identification method, computer system and storage medium
CN113569081B (en) Image recognition method, device, equipment and storage medium
CN114332911B (en) Head posture detection method, device and computer equipment
CN115880740A (en) Face living body detection method and device, computer equipment and storage medium
CN111523586A (en) A noise-knowable full-network supervised object detection method
CN116975602A (en) An AR interactive emotion recognition method and system based on dual fusion of multi-modal information
CN118334549A (en) Short video label prediction method and system for multi-mode collaborative interaction
Sun et al. Automatic building age prediction from street view images
CN119540841A (en) A short video information detection method, device, equipment and storage medium
CN116884039A (en) A cross-modal person re-identification method based on graph structure
CN114663401B (en) A defect prediction method and system based on small sample knowledge transfer learning
CN105787045A (en) Precision enhancing method for visual media semantic indexing
CN119719675B (en) A multimodal social relationship extraction method based on hypergraph attention neural network
CN118656791B (en) Multi-mode emotion detection method, device, computer equipment and storage medium
CN114419529A (en) A cross-modal pedestrian re-identification method and system based on distribution space alignment
Chandana et al. Convolutional Neural Network Based Age Estimation using Diverse Facial Datasets
CN120236331A (en) Human behavior recognition method and device based on multimodal knowledge graph reasoning enhancement
CN117292297B (en) Video emotion description method based on hierarchical emotion feature coding
CN118470774A (en) Self-supervision face AU detection method, equipment and medium without label guidance
CN118212458A (en) Decoupled representation domain adaptive sonar image classification method for underwater scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination