CN116884039A

CN116884039A - A cross-modal person re-identification method based on graph structure

Info

Publication number: CN116884039A
Application number: CN202310913967.8A
Authority: CN
Inventors: 季一木; 刘尚东; 张驰
Original assignee: Jiangsu Tuoyou Information Intelligent Technology Research Institute Co ltd
Current assignee: Jiangsu Tuoyou Information Intelligent Technology Research Institute Co ltd
Priority date: 2023-07-25
Filing date: 2023-07-25
Publication date: 2023-10-13

Abstract

The invention discloses a cross-modal pedestrian re-identification method based on a graph structure, which belongs to the technical field of pedestrian re-identification methods. First, a training feature data set is constructed, and the identity balanced sampling strategy is used for training to extract pedestrian data of different identities in two modalities; then an adjacency matrix is generated through the training feature data set to construct an undirected graph structure, and the graph construction calculation is performed through matrix multiplication ; Carry out graph attention learning in two modes at the same time; and use multi-head attention technology to improve the data accuracy of graph attention learning. The present invention uses multi-head attention technology to allocate adaptive weights to intra-modal and inter-modal graph structures, eliminates the negative impact of large changing samples, reduces modal differences, and makes the training process stable and efficient.

Description

Cross-mode pedestrian re-identification method based on graph structure

Technical Field

The invention relates to the technical field of pedestrian re-recognition algorithms, in particular to a cross-mode pedestrian re-recognition method based on a graph structure.

Background

Because of popularization of construction planning such as 'safe city', 'smart city', the safety problem is paid unprecedented importance, the safety consciousness of human is also continuously improved, the pedestrian re-recognition technology in computer vision can process video data by using a machine and combine multiple cameras to replace manual processing and analysis of monitoring video, thereby effectively solving the defect of manual inspection, ensuring the safety of society and having wide application in daily life. The pedestrian re-identification technology is an image retrieval technology for determining whether a pedestrian appears in a monitoring camera network or not. The technology can quickly and accurately capture pedestrian images and is excellent in practical application, so that the technology is widely focused in the field of computer vision and gradually becomes a hot research direction.

DaiP et al, "DaiP, jiR, wangH, et al, cross-model persona-identificationwith generativeadversarialtraining [ C ]// IJCAI.2018,1 (3): 6," disclose a discriminant feature representation for learning different modes based on generating discriminators for countermeasure training, the structure of which is a generator of a deep convolutional neural network as a learning image representation and a mode classifier as a discriminator, which attempts to distinguish RGB from infrared image modes, the disadvantage being that learning is easily contaminated and unstable by noise samples when the difference in appearance between the two modes is large. All of these challenges result in poorly-discriminative cross-modal features and unstable training.

Chinese patent publication No. CN116311384a, publication date 2023-06-23, title of invention: the patent discloses a cross-modal pedestrian re-recognition method based on joint intermediate modality and characterization learning, wherein an intermediate modality generator is utilized to map original images of the two modalities into a unified feature space so as to generate an intermediate modality image, and the defect is that the global feature learning method is sensitive to background clutter and cannot clearly process modality differences.

Disclosure of Invention

The invention aims at overcoming the defects of the prior art, and provides a cross-mode pedestrian re-identification method based on a graph structure, which is used for enhancing the robustness to noise samples while considering the intra-mode information and the inter-mode discriminant analysis.

In order to achieve the above purpose, the invention adopts the following technical scheme:

a cross-mode pedestrian re-identification method based on a graph structure comprises the following steps:

step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, randomly extracting n pedestrians in different identities from the training characteristic data set, extracting m infrared images and m visible light images, and generating K=2mn images in each training batch;

s2, generating an adjacency matrix for the training characteristic data set acquired in the step S1 so as to construct an undirected graph G; the specific expression is as follows:

wherein the method comprises the steps ofl _i And l _j Shan Re labels corresponding to two graph nodes, II _k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among single thermal labels of training characteristic data;

step S3, drawing meaning learning is carried out, the importance of the node i to another node j in the drawing is learned, and the node i spans between two modes, and the specific formula is as follows:

e _ij ＝a(W _i ,W _j )

where a is the shared attention mechanism, W _i And W is _j Weight matrix representing nodes i and j, e _ij Representing the importance of node i in a graph feature to another node j in the graph feature, each graph feature is allowed to participate in each other graph feature.

Step S4, improving the attention of the graph by adopting a multi-head attention technology in the intra-mode graph structureAccuracy and stability of learning by learning a plurality of attention heads h having the same structure ^l And attention weight w ^l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.

Further, the specific steps of the learning of the attention drawing force in the step S3 are as follows:

by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then calculating the graph attention coefficient +.>

Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,a learnable weight vector is represented to measure the importance between different feature dimensions. By fully utilizing the relationship between all images in the two modalities, the context information of the same identity is used to enhance the presentation effect.

Further, the specific steps of the multi-head attention technology in the step S4 are as follows:

step S41, by learning a plurality of h having the same structure ^l And w ^l Where l=1, 2··l, L is the total number of consciousness and is optimized for each, and after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:

wherein the method comprises the steps ofAttention enhancing features representing graph structure, phi being ELU activation function and introducing a graph annotating force network layer of single-head structure to better guide intermodal graph structure learning, final output node features being represented by +.>A representation;

in step S42, to learn Xi Tu attention more effectively, we use a negative log likelihood loss function whose formulation is as follows:

compared with the prior art, the invention has the following beneficial effects:

(1) The invention considers the information in the modes and the discriminant analysis among the modes, effectively reduces the difference among the modes and also enhances the robustness to noise samples.

(2) The invention distributes self-adaptive weights for the intra-mode and inter-mode graph structures by using the multi-head attention technology, eliminates the negative influence of a large variation sample, reduces the mode difference, and ensures that the training process is stable and efficient.

Drawings

FIG. 1 is a network structure diagram of a cross-modal pedestrian re-recognition method based on a graph structure of the invention;

FIG. 2 is a process of calculating an attention coefficient;

FIG. 3 is a graph showing the effect of Rank-1 and mAP on the SYSU-MM01 dataset when K and L are different.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, a cross-mode pedestrian re-identification method based on a graph structure comprises the following steps:

step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, extracting m infrared images and m visible light images from pedestrians in n different identities randomly extracted from the training characteristic data set, and generating K=2mn images in each training batch;

s2, generating an adjacency matrix by adopting the following formula for the training characteristic data set acquired in the step S1 so as to construct an undirected graph G;

wherein the method comprises the steps ofl _i And l _j Shan Re labels corresponding to two graph nodes, II _k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among the single thermal labels of the training characteristic data set;

s3, drawing meaning force learning is carried out, wherein the importance of the node i to another node j in the drawing is learned, and the drawing meaning force spans between two modes;

s4, improving accuracy and stability of drawing attention learning by adopting a multi-head attention technology in a modal internal diagram structure, and learning a plurality of attention heads h with the same structure ^l And attention weight w ^l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.

The specific steps of the learning of the attention force by adopting the drawing in the step S3 are as follows:

by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then, as shown in FIG. 2, by calculating the graph attention coefficients +.>

Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,a learnable weight vector is represented to measure the importance between different feature dimensions.

The specific steps of the multi-head attention technology in the step S4 are as follows:

step S41, by learning a plurality of h having the same structure ^l And w ^l L=1, 2, L is the total number of consciousness and is optimized separately, after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:

step S42, deepening the attention of the learning graph by adopting a negative log likelihood loss function, wherein the formula expression method is as follows:

finally, the effect graph of Rank-1 and mAP on the SYSU-MM01 data set shown in the figure 3 can be obtained when the K and the L are different in value.

The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims

1. The cross-mode pedestrian re-identification method based on the graph structure is characterized by comprising the following steps of:

step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, randomly extracting pedestrians in n different identities from the training characteristic data set, extracting m infrared images and m visible light images, and generating K=2mn images in each training batch;

s2, processing the training characteristic data set acquired in the step S1 to generate an adjacency matrix so as to construct an undirected graph G; the specific formula is as follows:

e _ij ＝a(W _i ,W _j )

wherein a is a co-Shared attention mechanism, W _i And W is _j Weight matrix representing nodes i and j, e _ij Representing the importance of a node i in a graph feature to another node j in the graph feature, allowing each graph feature to participate in each other graph feature;

step S4, improving the data accuracy of the drawing attention learning by adopting a multi-head attention technology in the intra-mode diagram structure, and learning a plurality of attention heads h with the same structure ^l And attention weight w ^l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.

2. The cross-modal pedestrian re-recognition method based on the graph structure of claim 1, wherein the method comprises the following steps of: the specific steps of the drawing meaning force learning in the step S3 are as follows:

Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,representing a learnable weight vector.

3. The cross-modal pedestrian re-recognition method based on the graph structure of claim 1, wherein the method comprises the following steps of: the specific steps of the multi-head attention technology in step S4 are as follows:

wherein the method comprises the steps ofThe attention enhancement feature of the graph structure is represented, phi is an ELU activation function, and a graph annotation meaning network layer of a single-head structure is introduced to guide the study of the graph structure among modes; the final output node is characterized by->A representation;

step S42, deepening the attention of the learning graph by adopting a negative log likelihood loss function, wherein the attention is shown in the following formula: