Cross-mode pedestrian re-identification method based on graph structure
Technical Field
The invention relates to the technical field of pedestrian re-recognition algorithms, in particular to a cross-mode pedestrian re-recognition method based on a graph structure.
Background
Because of popularization of construction planning such as 'safe city', 'smart city', the safety problem is paid unprecedented importance, the safety consciousness of human is also continuously improved, the pedestrian re-recognition technology in computer vision can process video data by using a machine and combine multiple cameras to replace manual processing and analysis of monitoring video, thereby effectively solving the defect of manual inspection, ensuring the safety of society and having wide application in daily life. The pedestrian re-identification technology is an image retrieval technology for determining whether a pedestrian appears in a monitoring camera network or not. The technology can quickly and accurately capture pedestrian images and is excellent in practical application, so that the technology is widely focused in the field of computer vision and gradually becomes a hot research direction.
DaiP et al, "DaiP, jiR, wangH, et al, cross-model persona-identificationwith generativeadversarialtraining [ C ]// IJCAI.2018,1 (3): 6," disclose a discriminant feature representation for learning different modes based on generating discriminators for countermeasure training, the structure of which is a generator of a deep convolutional neural network as a learning image representation and a mode classifier as a discriminator, which attempts to distinguish RGB from infrared image modes, the disadvantage being that learning is easily contaminated and unstable by noise samples when the difference in appearance between the two modes is large. All of these challenges result in poorly-discriminative cross-modal features and unstable training.
Chinese patent publication No. CN116311384a, publication date 2023-06-23, title of invention: the patent discloses a cross-modal pedestrian re-recognition method based on joint intermediate modality and characterization learning, wherein an intermediate modality generator is utilized to map original images of the two modalities into a unified feature space so as to generate an intermediate modality image, and the defect is that the global feature learning method is sensitive to background clutter and cannot clearly process modality differences.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art, and provides a cross-mode pedestrian re-identification method based on a graph structure, which is used for enhancing the robustness to noise samples while considering the intra-mode information and the inter-mode discriminant analysis.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a cross-mode pedestrian re-identification method based on a graph structure comprises the following steps:
step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, randomly extracting n pedestrians in different identities from the training characteristic data set, extracting m infrared images and m visible light images, and generating K=2mn images in each training batch;
s2, generating an adjacency matrix for the training characteristic data set acquired in the step S1 so as to construct an undirected graph G; the specific expression is as follows:
wherein the method comprises the steps ofl i And l j Shan Re labels corresponding to two graph nodes, II k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among single thermal labels of training characteristic data;
step S3, drawing meaning learning is carried out, the importance of the node i to another node j in the drawing is learned, and the node i spans between two modes, and the specific formula is as follows:
e ij =a(W i ,W j )
where a is the shared attention mechanism, W i And W is j Weight matrix representing nodes i and j, e ij Representing the importance of node i in a graph feature to another node j in the graph feature, each graph feature is allowed to participate in each other graph feature.
Step S4, improving the attention of the graph by adopting a multi-head attention technology in the intra-mode graph structureAccuracy and stability of learning by learning a plurality of attention heads h having the same structure l And attention weight w l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.
Further, the specific steps of the learning of the attention drawing force in the step S3 are as follows:
by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then calculating the graph attention coefficient +.>
Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,a learnable weight vector is represented to measure the importance between different feature dimensions. By fully utilizing the relationship between all images in the two modalities, the context information of the same identity is used to enhance the presentation effect.
Further, the specific steps of the multi-head attention technology in the step S4 are as follows:
step S41, by learning a plurality of h having the same structure l And w l Where l=1, 2··l, L is the total number of consciousness and is optimized for each, and after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:
wherein the method comprises the steps ofAttention enhancing features representing graph structure, phi being ELU activation function and introducing a graph annotating force network layer of single-head structure to better guide intermodal graph structure learning, final output node features being represented by +.>A representation;
in step S42, to learn Xi Tu attention more effectively, we use a negative log likelihood loss function whose formulation is as follows:
compared with the prior art, the invention has the following beneficial effects:
(1) The invention considers the information in the modes and the discriminant analysis among the modes, effectively reduces the difference among the modes and also enhances the robustness to noise samples.
(2) The invention distributes self-adaptive weights for the intra-mode and inter-mode graph structures by using the multi-head attention technology, eliminates the negative influence of a large variation sample, reduces the mode difference, and ensures that the training process is stable and efficient.
Drawings
FIG. 1 is a network structure diagram of a cross-modal pedestrian re-recognition method based on a graph structure of the invention;
FIG. 2 is a process of calculating an attention coefficient;
FIG. 3 is a graph showing the effect of Rank-1 and mAP on the SYSU-MM01 dataset when K and L are different.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in fig. 1, a cross-mode pedestrian re-identification method based on a graph structure comprises the following steps:
step S1, acquiring a training characteristic data set, training by adopting an identity balance sampling strategy, extracting m infrared images and m visible light images from pedestrians in n different identities randomly extracted from the training characteristic data set, and generating K=2mn images in each training batch;
s2, generating an adjacency matrix by adopting the following formula for the training characteristic data set acquired in the step S1 so as to construct an undirected graph G;
wherein the method comprises the steps ofl i And l j Shan Re labels corresponding to two graph nodes, II k The unit matrix is used for representing that each node is connected with the unit matrix, and the calculation of the graph construction is carried out through matrix multiplication among the single thermal labels of the training characteristic data set;
s3, drawing meaning force learning is carried out, wherein the importance of the node i to another node j in the drawing is learned, and the drawing meaning force spans between two modes;
s4, improving accuracy and stability of drawing attention learning by adopting a multi-head attention technology in a modal internal diagram structure, and learning a plurality of attention heads h with the same structure l And attention weight w l Where l=1, 2··, L is the total number of deliberations and they are optimized separately.
The specific steps of the learning of the attention force by adopting the drawing in the step S3 are as follows:
by usingRepresenting input node characteristics, which are the outputs of the pooling layer, and then, as shown in FIG. 2, by calculating the graph attention coefficients +.>
Where Γ is the LeakyReLu operation, and "h" is the series operation, h is the transformation matrix that reduces the input node feature dimension from the original dimension C to the new dimension d, where d is set to 256,a learnable weight vector is represented to measure the importance between different feature dimensions.
The specific steps of the multi-head attention technology in the step S4 are as follows:
step S41, by learning a plurality of h having the same structure l And w l L=1, 2, L is the total number of consciousness and is optimized separately, after connecting the outputs of the plurality of heads, the attention enhancing features of the graph structure are represented by the following formula:
wherein the method comprises the steps ofAttention enhancing features representing graph structure, phi being ELU activation function and introducing a graph annotating force network layer of single-head structure to better guide intermodal graph structure learning, final output node features being represented by +.>A representation;
step S42, deepening the attention of the learning graph by adopting a negative log likelihood loss function, wherein the formula expression method is as follows:
finally, the effect graph of Rank-1 and mAP on the SYSU-MM01 data set shown in the figure 3 can be obtained when the K and the L are different in value.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.