CN112800836A

CN112800836A - Pedestrian re-identification method, system, server and storage medium

Info

Publication number: CN112800836A
Application number: CN202011563480.4A
Authority: CN
Inventors: 王咏涛
Original assignee: Fullsee Technology Co ltd
Current assignee: Fullsee Technology Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-05-14

Abstract

The present application relates to a pedestrian re-identification method, system, server and storage medium. The method includes: identifying a human-shaped frame according to a preset first neural network; identifying a first human-shaped frame in the human-shaped frame according to a preset second neural network Obtain the occlusion skeleton information from the skeleton information; use the skeleton information of the second skeleton information except the occluded skeleton information as the valid skeleton information; intercept the area corresponding to the valid skeleton information in the original image as the preprocessing area; The algorithm obtains the core frame, and uses the interception algorithm to obtain the pre-processed original image according to the pre-processing area; then judges the correlation between the carry-on frame and the core frame according to the preset carry-on algorithm, and if the value of the correlation is less than or equal to the reference threshold, the The core frame and the carrier frame are combined as the preprocessing monitoring image; the preprocessing monitoring image and the preprocessing original image are input into the preset Re-ID neural network. The present application has the effect of improving the anti-interference and anti-occlusion capabilities of the pedestrian re-identification method.

Description

Pedestrian re-identification method, system, server and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a pedestrian re-identification method, system, server, and storage medium.

Background

Human Re-identification (Re-ID), which is the most important technical means of people search and human tracking technology in video images in addition to face recognition, is widely used in the fields of robots, intelligent security monitoring, and the like, and refers to the problem of identifying people across time by using images photographed by different cameras or using a single camera. The traditional Re-ID mainly utilizes deep learning and target graph segmentation technologies, different identification target parts are dealt with by adopting independent neural network submodels, and personnel Re-identification Re-ID methods learn local information (fragmentation technology) and overall information.

However, in the case of actual application to occlusion, the robustness of the overall algorithm is reduced dramatically due to the unsuitability and inaccuracy of the segmentation method, and the anti-interference performance of the human body re-identification method in the related art, such as anti-occlusion and handling performance for working conditions such as light change, needs to be improved.

Disclosure of Invention

In order to improve the anti-interference and anti-blocking capacity of the pedestrian re-identification method, the application provides a pedestrian re-identification method, a system, a server and a storage medium.

In a first aspect, the present application provides a pedestrian re-identification method, which adopts the following technical scheme:

a pedestrian re-identification method, comprising:

identifying whether a human-shaped frame exists in the obtained monitoring image according to a preset first neural network;

if the skeleton information exists, identifying first skeleton information of the human body in the human-shaped frame according to a preset second neural network, and comparing the first skeleton information with reference skeleton information preset by the second neural network to obtain shielding skeleton information;

identifying second skeleton information of the human body in the original image according to the second neural network, wherein the first skeleton information corresponds to the second skeleton information;

comparing the second skeleton information with the shielded skeleton information, and taking the skeleton information of the second skeleton information except the shielded skeleton information as effective skeleton information;

intercepting an area corresponding to effective bone information in an original image as a preprocessing area;

obtaining a core frame according to the human-shaped frame by a preset interception algorithm, and obtaining a preprocessed original image according to the preprocessing region by the interception algorithm;

identifying whether an object carrying frame exists in the obtained monitoring image according to the first neural network;

if the object carrying frame exists, judging the correlation between the object carrying frame and the core frame according to a preset object carrying algorithm, and if the numerical value of the correlation is less than or equal to a reference threshold value, combining the core frame and the object carrying frame to be used as a preprocessing monitoring graph;

inputting the preprocessed monitoring graph and the preprocessed original graph into a preset Re-ID neural network;

and acquiring the output value of the Re-ID neural network.

By adopting the technical scheme, firstly, a human-shaped frame and/or a carrying object frame are/is marked in a monitored image through a classification or target detection method through a first neural network, the interference of a background is removed, and the human-shaped frame is an area where a human body is not shielded by a shielding object, so that the human-shaped frame can be a picture containing the whole human body or a picture containing a part of the human body;

if the shelter exists, according to the second neural network, the first skeleton information which is not framed in the human-shaped frame is sheltered by the shelter, the occluded area does not provide a valid feature for the Re-ID neural network, so it is necessary to discard the area and leave the area of the human body that is not occluded, because the first bone information corresponds to the second bone information, the first bone information blocked by the blocking object can be matched in the second bone information, the rest bone information in the second bone information is effective bone information, a corresponding region of the effective bone information in the original image is intercepted to be used as a preprocessing region, at the moment, the preprocessing region and the human-shaped frame both represent the same position of the human body (namely, the same position as the region which is not shielded in the monitored image), the identification accuracy of the Re-ID neural network can be effectively improved by removing the interference of the shielding object; the core area representing the human body identity in the human figure frame is reserved through a preset intercepting algorithm, the original image is processed in the same way, and the interference of other backgrounds in the human figure frame can be effectively filtered;

in addition, because a human body may carry the same carried object under different cameras, if a carried object frame exists in the monitored image and belongs to the core frame, the core frame and the carried object frame are combined to serve as a preprocessing monitoring image, the characteristics of the carried object in the monitored image are reserved, the Re-ID neural network can accurately identify the identity of the human body conveniently, and the anti-interference and anti-blocking capabilities of the pedestrian Re-identification method are further improved.

Optionally, the preset intercept algorithm includes:

intercepting a region corresponding to head skeleton information and/or trunk skeleton information in the first skeleton information in the human-shaped frame as a core frame;

and intercepting a region corresponding to the head skeleton information and/or the body skeleton information in the two skeleton information in the preprocessing region as an original preprocessing image.

By adopting the technical scheme, the human body trunk and the head contain main human body characteristics, and the human body can comprise a larger detection area under the condition that two hands of a person are unfolded, so that unnecessary areas such as arm information, leg and foot information and the like can be removed, and the input of noise is reduced.

Optionally, if there is no object carrying frame, the core frame is used as the preprocessing monitoring graph.

By adopting the technical scheme, if the carrying object frame does not exist, the human-shaped frame is not shielded, and only the body trunk and head characteristics of the human body are extracted at the moment, because the characteristics actually comprise main human body characteristics, a larger detection area can be included under the condition that two hands of a person are unfolded, and unnecessary areas can be removed, so that the interference of background information is reduced.

Optionally, if the value of the correlation is greater than the reference threshold, the core box is used as the pre-processing monitoring graph.

By adopting the technical scheme, if the carrying object frame does not belong to the core frame, the fact that the carrying object may belong to other people or is a small shielding object which does not interfere with the first skeleton information of the monitored image is indicated, the interference can be ignored at the moment, and the core frame is used as a preprocessed monitoring image.

Optionally, the preset carried object algorithm includes:

the two object correlation is calculated by the normalized minimum distance,

in the formula: IN- -represents the sampled coordinate points on the two-dimensional plane that carry the object frame over the time series t;

OUT- -represents the sampling coordinate point of the core box on the two-dimensional plane over the time series t.

By adopting the technical scheme, the relevance of the carrying object frame and the core frame is measured by adopting the standardized European distance, the relevance is used for detecting the distance relevance between the carrying object frame and the core frame and the movement speed relevance between the carrying object frame and the core frame, and whether the carrying object belongs to the same person or not can be judged quickly and efficiently through an algorithm of the sum of continuous minimum distances between sampling points of the carrying object frame and the core frame.

Optionally, the preset carried object algorithm includes:

and continuously measuring the correlation between the object carrying frame and the core frame in the N frames of monitoring images, wherein N is more than or equal to 2.

By adopting the technical scheme, the reliability of the correlation detection of the carrying object frame and the core frame is improved in a continuous multi-frame detection mode.

In a second aspect, the present application provides a pedestrian re-identification system, which adopts the following technical solutions:

a pedestrian re-identification system comprising:

the first neural network module is used for identifying whether a human-shaped frame and/or a carrying object frame exist in the obtained monitoring image;

the second neural network module is used for identifying first skeleton information of the human body in the human-shaped frame and/or second skeleton information of the human body in the original image;

the processing module is used for comparing the first bone information with reference bone information preset by the second neural network to obtain shielded bone information, comparing the second bone information with the shielded bone information to obtain effective bone information, intercepting an area corresponding to the effective bone information in the original image as a preprocessing area, and acquiring the preprocessed original image by an intercepting algorithm according to the preprocessing area;

the processing module is further used for obtaining a core frame according to the human-shaped frame by a preset interception algorithm, judging the correlation between the carried object frame and the core frame according to a preset carried object algorithm module, and combining the core frame and the carried object frame to be used as a preprocessing monitoring graph if the numerical value of the correlation is smaller than or equal to a reference threshold value;

and the Re-ID neural network module is used for analyzing the preprocessed monitoring graph and the preprocessed original graph.

Optionally, the preset carried object algorithm module is specifically configured to:

the two object correlation is calculated by the normalized minimum distance,

In a third aspect, the present application provides a server, which adopts the following technical solutions:

a server comprising a memory and a processor, including a memory and a processor, the memory having stored thereon a computer program that can be loaded by the processor and that performs any of the pedestrian re-identification methods described above.

In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:

a computer readable storage medium, storing a computer program that can be loaded by a processor and executed to perform any of the above pedestrian re-identification methods.

Drawings

Fig. 1 is a block flow diagram illustrating steps 100-500 according to an embodiment of the present application.

Fig. 2 is a block flow diagram illustrating steps 600-700 according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of a first neural network.

FIG. 4 is a schematic diagram of the structure of a second neural network.

Fig. 5 is a schematic structural diagram of preset reference bone information.

FIG. 6 is a schematic diagram of the structure of a Re-ID neural network.

Fig. 7 is a system block diagram of a pedestrian re-identification system according to an embodiment of the present application.

Description of reference numerals: 2000. a pedestrian re-identification system; 2001. a first neural network module; 2002. a second neural network module; 2003. a processing module; 2004. and the Re-ID neural network module.

Detailed Description

The present application is described in further detail below with reference to figures 1-7.

The present embodiment is only for explaining the present application, and it is not limited to the present application, and those skilled in the art can make modifications of the present embodiment without inventive contribution as needed after reading the present specification, but all of them are protected by patent law within the scope of the claims of the present application.

The embodiment of the application discloses a pedestrian re-identification method. With reference to figures 1 and 2 of the drawings,

step 100: and identifying whether a human-shaped frame exists in the obtained monitoring image according to a preset first neural network.

The first neural network can adopt a target detection neural network, the first neural network can adopt a target detection algorithm, the target detection algorithm is to extract a target from an image, and the current target detection algorithm comprises the following steps: R-CNN, Fast R-CNN, Faster R-CNN, YOLO, SSD, FPN, MASK R-CNN.

The embodiment of the application adopts the SSD model based on the VGG-16 network as a training basis, and introduces the classic SSD model as follows: the SSD is a single-detection deep neural network, and meanwhile, the regression thought of the YOLO and the anchors mechanism of the Faster R-CNN are combined, the calculation complexity of the neural network can be simplified by adopting the regression thought, and the real-time performance of the algorithm is improved; features with different aspect ratio sizes can be extracted by adopting an anchors mechanism, and meanwhile, compared with a method for extracting global features at a certain position by using a YOLO (YOLO), the method for extracting the local features is more reasonable and effective in the aspect of identification; in addition, the SSD adopts a multi-scale target feature extraction method aiming at the characteristic that features of different scales express different features, and the design is favorable for improving the robustness of detecting the targets of different scales.

Referring to fig. 3, the architecture of the SSD is largely divided into two parts: one part is a deep convolutional neural network positioned at the front end, and the adopted part is an image classification network VGG with a classification layer removed, and is used for extracting the primary features of the target; the other part is a multi-scale feature detection network positioned at the rear end, which is a group of cascaded convolutional neural networks, and feature extraction is carried out on a feature layer generated by the front-end network under different scale conditions.

Given an input image and a series of truth labels, the SSD performs the following operations:

1) passing the image through a series of convolutional layers, producing a series of differently sized feature maps (e.g., 10x10, 6x6, 3x3, etc.);

2) for each position in each of these feature maps, a 3x3 convolution filter was used to evaluate a small fraction of the default bounding boxes;

3) the bounding boxes for these default edges are essentially equivalent to the anchor box of Faster R-CNN;

4) prediction is performed simultaneously for each bounding box: a) an offset of the bounding box; b) the probability of classification.

During training, these IOU coefficient-based prediction bounding boxes are used to match the correct bounding box. The best predicted bounding box will be labeled "positive" and IoU for the other bounding boxes is greater than 0.5.

The SSD training simultaneously regresses the position and the target type, the target loss function is the sum of confidence loss and position loss, and the expression is as follows:

in the formula: n is the number of default frames matched with the frame of the ground truth object; l is_conf(z, c) is confidence loss; l is_loc(z, l, g) is the position loss, and z is the matching result of the default frame and the frame of the ground truth objects in different classes; c is the confidence of the predicted object frame; l is the position information of the predicted object frame; g is the position information of the frame of the ground truth object; a is a parameter that trades off confidence loss against location loss and is typically set to 1.

The human-shaped frame is an area where a human body is not shielded by a shielding object, so that the human-shaped frame can be a picture containing the whole human body or a picture containing a part of the human body, the first neural network can automatically demarcate the size (range) of the human-shaped frame according to the area where the human body is not shielded, and the first neural network intercepts and removes the background in a target detection mode, so that the interference of the background can be effectively filtered.

With reference to figure 1 of the drawings,

step 200: and if so, identifying first skeleton information of the human body in the human-shaped frame according to a preset second neural network.

In the embodiment of the application, a second neural network is an Open Pose network, the Open Pose network is an algorithm for effectively detecting the two-dimensional posture of a multi-target human body in an image, and is a human body posture estimation algorithm from bottom to top.

The overall flow of the Open Pose algorithm is as follows: the Open Pose algorithm needs to input a color image with the size of w × h, firstly, a part confidence map set S is predicted from the input image through a feedforward network, and a part affinity vector field L is predicted at the same time, wherein the part affinity vector field L encodes the relation among all skeletal joint points of a human body and is used for subsequent skeletal joint point correlation matching; set S ═ S₁,S₂…S_J) Representing a confidence map of J body parts per skeletal joint, where S_j∈R^w×nJ ∈ {1 … J }, and the set L ═ c (c) } cL₁，L₂,…L_C) Each limb part has a C-part affinity vector field, where L_C∈R^w×h×2And C belongs to {1 … C }, LC codes a 2D vector for representing the position in the image, and finally, a part confidence map and a part affinity vector field are analyzed through a greedy algorithm to output two-dimensional human skeletal joint points for all people in the image.

Referring to fig. 4, the Open dose whole network architecture diagram is divided into an upper branch network and a lower branch network, wherein the upper branch network is used for predicting the confidence map L of the skeletal joint point, and the lower branch network is used for predicting part of the affinity vector field S.

When the Open pos is used as the human body bone joint point detector, training is not performed on the golf swing database or the pos Track database, and a model with the Open pos pre-trained is selected, which is shown in fig. 5 as 18 human body bone joint points provided by the model.

Step 201: and if the monitoring image does not exist, the monitoring image is acquired again and input into the first neural network.

And if the human-shaped frame is not identified in the monitored image, the monitored image is obtained again and input into the first neural network.

With reference to figure 1 of the drawings,

step 300: and comparing the first bone information with the reference bone information preset by the second neural network to obtain the shielding bone information, and identifying the second bone information of the human body in the original image according to the second neural network.

According to the Open Pose network, preset reference skeleton information corresponds to first skeleton information one to one, and preset reference skeleton information corresponds to second skeleton information one to one, so that the first skeleton information corresponds to the second skeleton information one to one; if the occlusion exists, the first skeleton information which is not framed in the human-shaped frame is occluded by the occlusion, and the occluded area cannot provide effective characteristics for the Re-ID neural network, so that the area needs to be abandoned and the area where the human body is not occluded is reserved.

With reference to figure 1 of the drawings,

step 400: and comparing the second bone information with the shielding bone information, and taking the bone information of the second bone information except the shielding bone information as effective bone information.

The first skeleton information shielded by the shielding object is shielding skeleton information, the shielding skeleton information can be matched in the second skeleton information, and the skeleton information which is not successfully matched in the second skeleton information is effective skeleton information.

With reference to figure 1 of the drawings,

step 500: and intercepting an area corresponding to the effective bone information in the original image as a preprocessing area.

The preprocessing area is an area corresponding to the effective bone area in the original image, and both the preprocessing area and the human-shaped frame represent the same position of a human body (namely, the same position of the human body as an area which is not shielded in the monitored image);

the intercepting mode can adopt an irregular frame shape or a regular circle shape, and in order to reduce the calculation amount of intercepting the area corresponding to the shielding bone information in the original image, the intercepting mode adopted by the embodiment of the application is a mode of horizontally intercepting a rectangular frame.

With reference to figure 2 of the drawings,

step 600: and obtaining a core frame by a preset interception algorithm according to the human-shaped frame, and obtaining a preprocessed original image by the interception algorithm according to the preprocessing area.

Because body trunk and head have contained main human body characteristic, and can include bigger detection area under the condition that people's both hands are expanded, can remove unnecessary regions such as arm information and leg and foot information to the input of noise abatement, therefore the preset interception algorithm that this application embodiment set up includes: intercepting a region corresponding to head skeleton information and/or trunk skeleton information in the first skeleton information in the human-shaped frame as a core frame; and intercepting a region corresponding to the head skeleton information and/or the body skeleton information in the two skeleton information in the preprocessing region as an original preprocessing image.

If the head skeleton information and the body skeleton information are selected in the human-shaped frame, the pre-processed original image comprises the head skeleton information and the body skeleton information, if the human-shaped frame only comprises one part of the head skeleton information/the body skeleton information, the pre-processed original image comprises the head skeleton information/the body skeleton information, and the mode of obtaining the core frame is the same as the mode.

With reference to figure 2 of the drawings,

step 700: and identifying whether the obtained monitoring image has the carrying object frame according to the first neural network, and obtaining a corresponding preprocessing monitoring image according to whether the monitoring image exists.

Step 7001: if the core frame exists, judging the correlation between the object carrying frame and the core frame according to a preset object carrying algorithm.

The human body can carry the same carried object under different cameras, so that the retention of the carried object information is convenient for the Re-ID neural network to accurately identify the identity of the human body, and the anti-interference and anti-shielding capabilities of the pedestrian Re-identification method are further improved.

The preset carried object algorithm in the embodiment of the application comprises the following steps:

the two object correlation is calculated by the normalized minimum distance,

The embodiment of the application adopts a standardized Euclidean distance to measure the correlation between the carrying object frame and the core frame, the correlation detects the distance correlation between the carrying object frame and the core frame and the motion speed correlation between the carrying object frame and the core frame, and the normalization is a dimensionless processing means and enables the absolute value of the physical system value to be changed into a certain relative value relation;

in addition, the correlation between the carrying object frame and the core frame in the N frames of monitoring images is continuously measured, wherein N is 5, an algorithm of the sum of continuous minimum distances between sampling points of the carrying object frame and the core frame is adopted, whether the carrying object belongs to the same person or not can be judged quickly and efficiently, and the reliability of the correlation detection of the carrying object frame and the core frame is improved through a continuous multi-frame detection mode.

Step 7002: and if the correlation value is less than or equal to the reference threshold value, combining the core frame and the carrying object frame to be used as a preprocessing monitoring graph.

If the carried object frame exists in the monitored image and belongs to the core frame, the core frame and the carried object frame are combined to be used as a preprocessing monitoring image, the characteristics of the carried object in the monitored image are reserved, and the Re-ID neural network can accurately identify the identity of the human body conveniently.

Step 7003: and if the value of the correlation is larger than the reference threshold value, taking the core frame as a preprocessing monitoring graph.

If the carrying object frame does not belong to the core frame, it indicates that the carrying object may belong to other people or is a small-sized shelter (such as a mask, a small backpack and the like) and the shelter does not interfere with the first skeleton information of the monitored image, and at this time, the interference can be ignored, and the core frame is used as a preprocessed monitoring image; the label of the obstruction is input to the first neural network in advance.

Step 7004: and if no object carrying frame exists, taking the core frame as a preprocessing monitoring graph.

If the carrying object frame does not exist, the human-shaped frame is not blocked, and only the human body trunk and head features are extracted at the moment, because the features actually contain main human body features, a larger detection area can be included under the condition that two hands of a person are unfolded, and unnecessary areas can be removed to reduce the interference of background information.

With reference to figure 2 of the drawings,

step 800: and inputting the pre-processing monitoring graph and the pre-processing original graph into a preset Re-ID neural network to obtain an output value of the Re-ID neural network.

Currently, deep learning based Re-ID methods fall into the following categories:

the method based on the characterization learning is a very common pedestrian Re-identification method, which mainly benefits from deep learning, particularly rapid development of a convolutional neural network, and because CNN can automatically extract characterization features from original image data according to task requirements, the pedestrian Re-identification problem can be regarded as a classification problem or a verification problem: (1) the classification problem is that the ID or attribute of the pedestrian is used as a training label to train the model; (2) the verification problem is that a pair of (two) pedestrian pictures are input, and the network is allowed to learn whether the two pictures belong to the same pedestrian or not.

Secondly, the Re-ID method based on metric learning: different from characterization learning, metric learning aims to learn the similarity of two pictures through a network, specifically, the similarity of different pictures of the same pedestrian is greater than that of different pictures of different pedestrians in the problem of pedestrian weight identification, and finally, the loss function of the network enables the distance between the pictures (positive sample pairs) of the same pedestrian to be as small as possible and the distance between the pictures (negative sample pairs) of different pedestrians to be as large as possible.

Thirdly, a Re-ID method based on local characteristics: the idea of extracting local features mainly includes image dicing, positioning by using skeleton key points, posture correction and the like.

Fourthly, the Re-ID method based on the video sequence comprises the following steps: at present, research on Re-ID of a single frame is still mainstream, because a data set is relatively small, but information of a single frame image is generally limited, so that much work is focused on research on a pedestrian Re-identification method by using a video sequence, and the most important difference of the method based on the video sequence is that the method not only considers content information of the image, but also considers motion information between frames and the like.

Referring to fig. 6, the Re-ID neural network employed in the practice of the present application proposes a deep neural network architecture describing human Re-identification problems as binary classification, the Re-ID neural network architecture comprising: and finally, estimating whether the maximum value of the input soft image is the same as the final input value or not by using the binding convolution of the two layers of maximum pools, the adjacent difference of cross input, the patch abstract characteristic, the cross-patch characteristic and the high-order relation.

(1) Binding Convolution (Tied Convolution):

in order to make the features of the two images (input image pair) comparable, the weights are shared when the first two layers (two branches) are subjected to feature extraction, the size of the input image is 160 × 60 × 30, the first convolutional layer is 20 × 5 × 3 filters (the output size is (6-5+1)/X (160-5+1)/1 is 56 × 156 × 20), the size of 28X78X20 is obtained through a max firing layer of 2X2, the second convolutional layer (25X5X 20) and the max firing layer (2X2) are output in 12X37X 25.

(2) Cross-Input Neighborhood Difference (Cross-Input Neighborhod Difference):

the layer input is a feature map obtained by convolving an image pair and is recorded as f_i，g_i∈R^12×37The output of the layer is K_i∈R^12×37×5×5(1. ltoreq. i. ltoreq.25), each K_i(x,y)∈R^5×5Namely a matrix of 5x5, (1 ≦ x ≦ 12 and 1 ≦ y ≦ 37), the specific formula is as follows:

wherein f is_i(x, y) agent (5,5) f_iThe pixel values are replicated to a matrix of 5X5, N g_i(x，y)]G of 5X5 centered on (X, y)_iNeighborhood value matrix, will_iAnd g_iExchanged to obtain K'_iThus, K is obtained_i}²⁵ _i＝1And { K'_i}²⁵ _i＝1Has 50 dimensions and the size of 12X 37.

(3) Patch Summary (Patch Summary Features):

the function of this layer is to summarize the block of the neighborhood difference map 5X5, specifically, perform convolution operation on K with 25 filters of 5X25, where the step size is 5, actually, the K size is (12X5) X (37X5) X25, the K ' size is (12X5) X (37X5) X25, that is, (12X5) X (37X5) X50, and the convolution kernel size is 5X25, so that K obtains L as (12X37X25) after convolution, and L ' as (12X37X25) (12X37X50) after convolution, where convolution respectively rolls up to the first 25 dimensions (K) and the last 25 dimensions (K '), rather than convolution directly operating on 50 dimensions.

(4) Trans-patcher (Across Patch Features):

learning the spatial relationship of neighborhood difference, convolving L (step length is 1) by using 25 convolution kernels of 3X3X25, then obtaining a 25-dimensional 5X18 feature layer M through max posing of 2X2, wherein M belongs to R5 multiplied by 18 multiplied by 25, and processing L 'in the same way to obtain M', wherein L → M; l '→ M' L → M; the filter of L '→ M' is not bound.

(5) High-order relationship (Fully Connected Layer):

a fully connected layer is applied after M and M ', which captures the high order relationships by combining information from patches that are far away from each other and combining the information from M with the information from M', the size of the synthesized feature vector is 500 by ReLu nonlinearity, then the 500 outputs are passed to another fully connected layer, which contains 2 soft Max units, which represents the probability that the two images in a pair are the same person or different persons.

With reference to figure 7 of the drawings,

the embodiment of the present application further provides a pedestrian re-identification system 2000, which includes:

the first neural network module 2001 is configured to identify whether a human-shaped frame and/or a carrying object frame exists in the acquired monitoring image;

a second neural network module 2002 for identifying first skeleton information of the human body in the human-shaped frame and/or second skeleton information of the human body in the original image;

the processing module 2003 is used for comparing the first bone information with reference bone information preset by the second neural network to obtain shielded bone information, comparing the second bone information with the shielded bone information to obtain effective bone information, intercepting an area corresponding to the effective bone information in the original image as a preprocessing area, and acquiring the preprocessed original image by an intercepting algorithm according to the preprocessing area;

the processing module 2003 is further configured to obtain a core frame according to the human-shaped frame by using a preset interception algorithm, judge the correlation between the carried object frame and the core frame according to a preset carried object algorithm module, and merge the core frame and the carried object frame to serve as a preprocessing monitoring graph if the numerical value of the correlation is smaller than or equal to a reference threshold;

and the Re-ID neural network module 2004 is used for analyzing the preprocessed monitoring graphs and the preprocessed original graphs.

the two object correlation is calculated by the normalized minimum distance,

The embodiment of the application also provides a server, which comprises a memory and a processor, wherein the memory and the processor are included, and the memory is stored with a computer program which can be loaded by the processor and can execute any pedestrian re-identification method.

It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.

The non-volatile memory may be ROM, Programmable Read Only Memory (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory.

Volatile memory can be RAM, which acts as external cache memory. There are many different types of RAM, such as Static Random Access Memory (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synclink DRAM (SLDRAM), and direct memory bus RAM.

The processor mentioned in any of the above may be a CPU, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method for transmitting feedback information. The processing unit and the storage unit may be decoupled, and are respectively disposed on different physical devices, and are connected in a wired or wireless manner to implement respective functions of the processing unit and the storage unit, so as to support the system chip to implement various functions in the foregoing embodiments. Alternatively, the processing unit and the memory may be coupled to the same device.

Embodiments of the present application further provide a computer-readable storage medium, which stores a computer program that can be loaded by a processor and execute any one of the above pedestrian re-identification methods.

The computer-readable storage media of the embodiments of the present application may take any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The embodiments are preferred embodiments of the present application, and the scope of the present application is not limited by the embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.

Claims

1. A pedestrian re-identification method, comprising:

and acquiring the output value of the Re-ID neural network.

2. The pedestrian re-identification method according to claim 1, wherein the preset intercept algorithm comprises:

3. The pedestrian re-identification method according to claim 1, wherein if there is no object-carrying frame, the core frame is used as the pre-processing monitoring map.

4. The pedestrian re-identification method according to claim 3, wherein if the correlation value is greater than the reference threshold, the core frame is used as the pre-processing monitoring map.

5. The pedestrian re-identification method according to claim 1, wherein the preset carrier algorithm comprises:

the two object correlation is calculated by the normalized minimum distance,

6. The pedestrian re-identification method according to claim 5, wherein the preset carrier algorithm comprises:

7. A pedestrian re-identification system, comprising:

the first neural network module (2001) is used for identifying whether a human-shaped frame and/or a carrying object frame exist in the acquired monitoring image;

the second neural network module (2002) is used for identifying first skeleton information of the human body in the human-shaped frame and/or second skeleton information of the human body in the original image;

the processing module (2003) is used for comparing the first bone information with reference bone information preset by the second neural network to obtain shielded bone information, comparing the second bone information with the shielded bone information to obtain effective bone information, intercepting an area corresponding to the effective bone information in the original image as a preprocessing area, and acquiring the preprocessed original image by an intercepting algorithm according to the preprocessing area;

the processing module (2003) is further used for obtaining a core frame according to the human-shaped frame by a preset interception algorithm, judging the correlation between the carried object frame and the core frame according to a preset carried object algorithm module, and combining the core frame and the carried object frame to be used as a preprocessing monitoring graph if the numerical value of the correlation is smaller than or equal to a reference threshold value;

and the Re-ID neural network module (2004) is used for analyzing the pre-processing monitoring graph and pre-processing the original graph.

8. The pedestrian re-identification system according to claim 7, wherein the preset carried object algorithm module is specifically configured to:

the two object correlation is calculated by the normalized minimum distance,

9. A server, comprising a memory and a processor, the memory having stored thereon a computer program which is loadable by the processor and adapted to perform the method of any of claims 1 to 6.

10. A computer-readable storage medium, in which a computer program is stored which can be loaded by a processor and which executes the method of any one of claims 1 to 6.