CN117036846A

CN117036846A - A helmet wearing detection method based on hybrid connection improved YOLOv5

Info

Publication number: CN117036846A
Application number: CN202310785410.0A
Authority: CN
Inventors: 曹振涛; 颉志杰; 李洁; 王希刚; 李跃杰; 程倩
Original assignee: China Petroleum and Chemical Corp; Sinopec Southwest Oil and Gas Co
Current assignee: China Petroleum and Chemical Corp; Sinopec Southwest Oil and Gas Co
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-11-10

Abstract

The invention discloses a safety helmet wearing detection method based on hybrid connection improved YOLOv 5. The method comprises the steps of completing format conversion of a labeling file by collecting field operation images, and establishing a safety helmet wearing detection data set; constructing a YOLOv5v6.0 network, improving the network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improvement YOLOv 5; inputting the data set into a YOLOv5 network model for training to obtain training parameters; completing model deployment and system interface design, and performing functional test by using a test sample; and if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out. The method is an intelligent helmet wearing automatic detection method which can be used in any construction operation environment, has high detection accuracy and high detection speed, is beneficial to improving the detection efficiency of the helmet wearing condition, reduces the working intensity of safety management staff, and provides a new visual angle for realizing and implementing an intelligent construction site.

Description

Safety helmet wearing detection method based on hybrid connection improved YOLOv5

Technical Field

The invention relates to the field of artificial intelligent target detection, in particular to a safety helmet wearing detection method based on a hybrid connection improved YOLOv5 algorithm.

Background

The safety helmet can play a good role in buffering, damping and dispersing stress in various fields of oilfield drilling, mining industry and the like, and the standard wearing of the safety helmet is an effective protective measure for protecting the head of a human body. The supervision and inspection of the helmet wearing has great significance for protecting the safety of the staff, and is also necessary responsibility and obligation of production and management units, and the regular production units are provided with safety management staff for checking and supervising the wearing condition of the helmet.

With the development of computer vision technology, computer-aided supervision and inspection is one of the current common methods. The advanced degree of intelligent algorithms and equipment is continuously advanced, and a plurality of prospective development possibilities, research values and significance are brought to the computer vision technology.

At present, the safety helmet supervision and inspection work can be divided into on-site supervision, photo inspection, video playback inspection and real-time shooting inspection according to inspection data forms, and each mode needs to be provided with a safety manager for manual judgment, so that time and labor are wasted, and missed observation and misjudgment can be caused by fatigue and other artificial subjective factors. The headgear wear detection method may be regarded as a target detection method for the headgear wear situation. YOLOv5 has emerged as a one-stage target detection algorithm that has been developed in recent years, and has excellent properties such as high accuracy and high speed, but has low target accuracy.

Disclosure of Invention

The invention aims to solve the problem that the accuracy of a detection algorithm using YOLOv5 as a target is not high in the prior art, and provides a safety helmet wearing detection method based on hybrid connection improvement YOLOv 5.

In order to achieve the above object, the present invention provides the following technical solutions:

the method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 is characterized by comprising the following steps of:

s1, collecting an on-site operation image;

s2, carrying out data annotation on the collected images, completing format conversion of annotation files, and establishing a safety helmet wearing detection data set;

s3, constructing a YOLOv5v6.0 network, improving a network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improved YOLOv 5;

s4, inputting the data set established in the S2 into the improved YOLOv5 network model established in the S3 for training, obtaining training parameters, and storing the obtained weight files;

s5, completing model deployment and system interface design, and performing functional test by using a test sample, wherein the method comprises the following steps: image, video, camera video stream;

and S6, if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out, and if the personnel do not wear the safety helmet, the step S4 is returned to.

The method is improved aiming at the YOLOv5 algorithm structure, short circuit connection structures of local residual connection and macroscopic dense connection are carried out, structures such as group convolution, depth separable convolution and the like are used in a compression laser valve control unit, the new extracted characteristic information is effectively fused by using a Softmax through an update door, reuse characteristics are attenuated by using a Sigmoid function through a forgetting door, and the accuracy of model detection is further improved.

Preferably, in the step S1, based on the MS COCO data set, acquiring the construction operation image related to the helmet by combining with the web crawler tool, and acquiring the operation video record on the operation site with pertinence according to the use environment, including: and (3) in operation sites such as chemical plants, construction sites and the like, video frames are intercepted, and image data collection is completed.

The web crawler tool is used for acquiring construction operation images related to the safety helmet, the collected picture data are more in quantity and comprehensive, and the data set is richer.

Preferably, the step S2 includes the following steps:

s21, marking the images collected in the step S1 by using a QT development interface Labelme in Python, marking the content of wearing the safety helmet as 1, marking the content of not wearing the safety helmet as 0, and outputting and storing the marking result as an xml format file;

s22, converting the annotation file into a YOLO data set tag file in txt format;

s23, according to the number of images in the data set and 8:1: the training set, the validation set and the test set are proportioned 1.

By using the method, a targeted, comprehensive and accurate safety helmet wearing detection data set can be established.

Preferably, in the step S3, the method includes the following steps:

s31, constructing a YOLOv5v6.0 network, wherein the YOLOv5v6.0 network comprises P1-P5, 8C 3 structures, an SPPF structure and two Upsample structures, wherein the first 4C 3 structures use BottleNeck1 Xn structures for feature extraction, and the last 4C 3 structures use BottleNeck2 Xn structures for feature extraction;

s32, replacing the BottleNeck1 Xn module with a hybrid connection module 1 which introduces a attention mechanism, adding a Dense connection structure between the input and the output of the hybrid connection module 1, and adding a 1X 1 convolution layer before final output, so that the transmission and protection capability of a model on characteristic information is enhanced, and the number of original channels is ensured to be unchanged;

s33, replacing the BottleNeck2 Xn module with a hybrid connection module 2, adding a dense connection structure between the input and the output of the hybrid connection module 1, continuing the original BottleNeck2 Xn structure characteristics of the hybrid connection module 2, and removing a jump connection structure between every two unit structures on the basis of the hybrid connection module 1;

s34, in the hybrid connection module 1 and the hybrid connection module 2, a compression excitation valve control structure is used for replacing the original ConvBNSILU structure;

s35, in a compression excitation valve control structure, a compression unit is formed by using 1 multiplied by 1 and 3 multiplied by 3 groups of convolutions; the method comprises the steps of forming an information activation unit by using parallel 3X 3 and 5X 5 depth separable convolutions, and connecting a gating structure update gate with an attention mechanism behind the depth separable convolutions layers of two different convolution kernels, wherein the gating structure update gate is used for carrying out information fusion on two output activation information of a parallel structure; and a forgetting gate is connected in parallel to the outside of the activation unit and used for filtering the repeatedly extracted characteristic information, and a residual connection (Residual connection) structure is embedded between the compression unit and the output of the forgetting gate, so that the mixed connection of the inside of all C3 structures can be realized.

The lengths obtained by the 3×3 convolution and the 5×5 convolution are similar, the depth separable convolution of the 3×3 convolution and the 5×5 convolution which are connected in parallel is the lightest combination, the speed performance is high, and the model occupation is small.

The method for replacing the original BottleNeck structure by the hybrid connection structure can accelerate the transmission of the characteristic information, reserve more characteristic information, realize the effective utilization of the characteristic information, ensure that the complexity of a model is not increased, and effectively improve the accuracy of a safety helmet wearing detection algorithm.

Preferably, the compressing the laser control unit in step S35 includes the following steps:

s351, inputting the characteristic images into a compression laser control unit, sequentially passing through a 1 multiplied by 1 convolution layer and a 3 multiplied by 3 convolution layers, wherein the two convolution layers can use a width multiplier or a step length to realize the downsampling operation of a model, so as to obtain the characteristic images with reduced characteristic dimensions, and achieve the purpose of reducing the complexity of the model;

s352, feature information activation and data filling are carried out by using a structure with 3 multiplied by 3 and 5 multiplied by 5 depth separable convolution parallel connection, so that more activated feature information can be obtained under the condition that two parallel output dimensions are kept the same, and more effective learning of information by a model is realized;

s353, connecting an update gate after the two depth separable convolution structures for effectively fusing the characteristic information of different branches, and respectively obtaining characteristic images after 3×3 and 5×5 depth separable convolution to enter the update gate;

s354, embedding a forgetting gate and a residual error connection parallel weighted aggregation structure between the two compression structures of S351 and the output of the updating gate, and adding the forgetting gate and the residual error connection parallel weighted aggregation structure with the output of the updating gate, so that the complex utilized characteristics can be screened and attenuated by using a channel attention mechanism and a Sigmoid function; the forget gate and the update gate have the same Softmax gating structure before the global attention pooling layer, and then the output of the forget gate is obtained through a first full connection layer, a batch standardization and activation function layer, a second full connection layer and a Sigmoid function layer respectively;

the Sigmoid function can control the channel weight between (0, 1), so that attenuation of repeated utilization characteristics is effectively realized.

S355, adding the corresponding positions of the final output characteristic diagrams of the update gate and the forget gate to obtain the final output of the compression excitation gate control unit.

Through the steps, efficient propagation of the characteristic information is realized.

Preferably, in the step S353, for the 3×3 branches, the input feature image reduces the channel dimension to 1 by a 1×1 convolution and a Softmax function, so as to implement the attention profile normalization, and obtain a final attention profile descriptor, where each element is expressed as:

in the method, in the process of the invention,the final attention characteristic diagram obtained after the normalization of the Softmax function is represented, and H, W and C respectively represent the length, the width and the channel number of the characteristic diagram; />Representing input feature map->Or (b)Attention feature map obtained through 1×1 convolution kernel;

secondly, image contraction is carried out through the global attention pooling layer respectively, a global context feature map is generated, and each element of the global context feature map can be expressed as:

in the method, in the process of the invention,representing a global context profile obtained by weighting the final profile through global attention pooling,/->A global context feature map element representing a c-th channel;

for 5×5 branches, global context feature maps can be obtained via the same parallel structure

Then, the global context feature map obtained by the two parallel branches is connected on the channel, and the global context feature map is converted into a hidden process feature map through a fully connected layer with a reduction coefficient:

h＝W(z ^3×3 ||z ^5×5 )+b

in the method, in the process of the invention,μ represents a reduction coefficient; />And->Respectively representing the weight and the deviation value of the full connection layer;

then, the hidden process feature map is activated by batch normalization and nonlinear activation functions to obtain a hidden feature map, which can be expressed as:

h′＝Tanh{BN[W(z ^3×3 ||z ^5×5 )+b]}

in the method, in the process of the invention,BN represents batch normalized BatchNorm, tanh represents Tanh () activation function;

this process can improve the efficiency of information fusion and propagation by setting the reduction factor.

Then, two parallel full-connection layers are used for acting on the fused hidden feature graphs to generate a channel attention feature graph, and channel feature normalization is realized through a Softmax function to generate a final channel attention feature graph, wherein the channel attention feature graph can be expressed as:

in the formula, the channel attention characteristic diagram,respectively representing the weight and the deviation value of the two full connection layers;

the final channel attention profile can be expressed as:

wherein u is ^3×3 ,

Finally, the final channel attention feature graphs are respectively weighted and aggregated with the global context feature graphs of the corresponding branches, so that the final output of the update gate can be obtained:

preferably, in the step S354, the output of the forgetting gate is expressed as:

in the method, in the process of the invention,representing the input characteristic diagram->Global context feature graphs obtained through the same global attention pooling and other structures as the update gate; />Respectively representing the weight and the deviation value of the first full connection layer; />Respectively representing the weight and the deviation value of the second full connection layer; r is (r) ^f Representing bottleneck coefficient>A final channel attention feature map representing forget gate output;

preferably, in the step S4, the method includes the following steps:

s41, training a safety helmet wearing detection network based on the hybrid connection improved YOLOv5 in the step S3 based on the safety helmet wearing detection data set established in the step S2, wherein an Adam or SGD optimizer can be used, a training period (epoch) is 300, and different batch sizes (batch_size) are set to be respectively trained for 4, 8, 16, 32 and 64;

s42, testing training results based on weight files and test set data obtained by training in different batch sizes;

s43, fine-tuning the super parameters of the network on the verification set;

s44, storing optimal training parameters and weight files thereof, drawing a confusion matrix, an F1 curve, a P curve, an R curve and a P-R curve, calculating a training set prediction frame position loss train/box_loss, a target loss train/obj_loss, a confidence coefficient loss train/cls_loss, a verification set prediction frame position loss val/box_loss, a target loss val/obj_loss, a confidence coefficient loss val/cls_loss, a Precision, a Recall ratio Recall, an average correct ratio mean mAP@0.5 and mAP@0.5:0.95, and drawing a function image.

Preferably, in the step S5, the method includes the following steps:

s51, adding a main interface, and setting parameters such as a window main class, a picture video reading process and thread, model and interface initialization;

s52, respectively setting a picture, a video and a camera real-time data stream reading interface;

s53, set events such as picture, video, etc. on, off and reset events.

Preferably, in the step S6, the alarm prompt is implemented through a Playsound database carried by Python, and when the model detects that the person does not wear the safety helmet, a real-time alarm prompt is sent to remind the person that the person does not wear the safety helmet, so that the working strength of the safety manager is reduced.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, a hybrid connection structure is added in the YOLOv5 network, and the transmission speed and the utilization efficiency of characteristic information are accelerated and the model is protected from excessive redundancy through a short circuit connection structure of local residual connection and macro dense connection;

2. in the method, in a YOLOv5 network, a compression excitation valve control unit is embedded into a hybrid connection module, and the built compression excitation valve control unit adopts structures such as group convolution, depth separable convolution and the like, so that the parameter and the calculated amount of a model are saved under the condition of ensuring the performance, and the light weight effect of the model is maintained;

3. in the improved YOLOv5 network, the update gate uses Softmax to realize effective fusion of newly extracted feature information, the forget gate uses Sigmoid function to realize attenuation of reuse features, and both introduce a light channel attention mechanism, so that a model can learn more feature information, the accuracy of model detection is improved, the complexity of the model is not too high, the memory occupation ratio of the model is not too large, and the detection speed is not reduced;

4. the method is an intelligent helmet wearing automatic detection method which can be used in any construction operation environment, is beneficial to improving the detection efficiency of the helmet wearing condition, reduces the working intensity of safety management staff, has wide popularization significance and development research value, and provides a new view for realizing and implementing an intelligent construction site.

Drawings

Fig. 1 is a flowchart of a method for detecting the wearing of a helmet based on the hybrid connection improved YOLOv5 of the present embodiment 1;

fig. 2 is a block diagram of a hybrid connection module 1 for replacing the BottleNeck1×n of the present embodiment 1;

fig. 3 is a block diagram of a hybrid connection module 2 for replacing the BottleNeck2×n of the present embodiment 1;

fig. 4 is a structural diagram of a compression-excited gate control (SEG) structure of the present embodiment 1;

fig. 5 is a structural diagram of the update door of embodiment 1;

fig. 6 is a structural diagram of a forgetting door of the present embodiment 1;

fig. 7 is a view showing the effect of the helmet wearing detection of embodiment 1;

FIG. 8 is a graph showing the comparison of the experimental results of the method of the present invention in example 1 with other prior art methods.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should not be construed that the scope of the above subject matter of the present invention is limited to the following embodiments, and all techniques realized based on the present invention are within the scope of the present invention.

Example 1

The invention provides a method for detecting wearing of a safety helmet based on hybrid connection improved YOLOv5, which is shown in figure 1 and comprises the following steps:

s1: collecting a targeted field operation image;

s2: the collected images are subjected to data annotation, format conversion of annotation files is completed, and a targeted safety helmet wearing detection data set is established;

s3: constructing a YOLOv5v6.0 network, improving a network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improvement YOLOv 5;

s4: inputting the data set established in the step 2 into the improved YOLOv5 network model established in the step S3 for training, obtaining training parameters, and storing the obtained weight file;

s5: completing model deployment and system interface design, and performing functional test by using a test sample head video stream, wherein the test sample comprises an image, a video and a camera video stream;

s6: if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out, and if the personnel do not wear the safety helmet, the step S4 is returned to.

In the step S1, based on the MS COCO data set, a related construction operation image of the helmet is obtained in combination with a web crawler tool, and an operation video recording is obtained on an operation site such as a chemical plant, a construction site and the like in a targeted manner according to the use environment, and effective video frames are intercepted to complete image data collection.

The implementation process for constructing the safety helmet wearing detection data set in the step S2 specifically comprises the following steps:

s21: marking the images collected in the step S1 by using a QT development interface Labelme in Python, marking the content of wearing the safety helmet as 1, marking the content of not wearing the safety helmet as 0, and outputting and storing the marking result as an xml format file;

s22: converting the markup file into a YOLO dataset tag file (txt format);

s23: according to the number of images in the dataset, according to 8:1: and 1, dividing a training set, a verification set and a test set in proportion, and finally establishing a targeted safety helmet wearing detection data set.

The step S3 of constructing the safety helmet wearing detection network model based on the hybrid connection improved YOLOv5 comprises the following steps:

step S31: constructing a YOLOv5v6.0 network, which mainly comprises P1-P5, 8C 3 structures, an SPPF structure and two Upsamples, wherein the first 4C 3 structures use BottleNeck1 Xn structures for feature extraction, and the last 4C 3 structures use BottleNeck2 Xn structures for feature extraction;

step S32: as shown in fig. 2, the BottleNeck1×n module in the network is replaced by a hybrid connection module 1 with attention mechanism introduced, a Dense connection (Dense) structure is added between the input and the output of the hybrid module 1, and a 1×1 convolution layer is added before final output, so that the transmission and storage capacity of the model to the feature information is enhanced, and the original channel number is ensured to be unchanged;

step S33: as shown in fig. 3, the BottleNeck2×n module is replaced by a hybrid connection module 2, and the structure of the hybrid connection module 2 is similar to that of the hybrid connection module 1, and dense connection structures are added, except that the hybrid module 2 continues the original BottleNeck2×n structure characteristics, and the jump connection structure between every two unit structures is removed on the basis of the hybrid module 1;

step S34: in the hybrid connection module 1 and the hybrid connection module 2, a compression excitation valve control (SEG) structure shown in fig. 4 is used to replace the original ConvBNSiLU structure;

step S35: in the compression excitation valve control structure, a compression unit is formed by using 1×1 convolution and 3×3 group convolution; an information activation unit is formed by using parallel 3×3 and 5×5 depth separable convolutions, and a gating structure update gate with an attention mechanism shown in fig. 5 is connected behind the depth separable convolutions layers of two different convolution kernels and is used for carrying out information fusion on two output activation information of a parallel structure; the outside of the activation unit is connected with a forgetting gate as shown in fig. 6 in parallel for filtering the repeatedly extracted characteristic information, and a residual connection (Residual connection) structure is embedded between the compression unit and the output of the forgetting gate, so that the mixed connection inside all the C3 structures is realized.

The network training in step S4 specifically includes the following steps:

step S41: training the safety helmet wearing detection network based on the hybrid connection improved YOLOv5 in the step S3 based on the safety helmet wearing detection data set established in the step S2, using an Adam or SGD optimizer, setting a training period (epoch) to 300, and setting different batch sizes (batch_size) to be 4, 8, 16, 32 and 64 for training respectively;

step S42: testing training results based on weight files and test set data obtained by training in different batches;

step S43: the hyper-parameters of the network are fine-tuned on the validation set.

Step S44: the optimal training parameters and weight files thereof are saved, an confusion matrix, an F1 curve, a P curve, an R curve and a P-R curve are drawn, the position loss train/box_loss, the target loss train/obj_loss, the confidence loss train/cls_loss of a training set prediction frame, the position loss val/box_loss of a verification set prediction frame, the target loss val/obj_loss, the confidence loss val/cls_loss, the Precision, the Recall rate Recall, the average accuracy average mAP@0.5 and mAP@0.5:0.95 are calculated, and a function image is drawn.

The step S5 specifically includes the following steps:

step S51: adding a main interface, setting parameters such as a window main class, a picture video reading process and a thread, a model and interface initialization and the like;

step S52: respectively setting a picture, a video and a camera real-time data stream reading interface;

step S53: setting events such as pictures, videos, etc. turn on, off and reset events.

The practical application effect of the embodiment of the invention is shown in fig. 7.

The alarm prompt sending operation in the step S6 is realized through a Python self-contained Playsound database, and when the model detects that the person does not wear the safety helmet, a real-time alarm prompt is sent.

The implementation mechanism of adding the hybrid connection structure in the YOLOv5 network is as follows:

firstly, building a mixed connection module 1 and a mixed connection module 2 of local residual connection and external dense connection phase aggregation are respectively shown in fig. 2 and 3, and respectively replacing BottleNeck1 multiplied by n and BottleNeck2 multiplied by n structures in YOLOv5 with two structures; secondly, embedding a compression excitation valve control unit shown in fig. 4 into the hybrid connection module, compressing the characteristic image by utilizing standard convolution and group convolution in the compression excitation valve control unit, activating information by adopting depth separable convolution with different scales, and fusing information by introducing a gating structure of an attention mechanism; finally, an update gate that facilitates the transfer of feature information and a forget gate that attenuates repeated feature information as shown in fig. 5 are constructed as shown in fig. 7. The method for replacing the original BottleNeck structure by the hybrid connection structure can accelerate the transmission of the characteristic information, reserve more characteristic information, realize the effective utilization of the characteristic information, ensure that the complexity of a model is not increased, and effectively improve the accuracy of a safety helmet wearing detection algorithm.

Further, in the compression excitation gating unit, efficient propagation of feature information is realized by adopting methods of group convolution, depth separable convolution, attention combined gating mechanism, residual error connection and the like, and the method specifically comprises the following steps:

s351, inputting the feature map into a compression laser control unit, sequentially passing through a 1X 1 convolution layer and a 3X 3 group convolution layer, wherein both the two convolution layers can use a width multiplier or a step length to realize the downsampling operation of the model, so as to obtain a feature image with reduced feature dimension, and achieve the purpose of reducing the complexity of the model;

s353, connecting an update gate after the two depth separable convolution structures to realize effective fusion of the characteristic information of different branches, and respectively obtaining characteristic images after 3×3 and 5×5 depth separable convolution to enter the update gate. This process can be seen as being accomplished by the following five steps:

(1) for the 3×3 branch, the input feature image reduces the channel dimension to 1 through a 1×1 convolution and a Softmax function, so as to normalize the attention feature image, obtain a final attention feature descriptor, and each element in the final attention feature descriptor can be expressed by the following formula:

wherein,the final attention characteristic diagram obtained after the normalization of the Softmax function is represented, and H, W and C respectively represent the length, the width and the channel number of the characteristic diagram; />Representing input feature map->Or (b)The attention profile obtained through the 1×1 convolution kernel, i, j represent the number of rows and columns of the profile, respectively.

(2) Image contraction is performed through the global attention pooling layer respectively, a global context feature map is generated, and each element of the global context feature map can be expressed as:

wherein,representing a global context profile obtained by weighting the final profile through global attention pooling,/->A global context feature map element representing the c-th channel.

Similarly, another global context feature map can be obtained through the same parallel structure

(3) And (3) carrying out channel connection on the global context feature graphs obtained by the two parallel branches, and converting the global context feature graphs into a hidden process feature graph through a full connection layer with a reduction coefficient:

h＝W(z ^3×3 ||z ^5×5 )+b

wherein,μ represents a reduction coefficient; />And->Respectively representing the weight and the deviation value of the full connection layer;

the hidden process feature map is activated by batch normalization and nonlinear activation functions to obtain a hidden feature map, which can be expressed as:

h′＝Tanh{BN[W(z ^3×3 ||z ^5×5 )+b]}

wherein,BN represents batch normalized BatchNorm, tanh represents Tanh () activation function;

(4) And generating a channel attention characteristic diagram by using two parallel full-connection layers to act on the fused hidden characteristic diagram, and then realizing channel characteristic normalization by using a Softmax function to generate a final channel attention characteristic diagram.

The channel attention profile can be expressed as:

wherein, the channel attention feature mapRespectively representing the weights and deviation values of the two fully connected layers.

The final channel attention profile can be expressed as:

wherein u is ^3×3 ,

(5) And respectively carrying out weighted aggregation on the final channel attention feature map and the global context feature map of the corresponding branch, so as to obtain the final output of the update gate:

wherein,

s354, embedding a forgetting gate and a residual connection parallel weighted aggregation structure between the two compression structures in the step 1) and the output of the updating gate, and adding the forgetting gate and the residual connection parallel weighted aggregation structure with the output of the updating gate. Doing so may use the channel attention mechanism and Sigmoid function to filter out the re-used features. The forget gate and the update gate have the same Softmax gating structure before the global attention pooling layer, and then the output of the forget gate is obtained through the first full connection layer, the batch standardization and activation function layer, the second full connection layer and the Sigmoid function layer respectively. Similar to the implementation mechanism of the update gate, the output of the forget gate can be expressed as follows:

wherein,representing the input characteristic diagram->Global context feature graphs obtained through the same global attention pooling and other structures as the update gate; />Respectively representing the weight and the deviation value of the first full connection layer; />Respectively representing the weight and the deviation value of the second full connection layer; r is (r) ^f Representing bottleneck coefficient>And a final channel attention characteristic diagram representing the forgetting gate output.

In this embodiment, the above several important parameters are finally set as follows: batch size (Batch size) of 16, training period (epochs) of 300, learning rate (Learning rate) of 0.1, 3×3 convolutions of groups in seg block g=4, width multiplier α=4, reduction ratio parameters (Reduction) r of update gates and forgetting gates ^u ＝r ^f Step s=1.

Finally, the method and other methods are trained based on the data set used in the embodiment, and the obtained results are compared, so that the method can be seen to improve the detection precision of the model and prove the effectiveness of the method under the condition that the detection performance is not lost.

As shown in fig. 8, based on the experimental result of the dataset of this embodiment, it can be seen that the method of the present invention has the highest detection precision, the least occupied memory of the weight file, and the shorter time for detecting the same picture. As can be seen from the detection effect shown in FIG. 8, the method of the invention realizes the computer automatic detection of the wearing of the safety helmet, and is beneficial to the further research and application of the relevant direction.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 is characterized by comprising the following steps of:

s1, collecting an on-site operation image;

2. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein in the step S1, based on an MS COCO data set, a construction operation image related to the safety helmet is acquired in combination with a web crawler tool, an operation video record is acquired on an operation site in a targeted manner according to a use environment, a video frame is intercepted, and image data collection is completed.

3. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S2 comprises the following steps:

s23, dividing the training set, the verification set and the test set according to the number of images in the data set in proportion, and finally establishing a targeted safety helmet wearing detection data set.

4. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S3 comprises the following steps:

s32, replacing the BottleNeck1 Xn module with a hybrid connection module 1 which introduces an attention mechanism, adding a dense connection structure between the input and the output of the hybrid connection module 1, and adding a 1X 1 convolution layer before final output;

s35, in a compression excitation valve control structure, a compression unit is formed by using 1 multiplied by 1 and 3 multiplied by 3 groups of convolutions; the method comprises the steps of forming an information activation unit by using parallel 3X 3 and 5X 5 depth separable convolutions, and connecting a gating structure update gate with an attention mechanism behind the depth separable convolutions layers of two different convolution kernels, wherein the gating structure update gate is used for carrying out information fusion on two output activation information of a parallel structure; and a forgetting gate is connected in parallel to the outside of the activation unit and used for filtering the repeatedly extracted characteristic information, and a residual error connection structure is embedded between the compression unit and the output of the forgetting gate.

5. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to any one of claims 1 or 4, wherein the step S35 of compressing the laser control unit realizes efficient propagation of characteristic information, and the method comprises the following steps:

s351, inputting the characteristic image into a compression laser control unit, sequentially passing through a 1X 1 convolution layer and a 3X 3 group convolution layer, and obtaining the characteristic image with the characteristic dimension reduced through downsampling operation;

s352, performing characteristic information activation and data filling by using a structure with 3×3 and 5×5 depth separable convolutions connected in parallel;

s354, embedding a structure of forgetting gate and residual connection parallel weighted aggregation between the two compression structures of S351 and the output of the update gate, and adding the structure with the output of the update gate; then the output of the forgetting gate is obtained through the first full-connection layer, the batch standardization and activation function layer, the second full-connection layer and the Sigmoid function layer respectively;

6. The method for detecting the wearing of a helmet based on the hybrid connection improved YOLOv5 of claim 5, wherein in the step S353, for the 3×3 branch, the input feature image reduces the channel dimension to 1 by a 1×1 convolution and a Softmax function, so as to normalize the attention profile, and obtain a final attention profile descriptor, and each element is expressed as:

in the method, in the process of the invention,the final attention characteristic diagram obtained after the normalization of the Softmax function is represented, and H, W and C respectively represent the length, the width and the channel number of the characteristic diagram; x is E [1, H]，y∈[1,W]；/>Representing input feature graphsOr->Attention feature map obtained through 1×1 convolution kernel;

for 5×5 branches, global context feature maps can be obtained via the same parallel structureThen, the global context feature map obtained by the two parallel branches is connected on the channel, and the global context feature map is converted into a hidden process feature map through a fully connected layer with a reduction coefficient:

h＝W(z ^3×3 ||z ^5×5 )+b

h′＝Tanh{BN[W(z ^3×3 ||z ^5×5 )+b]}

in the channel attention profileRespectively representing the weight and the deviation value of the two full connection layers;

the final channel attention profile can be expressed as:

wherein u is ^3×3 ，

7. the method for detecting the wearing of a helmet based on the hybrid connection improved YOLOv5 of claim 1, wherein in the step S354, the output of the forgetting gate is expressed as:

f＝σ{W ₂ ^f [Tanh(BN(W ₁ ^f z ^f +b ₁ ^f ))]+b ₂ ^f }

in the method, in the process of the invention,representing the input characteristic diagram->Global context feature graphs obtained through the global attention pooling structure which is the same as the update gate; />Respectively representing the weight and the deviation value of the first full connection layer; /> Respectively representing the weight and the deviation value of the second full connection layer; r is (r) ^f Representing bottleneck coefficient>A final channel attention feature map representing forget gate output;

8. the method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S4 comprises the following steps:

s41, training a safety helmet wearing detection network based on the hybrid connection improved YOLOv5 in the step S3 based on the safety helmet wearing detection data set established in the step S2, using an optimizer, setting a training period (epoch) to be 300, and setting different batch sizes (batch_size) to be 4, 8, 16, 32 and 64 for training respectively;

s43, fine-tuning the super parameters of the network on the verification set;

s44, storing optimal training parameters and weight files thereof, drawing a confusion matrix, an F1 curve, a P curve, an R curve and a P-R curve, calculating the position loss, the target loss and the confidence coefficient loss of a training set prediction frame, calculating the position loss, the target loss, the confidence coefficient loss, the precision rate, the recall rate and the average accuracy rate average mAP@0.5 and mAP@0.5:0.95 of a verification set prediction frame, and drawing a function image.

9. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S5 comprises the following steps:

s51, adding a main interface, and setting a window main class, a picture video reading process and a thread, and model and interface initialization parameters;

s53, setting pictures, turning on and off video events and resetting the events.

10. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein in the step S6, the alarm prompt is realized through a Playsound database carried by Python, and when the model detects that the person does not wear the safety helmet, the real-time alarm prompt is sent out.