[go: up one dir, main page]

CN117036846A - A helmet wearing detection method based on hybrid connection improved YOLOv5 - Google Patents

A helmet wearing detection method based on hybrid connection improved YOLOv5 Download PDF

Info

Publication number
CN117036846A
CN117036846A CN202310785410.0A CN202310785410A CN117036846A CN 117036846 A CN117036846 A CN 117036846A CN 202310785410 A CN202310785410 A CN 202310785410A CN 117036846 A CN117036846 A CN 117036846A
Authority
CN
China
Prior art keywords
safety helmet
wearing
hybrid connection
attention
gate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310785410.0A
Other languages
Chinese (zh)
Inventor
曹振涛
颉志杰
李洁
王希刚
李跃杰
程倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Petroleum and Chemical Corp
Sinopec Southwest Oil and Gas Co
Original Assignee
China Petroleum and Chemical Corp
Sinopec Southwest Oil and Gas Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Petroleum and Chemical Corp, Sinopec Southwest Oil and Gas Co filed Critical China Petroleum and Chemical Corp
Priority to CN202310785410.0A priority Critical patent/CN117036846A/en
Publication of CN117036846A publication Critical patent/CN117036846A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Helmets And Other Head Coverings (AREA)

Abstract

The invention discloses a safety helmet wearing detection method based on hybrid connection improved YOLOv 5. The method comprises the steps of completing format conversion of a labeling file by collecting field operation images, and establishing a safety helmet wearing detection data set; constructing a YOLOv5v6.0 network, improving the network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improvement YOLOv 5; inputting the data set into a YOLOv5 network model for training to obtain training parameters; completing model deployment and system interface design, and performing functional test by using a test sample; and if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out. The method is an intelligent helmet wearing automatic detection method which can be used in any construction operation environment, has high detection accuracy and high detection speed, is beneficial to improving the detection efficiency of the helmet wearing condition, reduces the working intensity of safety management staff, and provides a new visual angle for realizing and implementing an intelligent construction site.

Description

Safety helmet wearing detection method based on hybrid connection improved YOLOv5
Technical Field
The invention relates to the field of artificial intelligent target detection, in particular to a safety helmet wearing detection method based on a hybrid connection improved YOLOv5 algorithm.
Background
The safety helmet can play a good role in buffering, damping and dispersing stress in various fields of oilfield drilling, mining industry and the like, and the standard wearing of the safety helmet is an effective protective measure for protecting the head of a human body. The supervision and inspection of the helmet wearing has great significance for protecting the safety of the staff, and is also necessary responsibility and obligation of production and management units, and the regular production units are provided with safety management staff for checking and supervising the wearing condition of the helmet.
With the development of computer vision technology, computer-aided supervision and inspection is one of the current common methods. The advanced degree of intelligent algorithms and equipment is continuously advanced, and a plurality of prospective development possibilities, research values and significance are brought to the computer vision technology.
At present, the safety helmet supervision and inspection work can be divided into on-site supervision, photo inspection, video playback inspection and real-time shooting inspection according to inspection data forms, and each mode needs to be provided with a safety manager for manual judgment, so that time and labor are wasted, and missed observation and misjudgment can be caused by fatigue and other artificial subjective factors. The headgear wear detection method may be regarded as a target detection method for the headgear wear situation. YOLOv5 has emerged as a one-stage target detection algorithm that has been developed in recent years, and has excellent properties such as high accuracy and high speed, but has low target accuracy.
Disclosure of Invention
The invention aims to solve the problem that the accuracy of a detection algorithm using YOLOv5 as a target is not high in the prior art, and provides a safety helmet wearing detection method based on hybrid connection improvement YOLOv 5.
In order to achieve the above object, the present invention provides the following technical solutions:
the method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 is characterized by comprising the following steps of:
s1, collecting an on-site operation image;
s2, carrying out data annotation on the collected images, completing format conversion of annotation files, and establishing a safety helmet wearing detection data set;
s3, constructing a YOLOv5v6.0 network, improving a network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improved YOLOv 5;
s4, inputting the data set established in the S2 into the improved YOLOv5 network model established in the S3 for training, obtaining training parameters, and storing the obtained weight files;
s5, completing model deployment and system interface design, and performing functional test by using a test sample, wherein the method comprises the following steps: image, video, camera video stream;
and S6, if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out, and if the personnel do not wear the safety helmet, the step S4 is returned to.
The method is improved aiming at the YOLOv5 algorithm structure, short circuit connection structures of local residual connection and macroscopic dense connection are carried out, structures such as group convolution, depth separable convolution and the like are used in a compression laser valve control unit, the new extracted characteristic information is effectively fused by using a Softmax through an update door, reuse characteristics are attenuated by using a Sigmoid function through a forgetting door, and the accuracy of model detection is further improved.
Preferably, in the step S1, based on the MS COCO data set, acquiring the construction operation image related to the helmet by combining with the web crawler tool, and acquiring the operation video record on the operation site with pertinence according to the use environment, including: and (3) in operation sites such as chemical plants, construction sites and the like, video frames are intercepted, and image data collection is completed.
The web crawler tool is used for acquiring construction operation images related to the safety helmet, the collected picture data are more in quantity and comprehensive, and the data set is richer.
Preferably, the step S2 includes the following steps:
s21, marking the images collected in the step S1 by using a QT development interface Labelme in Python, marking the content of wearing the safety helmet as 1, marking the content of not wearing the safety helmet as 0, and outputting and storing the marking result as an xml format file;
s22, converting the annotation file into a YOLO data set tag file in txt format;
s23, according to the number of images in the data set and 8:1: the training set, the validation set and the test set are proportioned 1.
By using the method, a targeted, comprehensive and accurate safety helmet wearing detection data set can be established.
Preferably, in the step S3, the method includes the following steps:
s31, constructing a YOLOv5v6.0 network, wherein the YOLOv5v6.0 network comprises P1-P5, 8C 3 structures, an SPPF structure and two Upsample structures, wherein the first 4C 3 structures use BottleNeck1 Xn structures for feature extraction, and the last 4C 3 structures use BottleNeck2 Xn structures for feature extraction;
s32, replacing the BottleNeck1 Xn module with a hybrid connection module 1 which introduces a attention mechanism, adding a Dense connection structure between the input and the output of the hybrid connection module 1, and adding a 1X 1 convolution layer before final output, so that the transmission and protection capability of a model on characteristic information is enhanced, and the number of original channels is ensured to be unchanged;
s33, replacing the BottleNeck2 Xn module with a hybrid connection module 2, adding a dense connection structure between the input and the output of the hybrid connection module 1, continuing the original BottleNeck2 Xn structure characteristics of the hybrid connection module 2, and removing a jump connection structure between every two unit structures on the basis of the hybrid connection module 1;
s34, in the hybrid connection module 1 and the hybrid connection module 2, a compression excitation valve control structure is used for replacing the original ConvBNSILU structure;
s35, in a compression excitation valve control structure, a compression unit is formed by using 1 multiplied by 1 and 3 multiplied by 3 groups of convolutions; the method comprises the steps of forming an information activation unit by using parallel 3X 3 and 5X 5 depth separable convolutions, and connecting a gating structure update gate with an attention mechanism behind the depth separable convolutions layers of two different convolution kernels, wherein the gating structure update gate is used for carrying out information fusion on two output activation information of a parallel structure; and a forgetting gate is connected in parallel to the outside of the activation unit and used for filtering the repeatedly extracted characteristic information, and a residual connection (Residual connection) structure is embedded between the compression unit and the output of the forgetting gate, so that the mixed connection of the inside of all C3 structures can be realized.
The lengths obtained by the 3×3 convolution and the 5×5 convolution are similar, the depth separable convolution of the 3×3 convolution and the 5×5 convolution which are connected in parallel is the lightest combination, the speed performance is high, and the model occupation is small.
The method for replacing the original BottleNeck structure by the hybrid connection structure can accelerate the transmission of the characteristic information, reserve more characteristic information, realize the effective utilization of the characteristic information, ensure that the complexity of a model is not increased, and effectively improve the accuracy of a safety helmet wearing detection algorithm.
Preferably, the compressing the laser control unit in step S35 includes the following steps:
s351, inputting the characteristic images into a compression laser control unit, sequentially passing through a 1 multiplied by 1 convolution layer and a 3 multiplied by 3 convolution layers, wherein the two convolution layers can use a width multiplier or a step length to realize the downsampling operation of a model, so as to obtain the characteristic images with reduced characteristic dimensions, and achieve the purpose of reducing the complexity of the model;
s352, feature information activation and data filling are carried out by using a structure with 3 multiplied by 3 and 5 multiplied by 5 depth separable convolution parallel connection, so that more activated feature information can be obtained under the condition that two parallel output dimensions are kept the same, and more effective learning of information by a model is realized;
s353, connecting an update gate after the two depth separable convolution structures for effectively fusing the characteristic information of different branches, and respectively obtaining characteristic images after 3×3 and 5×5 depth separable convolution to enter the update gate;
s354, embedding a forgetting gate and a residual error connection parallel weighted aggregation structure between the two compression structures of S351 and the output of the updating gate, and adding the forgetting gate and the residual error connection parallel weighted aggregation structure with the output of the updating gate, so that the complex utilized characteristics can be screened and attenuated by using a channel attention mechanism and a Sigmoid function; the forget gate and the update gate have the same Softmax gating structure before the global attention pooling layer, and then the output of the forget gate is obtained through a first full connection layer, a batch standardization and activation function layer, a second full connection layer and a Sigmoid function layer respectively;
the Sigmoid function can control the channel weight between (0, 1), so that attenuation of repeated utilization characteristics is effectively realized.
S355, adding the corresponding positions of the final output characteristic diagrams of the update gate and the forget gate to obtain the final output of the compression excitation gate control unit.
Through the steps, efficient propagation of the characteristic information is realized.
Preferably, in the step S353, for the 3×3 branches, the input feature image reduces the channel dimension to 1 by a 1×1 convolution and a Softmax function, so as to implement the attention profile normalization, and obtain a final attention profile descriptor, where each element is expressed as:
in the method, in the process of the invention,the final attention characteristic diagram obtained after the normalization of the Softmax function is represented, and H, W and C respectively represent the length, the width and the channel number of the characteristic diagram; />Representing input feature map->Or (b)Attention feature map obtained through 1×1 convolution kernel;
secondly, image contraction is carried out through the global attention pooling layer respectively, a global context feature map is generated, and each element of the global context feature map can be expressed as:
in the method, in the process of the invention,representing a global context profile obtained by weighting the final profile through global attention pooling,/->A global context feature map element representing a c-th channel;
for 5×5 branches, global context feature maps can be obtained via the same parallel structure
Then, the global context feature map obtained by the two parallel branches is connected on the channel, and the global context feature map is converted into a hidden process feature map through a fully connected layer with a reduction coefficient:
h=W(z 3×3 ||z 5×5 )+b
in the method, in the process of the invention,μ represents a reduction coefficient; />And->Respectively representing the weight and the deviation value of the full connection layer;
then, the hidden process feature map is activated by batch normalization and nonlinear activation functions to obtain a hidden feature map, which can be expressed as:
h′=Tanh{BN[W(z 3×3 ||z 5×5 )+b]}
in the method, in the process of the invention,BN represents batch normalized BatchNorm, tanh represents Tanh () activation function;
this process can improve the efficiency of information fusion and propagation by setting the reduction factor.
Then, two parallel full-connection layers are used for acting on the fused hidden feature graphs to generate a channel attention feature graph, and channel feature normalization is realized through a Softmax function to generate a final channel attention feature graph, wherein the channel attention feature graph can be expressed as:
in the formula, the channel attention characteristic diagram,respectively representing the weight and the deviation value of the two full connection layers;
the final channel attention profile can be expressed as:
wherein u is 3×3 ,
Finally, the final channel attention feature graphs are respectively weighted and aggregated with the global context feature graphs of the corresponding branches, so that the final output of the update gate can be obtained:
preferably, in the step S354, the output of the forgetting gate is expressed as:
in the method, in the process of the invention,representing the input characteristic diagram->Global context feature graphs obtained through the same global attention pooling and other structures as the update gate; />Respectively representing the weight and the deviation value of the first full connection layer; />Respectively representing the weight and the deviation value of the second full connection layer; r is (r) f Representing bottleneck coefficient>A final channel attention feature map representing forget gate output;
preferably, in the step S4, the method includes the following steps:
s41, training a safety helmet wearing detection network based on the hybrid connection improved YOLOv5 in the step S3 based on the safety helmet wearing detection data set established in the step S2, wherein an Adam or SGD optimizer can be used, a training period (epoch) is 300, and different batch sizes (batch_size) are set to be respectively trained for 4, 8, 16, 32 and 64;
s42, testing training results based on weight files and test set data obtained by training in different batch sizes;
s43, fine-tuning the super parameters of the network on the verification set;
s44, storing optimal training parameters and weight files thereof, drawing a confusion matrix, an F1 curve, a P curve, an R curve and a P-R curve, calculating a training set prediction frame position loss train/box_loss, a target loss train/obj_loss, a confidence coefficient loss train/cls_loss, a verification set prediction frame position loss val/box_loss, a target loss val/obj_loss, a confidence coefficient loss val/cls_loss, a Precision, a Recall ratio Recall, an average correct ratio mean mAP@0.5 and mAP@0.5:0.95, and drawing a function image.
Preferably, in the step S5, the method includes the following steps:
s51, adding a main interface, and setting parameters such as a window main class, a picture video reading process and thread, model and interface initialization;
s52, respectively setting a picture, a video and a camera real-time data stream reading interface;
s53, set events such as picture, video, etc. on, off and reset events.
Preferably, in the step S6, the alarm prompt is implemented through a Playsound database carried by Python, and when the model detects that the person does not wear the safety helmet, a real-time alarm prompt is sent to remind the person that the person does not wear the safety helmet, so that the working strength of the safety manager is reduced.
Compared with the prior art, the invention has the beneficial effects that:
1. according to the method, a hybrid connection structure is added in the YOLOv5 network, and the transmission speed and the utilization efficiency of characteristic information are accelerated and the model is protected from excessive redundancy through a short circuit connection structure of local residual connection and macro dense connection;
2. in the method, in a YOLOv5 network, a compression excitation valve control unit is embedded into a hybrid connection module, and the built compression excitation valve control unit adopts structures such as group convolution, depth separable convolution and the like, so that the parameter and the calculated amount of a model are saved under the condition of ensuring the performance, and the light weight effect of the model is maintained;
3. in the improved YOLOv5 network, the update gate uses Softmax to realize effective fusion of newly extracted feature information, the forget gate uses Sigmoid function to realize attenuation of reuse features, and both introduce a light channel attention mechanism, so that a model can learn more feature information, the accuracy of model detection is improved, the complexity of the model is not too high, the memory occupation ratio of the model is not too large, and the detection speed is not reduced;
4. the method is an intelligent helmet wearing automatic detection method which can be used in any construction operation environment, is beneficial to improving the detection efficiency of the helmet wearing condition, reduces the working intensity of safety management staff, has wide popularization significance and development research value, and provides a new view for realizing and implementing an intelligent construction site.
Drawings
Fig. 1 is a flowchart of a method for detecting the wearing of a helmet based on the hybrid connection improved YOLOv5 of the present embodiment 1;
fig. 2 is a block diagram of a hybrid connection module 1 for replacing the BottleNeck1×n of the present embodiment 1;
fig. 3 is a block diagram of a hybrid connection module 2 for replacing the BottleNeck2×n of the present embodiment 1;
fig. 4 is a structural diagram of a compression-excited gate control (SEG) structure of the present embodiment 1;
fig. 5 is a structural diagram of the update door of embodiment 1;
fig. 6 is a structural diagram of a forgetting door of the present embodiment 1;
fig. 7 is a view showing the effect of the helmet wearing detection of embodiment 1;
FIG. 8 is a graph showing the comparison of the experimental results of the method of the present invention in example 1 with other prior art methods.
Detailed Description
The present invention will be described in further detail with reference to test examples and specific embodiments. It should not be construed that the scope of the above subject matter of the present invention is limited to the following embodiments, and all techniques realized based on the present invention are within the scope of the present invention.
Example 1
The invention provides a method for detecting wearing of a safety helmet based on hybrid connection improved YOLOv5, which is shown in figure 1 and comprises the following steps:
s1: collecting a targeted field operation image;
s2: the collected images are subjected to data annotation, format conversion of annotation files is completed, and a targeted safety helmet wearing detection data set is established;
s3: constructing a YOLOv5v6.0 network, improving a network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improvement YOLOv 5;
s4: inputting the data set established in the step 2 into the improved YOLOv5 network model established in the step S3 for training, obtaining training parameters, and storing the obtained weight file;
s5: completing model deployment and system interface design, and performing functional test by using a test sample head video stream, wherein the test sample comprises an image, a video and a camera video stream;
s6: if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out, and if the personnel do not wear the safety helmet, the step S4 is returned to.
In the step S1, based on the MS COCO data set, a related construction operation image of the helmet is obtained in combination with a web crawler tool, and an operation video recording is obtained on an operation site such as a chemical plant, a construction site and the like in a targeted manner according to the use environment, and effective video frames are intercepted to complete image data collection.
The implementation process for constructing the safety helmet wearing detection data set in the step S2 specifically comprises the following steps:
s21: marking the images collected in the step S1 by using a QT development interface Labelme in Python, marking the content of wearing the safety helmet as 1, marking the content of not wearing the safety helmet as 0, and outputting and storing the marking result as an xml format file;
s22: converting the markup file into a YOLO dataset tag file (txt format);
s23: according to the number of images in the dataset, according to 8:1: and 1, dividing a training set, a verification set and a test set in proportion, and finally establishing a targeted safety helmet wearing detection data set.
The step S3 of constructing the safety helmet wearing detection network model based on the hybrid connection improved YOLOv5 comprises the following steps:
step S31: constructing a YOLOv5v6.0 network, which mainly comprises P1-P5, 8C 3 structures, an SPPF structure and two Upsamples, wherein the first 4C 3 structures use BottleNeck1 Xn structures for feature extraction, and the last 4C 3 structures use BottleNeck2 Xn structures for feature extraction;
step S32: as shown in fig. 2, the BottleNeck1×n module in the network is replaced by a hybrid connection module 1 with attention mechanism introduced, a Dense connection (Dense) structure is added between the input and the output of the hybrid module 1, and a 1×1 convolution layer is added before final output, so that the transmission and storage capacity of the model to the feature information is enhanced, and the original channel number is ensured to be unchanged;
step S33: as shown in fig. 3, the BottleNeck2×n module is replaced by a hybrid connection module 2, and the structure of the hybrid connection module 2 is similar to that of the hybrid connection module 1, and dense connection structures are added, except that the hybrid module 2 continues the original BottleNeck2×n structure characteristics, and the jump connection structure between every two unit structures is removed on the basis of the hybrid module 1;
step S34: in the hybrid connection module 1 and the hybrid connection module 2, a compression excitation valve control (SEG) structure shown in fig. 4 is used to replace the original ConvBNSiLU structure;
step S35: in the compression excitation valve control structure, a compression unit is formed by using 1×1 convolution and 3×3 group convolution; an information activation unit is formed by using parallel 3×3 and 5×5 depth separable convolutions, and a gating structure update gate with an attention mechanism shown in fig. 5 is connected behind the depth separable convolutions layers of two different convolution kernels and is used for carrying out information fusion on two output activation information of a parallel structure; the outside of the activation unit is connected with a forgetting gate as shown in fig. 6 in parallel for filtering the repeatedly extracted characteristic information, and a residual connection (Residual connection) structure is embedded between the compression unit and the output of the forgetting gate, so that the mixed connection inside all the C3 structures is realized.
The network training in step S4 specifically includes the following steps:
step S41: training the safety helmet wearing detection network based on the hybrid connection improved YOLOv5 in the step S3 based on the safety helmet wearing detection data set established in the step S2, using an Adam or SGD optimizer, setting a training period (epoch) to 300, and setting different batch sizes (batch_size) to be 4, 8, 16, 32 and 64 for training respectively;
step S42: testing training results based on weight files and test set data obtained by training in different batches;
step S43: the hyper-parameters of the network are fine-tuned on the validation set.
Step S44: the optimal training parameters and weight files thereof are saved, an confusion matrix, an F1 curve, a P curve, an R curve and a P-R curve are drawn, the position loss train/box_loss, the target loss train/obj_loss, the confidence loss train/cls_loss of a training set prediction frame, the position loss val/box_loss of a verification set prediction frame, the target loss val/obj_loss, the confidence loss val/cls_loss, the Precision, the Recall rate Recall, the average accuracy average mAP@0.5 and mAP@0.5:0.95 are calculated, and a function image is drawn.
The step S5 specifically includes the following steps:
step S51: adding a main interface, setting parameters such as a window main class, a picture video reading process and a thread, a model and interface initialization and the like;
step S52: respectively setting a picture, a video and a camera real-time data stream reading interface;
step S53: setting events such as pictures, videos, etc. turn on, off and reset events.
The practical application effect of the embodiment of the invention is shown in fig. 7.
The alarm prompt sending operation in the step S6 is realized through a Python self-contained Playsound database, and when the model detects that the person does not wear the safety helmet, a real-time alarm prompt is sent.
The implementation mechanism of adding the hybrid connection structure in the YOLOv5 network is as follows:
firstly, building a mixed connection module 1 and a mixed connection module 2 of local residual connection and external dense connection phase aggregation are respectively shown in fig. 2 and 3, and respectively replacing BottleNeck1 multiplied by n and BottleNeck2 multiplied by n structures in YOLOv5 with two structures; secondly, embedding a compression excitation valve control unit shown in fig. 4 into the hybrid connection module, compressing the characteristic image by utilizing standard convolution and group convolution in the compression excitation valve control unit, activating information by adopting depth separable convolution with different scales, and fusing information by introducing a gating structure of an attention mechanism; finally, an update gate that facilitates the transfer of feature information and a forget gate that attenuates repeated feature information as shown in fig. 5 are constructed as shown in fig. 7. The method for replacing the original BottleNeck structure by the hybrid connection structure can accelerate the transmission of the characteristic information, reserve more characteristic information, realize the effective utilization of the characteristic information, ensure that the complexity of a model is not increased, and effectively improve the accuracy of a safety helmet wearing detection algorithm.
Further, in the compression excitation gating unit, efficient propagation of feature information is realized by adopting methods of group convolution, depth separable convolution, attention combined gating mechanism, residual error connection and the like, and the method specifically comprises the following steps:
s351, inputting the feature map into a compression laser control unit, sequentially passing through a 1X 1 convolution layer and a 3X 3 group convolution layer, wherein both the two convolution layers can use a width multiplier or a step length to realize the downsampling operation of the model, so as to obtain a feature image with reduced feature dimension, and achieve the purpose of reducing the complexity of the model;
s352, feature information activation and data filling are carried out by using a structure with 3 multiplied by 3 and 5 multiplied by 5 depth separable convolution parallel connection, so that more activated feature information can be obtained under the condition that two parallel output dimensions are kept the same, and more effective learning of information by a model is realized;
s353, connecting an update gate after the two depth separable convolution structures to realize effective fusion of the characteristic information of different branches, and respectively obtaining characteristic images after 3×3 and 5×5 depth separable convolution to enter the update gate. This process can be seen as being accomplished by the following five steps:
(1) for the 3×3 branch, the input feature image reduces the channel dimension to 1 through a 1×1 convolution and a Softmax function, so as to normalize the attention feature image, obtain a final attention feature descriptor, and each element in the final attention feature descriptor can be expressed by the following formula:
wherein,the final attention characteristic diagram obtained after the normalization of the Softmax function is represented, and H, W and C respectively represent the length, the width and the channel number of the characteristic diagram; />Representing input feature map->Or (b)The attention profile obtained through the 1×1 convolution kernel, i, j represent the number of rows and columns of the profile, respectively.
(2) Image contraction is performed through the global attention pooling layer respectively, a global context feature map is generated, and each element of the global context feature map can be expressed as:
wherein,representing a global context profile obtained by weighting the final profile through global attention pooling,/->A global context feature map element representing the c-th channel.
Similarly, another global context feature map can be obtained through the same parallel structure
(3) And (3) carrying out channel connection on the global context feature graphs obtained by the two parallel branches, and converting the global context feature graphs into a hidden process feature graph through a full connection layer with a reduction coefficient:
h=W(z 3×3 ||z 5×5 )+b
wherein,μ represents a reduction coefficient; />And->Respectively representing the weight and the deviation value of the full connection layer;
the hidden process feature map is activated by batch normalization and nonlinear activation functions to obtain a hidden feature map, which can be expressed as:
h′=Tanh{BN[W(z 3×3 ||z 5×5 )+b]}
wherein,BN represents batch normalized BatchNorm, tanh represents Tanh () activation function;
this process can improve the efficiency of information fusion and propagation by setting the reduction factor.
(4) And generating a channel attention characteristic diagram by using two parallel full-connection layers to act on the fused hidden characteristic diagram, and then realizing channel characteristic normalization by using a Softmax function to generate a final channel attention characteristic diagram.
The channel attention profile can be expressed as:
wherein, the channel attention feature mapRespectively representing the weights and deviation values of the two fully connected layers.
The final channel attention profile can be expressed as:
wherein u is 3×3 ,
(5) And respectively carrying out weighted aggregation on the final channel attention feature map and the global context feature map of the corresponding branch, so as to obtain the final output of the update gate:
wherein,
s354, embedding a forgetting gate and a residual connection parallel weighted aggregation structure between the two compression structures in the step 1) and the output of the updating gate, and adding the forgetting gate and the residual connection parallel weighted aggregation structure with the output of the updating gate. Doing so may use the channel attention mechanism and Sigmoid function to filter out the re-used features. The forget gate and the update gate have the same Softmax gating structure before the global attention pooling layer, and then the output of the forget gate is obtained through the first full connection layer, the batch standardization and activation function layer, the second full connection layer and the Sigmoid function layer respectively. Similar to the implementation mechanism of the update gate, the output of the forget gate can be expressed as follows:
wherein,representing the input characteristic diagram->Global context feature graphs obtained through the same global attention pooling and other structures as the update gate; />Respectively representing the weight and the deviation value of the first full connection layer; />Respectively representing the weight and the deviation value of the second full connection layer; r is (r) f Representing bottleneck coefficient>And a final channel attention characteristic diagram representing the forgetting gate output.
The Sigmoid function can control the channel weight between (0, 1), so that attenuation of repeated utilization characteristics is effectively realized.
S355, adding the corresponding positions of the final output characteristic diagrams of the update gate and the forget gate to obtain the final output of the compression excitation gate control unit.
In this embodiment, the above several important parameters are finally set as follows: batch size (Batch size) of 16, training period (epochs) of 300, learning rate (Learning rate) of 0.1, 3×3 convolutions of groups in seg block g=4, width multiplier α=4, reduction ratio parameters (Reduction) r of update gates and forgetting gates u =r f Step s=1.
Finally, the method and other methods are trained based on the data set used in the embodiment, and the obtained results are compared, so that the method can be seen to improve the detection precision of the model and prove the effectiveness of the method under the condition that the detection performance is not lost.
As shown in fig. 8, based on the experimental result of the dataset of this embodiment, it can be seen that the method of the present invention has the highest detection precision, the least occupied memory of the weight file, and the shorter time for detecting the same picture. As can be seen from the detection effect shown in FIG. 8, the method of the invention realizes the computer automatic detection of the wearing of the safety helmet, and is beneficial to the further research and application of the relevant direction.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (10)

1. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 is characterized by comprising the following steps of:
s1, collecting an on-site operation image;
s2, carrying out data annotation on the collected images, completing format conversion of annotation files, and establishing a safety helmet wearing detection data set;
s3, constructing a YOLOv5v6.0 network, improving a network structure, and constructing a safety helmet wearing detection network model based on the hybrid connection improved YOLOv 5;
s4, inputting the data set established in the S2 into the improved YOLOv5 network model established in the S3 for training, obtaining training parameters, and storing the obtained weight files;
s5, completing model deployment and system interface design, and performing functional test by using a test sample, wherein the method comprises the following steps: image, video, camera video stream;
and S6, if the fact that personnel in the sample wear the safety helmet is detected, an alarm prompt is sent out, and if the personnel do not wear the safety helmet, the step S4 is returned to.
2. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein in the step S1, based on an MS COCO data set, a construction operation image related to the safety helmet is acquired in combination with a web crawler tool, an operation video record is acquired on an operation site in a targeted manner according to a use environment, a video frame is intercepted, and image data collection is completed.
3. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S2 comprises the following steps:
s21, marking the images collected in the step S1 by using a QT development interface Labelme in Python, marking the content of wearing the safety helmet as 1, marking the content of not wearing the safety helmet as 0, and outputting and storing the marking result as an xml format file;
s22, converting the annotation file into a YOLO data set tag file in txt format;
s23, dividing the training set, the verification set and the test set according to the number of images in the data set in proportion, and finally establishing a targeted safety helmet wearing detection data set.
4. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S3 comprises the following steps:
s31, constructing a YOLOv5v6.0 network, wherein the YOLOv5v6.0 network comprises P1-P5, 8C 3 structures, an SPPF structure and two Upsample structures, wherein the first 4C 3 structures use BottleNeck1 Xn structures for feature extraction, and the last 4C 3 structures use BottleNeck2 Xn structures for feature extraction;
s32, replacing the BottleNeck1 Xn module with a hybrid connection module 1 which introduces an attention mechanism, adding a dense connection structure between the input and the output of the hybrid connection module 1, and adding a 1X 1 convolution layer before final output;
s33, replacing the BottleNeck2 Xn module with a hybrid connection module 2, adding a dense connection structure between the input and the output of the hybrid connection module 1, continuing the original BottleNeck2 Xn structure characteristics of the hybrid connection module 2, and removing a jump connection structure between every two unit structures on the basis of the hybrid connection module 1;
s34, in the hybrid connection module 1 and the hybrid connection module 2, a compression excitation valve control structure is used for replacing the original ConvBNSILU structure;
s35, in a compression excitation valve control structure, a compression unit is formed by using 1 multiplied by 1 and 3 multiplied by 3 groups of convolutions; the method comprises the steps of forming an information activation unit by using parallel 3X 3 and 5X 5 depth separable convolutions, and connecting a gating structure update gate with an attention mechanism behind the depth separable convolutions layers of two different convolution kernels, wherein the gating structure update gate is used for carrying out information fusion on two output activation information of a parallel structure; and a forgetting gate is connected in parallel to the outside of the activation unit and used for filtering the repeatedly extracted characteristic information, and a residual error connection structure is embedded between the compression unit and the output of the forgetting gate.
5. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to any one of claims 1 or 4, wherein the step S35 of compressing the laser control unit realizes efficient propagation of characteristic information, and the method comprises the following steps:
s351, inputting the characteristic image into a compression laser control unit, sequentially passing through a 1X 1 convolution layer and a 3X 3 group convolution layer, and obtaining the characteristic image with the characteristic dimension reduced through downsampling operation;
s352, performing characteristic information activation and data filling by using a structure with 3×3 and 5×5 depth separable convolutions connected in parallel;
s353, connecting an update gate after the two depth separable convolution structures for effectively fusing the characteristic information of different branches, and respectively obtaining characteristic images after 3×3 and 5×5 depth separable convolution to enter the update gate;
s354, embedding a structure of forgetting gate and residual connection parallel weighted aggregation between the two compression structures of S351 and the output of the update gate, and adding the structure with the output of the update gate; then the output of the forgetting gate is obtained through the first full-connection layer, the batch standardization and activation function layer, the second full-connection layer and the Sigmoid function layer respectively;
s355, adding the corresponding positions of the final output characteristic diagrams of the update gate and the forget gate to obtain the final output of the compression excitation gate control unit.
6. The method for detecting the wearing of a helmet based on the hybrid connection improved YOLOv5 of claim 5, wherein in the step S353, for the 3×3 branch, the input feature image reduces the channel dimension to 1 by a 1×1 convolution and a Softmax function, so as to normalize the attention profile, and obtain a final attention profile descriptor, and each element is expressed as:
in the method, in the process of the invention,the final attention characteristic diagram obtained after the normalization of the Softmax function is represented, and H, W and C respectively represent the length, the width and the channel number of the characteristic diagram; x is E [1, H],y∈[1,W];/>Representing input feature graphsOr->Attention feature map obtained through 1×1 convolution kernel;
secondly, image contraction is carried out through the global attention pooling layer respectively, a global context feature map is generated, and each element of the global context feature map can be expressed as:
in the method, in the process of the invention,representing a global context profile obtained by weighting the final profile through global attention pooling,/->A global context feature map element representing a c-th channel;
for 5×5 branches, global context feature maps can be obtained via the same parallel structureThen, the global context feature map obtained by the two parallel branches is connected on the channel, and the global context feature map is converted into a hidden process feature map through a fully connected layer with a reduction coefficient:
h=W(z 3×3 ||z 5×5 )+b
in the method, in the process of the invention,μ represents a reduction coefficient; />And->Respectively representing the weight and the deviation value of the full connection layer;
then, the hidden process feature map is activated by batch normalization and nonlinear activation functions to obtain a hidden feature map, which can be expressed as:
h′=Tanh{BN[W(z 3×3 ||z 5×5 )+b]}
in the method, in the process of the invention,BN represents batch normalized BatchNorm, tanh represents Tanh () activation function;
then, two parallel full-connection layers are used for acting on the fused hidden feature graphs to generate a channel attention feature graph, and channel feature normalization is realized through a Softmax function to generate a final channel attention feature graph, wherein the channel attention feature graph can be expressed as:
in the channel attention profileRespectively representing the weight and the deviation value of the two full connection layers;
the final channel attention profile can be expressed as:
wherein u is 3×3
Finally, the final channel attention feature graphs are respectively weighted and aggregated with the global context feature graphs of the corresponding branches, so that the final output of the update gate can be obtained:
7. the method for detecting the wearing of a helmet based on the hybrid connection improved YOLOv5 of claim 1, wherein in the step S354, the output of the forgetting gate is expressed as:
f=σ{W 2 f [Tanh(BN(W 1 f z f +b 1 f ))]+b 2 f }
in the method, in the process of the invention,representing the input characteristic diagram->Global context feature graphs obtained through the global attention pooling structure which is the same as the update gate; />Respectively representing the weight and the deviation value of the first full connection layer; /> Respectively representing the weight and the deviation value of the second full connection layer; r is (r) f Representing bottleneck coefficient>A final channel attention feature map representing forget gate output;
8. the method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S4 comprises the following steps:
s41, training a safety helmet wearing detection network based on the hybrid connection improved YOLOv5 in the step S3 based on the safety helmet wearing detection data set established in the step S2, using an optimizer, setting a training period (epoch) to be 300, and setting different batch sizes (batch_size) to be 4, 8, 16, 32 and 64 for training respectively;
s42, testing training results based on weight files and test set data obtained by training in different batch sizes;
s43, fine-tuning the super parameters of the network on the verification set;
s44, storing optimal training parameters and weight files thereof, drawing a confusion matrix, an F1 curve, a P curve, an R curve and a P-R curve, calculating the position loss, the target loss and the confidence coefficient loss of a training set prediction frame, calculating the position loss, the target loss, the confidence coefficient loss, the precision rate, the recall rate and the average accuracy rate average mAP@0.5 and mAP@0.5:0.95 of a verification set prediction frame, and drawing a function image.
9. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein the step S5 comprises the following steps:
s51, adding a main interface, and setting a window main class, a picture video reading process and a thread, and model and interface initialization parameters;
s52, respectively setting a picture, a video and a camera real-time data stream reading interface;
s53, setting pictures, turning on and off video events and resetting the events.
10. The method for detecting the wearing of the safety helmet based on the hybrid connection improved YOLOv5 according to claim 1, wherein in the step S6, the alarm prompt is realized through a Playsound database carried by Python, and when the model detects that the person does not wear the safety helmet, the real-time alarm prompt is sent out.
CN202310785410.0A 2023-06-29 2023-06-29 A helmet wearing detection method based on hybrid connection improved YOLOv5 Pending CN117036846A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310785410.0A CN117036846A (en) 2023-06-29 2023-06-29 A helmet wearing detection method based on hybrid connection improved YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310785410.0A CN117036846A (en) 2023-06-29 2023-06-29 A helmet wearing detection method based on hybrid connection improved YOLOv5

Publications (1)

Publication Number Publication Date
CN117036846A true CN117036846A (en) 2023-11-10

Family

ID=88636106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310785410.0A Pending CN117036846A (en) 2023-06-29 2023-06-29 A helmet wearing detection method based on hybrid connection improved YOLOv5

Country Status (1)

Country Link
CN (1) CN117036846A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445706A (en) * 2022-01-25 2022-05-06 南京工程学院 Power transmission line target detection and identification method based on feature fusion
CN115082922A (en) * 2022-08-24 2022-09-20 济南瑞泉电子有限公司 Water meter digital picture processing method and system based on deep learning
CN115423743A (en) * 2022-07-21 2022-12-02 韶关学院 A detection method for hot-rolled strip surface defects based on improved YOLOv5
CN115861758A (en) * 2022-12-26 2023-03-28 西安邮电大学 A target detection method and device based on a dual-mode fusion network
US20230106961A1 (en) * 2021-10-04 2023-04-06 Motive Technologies, Inc. Camera initialization for lane detection and distance estimation using single-view geometry

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230106961A1 (en) * 2021-10-04 2023-04-06 Motive Technologies, Inc. Camera initialization for lane detection and distance estimation using single-view geometry
CN114445706A (en) * 2022-01-25 2022-05-06 南京工程学院 Power transmission line target detection and identification method based on feature fusion
CN115423743A (en) * 2022-07-21 2022-12-02 韶关学院 A detection method for hot-rolled strip surface defects based on improved YOLOv5
CN115082922A (en) * 2022-08-24 2022-09-20 济南瑞泉电子有限公司 Water meter digital picture processing method and system based on deep learning
CN115861758A (en) * 2022-12-26 2023-03-28 西安邮电大学 A target detection method and device based on a dual-mode fusion network

Similar Documents

Publication Publication Date Title
CN119848786B (en) A dam safety monitoring system and method based on digital twin
CN116579616B (en) A risk identification method based on deep learning
CN111738044A (en) A school violence assessment method based on deep learning behavior recognition
CN111242351A (en) A Tropical Cyclone Track Prediction Method Based on Autoencoder and GRU Neural Network
CN109948501A (en) A detection method for personnel and safety helmets in surveillance video
CN109145743A (en) A kind of image-recognizing method and device based on deep learning
CN118657685B (en) Remote sensing image generation method for comprehensive monitoring of forestry resources
CN116522751A (en) Satellite retrieval precipitation correction method, device and system based on two-layer machine learning framework
CN118627407A (en) Waterlogging model creation method and system for urban water management
CN120146576A (en) Risk monitoring and early warning method and system for deep foundation pit engineering based on Internet of Things
CN116205144A (en) A Method of Landslide Susceptibility Evaluation Based on Transformer Model
CN119049267A (en) Traffic event active handling decision method and system based on neural network selection model
CN118094230A (en) Method for constructing underground water level monitoring and early warning model based on big data
CN116385930A (en) Abnormal Behavior Detection Method Based on Hyperparameter Optimization Time Difference Network
Muntifering et al. Population parameters, performance and insights into factors influencing the reproduction of the black rhinoceros Diceros bicornis in Namibia
CN117036846A (en) A helmet wearing detection method based on hybrid connection improved YOLOv5
CN120372486A (en) Intelligent construction site safety evaluation method and system based on data elements
CN120411694A (en) A method and system for early warning of instability and failure of loaded coal and rock masses based on feature fusion
CN119990745A (en) Basin hydrological assessment method and system under climate change
CN119378357A (en) A runoff prediction method based on multi-source data fusion
Chen et al. Eliminating thermal responses of stay cables by partially missing sensors toward more reliable and robust SHM
CN116822403A (en) A stable state analysis method and system based on superpixel convolutional network
CN120086699B (en) Smart construction site safety management and control method and system based on multi-source data analysis
Xue et al. A deep learning methodology for rapid identification of slab damage in concrete face rockfill dams
CN120354315B (en) Photovoltaic power generation abnormality detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination