CN120086699B

CN120086699B - Smart construction site safety management and control method and system based on multi-source data analysis

Info

Publication number: CN120086699B
Application number: CN202510580593.1A
Authority: CN
Inventors: 赵小永; 刘立广; 王鹏; 杨武顺; 宋伟青; 王硕; 宋奕霖
Original assignee: China Railway Construction Group Co Ltd; Xian Engineering Co Ltd of China Railway Construction Group Co Ltd
Current assignee: China Railway Construction Group Co Ltd; Xian Engineering Co Ltd of China Railway Construction Group Co Ltd
Priority date: 2025-05-07
Filing date: 2025-05-07
Publication date: 2025-07-25
Anticipated expiration: 2045-05-07
Also published as: CN120086699A

Abstract

The invention is suitable for the field of intelligent construction sites, and provides an intelligent construction site safety control method and system based on multi-source data analysis, wherein in the control method, a BERT model is adopted to respectively extract features of a site information text and a weather information text so as to obtain text features; the method comprises the steps of adopting an improved SlowFast model to extract visual features of image information, constructing a cross-modal relation matrix between text and the visual features, weighting and fusing the text features and the visual features, taking the fused features as input of an MLP network to obtain preliminary behavior categories of workers in the image, comparing the preliminary behavior category results with site features and weather features, adjusting the preliminary behavior categories according to a preset correction rule to obtain final behavior categories, and carrying out safety control on behaviors of the workers in the construction site based on the final behavior categories. The control method can realize comprehensive, accurate and real-time monitoring and management of the behaviors of workers on the construction site by acquiring and fusing the multisource information and combining the safety control measures.

Description

Intelligent building site safety control method and system based on multi-source data analysis

Technical Field

The invention belongs to the technical field of intelligent building site management and control, and particularly relates to an intelligent building site safety management and control method and system based on multi-source data analysis.

Background

The internal environment of the construction site is complex and changeable, and construction workers need to operate according to relevant regulations in the construction site, and meanwhile, the safety state of the area where the construction workers are located needs to be paid attention to.

At present, the safety management of the construction site can reduce the probability of accident occurrence to a certain extent, however, the traditional construction site safety monitoring means have a plurality of defects, manual inspection is difficult to cover the whole area, personnel quality and fatigue are easily affected, and the fixed safety notification and warning effect is limited. The monitoring to construction workers is incomplete and discontinuous, security supervision loopholes exist, potential safety hazards cannot be found and early warned in time, and the accuracy and reliability of site security monitoring are reduced.

Disclosure of Invention

The embodiment of the invention aims to provide an intelligent building site safety control method and system based on multi-source data analysis, which aim to solve the technical problems in the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions.

The embodiment of the invention provides an intelligent building site safety control method based on multi-source data analysis, which comprises the following steps:

S1, acquiring multi-source information of a construction site, wherein the multi-source information comprises site information, weather information and image information;

s2, respectively carrying out text preprocessing on the site information and the weather information, respectively carrying out feature extraction on the preprocessed site information text and the preprocessed weather information text by adopting a BERT model to obtain site features and weather features, and carrying out splicing processing on the site features and the weather features to obtain text features;

S3, extracting visual features of image information by adopting an improved SlowFast model, wherein the improved SlowFast model comprises a fast branch and a slow branch, a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;

s4, constructing a cross-modal relation matrix between the text and the visual features by calculating the Pearson correlation coefficient, and weighting and fusing the text features and the visual features according to the cross-modal relation matrix to obtain final fused features;

S5, taking the fusion features as the input of an MLP network, wherein multi-stage layered fusion residual errors are introduced into the MLP network, features are extracted through residual error MLP blocks of a plurality of stages, shallow layer features and deep layer features are fused in each stage, and the fusion features are mapped to corresponding behavior categories to obtain preliminary behavior categories of workers in the images;

S6, comparing the preliminary behavior category result with site characteristics and weather characteristics, and adjusting the preliminary behavior category according to a preset correction rule to obtain a final behavior category;

s7, safety control is conducted on the behaviors of the workers on the construction site based on the final behavior categories.

Further, in step S3, the step of introducing a dual-layer attention module in the fast branch includes:

s311, linearly changing the input characteristics to generate a query matrix, a key matrix and a value matrix in an attention mechanism;

S312, constructing a directed graph, establishing attention relations among different areas, calculating average values of query matrixes and key matrixes in each area, generating an area query matrix and an area key matrix, generating an adjacency matrix by dot product area query matrixes and area key matrixes, and measuring correlation among different areas by the adjacency matrix;

s313, pruning is carried out on the adjacent matrixes, wherein the pruning comprises the first k adjacent matrixes of the high-correlation area, and a route index matrix is obtained;

s314, aggregating key matrix and value matrix tensors of all routing areas based on the attention mechanisms focused on k routing areas to generate an aggregated key matrix and value matrix;

s315, performing attention operation on the aggregated key matrix and the value matrix, and introducing a local context enhancement item LEC to derive a result tensor so as to obtain the output characteristics of the fast branch.

In step S3, the feature enhancement module is of a multi-branch structure, and the output feature graphs of the branches are spliced, and the spliced feature graphs and the input feature graphs are added through residual connection, and output features of the slow branches are obtained through the ReLU activation function output.

Further, in step S3, in the step of adjusting weights of the fast branch and the slow branch based on the dynamic weight mechanism, a weight calculation formula is expressed as:;

Wherein, the The characteristic sequence representing the slow branch is subjected to global average pooling; The feature sequences representing the fast branches are subjected to global average pooling, conv1, conv2 and conv3 represent convolution operation, sigma represents a sigmoid activation function for limiting an output result to be in a range of 0 to 1, and the output result is used for representing the size of the weight.

Further, in step S3, the step of performing feature fusion on the outputs of the two branches according to the adjusted weights to obtain visual features includes:

the output characteristics of the fast branch and the output characteristics of the slow branch are respectively recorded as And, wherein,Representing the output characteristics of the slow-leg,Representing the output characteristics of the fast branch;

global average pooling is carried out on the characteristic sequences of the slow branch and the fast branch respectively to obtain pooling results And;

And respectively carrying out convolution operation and difference on the two pooling results, and calculating the motion characteristic difference F of the fast and slow branches, wherein the motion characteristic difference F is expressed as:;

Wherein conv1 and conv2 both represent convolution operations, σ represents an activation function;

Convolving the motion characteristic difference F and generating characteristic weights by adopting sigmoid activation function Expressed as:;

wherein conv3 represents a convolution operation and σ represents a sigmoid activation function;

the characteristic weight is related to the characteristic of the slow branch Performing point multiplication operation to generate enhanced feature images, wherein,;

Feature map to be enhancedAnd fusing the characteristics of the fast branch as the subsequent input of the slow branch, so as to realize the characteristic fusion of the fast and slow branches and obtain visual characteristics.

Further, in step S4, the pearson correlation coefficientExpressed as:;

In the formula, Is thatAndIs a covariance of (2); Is that Standard deviation of (2); Is that Standard deviation of (2); Is that Is used for the average value of (a),Is thatE represents the expected value.

Further, in step S4, the step of constructing a cross-modal relation matrix between the text and the visual feature includes:

respectively substituting the text features and the visual features into the variables of the Pearson correlation coefficient, and constructing a relation fusion matrix to obtain a relation fusion matrix of the text features and the visual features , wherein,The values of (a) are the number of text features and visual features, and the dimension of the matrix isMatrix of (a)Reflecting the interrelationship between text features and visual features;

Substituting the visual features and the text features into the variables of the pearson correlation coefficients, constructing a relation fusion matrix, and obtaining the relation matrix of the visual features and the text features by using the pearson correlation coefficients 。

Further, in step S4, the step of weighting and fusing the text feature and the visual feature according to the cross-modal relation matrix to obtain a final fused feature includes:

In the process of obtaining the relation matrix And (3) withAnd then weighting the visual features and the text features respectively, wherein:

Through text-to-vision weighted relation matrix To weight visual features, expressed as:, representing visual features generated by weighting the text-to-visual relationship matrix;

vision to text weighted relation matrix To weight text features, expressed as:, Representing text features generated by weighting the text by a visual-to-text weighting relationship matrix;

And Respectively enhanced text features and visual features;

Finally, the enhanced text features and visual features are fused together by a weighted average method, and are expressed as follows: Wherein, the method comprises the steps of, In order to finally fuse the feature vectors,AndIs a hyper-parameter that controls how much text features and visual features contribute in the final fusion feature.

Further, in step S5, introducing a multi-stage hierarchical fusion residual in the MLP network, including:

Dividing the MLP network into a plurality of phases, each phase comprising a number of residual MLP blocks, the structure of each residual MLP block being expressed as: ;

Wherein, the Represent the firstOutput of layers, MLP (·) represents a multi-layer perceptron;

At the end of each phase, the output features of the current phase are fused with the output features of the previous phase, expressed as: ;

Wherein, the Representing the fusion characteristics of the s-th stage, wherein concat (·, ·) represents characteristic splicing operation;

jump connection is added between different stages of the network, and shallow layer characteristics and deep layer characteristics are fused, which are expressed as follows: ;

Wherein, the Representing the output of the j-th hop connection,The representation transforms the feature;

the fusion features of the last stage are input to the classification layer, expressed as: ;

where O represents the final classification output and W and b represent the weight and bias of the classification layer, respectively.

Another embodiment of the present invention provides an intelligent worksite safety control system based on multi-source data analysis, the control system comprising the following modules:

The data acquisition module is used for acquiring multi-source information of the construction site, wherein the multi-source information comprises site information, weather information and image information;

The text feature extraction module is used for respectively preprocessing the text of the field information and the weather information, respectively extracting features of the preprocessed field information text and the preprocessed weather information text by adopting the BERT model to obtain field features and weather features, and performing splicing processing on the field features and the weather features to obtain text features;

the improved SlowFast model comprises a fast branch and a slow branch, a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;

The feature fusion module is used for constructing a cross-modal relation matrix between the text and the visual features by calculating the Pearson correlation coefficient, and weighting and fusing the text features and the visual features according to the cross-modal relation matrix to obtain final fusion features;

The behavior classification module is used for taking the fusion characteristics as the input of the MLP network, wherein multi-stage layered fusion residual errors are introduced into the MLP network, characteristics are extracted through residual error MLP blocks of a plurality of stages, shallow layer characteristics and deep layer characteristics are fused in each stage, and the fusion characteristics are mapped to corresponding behavior categories to obtain preliminary behavior categories of workers in the image;

The behavior modification module is used for comparing the preliminary behavior category result with the site characteristics and the weather characteristics, and adjusting the preliminary behavior category according to a preset modification rule to obtain a final behavior category;

and the behavior control module is used for safely controlling the behaviors of the construction site workers based on the final behavior category.

Compared with the prior art, the intelligent building site safety control method and system based on multi-source data analysis have the beneficial effects that:

Firstly, the invention can more comprehensively understand the actual situation of a construction site by fusing site information, weather information and image information, adopts a BERT model to extract text characteristics, so that the influence of the site and weather can be considered in the subsequent behavior recognition, combines an improved SlowFast model, can better capture motion information in an image sequence by introducing a double-layer attention module into a fast branch, improves the recognition precision of the behavior of a worker, and can enhance the extraction capability of key characteristics of safety equipment for the worker by introducing a characteristic enhancement module into a slow branch;

Secondly, the cross-modal relation matrix is constructed by calculating the pearson correlation coefficient, so that the correlation between text features and visual features can be quantized, a basis is provided for cross-modal feature fusion, the text features and the visual features are fused by weighting according to the relation matrix, the deep fusion of multi-modal information can be realized, the site and weather information are fully utilized to assist behavior recognition, and the accuracy and the robustness of the behavior recognition are improved;

Thirdly, the fusion characteristics are used as the input of the MLP network, the improved MLP network is utilized, the multi-stage layered fusion residual MLP is utilized, the adaptability of the network model to different characteristic modes is enhanced, the preliminary classification of the behaviors of workers is realized, the preliminary behavior classification result is compared with the site characteristics and the weather characteristics, and the accuracy of behavior recognition can be further improved by adjusting according to a preset correction rule. The site features and the weather features provide contextual information of behavior occurrence, which is helpful for verifying and correcting the preliminary classification result and reducing misjudgment;

In summary, the control method of the invention can realize comprehensive, accurate and real-time monitoring and management of the actions of workers on the construction site by acquiring and fusing the multisource information and combining the safety control measures.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.

FIG. 1 is a flow chart of an implementation of the intelligent site safety control method based on multi-source data analysis of the present invention;

FIG. 2 is a sub-flowchart of the intelligent site security management and control method based on multi-source data analysis of the present invention;

FIG. 3 is another sub-flowchart of the intelligent worksite security management and control method based on multi-source data analysis of the present invention;

FIG. 4 is a further sub-flowchart of the intelligent worksite security management and control method based on multi-source data analysis of the present invention;

FIG. 5 is a block diagram of an intelligent worksite safety control system based on multi-source data analysis according to the present invention;

Fig. 6 is a block diagram of a computer device according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Specific implementations of the invention are described in detail below in connection with specific embodiments.

Referring to fig. 1, in one embodiment of the present invention, an intelligent site security control method based on multi-source data analysis is provided, the control method includes the following steps:

The invention can more comprehensively understand the actual situation of the construction site by fusing the site information, the weather information and the image information, wherein the site information mainly comprises the layout of the construction site, and comprises the specific positions and the specific shapes of buildings, construction areas, material stacking areas, channels and the like. The functions of different areas, such as a high-altitude operation area, a foundation pit operation area, a welding operation area, a common construction area and the like, can be clearly divided through the site information, and the safety level and the special requirements of each area are marked at the same time; the collection of the weather information is realized by two main ways, namely, firstly, a high-precision weather forecast API is called to acquire weather forecast data of a region where a construction site is located within the future hours to days, wherein the weather forecast data comprises weather elements such as temperature, humidity, rainfall, wind power, wind direction, air pressure and the like, and secondly, reliable weather monitoring equipment such as a weather station is installed on the construction site to monitor the actual weather condition of the construction site in real time.

Furthermore, the image information is a key data source for behavior recognition, high-definition and high-frame-rate cameras are installed in each key area and key parts of the construction site, so that reasonable layout of the cameras is ensured, main operation areas, channels and dangerous areas of the construction site can be covered comprehensively, and monitoring blind areas are avoided.

The method comprises the steps of extracting text features by adopting a BERT model, enabling the influence of places and weather to be considered in subsequent behavior recognition, firstly, respectively preprocessing the places and the weather to ensure the normalization and consistency of data, removing noise, unified text formats, word segmentation and the like in the text in the preprocessing process, then respectively extracting features of the preprocessed places and the weather, capturing rich semantic information and context relations by adopting the BERT model, specifically, respectively inputting the preprocessed places and the weather into the pre-trained BERT model, encoding each word or sentence by the BERT model according to a multi-layer bidirectional transducer structure in the BERT model, generating corresponding feature vectors, thereby obtaining the places and the weather, and finally, splicing the places and the weather to form a comprehensive text feature vector.

The invention combines the improved SlowFast model, can better capture the motion information in the image sequence and improve the recognition accuracy of the worker behavior by introducing a double-layer attention module into the fast branch, and can enhance the extraction capability of the worker for using the key features of the safety equipment by introducing a feature enhancement module into the slow branch;

According to the invention, the cross-modal relation matrix is constructed by calculating the Pearson correlation coefficient, so that the correlation between text features and visual features can be quantized, a basis is provided for cross-modal feature fusion, the text features and the visual features are weighted and fused according to the relation matrix, the deep fusion of multi-modal information can be realized, the site and weather information are fully utilized to assist behavior recognition, and the accuracy and the robustness of the behavior recognition are improved;

s7, safety control is carried out on the behaviors of the workers on the construction site based on the final behavior category;

According to the invention, the fusion characteristics are used as the input of the MLP network, and the improved MLP network is utilized to utilize multi-stage layered fusion residual MLP to realize the preliminary classification of the behaviors of workers through characteristic fusion and residual connection, so that the adaptability of the network model to different characteristic modes is enhanced;

Further, the preliminary behavior category results are compared with the site characteristics and the weather characteristics, and are adjusted according to the preset correction rules, so that the accuracy of behavior identification can be further improved;

The site features and the weather features provide contextual information of behavior occurrence, which is helpful for verifying and correcting the preliminary classification result and reducing erroneous judgment.

In the embodiment of the invention, the position of the worker is combined with the site information, so that the specific area where the worker is currently located can be determined when the worker is subjected to behavior control. For example, whether the worker is in an overhead working area, near a foundation pit, in a tower crane operation area, or in a general construction passage, etc. Different areas correspond to different types of possible behaviors and different safety requirements, which provide key context information for space perception and behavior positioning, and in addition, the behaviors of workers in certain areas can be reasonably expected according to the functional division of places. For example, in a material deposit area, it is normal for workers to take the action of handling material, but if the workers take welding or cutting or the like in that area, there may be some safety hazard.

Therefore, in the embodiment of the invention, after the site information is fused with the multi-source information such as the weather information, the image information and the like, a more comprehensive context can be provided. For example, in an aerial work area with strong wind weather, the worker may have physical inclination, etc. and combine the site information (aerial work area) and weather information (strong wind), it may be more accurately judged that this is a normal reaction in which the worker tries to keep balance in the strong wind environment, rather than a dangerous behavior against rule. The multisource information fusion enables behavior recognition to fully consider site factors, so that recognition accuracy is improved.

In the step S6 of the invention, the preliminary behavior category result is compared with the site characteristics and the weather characteristics, and the behavior category result is revised based on the comparison result, wherein the fact that the site is a complex environment is considered, various uncertainties exist, weather can be changed suddenly, the behavior mode of a worker can be changed greatly, such as the walking speed is slow, the operation action can be cautious, and the like, so that the subsequent behavior correction step of the invention can further correct the behavior recognition result by considering more factors such as the trend, the change speed and the like of the weather change.

Referring to fig. 2, in step S3 of the embodiment of the disclosure, the step of introducing a dual-layer attention module in the fast branch includes:

Further, in step S3, the feature enhancement module is a multi-branch structure, and splices the output feature graphs of the multiple branches, and the spliced feature graphs and the input feature graphs are added through residual connection, and output through a ReLU activation function to obtain the output feature of the slow branch, and in one implementation manner of the multi-branch structure, three branch structures are adopted, and specifically include:

the first branch with two convolution layers, one convolution layer uses a convolution kernel of 1 multiplied by 1, the stride is stride, the channel number of the input feature map is adjusted to be 2 multiplied by inter_ planes, the other convolution layer uses a convolution kernel of 3 multiplied by 3, the stride is 1, the filling is 1, and the first branch is used for extracting local features, and can increase the channel number of the features and extract richer local features under the condition that the size of the feature map is not changed;

A second branch having four convolution layers, one using a1×1 convolution kernel with a stride of 1, adjusting the number of channels of the input feature map to inter_ planes, another using a1×3 convolution kernel with a stride of stride, filling (0, 1) for expanding the receptive field in the height direction, a last using a 3×1 convolution kernel with a stride of stride, (1, 0) for expanding the receptive field in the width direction, and a fourth using a 3×3 convolution kernel with a stride of 1, filling 5, a void ratio of 5 for further extracting void convolution features, expanding the receptive field and capturing more context information;

The third branch with three convolution layers, one convolution layer uses a 1X 1 convolution kernel, the stride is stride, the channel number of the input feature graph is adjusted to be 2X inter_ planes, the other convolution layer uses a 3X 1 convolution kernel, the stride is stride, the filling is (1, 0), the last convolution layer uses a 1X 3 convolution kernel, the stride is stride, the filling is (0, 1), the feature is extracted from different convolution kernel dimension sequences, and the feature diversity is enriched;

and finally, adding the channel number with the input feature map through residual connection, and outputting through a ReLU activation function to obtain the output feature of the slow branch.

In the embodiment of the present invention, in the step of adjusting the weights of the fast branch and the slow branch based on the dynamic weight mechanism provided in step S3, the weight calculation formula is expressed as follows: Wherein, the method comprises the steps of, The characteristic sequence representing the slow branch is subjected to global average pooling; The feature sequences representing the fast branches are subjected to global average pooling, conv1, conv2 and conv3 represent convolution operation, sigma represents a sigmoid activation function for limiting an output result to be in a range of 0 to 1, and the output result is used for representing the size of the weight.

As shown in fig. 3, in step S3, the step of performing feature fusion on the outputs of the two branches according to the adjusted weights to obtain visual features includes:

s321, acquiring the output characteristics of the fast branch and the slow branch, which are respectively recorded as And, wherein,Representing the output characteristics of the slow-leg,Representing the output characteristics of the fast branch;

S323, carrying out global average pooling on the characteristic sequences of the slow branch and the fast branch respectively to obtain pooling results And;

S323, respectively carrying out convolution operation and difference on the two pooling results, and calculating the motion characteristic difference F of the fast and slow branches, wherein the motion characteristic difference F is expressed as: wherein conv1 and conv2 each represent a convolution operation and σ represents an activation function;

s324, performing convolution operation on the motion characteristic difference F and generating characteristic weights by adopting sigmoid activation function Expressed as: wherein conv3 represents a convolution operation and σ represents a sigmoid activation function;

s325, combining the characteristic weight with the characteristic of the slow branch Performing point multiplication operation to generate enhanced feature images, wherein,;

S326, feature map to be enhancedAnd fusing the characteristics of the fast branch as the subsequent input of the slow branch, so as to realize the characteristic fusion of the fast and slow branches and obtain visual characteristics.

Further, in step S4, the pearson correlation coefficientExpressed as:;

Further, in step S4, the step of constructing a cross-modal relation matrix between the text and the visual features comprises the steps of substituting the text features and the visual features into the variables of the Pearson correlation coefficient respectively, constructing a relation fusion matrix, and obtaining the relation fusion matrix of the text features and the visual features, wherein,The values of (a) are the number of text features and visual features, and the dimension of the matrix isMatrix of (a)Substituting the visual features and the text features into the variables of the pearson correlation coefficients, constructing a relationship fusion matrix, and obtaining the relationship matrix of the visual features and the text features by using the pearson correlation coefficients。

In step S4 of the invention, the step of weighting and fusing text features and visual features according to the cross-modal relation matrix to obtain final fused features comprises the steps of obtaining the relation matrixSum-of-relations matrixThen weighting the visual characteristic and the text characteristic respectively, wherein the visual characteristic and the text characteristic are weighted by a text-to-visual weighted relation matrixTo weight visual features, expressed as:, representing visual features generated by weighting a text-to-visual relationship matrix To weight text features, expressed as:, representing text features generated by weighting the text by a visual-to-text weighting relationship matrix, wherein, AndAnd finally, fusing the enhanced text features and the visual features together by a weighted average method, wherein the enhanced text features and the visual features are expressed as follows: Wherein, the method comprises the steps of, In order to finally fuse the feature vectors,AndIs a hyper-parameter that controls how much text features and visual features contribute in the final fusion feature.

Further, referring to fig. 4, in step S5, a multi-stage hierarchical fusion residual is introduced into the MLP network, including:

S411, dividing the MLP network into a plurality of stages, wherein each stage comprises a plurality of residual MLP blocks, and the structure of each residual MLP block is expressed as follows: Wherein, the method comprises the steps of, Represent the firstOutput of layers, MLP (·) represents a multi-layer perceptron;

S412, at the end of each stage, fusing the output characteristics of the current stage with the output characteristics of the previous stage, and representing as: Wherein, the method comprises the steps of, Representing the fusion characteristics of the s-th stage, wherein concat (·, ·) represents characteristic splicing operation;

s413, adding jump connection between different stages of the network, and fusing shallow layer characteristics and deep layer characteristics, wherein the jump connection is expressed as follows: Wherein, the method comprises the steps of, Representing the output of the j-th hop connection,The representation transforms the feature;

s414, inputting the fusion characteristic of the last stage into a classification layer, wherein the fusion characteristic is expressed as: Wherein O represents the final classification output and W and b represent the weight and bias of the classification layer, respectively.

Referring to fig. 5, in another embodiment of the present invention, an intelligent site safety control system based on multi-source data analysis is provided, the control system includes the following modules:

a data acquisition module 81, configured to acquire multi-source information of a worksite, where the multi-source information includes site information, weather information, and image information;

the text feature extraction module 82 is configured to perform text preprocessing on the site information and the weather information, perform feature extraction on the preprocessed site information text and weather information text by using a BERT model, obtain site features and weather features, and perform splicing processing on the site features and the weather features, so as to obtain text features;

The improved SlowFast model comprises a fast branch and a slow branch, wherein a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;

the feature fusion module 84 is configured to construct a cross-modal relation matrix between the text and the visual feature by calculating the pearson correlation coefficient, and weight-fuse the text feature and the visual feature according to the cross-modal relation matrix to obtain a final fusion feature;

The behavior classification module 85 is configured to take the fusion feature as an input of an MLP network, wherein a multi-stage layered fusion residual is introduced into the MLP network, the feature is extracted through residual MLP blocks of multiple stages, shallow and deep features are fused in each stage, and the fusion feature is mapped to a corresponding behavior class to obtain a preliminary behavior class of a worker in the image;

the behavior modification module 86 is configured to compare the preliminary behavior category result with the site feature and the weather feature, and adjust the preliminary behavior category according to a predetermined modification rule to obtain a final behavior category;

The behavior control module 87 is configured to safely control the behavior of the construction site worker based on the final behavior category.

As shown in fig. 6, the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by a processor, causes the processor to implement a smart site security management method based on multi-source data analysis.

The internal memory may also store a computer program that, when executed by the processor, causes the processor to perform a smart worksite security management method based on multi-source data analysis.

It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the processor is caused to execute the intelligent building site safety management method based on multi-source data analysis provided in the above embodiment.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A smart construction site safety management and control method based on multi-source data analysis, characterized in that the management and control method comprises the following steps:

S1. Acquire multi-source information of the construction site, including site information, weather information and image information;

S2. Preprocess the venue information and weather information respectively, and use the BERT model to extract features from the preprocessed venue information text and weather information text respectively to obtain venue features and weather features, and concatenate the venue features and weather features to obtain text features;

S3. Use the improved SlowFast model to extract the visual features of image information; the improved SlowFast model includes a fast branch and a slow branch, a double-layer attention module is introduced in the fast branch, and a feature enhancement module is introduced in the slow branch. The feature enhancement module is a multi-branch structure, and the output feature maps of multiple branches are spliced; the spliced feature map is added to the input feature map through a residual connection, and the output feature of the slow branch is obtained through the ReLU activation function output, and the weights of the fast branch and the slow branch are adjusted based on the dynamic weight mechanism. The outputs of the two branches are feature fused according to the adjusted weights to obtain visual features;

S4. By calculating the Pearson correlation coefficient, a cross-modal relationship matrix between text and visual features is constructed, and the text features and visual features are weightedly fused according to the cross-modal relationship matrix to obtain the final fusion features;

S5. The fused features are used as the input of the MLP network, wherein a multi-stage hierarchical fusion residual is introduced into the MLP network, features are extracted through residual MLP blocks of multiple stages, shallow and deep features are fused at each stage, and the fused features are mapped to the corresponding behavior categories to obtain the preliminary behavior categories of the workers in the image;

In the step of introducing multi-stage hierarchical fusion residuals in the MLP network, the MLP network is divided into multiple stages, each stage contains a number of residual MLP blocks, at the end of each stage, the output features of the current stage are fused with the output features of the previous stage, skip connections are added between different stages of the network, shallow features and deep features are fused, and the fused features of the last stage are input into the classification layer;

S6, comparing the preliminary behavior category results with the site characteristics and weather characteristics, and adjusting the preliminary behavior category according to a predetermined correction rule to obtain a final behavior category;

S7. Based on the final behavior category, safety management and control of the behavior of construction site workers is carried out.

2. The smart construction site safety management and control method based on multi-source data analysis according to claim 1 is characterized in that, in step S3, the step of introducing a double-layer attention module in the fast branch includes:

S311, linearly transform the input features to generate a query matrix, a key matrix, and a value matrix in the attention mechanism;

S312, construct a directed graph and establish attention relationships between different regions, calculate the average values of the query matrix and the key matrix in each region, generate a region query matrix and a region key matrix, and generate an adjacency matrix by performing a dot product of the region query matrix and the region key matrix. The adjacency matrix is used to measure the correlation between different regions;

S313, pruning the adjacency matrix, including the first k high-correlation region adjacency matrices, to obtain a routing index matrix;

S314, based on the attention mechanism focused on the k routing areas, aggregating the key matrix and value matrix tensors of all routing areas to generate an aggregated key matrix and value matrix;

S315. Perform an attention operation on the aggregated key matrix and value matrix, introduce a local context enhancement term LEC to derive the result tensor, and obtain the output features of the fast branch.

3. The smart construction site safety management and control method based on multi-source data analysis according to claim 2 is characterized in that, in step S3, in the step of adjusting the weights of the fast branch and the slow branch based on the dynamic weight mechanism, the weight calculation formula is expressed as:

;

in, Indicates the result of global average pooling of the feature sequence of the slow branch; It represents the result of global average pooling of the feature sequence of the fast branch; conv1, conv2, and conv3 all represent convolution operations; σ represents the sigmoid activation function, which is used to limit the output result to the range of 0 to 1. Used to indicate the size of the weight.

4. The method for intelligent construction site safety management and control based on multi-source data analysis according to claim 3 is characterized in that, in step S3, the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features, comprising:

Get the output characteristics of the fast branch and the output characteristics of the slow branch, which are recorded as and ,in, represents the output characteristics of the slow branch, Indicates the output characteristics of the fast branch;

Perform global average pooling on the feature sequences of the slow branch and the fast branch respectively to obtain the pooling results and ;

The two pooling results are convolved and subtracted to calculate the motion feature difference F between the fast and slow branches, which is expressed as: ; Where conv1 and conv2 both represent convolution operations, and σ represents the activation function;

Perform convolution operation on the motion feature difference F and use sigmoid activation function to generate feature weights , expressed as: ; Where conv3 represents the convolution operation and σ represents the sigmoid activation function;

The feature weights are combined with the features of the slow branch Perform point multiplication to generate enhanced feature maps ,in, ;

The enhanced feature map The features of the fast branch are fused together as the subsequent input of the slow branch to achieve feature fusion of the fast and slow branches and obtain visual features.

5. The smart construction site safety management and control method based on multi-source data analysis according to claim 4 is characterized in that, in step S4, the Pearson correlation coefficient It is expressed as:

;

In the formula, yes and The covariance of yes The standard deviation of yes The standard deviation of yes The mean of yes E represents the expected value.

6. The method for intelligent construction site safety management and control based on multi-source data analysis according to claim 5 is characterized in that, in step S4, the step of constructing a cross-modal relationship matrix between text and visual features comprises:

Substitute the text features and visual features into the variables of the Pearson correlation coefficient respectively, construct the relationship fusion matrix, and obtain the relationship fusion matrix between text features and visual features ,in, The value of is the number of text features and visual features; the dimension of the matrix is ;matrix It reflects the relationship between text features and visual features;

Substitute the visual features and text features into the variables of the Pearson correlation coefficient to construct a relationship fusion matrix. Using the Pearson correlation coefficient, the relationship matrix between visual features and text features can be obtained. .

7. The method for intelligent construction site safety management and control based on multi-source data analysis according to claim 6 is characterized in that, in step S4, the step of obtaining the final fusion feature by weighted fusion of text features and visual features according to the cross-modal relationship matrix comprises:

In the relationship matrix and the relationship matrix After that, the visual features and text features are weighted respectively; where:

Weighted Relationship Matrix via Text-to-Vision To weight the visual features, it is expressed as: , Represents the visual features generated by weighting the text-to-visual relationship matrix;

Vision-to-text weighted relationship matrix To weight the text features, it is expressed as: , Represents the text features generated by weighting the visual-to-text weighted relationship matrix;

and They are text features and visual features respectively;

Finally, the weighted text features and visual features are fused together by weighted averaging, expressed as: ;in, is the final fused feature vector, and It is a hyperparameter that controls the contribution of text features and visual features in the final fusion features.

8. The smart construction site safety management and control method based on multi-source data analysis according to claim 7 is characterized in that, in step S5, a multi-stage hierarchical fusion residual is introduced into the MLP network, comprising:

The structure of each residual MLP block is expressed as:

;

in, Indicates The output of the layer, represents a multi-layer perceptron;

At the end of each stage, the output features of the current stage are fused with the output features of the previous stage, expressed as: ;in, represents the fusion features of the sth stage, Represents feature concatenation operation;

The skip connection is added between different stages of the network to fuse shallow features and deep features, which can be expressed as: ;in, represents the output of the j-th skip connection, Indicates the transformation of features;

The fusion features of the last stage are input into the classification layer, which is expressed as: ; Where O represents the final classification output, W and b represent the weight and bias of the classification layer respectively.

9. A control system for implementing the smart construction site safety control method based on multi-source data analysis as claimed in any one of claims 1 to 8, characterized in that the control system comprises the following modules:

A data acquisition module is used to acquire multi-source information of the construction site, including site information, weather information and image information;

A text feature extraction module is used to perform text preprocessing on the venue information and weather information respectively, and use the BERT model to extract features from the preprocessed venue information text and weather information text respectively to obtain venue features and weather features, and then perform splicing processing on the venue features and weather features to obtain text features;

A visual feature extraction module is used to extract visual features of image information using an improved SlowFast model; the improved SlowFast model includes a fast branch and a slow branch, a double-layer attention module is introduced in the fast branch, and a feature enhancement module is introduced in the slow branch. The weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are feature-fused according to the adjusted weights to obtain visual features;

The feature fusion module is used to construct a cross-modal relationship matrix between text and visual features by calculating the Pearson correlation coefficient, and to weight the text features and visual features according to the cross-modal relationship matrix to obtain the final fusion features;

The behavior classification module is used to use the fused features as the input of the MLP network. In the MLP network, multi-stage hierarchical fusion residuals are introduced to extract features through residual MLP blocks in multiple stages, and shallow and deep features are fused at each stage. The fused features are mapped to the corresponding behavior categories to obtain the preliminary behavior categories of the workers in the image;

A behavior correction module is used to compare the preliminary behavior category results with the site characteristics and weather characteristics, and adjust the preliminary behavior category according to a predetermined correction rule to obtain a final behavior category;

The behavior control module is used to conduct safety control on the behavior of construction site workers based on the final behavior category.