CN120086699B - Smart construction site safety management and control method and system based on multi-source data analysis - Google Patents
Smart construction site safety management and control method and system based on multi-source data analysisInfo
- Publication number
- CN120086699B CN120086699B CN202510580593.1A CN202510580593A CN120086699B CN 120086699 B CN120086699 B CN 120086699B CN 202510580593 A CN202510580593 A CN 202510580593A CN 120086699 B CN120086699 B CN 120086699B
- Authority
- CN
- China
- Prior art keywords
- features
- text
- matrix
- feature
- branch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/08—Construction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
- G06Q50/265—Personal security, identity or safety
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Image Analysis (AREA)
Abstract
The invention is suitable for the field of intelligent construction sites, and provides an intelligent construction site safety control method and system based on multi-source data analysis, wherein in the control method, a BERT model is adopted to respectively extract features of a site information text and a weather information text so as to obtain text features; the method comprises the steps of adopting an improved SlowFast model to extract visual features of image information, constructing a cross-modal relation matrix between text and the visual features, weighting and fusing the text features and the visual features, taking the fused features as input of an MLP network to obtain preliminary behavior categories of workers in the image, comparing the preliminary behavior category results with site features and weather features, adjusting the preliminary behavior categories according to a preset correction rule to obtain final behavior categories, and carrying out safety control on behaviors of the workers in the construction site based on the final behavior categories. The control method can realize comprehensive, accurate and real-time monitoring and management of the behaviors of workers on the construction site by acquiring and fusing the multisource information and combining the safety control measures.
Description
Technical Field
The invention belongs to the technical field of intelligent building site management and control, and particularly relates to an intelligent building site safety management and control method and system based on multi-source data analysis.
Background
The internal environment of the construction site is complex and changeable, and construction workers need to operate according to relevant regulations in the construction site, and meanwhile, the safety state of the area where the construction workers are located needs to be paid attention to.
At present, the safety management of the construction site can reduce the probability of accident occurrence to a certain extent, however, the traditional construction site safety monitoring means have a plurality of defects, manual inspection is difficult to cover the whole area, personnel quality and fatigue are easily affected, and the fixed safety notification and warning effect is limited. The monitoring to construction workers is incomplete and discontinuous, security supervision loopholes exist, potential safety hazards cannot be found and early warned in time, and the accuracy and reliability of site security monitoring are reduced.
Disclosure of Invention
The embodiment of the invention aims to provide an intelligent building site safety control method and system based on multi-source data analysis, which aim to solve the technical problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions.
The embodiment of the invention provides an intelligent building site safety control method based on multi-source data analysis, which comprises the following steps:
S1, acquiring multi-source information of a construction site, wherein the multi-source information comprises site information, weather information and image information;
s2, respectively carrying out text preprocessing on the site information and the weather information, respectively carrying out feature extraction on the preprocessed site information text and the preprocessed weather information text by adopting a BERT model to obtain site features and weather features, and carrying out splicing processing on the site features and the weather features to obtain text features;
S3, extracting visual features of image information by adopting an improved SlowFast model, wherein the improved SlowFast model comprises a fast branch and a slow branch, a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;
s4, constructing a cross-modal relation matrix between the text and the visual features by calculating the Pearson correlation coefficient, and weighting and fusing the text features and the visual features according to the cross-modal relation matrix to obtain final fused features;
S5, taking the fusion features as the input of an MLP network, wherein multi-stage layered fusion residual errors are introduced into the MLP network, features are extracted through residual error MLP blocks of a plurality of stages, shallow layer features and deep layer features are fused in each stage, and the fusion features are mapped to corresponding behavior categories to obtain preliminary behavior categories of workers in the images;
S6, comparing the preliminary behavior category result with site characteristics and weather characteristics, and adjusting the preliminary behavior category according to a preset correction rule to obtain a final behavior category;
s7, safety control is conducted on the behaviors of the workers on the construction site based on the final behavior categories.
Further, in step S3, the step of introducing a dual-layer attention module in the fast branch includes:
s311, linearly changing the input characteristics to generate a query matrix, a key matrix and a value matrix in an attention mechanism;
S312, constructing a directed graph, establishing attention relations among different areas, calculating average values of query matrixes and key matrixes in each area, generating an area query matrix and an area key matrix, generating an adjacency matrix by dot product area query matrixes and area key matrixes, and measuring correlation among different areas by the adjacency matrix;
s313, pruning is carried out on the adjacent matrixes, wherein the pruning comprises the first k adjacent matrixes of the high-correlation area, and a route index matrix is obtained;
s314, aggregating key matrix and value matrix tensors of all routing areas based on the attention mechanisms focused on k routing areas to generate an aggregated key matrix and value matrix;
s315, performing attention operation on the aggregated key matrix and the value matrix, and introducing a local context enhancement item LEC to derive a result tensor so as to obtain the output characteristics of the fast branch.
In step S3, the feature enhancement module is of a multi-branch structure, and the output feature graphs of the branches are spliced, and the spliced feature graphs and the input feature graphs are added through residual connection, and output features of the slow branches are obtained through the ReLU activation function output.
Further, in step S3, in the step of adjusting weights of the fast branch and the slow branch based on the dynamic weight mechanism, a weight calculation formula is expressed as:;
Wherein, the The characteristic sequence representing the slow branch is subjected to global average pooling; The feature sequences representing the fast branches are subjected to global average pooling, conv1, conv2 and conv3 represent convolution operation, sigma represents a sigmoid activation function for limiting an output result to be in a range of 0 to 1, and the output result is used for representing the size of the weight.
Further, in step S3, the step of performing feature fusion on the outputs of the two branches according to the adjusted weights to obtain visual features includes:
the output characteristics of the fast branch and the output characteristics of the slow branch are respectively recorded as And, wherein,Representing the output characteristics of the slow-leg,Representing the output characteristics of the fast branch;
global average pooling is carried out on the characteristic sequences of the slow branch and the fast branch respectively to obtain pooling results And;
And respectively carrying out convolution operation and difference on the two pooling results, and calculating the motion characteristic difference F of the fast and slow branches, wherein the motion characteristic difference F is expressed as:;
Wherein conv1 and conv2 both represent convolution operations, σ represents an activation function;
Convolving the motion characteristic difference F and generating characteristic weights by adopting sigmoid activation function Expressed as:;
wherein conv3 represents a convolution operation and σ represents a sigmoid activation function;
the characteristic weight is related to the characteristic of the slow branch Performing point multiplication operation to generate enhanced feature images, wherein,;
Feature map to be enhancedAnd fusing the characteristics of the fast branch as the subsequent input of the slow branch, so as to realize the characteristic fusion of the fast and slow branches and obtain visual characteristics.
Further, in step S4, the pearson correlation coefficientExpressed as:;
In the formula, Is thatAndIs a covariance of (2); Is that Standard deviation of (2); Is that Standard deviation of (2); Is that Is used for the average value of (a),Is thatE represents the expected value.
Further, in step S4, the step of constructing a cross-modal relation matrix between the text and the visual feature includes:
respectively substituting the text features and the visual features into the variables of the Pearson correlation coefficient, and constructing a relation fusion matrix to obtain a relation fusion matrix of the text features and the visual features , wherein,The values of (a) are the number of text features and visual features, and the dimension of the matrix isMatrix of (a)Reflecting the interrelationship between text features and visual features;
Substituting the visual features and the text features into the variables of the pearson correlation coefficients, constructing a relation fusion matrix, and obtaining the relation matrix of the visual features and the text features by using the pearson correlation coefficients 。
Further, in step S4, the step of weighting and fusing the text feature and the visual feature according to the cross-modal relation matrix to obtain a final fused feature includes:
In the process of obtaining the relation matrix And (3) withAnd then weighting the visual features and the text features respectively, wherein:
Through text-to-vision weighted relation matrix To weight visual features, expressed as:, representing visual features generated by weighting the text-to-visual relationship matrix;
vision to text weighted relation matrix To weight text features, expressed as:, Representing text features generated by weighting the text by a visual-to-text weighting relationship matrix;
And Respectively enhanced text features and visual features;
Finally, the enhanced text features and visual features are fused together by a weighted average method, and are expressed as follows: Wherein, the method comprises the steps of, In order to finally fuse the feature vectors,AndIs a hyper-parameter that controls how much text features and visual features contribute in the final fusion feature.
Further, in step S5, introducing a multi-stage hierarchical fusion residual in the MLP network, including:
Dividing the MLP network into a plurality of phases, each phase comprising a number of residual MLP blocks, the structure of each residual MLP block being expressed as: ;
Wherein, the Represent the firstOutput of layers, MLP (·) represents a multi-layer perceptron;
At the end of each phase, the output features of the current phase are fused with the output features of the previous phase, expressed as: ;
Wherein, the Representing the fusion characteristics of the s-th stage, wherein concat (·, ·) represents characteristic splicing operation;
jump connection is added between different stages of the network, and shallow layer characteristics and deep layer characteristics are fused, which are expressed as follows: ;
Wherein, the Representing the output of the j-th hop connection,The representation transforms the feature;
the fusion features of the last stage are input to the classification layer, expressed as: ;
where O represents the final classification output and W and b represent the weight and bias of the classification layer, respectively.
Another embodiment of the present invention provides an intelligent worksite safety control system based on multi-source data analysis, the control system comprising the following modules:
The data acquisition module is used for acquiring multi-source information of the construction site, wherein the multi-source information comprises site information, weather information and image information;
The text feature extraction module is used for respectively preprocessing the text of the field information and the weather information, respectively extracting features of the preprocessed field information text and the preprocessed weather information text by adopting the BERT model to obtain field features and weather features, and performing splicing processing on the field features and the weather features to obtain text features;
the improved SlowFast model comprises a fast branch and a slow branch, a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;
The feature fusion module is used for constructing a cross-modal relation matrix between the text and the visual features by calculating the Pearson correlation coefficient, and weighting and fusing the text features and the visual features according to the cross-modal relation matrix to obtain final fusion features;
The behavior classification module is used for taking the fusion characteristics as the input of the MLP network, wherein multi-stage layered fusion residual errors are introduced into the MLP network, characteristics are extracted through residual error MLP blocks of a plurality of stages, shallow layer characteristics and deep layer characteristics are fused in each stage, and the fusion characteristics are mapped to corresponding behavior categories to obtain preliminary behavior categories of workers in the image;
The behavior modification module is used for comparing the preliminary behavior category result with the site characteristics and the weather characteristics, and adjusting the preliminary behavior category according to a preset modification rule to obtain a final behavior category;
and the behavior control module is used for safely controlling the behaviors of the construction site workers based on the final behavior category.
Compared with the prior art, the intelligent building site safety control method and system based on multi-source data analysis have the beneficial effects that:
Firstly, the invention can more comprehensively understand the actual situation of a construction site by fusing site information, weather information and image information, adopts a BERT model to extract text characteristics, so that the influence of the site and weather can be considered in the subsequent behavior recognition, combines an improved SlowFast model, can better capture motion information in an image sequence by introducing a double-layer attention module into a fast branch, improves the recognition precision of the behavior of a worker, and can enhance the extraction capability of key characteristics of safety equipment for the worker by introducing a characteristic enhancement module into a slow branch;
Secondly, the cross-modal relation matrix is constructed by calculating the pearson correlation coefficient, so that the correlation between text features and visual features can be quantized, a basis is provided for cross-modal feature fusion, the text features and the visual features are fused by weighting according to the relation matrix, the deep fusion of multi-modal information can be realized, the site and weather information are fully utilized to assist behavior recognition, and the accuracy and the robustness of the behavior recognition are improved;
Thirdly, the fusion characteristics are used as the input of the MLP network, the improved MLP network is utilized, the multi-stage layered fusion residual MLP is utilized, the adaptability of the network model to different characteristic modes is enhanced, the preliminary classification of the behaviors of workers is realized, the preliminary behavior classification result is compared with the site characteristics and the weather characteristics, and the accuracy of behavior recognition can be further improved by adjusting according to a preset correction rule. The site features and the weather features provide contextual information of behavior occurrence, which is helpful for verifying and correcting the preliminary classification result and reducing misjudgment;
In summary, the control method of the invention can realize comprehensive, accurate and real-time monitoring and management of the actions of workers on the construction site by acquiring and fusing the multisource information and combining the safety control measures.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a flow chart of an implementation of the intelligent site safety control method based on multi-source data analysis of the present invention;
FIG. 2 is a sub-flowchart of the intelligent site security management and control method based on multi-source data analysis of the present invention;
FIG. 3 is another sub-flowchart of the intelligent worksite security management and control method based on multi-source data analysis of the present invention;
FIG. 4 is a further sub-flowchart of the intelligent worksite security management and control method based on multi-source data analysis of the present invention;
FIG. 5 is a block diagram of an intelligent worksite safety control system based on multi-source data analysis according to the present invention;
Fig. 6 is a block diagram of a computer device according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Specific implementations of the invention are described in detail below in connection with specific embodiments.
Referring to fig. 1, in one embodiment of the present invention, an intelligent site security control method based on multi-source data analysis is provided, the control method includes the following steps:
S1, acquiring multi-source information of a construction site, wherein the multi-source information comprises site information, weather information and image information;
The invention can more comprehensively understand the actual situation of the construction site by fusing the site information, the weather information and the image information, wherein the site information mainly comprises the layout of the construction site, and comprises the specific positions and the specific shapes of buildings, construction areas, material stacking areas, channels and the like. The functions of different areas, such as a high-altitude operation area, a foundation pit operation area, a welding operation area, a common construction area and the like, can be clearly divided through the site information, and the safety level and the special requirements of each area are marked at the same time; the collection of the weather information is realized by two main ways, namely, firstly, a high-precision weather forecast API is called to acquire weather forecast data of a region where a construction site is located within the future hours to days, wherein the weather forecast data comprises weather elements such as temperature, humidity, rainfall, wind power, wind direction, air pressure and the like, and secondly, reliable weather monitoring equipment such as a weather station is installed on the construction site to monitor the actual weather condition of the construction site in real time.
Furthermore, the image information is a key data source for behavior recognition, high-definition and high-frame-rate cameras are installed in each key area and key parts of the construction site, so that reasonable layout of the cameras is ensured, main operation areas, channels and dangerous areas of the construction site can be covered comprehensively, and monitoring blind areas are avoided.
S2, respectively carrying out text preprocessing on the site information and the weather information, respectively carrying out feature extraction on the preprocessed site information text and the preprocessed weather information text by adopting a BERT model to obtain site features and weather features, and carrying out splicing processing on the site features and the weather features to obtain text features;
The method comprises the steps of extracting text features by adopting a BERT model, enabling the influence of places and weather to be considered in subsequent behavior recognition, firstly, respectively preprocessing the places and the weather to ensure the normalization and consistency of data, removing noise, unified text formats, word segmentation and the like in the text in the preprocessing process, then respectively extracting features of the preprocessed places and the weather, capturing rich semantic information and context relations by adopting the BERT model, specifically, respectively inputting the preprocessed places and the weather into the pre-trained BERT model, encoding each word or sentence by the BERT model according to a multi-layer bidirectional transducer structure in the BERT model, generating corresponding feature vectors, thereby obtaining the places and the weather, and finally, splicing the places and the weather to form a comprehensive text feature vector.
S3, extracting visual features of image information by adopting an improved SlowFast model, wherein the improved SlowFast model comprises a fast branch and a slow branch, a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;
The invention combines the improved SlowFast model, can better capture the motion information in the image sequence and improve the recognition accuracy of the worker behavior by introducing a double-layer attention module into the fast branch, and can enhance the extraction capability of the worker for using the key features of the safety equipment by introducing a feature enhancement module into the slow branch;
s4, constructing a cross-modal relation matrix between the text and the visual features by calculating the Pearson correlation coefficient, and weighting and fusing the text features and the visual features according to the cross-modal relation matrix to obtain final fused features;
According to the invention, the cross-modal relation matrix is constructed by calculating the Pearson correlation coefficient, so that the correlation between text features and visual features can be quantized, a basis is provided for cross-modal feature fusion, the text features and the visual features are weighted and fused according to the relation matrix, the deep fusion of multi-modal information can be realized, the site and weather information are fully utilized to assist behavior recognition, and the accuracy and the robustness of the behavior recognition are improved;
S5, taking the fusion features as the input of an MLP network, wherein multi-stage layered fusion residual errors are introduced into the MLP network, features are extracted through residual error MLP blocks of a plurality of stages, shallow layer features and deep layer features are fused in each stage, and the fusion features are mapped to corresponding behavior categories to obtain preliminary behavior categories of workers in the images;
S6, comparing the preliminary behavior category result with site characteristics and weather characteristics, and adjusting the preliminary behavior category according to a preset correction rule to obtain a final behavior category;
s7, safety control is carried out on the behaviors of the workers on the construction site based on the final behavior category;
According to the invention, the fusion characteristics are used as the input of the MLP network, and the improved MLP network is utilized to utilize multi-stage layered fusion residual MLP to realize the preliminary classification of the behaviors of workers through characteristic fusion and residual connection, so that the adaptability of the network model to different characteristic modes is enhanced;
Further, the preliminary behavior category results are compared with the site characteristics and the weather characteristics, and are adjusted according to the preset correction rules, so that the accuracy of behavior identification can be further improved;
The site features and the weather features provide contextual information of behavior occurrence, which is helpful for verifying and correcting the preliminary classification result and reducing erroneous judgment.
In the embodiment of the invention, the position of the worker is combined with the site information, so that the specific area where the worker is currently located can be determined when the worker is subjected to behavior control. For example, whether the worker is in an overhead working area, near a foundation pit, in a tower crane operation area, or in a general construction passage, etc. Different areas correspond to different types of possible behaviors and different safety requirements, which provide key context information for space perception and behavior positioning, and in addition, the behaviors of workers in certain areas can be reasonably expected according to the functional division of places. For example, in a material deposit area, it is normal for workers to take the action of handling material, but if the workers take welding or cutting or the like in that area, there may be some safety hazard.
Therefore, in the embodiment of the invention, after the site information is fused with the multi-source information such as the weather information, the image information and the like, a more comprehensive context can be provided. For example, in an aerial work area with strong wind weather, the worker may have physical inclination, etc. and combine the site information (aerial work area) and weather information (strong wind), it may be more accurately judged that this is a normal reaction in which the worker tries to keep balance in the strong wind environment, rather than a dangerous behavior against rule. The multisource information fusion enables behavior recognition to fully consider site factors, so that recognition accuracy is improved.
In the step S6 of the invention, the preliminary behavior category result is compared with the site characteristics and the weather characteristics, and the behavior category result is revised based on the comparison result, wherein the fact that the site is a complex environment is considered, various uncertainties exist, weather can be changed suddenly, the behavior mode of a worker can be changed greatly, such as the walking speed is slow, the operation action can be cautious, and the like, so that the subsequent behavior correction step of the invention can further correct the behavior recognition result by considering more factors such as the trend, the change speed and the like of the weather change.
Referring to fig. 2, in step S3 of the embodiment of the disclosure, the step of introducing a dual-layer attention module in the fast branch includes:
s311, linearly changing the input characteristics to generate a query matrix, a key matrix and a value matrix in an attention mechanism;
S312, constructing a directed graph, establishing attention relations among different areas, calculating average values of query matrixes and key matrixes in each area, generating an area query matrix and an area key matrix, generating an adjacency matrix by dot product area query matrixes and area key matrixes, and measuring correlation among different areas by the adjacency matrix;
s313, pruning is carried out on the adjacent matrixes, wherein the pruning comprises the first k adjacent matrixes of the high-correlation area, and a route index matrix is obtained;
s314, aggregating key matrix and value matrix tensors of all routing areas based on the attention mechanisms focused on k routing areas to generate an aggregated key matrix and value matrix;
s315, performing attention operation on the aggregated key matrix and the value matrix, and introducing a local context enhancement item LEC to derive a result tensor so as to obtain the output characteristics of the fast branch.
Further, in step S3, the feature enhancement module is a multi-branch structure, and splices the output feature graphs of the multiple branches, and the spliced feature graphs and the input feature graphs are added through residual connection, and output through a ReLU activation function to obtain the output feature of the slow branch, and in one implementation manner of the multi-branch structure, three branch structures are adopted, and specifically include:
the first branch with two convolution layers, one convolution layer uses a convolution kernel of 1 multiplied by 1, the stride is stride, the channel number of the input feature map is adjusted to be 2 multiplied by inter_ planes, the other convolution layer uses a convolution kernel of 3 multiplied by 3, the stride is 1, the filling is 1, and the first branch is used for extracting local features, and can increase the channel number of the features and extract richer local features under the condition that the size of the feature map is not changed;
A second branch having four convolution layers, one using a1×1 convolution kernel with a stride of 1, adjusting the number of channels of the input feature map to inter_ planes, another using a1×3 convolution kernel with a stride of stride, filling (0, 1) for expanding the receptive field in the height direction, a last using a 3×1 convolution kernel with a stride of stride, (1, 0) for expanding the receptive field in the width direction, and a fourth using a 3×3 convolution kernel with a stride of 1, filling 5, a void ratio of 5 for further extracting void convolution features, expanding the receptive field and capturing more context information;
The third branch with three convolution layers, one convolution layer uses a 1X 1 convolution kernel, the stride is stride, the channel number of the input feature graph is adjusted to be 2X inter_ planes, the other convolution layer uses a 3X 1 convolution kernel, the stride is stride, the filling is (1, 0), the last convolution layer uses a 1X 3 convolution kernel, the stride is stride, the filling is (0, 1), the feature is extracted from different convolution kernel dimension sequences, and the feature diversity is enriched;
and finally, adding the channel number with the input feature map through residual connection, and outputting through a ReLU activation function to obtain the output feature of the slow branch.
In the embodiment of the present invention, in the step of adjusting the weights of the fast branch and the slow branch based on the dynamic weight mechanism provided in step S3, the weight calculation formula is expressed as follows: Wherein, the method comprises the steps of, The characteristic sequence representing the slow branch is subjected to global average pooling; The feature sequences representing the fast branches are subjected to global average pooling, conv1, conv2 and conv3 represent convolution operation, sigma represents a sigmoid activation function for limiting an output result to be in a range of 0 to 1, and the output result is used for representing the size of the weight.
As shown in fig. 3, in step S3, the step of performing feature fusion on the outputs of the two branches according to the adjusted weights to obtain visual features includes:
s321, acquiring the output characteristics of the fast branch and the slow branch, which are respectively recorded as And, wherein,Representing the output characteristics of the slow-leg,Representing the output characteristics of the fast branch;
S323, carrying out global average pooling on the characteristic sequences of the slow branch and the fast branch respectively to obtain pooling results And;
S323, respectively carrying out convolution operation and difference on the two pooling results, and calculating the motion characteristic difference F of the fast and slow branches, wherein the motion characteristic difference F is expressed as: wherein conv1 and conv2 each represent a convolution operation and σ represents an activation function;
s324, performing convolution operation on the motion characteristic difference F and generating characteristic weights by adopting sigmoid activation function Expressed as: wherein conv3 represents a convolution operation and σ represents a sigmoid activation function;
s325, combining the characteristic weight with the characteristic of the slow branch Performing point multiplication operation to generate enhanced feature images, wherein,;
S326, feature map to be enhancedAnd fusing the characteristics of the fast branch as the subsequent input of the slow branch, so as to realize the characteristic fusion of the fast and slow branches and obtain visual characteristics.
Further, in step S4, the pearson correlation coefficientExpressed as:;
In the formula, Is thatAndIs a covariance of (2); Is that Standard deviation of (2); Is that Standard deviation of (2); Is that Is used for the average value of (a),Is thatE represents the expected value.
Further, in step S4, the step of constructing a cross-modal relation matrix between the text and the visual features comprises the steps of substituting the text features and the visual features into the variables of the Pearson correlation coefficient respectively, constructing a relation fusion matrix, and obtaining the relation fusion matrix of the text features and the visual features, wherein,The values of (a) are the number of text features and visual features, and the dimension of the matrix isMatrix of (a)Substituting the visual features and the text features into the variables of the pearson correlation coefficients, constructing a relationship fusion matrix, and obtaining the relationship matrix of the visual features and the text features by using the pearson correlation coefficients。
In step S4 of the invention, the step of weighting and fusing text features and visual features according to the cross-modal relation matrix to obtain final fused features comprises the steps of obtaining the relation matrixSum-of-relations matrixThen weighting the visual characteristic and the text characteristic respectively, wherein the visual characteristic and the text characteristic are weighted by a text-to-visual weighted relation matrixTo weight visual features, expressed as:, representing visual features generated by weighting a text-to-visual relationship matrix To weight text features, expressed as:, representing text features generated by weighting the text by a visual-to-text weighting relationship matrix, wherein, AndAnd finally, fusing the enhanced text features and the visual features together by a weighted average method, wherein the enhanced text features and the visual features are expressed as follows: Wherein, the method comprises the steps of, In order to finally fuse the feature vectors,AndIs a hyper-parameter that controls how much text features and visual features contribute in the final fusion feature.
Further, referring to fig. 4, in step S5, a multi-stage hierarchical fusion residual is introduced into the MLP network, including:
S411, dividing the MLP network into a plurality of stages, wherein each stage comprises a plurality of residual MLP blocks, and the structure of each residual MLP block is expressed as follows: Wherein, the method comprises the steps of, Represent the firstOutput of layers, MLP (·) represents a multi-layer perceptron;
S412, at the end of each stage, fusing the output characteristics of the current stage with the output characteristics of the previous stage, and representing as: Wherein, the method comprises the steps of, Representing the fusion characteristics of the s-th stage, wherein concat (·, ·) represents characteristic splicing operation;
s413, adding jump connection between different stages of the network, and fusing shallow layer characteristics and deep layer characteristics, wherein the jump connection is expressed as follows: Wherein, the method comprises the steps of, Representing the output of the j-th hop connection,The representation transforms the feature;
s414, inputting the fusion characteristic of the last stage into a classification layer, wherein the fusion characteristic is expressed as: Wherein O represents the final classification output and W and b represent the weight and bias of the classification layer, respectively.
In summary, the control method of the invention can realize comprehensive, accurate and real-time monitoring and management of the actions of workers on the construction site by acquiring and fusing the multisource information and combining the safety control measures.
Referring to fig. 5, in another embodiment of the present invention, an intelligent site safety control system based on multi-source data analysis is provided, the control system includes the following modules:
a data acquisition module 81, configured to acquire multi-source information of a worksite, where the multi-source information includes site information, weather information, and image information;
the text feature extraction module 82 is configured to perform text preprocessing on the site information and the weather information, perform feature extraction on the preprocessed site information text and weather information text by using a BERT model, obtain site features and weather features, and perform splicing processing on the site features and the weather features, so as to obtain text features;
The improved SlowFast model comprises a fast branch and a slow branch, wherein a double-layer attention module is introduced into the fast branch, a feature enhancement module is introduced into the slow branch, weights of the fast branch and the slow branch are adjusted based on a dynamic weight mechanism, and the outputs of the two branches are subjected to feature fusion according to the adjusted weights to obtain visual features;
the feature fusion module 84 is configured to construct a cross-modal relation matrix between the text and the visual feature by calculating the pearson correlation coefficient, and weight-fuse the text feature and the visual feature according to the cross-modal relation matrix to obtain a final fusion feature;
The behavior classification module 85 is configured to take the fusion feature as an input of an MLP network, wherein a multi-stage layered fusion residual is introduced into the MLP network, the feature is extracted through residual MLP blocks of multiple stages, shallow and deep features are fused in each stage, and the fusion feature is mapped to a corresponding behavior class to obtain a preliminary behavior class of a worker in the image;
the behavior modification module 86 is configured to compare the preliminary behavior category result with the site feature and the weather feature, and adjust the preliminary behavior category according to a predetermined modification rule to obtain a final behavior category;
The behavior control module 87 is configured to safely control the behavior of the construction site worker based on the final behavior category.
As shown in fig. 6, the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program which, when executed by a processor, causes the processor to implement a smart site security management method based on multi-source data analysis.
The internal memory may also store a computer program that, when executed by the processor, causes the processor to perform a smart worksite security management method based on multi-source data analysis.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer readable storage medium is provided, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the processor is caused to execute the intelligent building site safety management method based on multi-source data analysis provided in the above embodiment.
It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510580593.1A CN120086699B (en) | 2025-05-07 | 2025-05-07 | Smart construction site safety management and control method and system based on multi-source data analysis |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510580593.1A CN120086699B (en) | 2025-05-07 | 2025-05-07 | Smart construction site safety management and control method and system based on multi-source data analysis |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN120086699A CN120086699A (en) | 2025-06-03 |
| CN120086699B true CN120086699B (en) | 2025-07-25 |
Family
ID=95855795
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510580593.1A Active CN120086699B (en) | 2025-05-07 | 2025-05-07 | Smart construction site safety management and control method and system based on multi-source data analysis |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120086699B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116844236A (en) * | 2023-07-13 | 2023-10-03 | 重庆理工大学 | A behavior recognition method and system based on improved Slowfast |
| CN119919932A (en) * | 2025-04-03 | 2025-05-02 | 安徽农业大学 | Agricultural product classification method integrating dual-stream attention integration and cross-modal fusion |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112183313B (en) * | 2020-09-27 | 2022-03-11 | 武汉大学 | SlowFast-based power operation field action identification method |
| CN117542118A (en) * | 2023-11-24 | 2024-02-09 | 中国科学技术大学 | UAV aerial video action recognition method based on dynamic modeling of spatiotemporal information |
| CN119380106A (en) * | 2024-10-30 | 2025-01-28 | 电子科技大学(深圳)高等研究院 | Medical image analysis method and system based on residual MLP network with sparse attention mechanism |
| CN119474496B (en) * | 2024-11-08 | 2025-09-23 | 长安大学 | An intelligent traffic event recognition method based on large traffic model and cross-modal retrieval |
| CN119851348A (en) * | 2024-12-31 | 2025-04-18 | 西安理工大学 | Sports action recognition method based on 3D space-time attention and slowfast network |
-
2025
- 2025-05-07 CN CN202510580593.1A patent/CN120086699B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116844236A (en) * | 2023-07-13 | 2023-10-03 | 重庆理工大学 | A behavior recognition method and system based on improved Slowfast |
| CN119919932A (en) * | 2025-04-03 | 2025-05-02 | 安徽农业大学 | Agricultural product classification method integrating dual-stream attention integration and cross-modal fusion |
Non-Patent Citations (1)
| Title |
|---|
| 基于改进SlowFast算法的电梯乘客异常行为识别;王志恒 等;中国计量大学学报;20240915;第35卷(第3期);第407-413页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN120086699A (en) | 2025-06-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN119004322B (en) | A pipeline system fault diagnosis method and system based on hierarchical attention mechanism | |
| CN119251641B (en) | Method, system and equipment for predicting reliability of power transmission line based on SENet and EffNet | |
| CN109145743A (en) | A kind of image-recognizing method and device based on deep learning | |
| CN119646271B (en) | An emergency fire hazard detection method based on multimodal AI large model recognition technology | |
| CN119691419B (en) | Multi-scale extreme high wind event AI identification method, device and medium integrating physical constraints | |
| CN119272209A (en) | A foundation pit digital twin monitoring method, system and application thereof | |
| CN117851802A (en) | Water quality prediction method and device and computer readable storage medium | |
| CN119091307A (en) | Landslide hazard remote sensing detection method and system integrating spectral and terrain information | |
| Wang et al. | Multicategory fire damage detection of post‐fire reinforced concrete structural components | |
| CN117150383B (en) | A new energy vehicle power battery fault classification method based on ShuffleDarkNet37-SE | |
| CN120086699B (en) | Smart construction site safety management and control method and system based on multi-source data analysis | |
| CN119226805B (en) | Multi-mode data generalization learning method and system based on causal invariant transformation | |
| KR102784194B1 (en) | Method and electronic device for providing property prediction data of a composite based on artificial intelligence | |
| CN118865375B (en) | Cell state detection method, device and storage medium based on space-time feature fusion | |
| CN120542959A (en) | Emergency situation intelligent decision-making method, device and system based on deep learning | |
| CN118585772B (en) | Early warning methods and information release platforms applicable to emergencies | |
| CN120012974B (en) | A method, system, device and medium for predicting offshore wind power output | |
| CN119918716B (en) | ENSO long-term prediction methods, devices, and media based on multi-head spatiotemporal attention mechanisms | |
| CN119693620B (en) | Multi-scene fire detection method based on deep learning | |
| CN119152276B (en) | A local climate zone classification method based on multi-source data fusion | |
| Cheng et al. | The fusion strategy of multimodal learning in image and text recognition | |
| CN120747607A (en) | Image classification method, device, equipment and medium based on self-supervision attention | |
| CN119251809A (en) | An intelligent safety tool access detection method based on AI vision | |
| Ye | Attention-Based CNN-BiLSTM Model for La Niña Events | |
| CN117036846A (en) | A helmet wearing detection method based on hybrid connection improved YOLOv5 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |