CN116503799A

CN116503799A - Contact net dropper defect detection method based on CNN and Transformer fusion

Info

Publication number: CN116503799A
Application number: CN202310414019.XA
Authority: CN
Inventors: 何进; 刘俊; 王伟; 罗德宁; 张葛祥
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-07-28
Anticipated expiration: 2043-04-18
Also published as: CN116503799B

Abstract

The invention discloses a catenary suspension string defect detection method based on the fusion of CNN and Transformer, and relates to the technical field of defect detection. The method collects the catenary suspension string image and performs image enhancement processing to obtain a suspension string defect sample set; The convolutional network constructs the convolutional module, and at the same time builds the self-attention module based on the improved high-efficiency multi-head self-attention mechanism, and deeply integrates the convolutional module and the self-attention module based on the optimal module allocation ratio to generate a multi-block cross hybrid network to Improve the FasterRCNN network. Based on the suspension string defect sample set, the improved network is trained and verified, the trained model is obtained and deployed in the suspension string detection equipment, the catenary suspension string image is captured in real time and input into the suspension string detection equipment for suspension string defect detection, Identify catenary dropper defects. The application can be applied to complex natural scene environments, and improves the accuracy and recall rate of suspension string defect identification under real complex natural conditions.

Description

Catenary suspension string defect detection method based on CNN and Transformer fusion

技术领域technical field

本发明涉及缺陷检测技术领域，尤其涉及一种基于CNN与transformer深度交叉融合的高速铁路接触网吊弦缺陷检测方法。The present invention relates to the technical field of defect detection, in particular to a method for detecting defects of suspension strings of high-speed railway catenary based on deep cross-fusion of CNN and transformer.

背景技术Background technique

吊弦是高铁接触网的关键部件，确保高铁动车组平稳持续受流，并且缓解接触线与承力索之间的震动。然而吊弦受温度、气候，高频振动等影响，吊弦频发松断脱落等问题，轻则影响受电弓取流，重则打坏受电弓或损坏接触线，导致列车故障，因此，对吊弦缺陷实时检测并预警确保接触网的安全性和可靠性，对于高速铁路的安全运营具有重要意义。The suspension string is a key component of the high-speed rail catenary, which ensures the smooth and continuous flow of the high-speed rail EMU, and relieves the vibration between the contact wire and the catenary cable. However, the suspension string is affected by temperature, climate, high-frequency vibration, etc., and the suspension string frequently loosens, breaks, and falls off, which may affect the current flow of the pantograph at the slightest level, or damage the pantograph or damage the contact wire at the worst, resulting in train failure. Therefore, Real-time detection and early warning of suspension string defects to ensure the safety and reliability of the catenary is of great significance to the safe operation of high-speed railways.

现有的接触网吊弦缺陷检测方法中多数通过机器学习和神经网络模型来进行检测吊弦缺陷，但是存在以下缺点：Most of the existing catenary suspension string defect detection methods use machine learning and neural network models to detect suspension string defects, but there are the following shortcomings:

(1)受实际场景的影响，如在雨天，雾天，强太阳光，夜晚等自然场景环境下对吊弦线的缺陷识别效果较差；(1) Affected by the actual scene, such as in rainy days, foggy days, strong sunlight, night and other natural scene environments, the defect recognition effect on suspension strings is poor;

(2)在吊弦线被遮挡情况下不能精准识别吊弦缺陷，存在缺陷漏报的情况。(2) When the suspension strings are blocked, the defects of the suspension strings cannot be accurately identified, and there are cases where defects are missed.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种基于CNN与Transformer融合的接触网吊弦缺陷检测方法，有助于解决目前接触网吊弦缺陷检测方法在吊弦被遮挡以及复杂场景下缺陷识别效果不好，识别准确率较低的问题。The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a catenary suspension string defect detection method based on the fusion of CNN and Transformer, which helps to solve the problem of the current catenary suspension string defect detection method being blocked and complex scenes The effect of defect recognition is not good, and the recognition accuracy is low.

本发明的目的是通过以下技术方案来实现的：The purpose of the present invention is achieved through the following technical solutions:

本发明提供了一种基于CNN与Transformer融合的接触网吊弦缺陷检测方法，包括：The invention provides a catenary suspension string defect detection method based on the fusion of CNN and Transformer, including:

采集接触网吊弦图像并对所述吊弦图像进行图像增强处理，获得吊弦缺陷样本集；Collecting catenary suspension string images and performing image enhancement processing on the suspension string images to obtain suspension string defect sample sets;

利用基于约束的可变卷积网络构建卷积模块，同时依据改进的高效率多头自注意力机制构建自注意模块，并基于最优模块分配比例将所述卷积模块与自注意模块进行深度融合，生成多块交叉混合网络；Construct a convolution module using a variable convolutional network based on constraints, and construct a self-attention module based on an improved high-efficiency multi-head self-attention mechanism, and deeply fuse the convolution module with the self-attention module based on the optimal module allocation ratio , generating a multi-block cross-hybrid network;

利用所述多块交叉混合网络改进FasterRCNN网络，并基于所述吊弦缺陷样本集对改进后的FasterRCNN网络进行训练和验证，获得训练好的FasterRCNN改进模型；Using the multi-block cross hybrid network to improve the FasterRCNN network, and based on the hanging string defect sample set, the improved FasterRCNN network is trained and verified, and the trained FasterRCNN improved model is obtained;

将所述FasterRCNN改进模型部署于吊弦检测设备，实时抓拍高速铁路的接触网吊弦图像并输入所述吊弦检测设备进行吊弦缺陷检测，识别接触网中的吊弦缺陷。The improved FasterRCNN model is deployed in the suspension string detection equipment, and the catenary suspension string image of the high-speed railway is captured in real time and input to the suspension string detection equipment for suspension string defect detection, and the suspension string defect in the catenary is identified.

进一步地，所述采集接触网吊弦图像并对所述吊弦图像进行图像增强处理，获得吊弦缺陷样本集，具体包括：Further, the collecting the catenary suspension string image and performing image enhancement processing on the suspension string image to obtain the suspension string defect sample set specifically includes:

采集高速铁路运行过程中的接触网吊弦图像；Collect the catenary hanging string image during the operation of the high-speed railway;

基于改进后的图像增强算法对接触网吊弦图像进行图像增强处理，在任一吊弦图像中随机生成N个mask区域，并依据mask区域的数量N，过滤遮挡住吊弦关键特征的mask区域，获得吊弦缺陷样本集；Based on the improved image enhancement algorithm, image enhancement processing is performed on the catenary hanging string image, N mask areas are randomly generated in any hanging string image, and the mask area that blocks the key features of the hanging string is filtered according to the number N of mask areas. Obtain a sample set of hanging string defects;

将吊弦缺陷样本集划分为训练样本和验证样本。The hanging string defect sample set is divided into training samples and validation samples.

进一步地，所述依据mask区域的数量N，过滤遮挡住吊弦关键特征的mask区域，具体包括：Further, according to the number N of mask regions, filtering the mask regions that block the key features of the hanging strings specifically includes:

若N＝1，则过滤掉吊弦图像中完全遮挡住吊弦的单个mask区域；If N=1, filter out the single mask area that completely blocks the hanging string in the hanging string image;

若N＝2，则过滤掉吊弦图像中遮挡住吊弦上下两端的两个mask区域；If N=2, then filter out the two mask regions that block the upper and lower ends of the hanging string in the hanging string image;

若N≥3，则过滤掉吊弦图像中完全遮挡住吊弦的单个mask区域，以及过滤掉吊弦图像中遮挡住吊弦上下两端的任意两个mask区域。If N≥3, filter out a single mask area that completely blocks the hanging string in the hanging string image, and filter out any two mask areas that block the upper and lower ends of the hanging string in the hanging string image.

进一步地，所述利用基于约束的可变卷积网络构建卷积模块，同时依据改进的高效率多头自注意力机制构建自注意模块，并基于最优模块分配比例将所述卷积模块与自注意模块进行深度融合，生成多块交叉混合网络，具体包括：Further, the convolution module is constructed using a variable convolution network based on constraints, and the self-attention module is constructed according to the improved high-efficiency multi-head self-attention mechanism, and the convolution module is combined with the self-attention module based on the optimal module allocation ratio. The attention module is deeply fused to generate a multi-block cross-hybrid network, specifically including:

依据高宽比吊弦线的高宽比例，采用约束关系对可变卷积网络中采样点位置的高宽坐标比例进行约束，同时限制采样点位置的高宽坐标不超过输入特征图的高宽，获得基于约束的可变卷积网络，并通过基于约束的可变卷积网络构建卷积模块；According to the aspect ratio of the height-width ratio of the hanging string, the constraint relationship is used to constrain the height-width coordinate ratio of the sampling point position in the variable convolutional network, and at the same time limit the height-width coordinates of the sampling point position to not exceed the height-width of the input feature map , obtain a constraint-based variable convolution network, and construct a convolution module through a constraint-based variable convolution network;

对原始Transformer自注意力机制中的键向量K和值向量V分别进行空间降维操作，获得改进的高效率多头自注意力机制并依据改进的高效率多头自注意力机制构建自注意模块；Space dimensionality reduction operations are performed on the key vector K and value vector V in the original Transformer self-attention mechanism to obtain an improved high-efficiency multi-head self-attention mechanism and build a self-attention module based on the improved high-efficiency multi-head self-attention mechanism;

基于FasterRCNN模型中的主干网络架构，依据最优模块分配比例分配主干网络架构中的卷积模块数量与自注意模块数量，并将所述卷积模块与自注意模块进行新范式融合，生成多块交叉混合网络。Based on the backbone network architecture in the FasterRCNN model, the number of convolution modules and the number of self-attention modules in the backbone network architecture are allocated according to the optimal module allocation ratio, and the convolution modules and self-attention modules are fused in a new paradigm to generate multiple blocks cross-hybrid network.

进一步地，所述最优模块分配比例具体为：卷积模块数量：自注意模块数量＝7:2。Further, the optimal module allocation ratio is specifically: number of convolution modules: number of self-attention modules=7:2.

进一步地，所述利用所述多块交叉混合网络改进FasterRCNN网络，并基于所述吊弦缺陷样本集对改进后的FasterRCNN网络进行训练和验证，获得训练好的FasterRCNN改进模型，具体包括：Further, the improved FasterRCNN network is improved by using the multi-block cross hybrid network, and the improved FasterRCNN network is trained and verified based on the hanging string defect sample set, and the trained FasterRCNN improved model is obtained, which specifically includes:

将FasterRCNN网络的主干网络架构替换为所述多块交叉混合网络的网络架构，获得改进后的FasterRCNN模型；The backbone network architecture of the FasterRCNN network is replaced by the network architecture of the multi-block cross hybrid network to obtain the improved FasterRCNN model;

基于吊弦缺陷样本集中训练样本对改进后的FasterRCNN网络进行训练，训练完成后利用验证样本对模型进行验证，获得训练验证好的FasterRCNN改进模型。The improved Faster RCNN network is trained based on the training samples in the hanging string defect sample set. After the training is completed, the model is verified by the verification samples, and the trained and verified Faster RCNN improved model is obtained.

本发明的有益效果：本发明提供了基于CNN与Transformer融合的接触网吊弦缺陷检测方法，方法采集接触网吊弦图像并对所述吊弦图像进行图像增强处理，获得吊弦缺陷样本集；利用基于约束的可变卷积网络构建卷积模块，同时依据改进的高效率多头自注意力机制构建自注意模块，并基于最优模块分配比例将所述卷积模块与自注意模块进行深度融合，生成多块交叉混合网络；利用所述多块交叉混合网络改进FasterRCNN网络，并基于所述吊弦缺陷样本集对改进后的FasterRCNN网络进行训练和验证，获得训练好的FasterRCNN改进模型；将所述FasterRCNN改进模型部署于吊弦检测设备，实时抓拍高速铁路的接触网吊弦图像并输入所述吊弦检测设备进行吊弦缺陷检测，识别接触网中的吊弦缺陷。本申请通过对吊弦图像进行图像增强处理来获取吊弦缺陷样本集，增加了吊弦缺陷样本，解决吊弦被遮挡且吊弦训练样本量少问题。同时本申请通过利用基于约束的可变卷积网络构建卷积模块，满足了吊弦特征要求，并提升了吊弦缺陷识别能力。同时依据改进的高效率多头自注意力机制构建自注意模块，并基于最优模块分配比例将卷积模块与自注意模块进行深度融合，生成多块交叉混合网络，将CNN与transformer进行交叉融合来解决吊弦雨天，雾天，强太阳光，夜晚等自然场景环境下的缺陷识别效果差的问题，而且也能够对被遮挡以及远距离小目标吊弦缺陷精准识别，从而提高了吊弦缺陷识别的召回率和精准率。Beneficial effects of the present invention: the present invention provides a catenary suspension string defect detection method based on the fusion of CNN and Transformer. The method collects the catenary suspension string image and performs image enhancement processing on the suspension string image to obtain a suspension string defect sample set; Construct a convolution module using a variable convolutional network based on constraints, and construct a self-attention module based on an improved high-efficiency multi-head self-attention mechanism, and deeply fuse the convolution module with the self-attention module based on the optimal module allocation ratio , generate a multi-block cross-mixed network; use the multi-block cross-mixed network to improve the FasterRCNN network, and based on the hanging string defect sample set, the improved FasterRCNN network is trained and verified to obtain a trained FasterRCNN improved model; The improved FasterRCNN model is deployed in the suspension string detection equipment, and the catenary suspension string image of the high-speed railway is captured in real time and input to the suspension string detection equipment for suspension string defect detection, and the suspension string defect in the catenary is identified. This application obtains the hanging string defect sample set by performing image enhancement processing on the hanging string image, increases the hanging string defect samples, and solves the problem that the hanging string is blocked and the number of hanging string training samples is small. At the same time, the application meets the characteristic requirements of the hanging string and improves the ability to identify the defect of the hanging string by using a variable convolution network based on constraints to construct a convolution module. At the same time, the self-attention module is constructed according to the improved high-efficiency multi-head self-attention mechanism, and the convolution module and the self-attention module are deeply fused based on the optimal module allocation ratio to generate a multi-block cross hybrid network, and CNN and transformer are cross-fused to achieve Solve the problem of poor defect recognition effect in natural scene environments such as rainy days, foggy days, strong sunlight, and night, and can also accurately identify the defects of suspended strings that are blocked and small long-distance targets, thereby improving the detection of hanging string defects recall and precision rates.

附图说明Description of drawings

图1是本发明的基于CNN与Transformer融合的接触网吊弦缺陷检测方法流程图；Fig. 1 is the catenary suspension string defect detection method flow chart based on CNN and Transformer fusion of the present invention;

图2是现有的吊弦缺陷种类示意图；Figure 2 is a schematic diagram of the existing types of suspension string defects;

图3是真实应用环境中吊弦缺陷识别问题示意图；Fig. 3 is a schematic diagram of the identification problem of hanging string defects in a real application environment;

图4是现有模型误识别吊弦的实例图；Figure 4 is an example diagram of the existing model misidentifying the hanging string;

图5是FasterRCNN改进模型架构图；Figure 5 is a diagram of the FasterRCNN improved model architecture;

图6是图像增强后的吊弦图像样本图；Fig. 6 is a sample diagram of a hanging string image after image enhancement;

图7是采用不同卷积神经网络进行缺陷识别的卷积采样示意图；Figure 7 is a schematic diagram of convolutional sampling using different convolutional neural networks for defect identification;

图8是双线插值原理示意图；Fig. 8 is a schematic diagram of the principle of bilinear interpolation;

图9是传统混合网络和多块交叉融合混合网络的骨干网络架构对比图；Figure 9 is a comparison diagram of the backbone network architecture of a traditional hybrid network and a multi-block cross-fusion hybrid network;

图10是本申请主干网络的单个阶段中CB块与TB块交叉融合形成的网络结构示意图；Figure 10 is a schematic diagram of the network structure formed by the cross fusion of CB blocks and TB blocks in a single stage of the backbone network of the present application;

图11是本发明的CB块结构示意图；Fig. 11 is a schematic diagram of the CB block structure of the present invention;

图12是本发明的TB块结构示意图。Fig. 12 is a schematic diagram of the TB block structure of the present invention.

具体实施方式Detailed ways

为了对本发明的技术特征、目的和效果有更加清楚的理解，现对照附图说明本发明的具体实施方式。In order to have a clearer understanding of the technical features, purposes and effects of the present invention, the specific implementation manners of the present invention will now be described with reference to the accompanying drawings.

接触网是电气化铁路的重要组成部分，是列车高速运行的动力之源。吊弦是接触线与承力索间振动、力、电流的传递者，是改善接触网受流性能和受力性能的重要零部件，是保障电气化铁路接触网安全使用的关键部分之一，通常包括吊弦线和吊弦线两端的吊弦线夹。吊弦在柔性接触网系统中可以在不增加支柱的条件下，增加每个跨距中对接触线的悬挂点，实现接触网整体弹性和接触线弛度的改善和提升。The catenary is an important part of the electrified railway and the source of power for the high-speed operation of the train. The suspension string is the transmitter of vibration, force and current between the catenary wire and the catenary cable. It is an important component to improve the current and force performance of the catenary. Includes hanging string and hanging string clips at both ends of the hanging string. In the flexible catenary system, the hanging string can increase the suspension points of the catenary in each span without adding pillars, so as to improve the overall elasticity of the catenary and the sag of the catenary.

在铁路运输过程中，一旦发生吊弦断裂现象，如果不能及时发现和检修，吊弦断裂位置处的接触线会下沉，造成接触线与轨道的不平行，使受电弓取流受到影响，完全断裂的吊弦下垂后，在风力的作用下，很容易与接触线纠缠在一起，会打坏受电弓或损坏接触线，导致列车运行故障，危及乘客生命和财产安全。因此，对吊弦缺陷实时检测并预警确保接触网的安全性和可靠性，对于高速铁路的安全运营具有重要意义。In the process of railway transportation, once the suspension string breaks, if it cannot be found and repaired in time, the contact line at the position of the suspension string breakage will sink, causing the contact line to be non-parallel to the track, which will affect the flow of the pantograph and completely After the broken hanging string droops, under the action of wind, it is easy to get entangled with the contact wire, which will damage the pantograph or damage the contact wire, resulting in train operation failure and endangering the lives and property of passengers. Therefore, it is of great significance for the safe operation of high-speed railways to detect and warn the suspension string defects in real time to ensure the safety and reliability of the catenary.

因此，本申请将CNN与transformer进行交叉融合，可以在吊弦雨天、雾天、强太阳光和夜晚等自然场景环境下进行吊弦缺陷识别，而且也能够对被遮挡以及远距离小目标的吊弦缺陷进行精准识别，提高了吊弦缺陷识别的召回率和精准率。Therefore, this application cross-integrates CNN and transformer, which can identify the defects of hanging strings in natural scene environments such as rainy days, foggy days, strong sunlight, and nights, and can also detect hidden and long-distance small targets. Accurate identification of string defects improves the recall rate and precision of hanging string defect identification.

参照图1所示，图1示出了一种基于CNN与Transformer融合的接触网吊弦缺陷检测方法，包括：Referring to Figure 1, Figure 1 shows a catenary suspension string defect detection method based on the fusion of CNN and Transformer, including:

S1：采集接触网吊弦图像并对所述吊弦图像进行图像增强处理，获得吊弦缺陷样本集；S1: Collect the catenary suspension string image and perform image enhancement processing on the suspension string image to obtain a suspension string defect sample set;

S2：利用基于约束的可变卷积网络构建卷积模块，同时依据改进的高效率多头自注意力机制构建自注意模块，并基于最优模块分配比例将所述卷积模块与自注意模块进行深度融合，生成多块交叉混合网络；S2: Construct a convolution module using a variable convolutional network based on constraints, and construct a self-attention module based on an improved high-efficiency multi-head self-attention mechanism, and combine the convolution module with the self-attention module based on the optimal module allocation ratio Deep fusion to generate multi-block cross hybrid network;

S3：利用所述多块交叉混合网络FasterRCNN网络进行改进，并基于所述吊弦缺陷样本集对改进后的FasterRCNN网络进行训练和验证，获得训练好的FasterRCNN改进模型；S3: Utilize the multi-block cross hybrid network FasterRCNN network to improve, and train and verify the improved FasterRCNN network based on the hanging string defect sample set, and obtain a trained FasterRCNN improved model;

S4：将所述FasterRCNN改进模型部署于吊弦检测设备，实时抓拍高速铁路的接触网吊弦图像并输入所述吊弦检测设备进行吊弦缺陷检测，识别接触网中的吊弦缺陷。S4: Deploy the improved FasterRCNN model on the suspension string detection equipment, capture the catenary suspension string image of the high-speed railway in real time and input it into the suspension string detection equipment to detect suspension string defects, and identify the suspension string defects in the catenary.

其中，现有的FasterRCNN模型应用于复杂真实场景中无法满足吊弦缺陷检测的要求，存在大量低召回率和低精度识别的问题，而且高速运动下实时图像抓拍，需要较低的推理时延，现有的FasterRCNN模型的推理时延也无法满足这一要求。因此，本申请主要通过FasterRCNN网络进行改进来实现吊弦的缺陷检测。Among them, the existing FasterRCNN model applied to complex real scenes cannot meet the requirements of hanging string defect detection, and there are many problems of low recall rate and low precision recognition, and real-time image capture under high-speed motion requires low inference delay. The reasoning delay of the existing FasterRCNN model cannot meet this requirement. Therefore, this application mainly realizes the defect detection of hanging strings by improving the FasterRCNN network.

具体的，参照图2所示，由于吊弦长年暴露在外部环境，受气候因素、运行环境影响较大，同时，吊弦零部件易受高频振动的影响，从而引发应力疲劳、吊弦断、脱落等问题，缺陷数量呈几何数据的增长。其中，吊弦缺陷种类很多，为了方便归类和识别，我们将其划分为5大缺陷:吊弦线断裂、吊弦脱落(吊弦线夹上端脱落，下端脱落)、吊弦弯曲、吊弦线散股和吊弦安装不规范。图2中的a1部分的附图展示了吊弦线断裂的缺陷，d1部分的附图展示了吊弦弯曲的缺陷，e1和f1部分的附图展示了吊弦线松弛的缺陷，b1和c1部分附图分别表示吊弦的上脱落和下脱落缺陷。Specifically, as shown in Figure 2, since the suspension strings are exposed to the external environment for many years, they are greatly affected by climate factors and operating environments. There are problems such as breakage and shedding, and the number of defects increases geometrically. Among them, there are many types of suspension string defects. For the convenience of classification and identification, we divide them into five major defects: suspension string breakage, suspension string drop-off (the upper end of the suspension string clip falls off, and the lower end falls off), suspension string bending, suspension string The installation of loose strands and hanging strings is not standardized. Drawings in part a1 of Fig. 2 show defects of broken suspension strings, drawings of parts d1 show defects of bent suspension strings, drawings of parts e1 and f1 show defects of loose suspension strings, b1 and c1 Part of the drawings show the defects of the upper shedding and the lower shedding of the suspension string respectively.

由于中国高铁覆盖范围广，里程长，气候与地貌复杂多样。受自然天气，季节变化，光照、吊弦被遮挡等因素的影响，实际应用场景下吊弦缺陷识别难度很大。图3罗列了实际应用中吊弦识别遇到的问题。图3中的a2部分呈现了列车高速运动下图像出现拖隐，整个图像模糊不清的问题；b2部分附图显示了隧道光线暗，整个图像成像昏暗的问题；c2部分附图显示了镜头脏，部分图像被遮挡的问题；d2部分附图显示了图像遭遇复杂的背景，目标物与背景交织一起，很难分辨吊弦目标的问题；e2和f2部分的附图显示了图像中的吊弦被受电弓遮挡或者被燃弧截断的问题。g2、h2、i2和j2部分附图显示了吊弦图像受到大雾天气、夜晚、暴雨天气、强太阳光等恶劣天气影响，吊弦图像模糊不清、很难识别的问题。可见，在真实应用环境中，吊弦识别过程会遇到各类复杂情况，大大增加了吊弦缺陷识别难度。Due to the wide coverage and long mileage of China's high-speed rail, the climate and landforms are complex and diverse. Affected by factors such as natural weather, seasonal changes, illumination, and suspension strings being blocked, it is very difficult to identify suspension string defects in practical application scenarios. Figure 3 lists the problems encountered in the identification of hanging strings in practical applications. Part a2 in Figure 3 presents the problem of image dragging and blurring when the train is moving at high speed; part b2 shows the problem that the tunnel is dark and the whole image is dim; part c2 shows that the lens is dirty , the problem that part of the image is occluded; the drawings in part d2 show that the image encounters a complex background, and the target object is intertwined with the background, and it is difficult to distinguish the problem of the hanging string target; the drawings in parts e2 and f2 show the hanging string in the image The problem of being blocked by the pantograph or being cut off by the arc. Parts of the drawings in g2, h2, i2 and j2 show the problem that the hanging string image is blurred and difficult to identify due to the influence of heavy fog, night, rainstorm, strong sunlight and other bad weather. It can be seen that in the real application environment, the identification process of hanging strings will encounter various complex situations, which greatly increases the difficulty of identifying the defects of hanging strings.

参照图4所示，而现有技术采用FasterRCNN及其优化改进模型，或CNN与transformer融合模型对吊弦进行缺陷识别都存在如下问题：1、将定位管、电连接线接头、电连接线和防风拉线误识别为吊弦，如图4中的a3、b3、d3和e3部分的附图所示；2、吊弦被遮挡后误识别为吊弦断缺陷，如图4中的如图c3部分附图所示；3、远距离小目标漏识别，如图4中的f3部分附图所示。针对真实场景下吊弦及其缺陷存在大量低召回率和精准率问题，现有模型不能满足吊弦缺陷识别的要求，因此需要对吊弦内部特征进行大量分析，根据分析的结果然后设计构建相应的模型来满足吊弦的缺陷识别要求，从而确保高铁接触网安全运营。With reference to Fig. 4, and prior art adopts FasterRCNN and its optimization and improvement model, or CNN and transformer fusion model carry out defect recognition to suspension string, all there is following problem: 1, the positioning tube, electric connecting line joint, electric connecting line and The windproof stay wire is misidentified as a suspension string, as shown in the attached drawings of a3, b3, d3 and e3 in Figure 4; 2. After the suspension string is blocked, it is misidentified as a broken suspension string, as shown in Figure 4 as c3 Part of the drawings are shown; 3. Missing recognition of long-distance small targets, as shown in part f3 of Fig. 4 . Aiming at the problems of low recall and precision in hanging strings and their defects in real scenarios, the existing models cannot meet the requirements for the identification of hanging string defects. Therefore, it is necessary to conduct a large amount of analysis on the internal characteristics of hanging strings, and then design and build corresponding The model is used to meet the defect identification requirements of the hanging string, so as to ensure the safe operation of the high-speed railway catenary.

本申请基于FasterRCNN模型进行了三个方面改进，来实现被遮挡情况下吊弦线缺陷检测，以及提高吊弦缺陷检测的召回率和精准率，从而为高铁接触网安全智能运营提供强有力的支撑。This application is based on the FasterRCNN model and has been improved in three aspects to realize the defect detection of the suspension string under the occluded condition, and to improve the recall rate and accuracy of the detection of the suspension string defect, thus providing strong support for the safe and intelligent operation of the high-speed rail catenary. .

现有的FasterRCNN模型架构主要包括输入层、骨干网络(也称主干网络)、neck、Head检测头和输出层。主干网络中分为四个阶段(stage)，每个阶段含有一定数量的卷积模块。The existing FasterRCNN model architecture mainly includes an input layer, a backbone network (also known as a backbone network), a neck, a Head detection head, and an output layer. The backbone network is divided into four stages, and each stage contains a certain number of convolution modules.

而本申请改进后的模型如图5所示，主要进行了三方面改进：1)基于现有图像增强方法和吊线固有特性，提出了Limited cutout算法(即L-cutout)，来解决吊弦训练样本量少问题，2)针对可变卷积对吊弦缺陷识别的局限性和吊弦长宽比巨大差异性，提出了基于约束的可变卷积(C-DCV)来提升吊弦的识别性能；3)针对图像识别对CNN局部感知和ViT长范围的依赖，构建一个新型的主干网络来进行吊弦缺陷检测。受到残差网络激励，本申请将3x3卷积通过C-DCV替换，构建了一种新型的残差网络，我们称之为CNN block(本申请简称CB)。本申请也优化了HMSA效率，并与FFN结合，构建了一种高效的自注意模块，称之为transformer block(本申请简称TB)。The improved model of this application is shown in Figure 5, and it has mainly been improved in three aspects: 1) Based on the existing image enhancement method and the inherent characteristics of the hanging wire, a Limited cutout algorithm (namely L-cutout) is proposed to solve the problem of hanging string training. The problem of small sample size, 2) Aiming at the limitations of variable convolution on the identification of hanging string defects and the huge difference in the length-to-width ratio of hanging strings, a constraint-based variable convolution (C-DCV) is proposed to improve the identification of hanging strings performance; 3) Aiming at the dependence of image recognition on CNN's local perception and ViT's long range, a novel backbone network is constructed for hanging string defect detection. Inspired by the residual network, this application replaces the 3x3 convolution with C-DCV to construct a new type of residual network, which we call CNN block (abbreviated as CB in this application). This application also optimizes the efficiency of HMSA, and combines it with FFN to construct an efficient self-attention module, called transformer block (abbreviated as TB in this application).

进一步地，在一个实施例中，所述采集接触网吊弦图像并对所述吊弦图像进行图像增强处理，获得吊弦缺陷样本集，具体包括：Further, in one embodiment, the collecting the catenary suspension string image and performing image enhancement processing on the suspension string image to obtain the suspension string defect sample set specifically includes:

其中，所述依据mask区域的数量N，过滤遮挡住吊弦关键特征的mask区域，具体包括：Wherein, according to the number N of mask regions, filter the mask region that blocks the key features of the hanging string, specifically including:

具体实践过程中，本申请先对采集的吊弦图像进图像增强，图像增强有利于图像分类、目标检测以及语义分割等识别场景。通常图像增强采用传统图像增强算法，例如翻转(Flip，Rotation，Scale等)。近些年涌现了新的图像增强算法如cutout，mixup，及cutmix等，能够进一步提升图像识别能力。然而将这些新算法如果直接应用于吊弦缺陷检测，将会导致检测召回率和精准率严重下降。究其原因，主要是由于这些算法采用随机掩码，图像覆盖等操作，掩盖了吊弦关键特征，即掩盖了两端吊弦线夹(Dropper clamp)，留下吊弦线(Dropper wire)，导致后续识别过程中模型将线条识别为吊弦。In the specific practice process, the application first performs image enhancement on the collected hanging string image. Image enhancement is beneficial to recognition scenarios such as image classification, object detection, and semantic segmentation. Usually image enhancement adopts traditional image enhancement algorithms, such as flipping (Flip, Rotation, Scale, etc.). In recent years, new image enhancement algorithms such as cutout, mixup, and cutmix have emerged, which can further improve image recognition capabilities. However, if these new algorithms are directly applied to the detection of hanging string defects, the detection recall rate and precision rate will be seriously reduced. The reason is mainly because these algorithms use random masking, image overlay and other operations to cover up the key features of the dropper, that is, to cover up the dropper clamps at both ends, leaving the dropper wire, Causes the model to recognize the line as a hanging string during the subsequent recognition process.

现有图像增强技术吊弦样本上的增强识别结果参照图6所示，其中，图6中的a4部分附图代表标准吊弦样本；b4和c4部分附图展示了保留关键吊弦信息的图像增强结果；d4、e4和f4部分的附图展示了遮挡关键吊弦信息的图像增强结果。The enhanced recognition results on the suspended string samples of the existing image enhancement technology are shown in Figure 6, wherein, the a4 part of the figure in Figure 6 represents the standard suspended string sample; the b4 and c4 part of the figure shows the image that retains the key suspended string information Enhancement results; Figures in sections d4, e4, and f4 show the results of image enhancements that occlude key hanging string information.

针对上述图像增强算法存在的问题，本申请提出了一种基于cutout方法改进的算法，称之为Limited cutout(L-cutout)算法，该算法随机生成mask区域，但mask区域(也称mask块)不能遮挡吊弦的显著特征，即至少保留一个吊弦线夹不被遮挡(如图6中的b4和c4部分附图所示)。L-cutout在吊弦缺陷识别方面具有以下优点：1)提高被遮挡吊弦识别能力；2)增加了吊弦缺陷样本，有利于模型充分的训练。In view of the problems existing in the above-mentioned image enhancement algorithm, the application proposes an improved algorithm based on the cutout method, called the Limited cutout (L-cutout) algorithm, which randomly generates mask areas, but the mask areas (also called mask blocks) The distinctive feature of the suspension string cannot be covered, that is, at least one suspension string clip is left unobstructed (as shown in the b4 and c4 part of Figure 6). L-cutout has the following advantages in the identification of hanging string defects: 1) It improves the recognition ability of blocked hanging strings; 2) It increases the samples of hanging string defects, which is conducive to the sufficient training of the model.

本申请的L-cutout算法总体思路为：在吊弦图像中随机生成mask区域，但是需要过滤遮挡吊弦关键特征的mask区域(如图6的d4、e4和f4部分)。L-cutout算法的具体处理过程如下：The general idea of the L-cutout algorithm in this application is: randomly generate mask areas in the hanging string image, but need to filter the mask areas that block the key features of the hanging string (d4, e4 and f4 in Figure 6). The specific processing of the L-cutout algorithm is as follows:

首先，设随机生成的mask区域在图像中位置和尺寸用(X_mask,Y_mask,W_mask，H_mask)表示，其中X_mask,Y_mask表示起始点坐标，W_mask,H_mask表示mask在图中的宽和高。标注的吊弦样本Ground Truth在图像中用(X_GT,Y_GT,W_GT,H_GT)表示，其中X_GT,Y_GT表示起始坐标，W_GT,H_GT表示高宽。物理吊弦透视到图像中，L_dw表示吊弦线在图中的长度，L_dc表示吊弦夹的长度。First, let the position and size of the randomly generated mask area in the image be represented by (X _mask, Y _mask , W _mask , H _mask ), where X _mask and Y _mask represent the coordinates of the starting point, W _{mask and} H _mask represent the mask in the image in width and height. The marked hanging string sample Ground Truth is represented by (X _GT , Y _GT , W _GT , H _GT ) in the image, where X _GT , Y _GT represent the starting coordinates, W _GT , H _GT represent the height and width. The physical hanging string is perspectived into the image, L _dw indicates the length of the hanging string in the picture, and L _dc indicates the length of the hanging string clamp.

其次，算法对mask区域进行过滤，算法的过滤逻辑如下：Secondly, the algorithm filters the mask area, and the filtering logic of the algorithm is as follows:

其中，算法的过滤逻辑具体实现过程如下：依据第2行处理逻辑，在输入样本中随机生成N个masks块；当mask块数目N为1时，过滤掉吊弦图像中完全遮挡住吊弦的单个mask块，依据第3行到17行的处理逻辑获取吊弦样本；当mask块数目N为2时，过滤掉吊弦图像中遮挡住吊弦上下两端的两个mask块，依据第18行到29行的处理逻辑获取吊弦样本；当mask块数目N≥3时，将每个mask块进入N＝1的处理逻辑分支进行过滤，同时将任意两个mask块进入N＝2的处理逻辑分支进行过滤。Among them, the specific implementation process of the filtering logic of the algorithm is as follows: According to the processing logic of the second line, randomly generate N masks blocks in the input sample; when the number of mask blocks N is 1, filter out the hanging string image that completely blocks the hanging string For a single mask block, obtain the hanging string samples according to the processing logic in lines 3 to 17; when the number of mask blocks N is 2, filter out the two mask blocks that block the upper and lower ends of the hanging string in the hanging string image, according to line 18 The processing logic to line 29 obtains the hanging string samples; when the number of mask blocks N≥3, each mask block enters the processing logic branch of N=1 for filtering, and at the same time, any two mask blocks enter the processing logic of N=2 branch to filter.

进一步地，在一个实施例中，所述利用基于约束的可变卷积网络构建卷积模块，同时依据改进的高效率多头自注意力机制构建自注意模块，并基于最优模块分配比例将所述卷积模块与自注意模块进行深度融合，生成多块交叉混合网络，具体包括：Further, in one embodiment, the convolution module is constructed using a constraint-based variable convolution network, and the self-attention module is constructed according to an improved high-efficiency multi-head self-attention mechanism, and the optimal module allocation ratio is used to divide the The above convolution module is deeply fused with the self-attention module to generate a multi-block cross hybrid network, including:

其中，所述最优模块分配比例具体为：卷积模块数量(CB)：自注意模块数量(TB)＝7:2，该比例为本申请的最优分配比例。进一步地，依据最优模块分配比例分配主干网络架构中的卷积模块数量与自注意模块数量具体是：按照主干网络中每个stage的模块数量，将stage中模块以7个CB块与2个TB块为一组的排列方式划分多组模块，将多组模块之间进行串联，融合形成主干网络中的一个stage。本申请主要是对主干网络中的Stage2、Stage3和Stage4的模块数量进行分配和融合。在本申请的另一些实施例中，也可以根据模型中网络的架构对模块分配比例进行相应调整，本申请在此不再赘述。Wherein, the optimal module allocation ratio is specifically: the number of convolution modules (CB): the number of self-attention modules (TB) = 7:2, and this ratio is the optimal allocation ratio of this application. Further, according to the optimal module allocation ratio, the number of convolution modules and the number of self-attention modules in the backbone network architecture are allocated specifically: according to the number of modules in each stage in the backbone network, the modules in the stage are divided into 7 CB blocks and 2 CB blocks. The TB block is divided into multiple groups of modules in a group arrangement, and the multiple groups of modules are connected in series and merged to form a stage in the backbone network. This application mainly allocates and integrates the number of modules of Stage2, Stage3 and Stage4 in the backbone network. In some other embodiments of the present application, the proportion of module allocation may also be adjusted according to the architecture of the network in the model, and the present application will not repeat them here.

具体实践过程中，可变形卷积在采样时可以更贴近物体的形状和尺寸，而卷积神经网络的采样不能随着物体形状变化而跟随变化，受限于CNN模块的固定几何结构，卷积神经网络对高宽相等物体更加友好，而可变形卷积神经网络对多样性形状物体识别更有利，由于吊弦线宽高比例不相等，通常高比宽大的多，采用卷积神经网络对吊弦线识别不利，因此选择可变的卷积神经网络对吊弦线缺陷进行识别。In the specific practice process, the deformable convolution can be closer to the shape and size of the object when sampling, while the sampling of the convolutional neural network cannot follow the change of the shape of the object, which is limited by the fixed geometry of the CNN module. The neural network is more friendly to objects with equal height and width, and the deformable convolutional neural network is more beneficial to the recognition of objects with diverse shapes. Since the width and height ratio of the suspension string is not equal, usually the height is much larger than the width. The string identification is unfavorable, so a variable convolutional neural network is selected to identify the defects of the hanging string.

参照图7，本申请基于吊弦图像介绍了三种卷积采样图，其中，Dropper代表吊弦样本，SC代表标准卷积，DCV2代表可变形卷积，C-DCV代表基于约束的可变形卷积，图7的a5部分附图示出了实际的吊弦图像，b5部分附图展示了标准卷积的采样图，c5部分附图展示了可变形卷积的采样图，d5部分附图展示了基于约束的可变形卷积的采样图。可以看出，C-DCV可以根据吊弦线的高宽比在采样过程中进行相应的变化。Referring to Figure 7, this application introduces three convolution sampling images based on the hanging string image, where Dropper represents the hanging string sample, SC represents the standard convolution, DCV2 represents the deformable convolution, and C-DCV represents the constraint-based deformable volume Part a5 of Figure 7 shows the actual hanging string image, part b5 shows the sample image of standard convolution, part c5 shows the sample image of deformable convolution, and part d5 shows Sampling graphs for constraint-based deformable convolutions. It can be seen that the C-DCV can be changed in the sampling process according to the aspect ratio of the suspension string.

而将DCV1和DCV2直接应用到吊弦缺陷检测中，模型整个性能有小幅度提高，但不能充分发挥可变卷积的能力。主要原因是：尽管可变卷积根据物体形状进行采集识别，但是没有专门针对吊弦这个类型特征，即吊弦高远大于宽的这一特定特征。However, when DCV1 and DCV2 are directly applied to the detection of hanging wire defects, the overall performance of the model is slightly improved, but the ability of variable convolution cannot be fully utilized. The main reason is that although the variable convolution performs acquisition and recognition based on the shape of the object, it does not specifically target the type of hanging string, that is, the specific feature that the hanging string is much higher than the width.

针对可变卷积适用对象深入研究，并结合吊弦固有特征进行分析，本申请提出了针对吊弦特例固有特征卷积网络，称之为约束的可变卷积(Constraint-based deformableConvNets,C-DCV)，不仅可以应用吊弦缺陷检测，也可以应用于其他高大于宽的物体识别。In-depth research on the applicable objects of variable convolution, combined with the analysis of the inherent characteristics of the hanging string, this application proposes a convolution network for the inherent characteristics of the special case of the hanging string, which is called Constraint-based deformable ConvNets (Constraint-based deformable ConvNets, C- DCV), not only can be applied to the detection of hanging string defects, but also can be applied to the recognition of other objects that are taller than wide.

具体的，基于约束的可变形卷积算法(本申请简称C-DCV算法)如下式(1)所示：Specifically, the constraint-based deformable convolution algorithm (referred to as the C-DCV algorithm in this application) is shown in the following formula (1):

其中，H和W表示输入特征图的高宽，p_k表示输入特征图中的原始采样点，h_pk和w_pk表示p_k的高宽坐标；Δw_k和Δh_k表示p_k对应的偏移量；x表示p_k位置的特征；α表示可伸缩参数，目的是确保Δw_k和Δh_k之间的比例关系限制。p+p_k+offset_k(Δw_k,Δh_k)为p_k的采样点坐标，由于DCV2通过卷积操作没有限制的生成Δw_k和Δh_k，导致超出边界。因此，本申请的C-DCV采用约束关系，约束h_pk和w_pk之间的比例关系，让其趋近于吊弦高宽比例，同时限制h_pk和w_pk不能超出输入特征图的范围。Among them, H and W represent the height and width of the input feature map, p _k represents the original sampling point in the input feature map, h _pk and w _pk represent the height and width coordinates of p _k ; Δw _k and Δh _k represent the offset corresponding to p _k quantity; x represents the feature of p _k position; α represents the scalable parameter, the purpose is to ensure the proportional relationship between Δw _k and Δh _k is limited. p+p _k +offset _k (Δw _k , Δh _k ) is the coordinate of the sampling point of p _k , because DCV2 generates Δw _k and Δh _k without limitation through the convolution operation, resulting in exceeding the boundary. Therefore, the C-DCV of this application adopts a constraint relationship to constrain the proportional relationship between h _pk and w _pk , making it close to the height-width ratio of the hanging string, while restricting h _pk and w _pk to not exceed the range of the input feature map.

同时，offset_k(Δw_k,Δh_k)通过卷积生成Δw_k和Δh_k，可能是一个小数，因此p+p_k+offset_k(Δw_k,Δh_k)坐标不是整数，因此需要通过双线插值获得图像里整数坐标位置，再根据整数坐标获取对应特征值，将小数坐标分解到相邻的四个整数坐标点来计算结果。具体操作过程如下：At the same time, offset _k (Δw _k , Δh _k ) generates Δw _k and Δh _k through convolution, which may be a decimal, so the coordinates of p+p _k +offset _k (Δw _k , Δh _k ) are not integers, so it needs to pass the double line Interpolation obtains the integer coordinate position in the image, and then obtains the corresponding feature value according to the integer coordinate, and decomposes the decimal coordinate into four adjacent integer coordinate points to calculate the result. The specific operation process is as follows:

双线性插值原理如图8所示，训练获得坐标P_k(X,Y)不为整数，P_k对应的相邻的4个在图中坐标点为Q11(x₁,y₁),Q12(x₁,y₂),Q21(x₂,y₁)，Q22(x₂,y₂)，他们对应的特征值分别为f(Q₁₁)，f(Q₁₂)，f(Q₂₁)，f(Q₂₂)。我们通过双线插值方法将计算P值。其中，双线性插值的原理可以参照现有技术(http://www.cnblogs.com/yssongest/p/5303151.html)实现。The principle of bilinear interpolation is shown in Figure 8. The coordinates P _k (X, Y) obtained through training are not integers, and the four adjacent coordinate points corresponding to P _k in the figure are Q11(x ₁ ,y ₁ ), Q12 (x ₁ ,y ₂ ), Q21(x ₂ ,y ₁ ), Q22(x ₂ ,y ₂ ), their corresponding eigenvalues are f(Q ₁₁ ), f(Q ₁₂ ), f(Q ₂₁ ) , f(Q ₂₂ ). We will calculate P values by bilinear interpolation method. Wherein, the principle of bilinear interpolation can be implemented with reference to the prior art (http://www.cnblogs.com/yssongest/p/5303151.html).

首先在x方向进行线性插值，得到公式(2)：First perform linear interpolation in the x direction to obtain formula (2):

然后在y方向进行线性插值，得到公式(3)：Then perform linear interpolation in the y direction to obtain formula (3):

综合起来就是双线性插值最后的结果，即公式(4)：Taken together, it is the final result of bilinear interpolation, that is, formula (4):

由于图像双线性插值只会用相邻的4个点，因此上述公式4的分母都是1。这里P_k(X,Y)计算过程如下列公式(5)所示：Since the image bilinear interpolation only uses 4 adjacent points, the denominator of the above formula 4 is 1. Here, the calculation process of P _k (X, Y) is shown in the following formula (5):

进一步地，在一个实施例中，所述利用所述多块交叉混合网络FasterRCNN网络进行改进，并基于所述吊弦缺陷样本集对改进后的FasterRCNN网络进行训练和验证，获得训练好的FasterRCNN改进模型，具体包括：Further, in one embodiment, the multi-block cross hybrid network FasterRCNN network is used for improvement, and the improved FasterRCNN network is trained and verified based on the hanging string defect sample set, and the trained FasterRCNN improvement is obtained. models, including:

基于吊弦缺陷样本集中训练样本对改进后的FasterRCNN模型进行训练，训练完成后利用验证样本对模型进行验证，获得训练验证好的FasterRCNN改进模型。The improved Faster RCNN model is trained based on the training samples in the hanging string defect sample set. After the training is completed, the model is verified by the verification samples, and the improved Faster RCNN model that has been trained and verified is obtained.

具体实践过程中，吊弦缺陷识别不仅要求高召回率和高精准率，并且要求图像实时推理不少于12fps。尽管现有很多算法实时性能够满足吊弦缺陷识别要求，但是在召回率和精准率方面不够，严重影响高铁安全运营。针对该问题，本申请提出了一种将CNN与transformer相结合的新型模型来解决吊弦缺陷识别问题。首先提出了一个高性能和高效率的多头注意力，来提高transformer的计算效率和较少权重参数，然后构建了CNN与transformer融合新范式。In the specific practice process, the identification of hanging string defects not only requires high recall rate and high precision rate, but also requires real-time image reasoning of not less than 12fps. Although the real-time performance of many existing algorithms can meet the requirements of suspension string defect identification, they are not enough in terms of recall rate and precision rate, which seriously affects the safe operation of high-speed rail. To solve this problem, this application proposes a new model combining CNN and transformer to solve the problem of hanging string defect recognition. Firstly, a high-performance and high-efficiency multi-head attention is proposed to improve the computational efficiency and less weight parameters of the transformer, and then a new paradigm of CNN and transformer fusion is constructed.

具体处理过程如下：The specific process is as follows:

1、高性能与高效率的多头自注意力机制(HE-MHSA):1. High-performance and high-efficiency multi-head self-attention mechanism (HE-MHSA):

自注意力机制是一种基于缩放点积的注意力机制，其将原始序列的输入向量投影至三个不同的空间，作为query、key和value，即对应查询向量Q、键向量K和值向量V，每个序列中的输入都会对整个序列进行注意力计算，包括自身。The self-attention mechanism is an attention mechanism based on scaling dot products, which projects the input vector of the original sequence into three different spaces as query, key, and value, that is, the corresponding query vector Q, key vector K, and value vector V, the input in each sequence performs attention calculations on the entire sequence, including itself.

原始transformer的自注意力，随着输入图像分辨率增大，计算代价和内存消耗几何数增长。许多工作通过降低输入K和V向量的空间分辨率来减少SA的计算代价，然而这些操作存在重要特征信息丢失，甚至引入新的噪声，导致后续的MHSA表征能力衰减。The self-attention of the original transformer, as the resolution of the input image increases, the computational cost and memory consumption increase geometrically. Many works reduce the computational cost of SA by reducing the spatial resolution of the input K and V vectors. However, these operations lose important feature information and even introduce new noise, which leads to the attenuation of the subsequent MHSA characterization ability.

针对上面的问题，我们提出了HE-MHSA，不仅降低计算和内存开销，而且不影响模型的表征能力。具体实现过程如下：首先，我们构建一个新型的空间降维操作，如公式(6)所示。其目的是降低向量K和V的维度，本申请提出的空间降维操作既不丢失特征信息并且又不会引入新噪声。In response to the above problems, we propose HE-MHSA, which not only reduces the calculation and memory overhead, but also does not affect the representation ability of the model. The specific implementation process is as follows: First, we construct a new type of spatial dimensionality reduction operation, as shown in formula (6). The purpose is to reduce the dimensions of the vectors K and V, and the spatial dimensionality reduction operation proposed in this application neither loses feature information nor introduces new noise.

其中，SR(.),AVG(.),DW(.)分别表示通常空间降维操作，平均池化操作，可变卷积操作。这三类操作对键向量K执行降维操作，分别获得了经过通常空间降维操作、平均池化操作和可变卷积操作后的向量K_sr，K_avg，K_dw。分别对向量V执行上述三类降维操作，分别获得经过通常空间降维操作、平均池化操作和可变卷积操作后的向量V_sr，V_avg，V_dw。然后将降维后的向量K_sr，K_avg，K_dw相加获得低分辨率的向量K*，即K_sr+K_avg+K_dw→K*，同时将降维后的向量V_sr，V_avg，V_dw相加获得低分辨率的向量V*，即V_sr+V_avg+V_dw→V*。Among them, SR(.), AVG(.), and DW(.) respectively represent the usual space dimensionality reduction operation, average pooling operation, and variable convolution operation. These three types of operations perform dimensionality reduction operations on the key vector K, and obtain vectors K _sr , K _avg , and K _dw after normal space dimensionality reduction operations, average pooling operations, and variable convolution operations, respectively. Perform the above three types of dimensionality reduction operations on the vector V to obtain the vectors V _sr , V _avg , and V _dw after the usual space dimensionality reduction operation, average pooling operation, and variable convolution operation. Then add the dimension-reduced vectors K _sr , K _avg , K _dw to obtain a low-resolution vector K*, that is, K _sr +K _avg +K _dw →K*, and at the same time reduce the dimensionality-reduced vectors V _sr , V _avg and V _dw are added together to obtain a low-resolution vector V*, that is, V _sr +V _avg +V _dw →V*.

其次，将z、V*和K*作为输入，应用到MHSA，获得特征Z，如公式(7)所示：Second, z, V* and K* are used as input and applied to MHSA to obtain feature Z, as shown in formula (7):

其中，z表示特征图像，作为transformer块的查询输入，低分辨率的向量V*和K*分别表示key和value值输入，输出特征Z是通过transformer块处理获得。Among them, z represents the feature image, which is used as the query input of the transformer block, and the low-resolution vectors V* and K* represent the key and value input, respectively, and the output feature Z is obtained through the processing of the transformer block.

2、CNN与ViT融合新范式2. The new paradigm of CNN and ViT fusion

FasterRCNN的主干网络由四个stage构成，如图9中a6部分附图所示，每个stage由多个重叠的CNN block组成。CNN与ViT传统的融合网络如图9中的b6、c6、d6、e6和f6部分附图所示，其中图9的b6、c6、d6部分附图所示的网络是将FasterRCNN的主干网络(backbone)最后一个或几个阶段通过多个重叠的transformer block进行替换，构建新的检测器。尽管这些方式在图像识别效果方面有较大的提升，但是增加了计算复杂度和权重参数，导致推理延迟较大，无法满足高速列车上实时识别要求。图9中的e6和f6部分附图所示的网络在延迟(latency)与准确性(accuracy)方面做了一个折中trade-off，但是无法满足吊弦识别的高召回率和高精度需求。The backbone network of FasterRCNN consists of four stages, as shown in the figure a6 in Figure 9, and each stage consists of multiple overlapping CNN blocks. The traditional fusion network of CNN and ViT is shown in the drawings of b6, c6, d6, e6 and f6 in Figure 9, and the network shown in the drawings of b6, c6 and d6 in Figure 9 is the backbone network of FasterRCNN ( backbone) The last one or several stages are replaced by multiple overlapping transformer blocks to build a new detector. Although these methods have greatly improved the image recognition effect, they increase the computational complexity and weight parameters, resulting in a large reasoning delay, which cannot meet the real-time recognition requirements on high-speed trains. The network shown in part e6 and f6 in Figure 9 has made a trade-off trade-off in terms of latency and accuracy, but it cannot meet the high recall rate and high precision requirements of hanging string recognition.

参照图9中的g6部分附图所示，基于上述问题，本申请提出了一个新型的高精度高效率的CNN与ViT融合新范式，称之为multi-block cross-fusion混合网络，即多块交叉融合混合网络(MCHN)，网络采用(CBxN_C+TBxN_T)xL方式替换了现有next-ViT的(CB xN+TB x1)xL。Referring to the attached drawing of part g6 in Figure 9, based on the above problems, this application proposes a new high-precision and high-efficiency new fusion paradigm of CNN and ViT, which is called multi-block cross-fusion hybrid network, that is, multi-block Cross Convergence Hybrid Network (MCHN), the network uses (CBxN _C +TBxN _T )xL to replace the (CB xN+TB x1)xL of the existing next-ViT.

其中，本申请的MHCN与next-ViT主要区别体现在如下三点:Among them, the main difference between the MHCN of this application and next-ViT is reflected in the following three points:

1)本申请每个阶段的transformer block的数量是可变的，不是固定的1，这样更进一步能够捕获全局特征和长范围的依赖；1) The number of transformer blocks in each stage of this application is variable, not fixed at 1, so that it can further capture global features and long-range dependencies;

2)本申请MHCN的CNN block采用的约束的可变卷替换了标准可变卷积；2) The constrained variable convolution adopted by the CNN block of MHCN in this application replaces the standard variable convolution;

3)本申请transformer block采用了一个高性能与高效率的多头自注意力机制替代了原始transformer的MHSA。3) The transformer block of this application uses a high-performance and high-efficiency multi-head self-attention mechanism to replace the MHSA of the original transformer.

基于上述改进，本申请构建的MHCN网络模型体现出2个方面优点：1)不仅能够满足高精度和高召回率，而且满足高速度下的实时推理要求。更进一步，该方式根据吊弦检测的要求，更加合理在精度和效率方面做一个更好的折中，满足实际场景要求。2)MHCN网络兼容所有传统方式，传统模式都是图9中g6部分附图所示网络结构的特例。例如当NT＝1时，则MHCN网络演变成图9中e6部分附图所示的网络。而且本申请MHCN在stage2到stage4中，CNNblock与transformer block的分配比例更加灵活，可以根据实际场景需求来配置相关的模块分配比例。Based on the above improvements, the MHCN network model constructed in this application has two advantages: 1) It can not only meet the high precision and high recall rate, but also meet the real-time reasoning requirements at high speed. Furthermore, according to the requirements of hanging string detection, this method is more reasonable and makes a better compromise in terms of accuracy and efficiency to meet the requirements of actual scenarios. 2) The MHCN network is compatible with all traditional modes, and the traditional mode is a special case of the network structure shown in part g6 of Fig. 9 . For example, when NT=1, the MHCN network evolves into the network shown in part e6 in FIG. 9 . Moreover, in the MHCN of this application, in stage2 to stage4, the distribution ratio of CNN block and transformer block is more flexible, and the relevant module distribution ratio can be configured according to actual scene requirements.

本申请通过固定图9中stage1，stage2和stage4中的模块总数量，来研究CNNblock与transformer block合理的分配比例，借此进一步来优化模型的性能。为了更加公平的对比，所有模型在相同的stage中采用近似相等的block数目。通过大量实验验证，MHCN比next-ViT在吊弦识别和推理速度方面更加具有优势。MHCN模型中，CNN block与transformer block最优配置比例是7:2，模型性能表现更好，延迟较小。其中，按照最优配置比例分配将CB块与TB块交叉融合所形成的网络结构如图10所示，将模块以7个CB块与2个TB块为一组的排列方式，将多组模块依次串联形成一个stage。This application studies the reasonable distribution ratio of CNN block and transformer block by fixing the total number of modules in stage1, stage2 and stage4 in Figure 9, so as to further optimize the performance of the model. For a fairer comparison, all models use approximately equal number of blocks in the same stage. Through a large number of experiments, MHCN has more advantages than next-ViT in terms of hanging string recognition and inference speed. In the MHCN model, the optimal configuration ratio of the CNN block to the transformer block is 7:2, the model performance is better and the delay is smaller. Among them, the network structure formed by the cross fusion of CB blocks and TB blocks according to the optimal allocation ratio is shown in Figure 10. The modules are arranged in a group of 7 CB blocks and 2 TB blocks, and multiple groups of modules They are sequentially connected in series to form a stage.

具体的，参照图11所示，本申请提供的CB块的具体结构包括依次连接的一个3×3的C-DCV层、一个Batch Norm层和一个Dy-Relu层。参照图12所示，本申请提供的TB块包括依次连接的一个HE-MHSA层和一个FFN层。Specifically, as shown in FIG. 11 , the specific structure of the CB block provided by the present application includes a 3×3 C-DCV layer, a Batch Norm layer, and a Dy-Relu layer connected in sequence. Referring to FIG. 12 , the TB block provided by the present application includes an HE-MHSA layer and an FFN layer connected in sequence.

为进一步对本申请提出的方法进行验证，本申请通过设置相应的实验环境对方法的性能进行实验验证，具体过程如下：In order to further verify the method proposed in this application, the application verifies the performance of the method by setting up a corresponding experimental environment. The specific process is as follows:

(1)实验环境(1) Experimental environment

累积了大约14300张样本数据，包括正常吊弦，吊弦断，吊弦脱，吊弦弯曲，吊弦散股，吊弦安装不规范7大类样本，每类样本数目如表1所示：About 14,300 pieces of sample data have been accumulated, including 7 types of samples, including normal hanging strings, broken hanging strings, dropped hanging strings, bent hanging strings, loose strands of hanging strings, and irregular installation of hanging strings. The number of samples in each category is shown in Table 1:

表1吊弦样本数目表Table 1 Number of hanging string samples

其中，Trainset占用80％，testset占用20％。模型loss算法采用focal loss方式，来解决难易样本和不均衡样本。Among them, Trainset occupies 80%, and Testset occupies 20%. The model loss algorithm uses the focal loss method to solve difficult and unbalanced samples.

由于样本不均衡，导致模型更加倾向与样本数量多的类型，因此，针对每类样本数目不同，采用不同的新增策：样本多的种类追加20％样本，增强方法采用L-cutout算法，目的是提高遮挡识别；样本量少的采用传统方式(e.g.,roration,filp,etc)和L-cutout方式相结合，其中传统方式采用1:1的比例增强，L-cutout采用20％的增强。Due to the unbalanced samples, the model is more inclined to the type with a large number of samples. Therefore, according to the different number of samples of each type, different new strategies are adopted: 20% samples are added for the type with a large number of samples, and the L-cutout algorithm is used for the enhancement method. It is to improve occlusion recognition; if the sample size is small, the traditional method (e.g., roration, filp, etc.) is combined with the L-cutout method. The traditional method uses a 1:1 ratio enhancement, and the L-cutout uses a 20% enhancement.

(2)图像增强实验(cutout，mixup，cutmix，与L-cutout)进行比较；(2) Image enhancement experiments (cutout, mixup, cutmix, and L-cutout) are compared;

实验环境：我们使用吊弦作为训练集和验证集,fasterRCNN作为我们实验网络。将L-cutout增强算法分别用cutout，mixup，cutmix替换，将cutout，mixup，cutmix增强的样本在同一模型训练，评估效果如表2所示：Experimental environment: We use hanging strings as the training set and verification set, and fasterRCNN as our experimental network. Replace the L-cutout enhancement algorithm with cutout, mixup, and cutmix respectively, and train the samples enhanced by cutout, mixup, and cutmix in the same model. The evaluation results are shown in Table 2:

表2，增强算法的效果比对表Table 2, the effect comparison table of the enhanced algorithm

从表2中很容易看出，数据增强方式不同，效果不一样。L-cutout明显优于cutout，mixup，cutmix。主要原因是，cutout，mixup和cutmix采用随机增强，将吊弦关键特征遮挡或者覆盖，导致识别精度下降，而L-cutout算法在数据增强时，避免了关键吊弦关键信息缺失。It is easy to see from Table 2 that different data enhancement methods have different effects. L-cutout is significantly better than cutout, mixup, cutmix. The main reason is that cutout, mixup and cutmix use random enhancement to block or cover the key features of the suspension string, resulting in a decrease in recognition accuracy, while the L-cutout algorithm avoids the loss of key suspension string key information when the data is enhanced.

(3)对比DCV2和C-DCV(3) Comparison of DCV2 and C-DCV

我们对比DCV2和C-DCV对吊弦缺陷识别的影响。将DCV2和C-DCV分别替换主干网络stage2，stage3，stage4中的3x3卷积神经网络。将替换后两种网络进行训练，验证各自的召回率和精准率。DCV2和C-DCV的采样效果对比如表3所示，We compared the effect of DCV2 and C-DCV on the identification of suspension string defects. Replace the 3x3 convolutional neural network in the backbone network stage2, stage3, and stage4 with DCV2 and C-DCV, respectively. The latter two networks will be replaced for training, and their respective recall and precision rates will be verified. The sampling effect comparison of DCV2 and C-DCV is shown in Table 3.

表3DCV2和C-DCV的采样效果比对表Table 3 Sampling effect comparison table of DCV2 and C-DCV

从表3可以看出，尽管C-DCV与DCV2在推理效率是相等的，但是C-DCV对吊弦识别精度和召回率更高。主要由于C-DCV在采样过程中，更加倾向吊弦固有特性，即offset学习了的高宽比例约束，提升了模型的吊弦识别能力。It can be seen from Table 3 that although C-DCV and DCV2 are equal in inference efficiency, C-DCV has higher precision and recall for hanging string recognition. The main reason is that C-DCV is more inclined to the inherent characteristics of the hanging string during the sampling process, that is, the height-to-width ratio constraint learned by offset, which improves the model's ability to identify the hanging string.

(4)融合新范式对模型的影响：(4) Influence of the new paradigm on the model:

为了公平的对比，所有模型在相同的stage中采用近似相等的block数目，其中专干网络stage1，stage2和stage4的blocks数目分别固定为3,4,3，我们调整stage3块数和CNN块与transformer块的分配比例来对比MCHN与传统的混合网络。实验采用ImageNet-22K作为模型预训练pre-training，训练300epoach。采用dropper样本进行微调训练和评估。For a fair comparison, all models use an approximately equal number of blocks in the same stage. The number of blocks in the dedicated network stage1, stage2, and stage4 is fixed at 3, 4, and 3, respectively. We adjust the number of stage3 blocks and the CNN block and transformer The allocation ratio of blocks is used to compare MCHN with traditional hybrid networks. The experiment uses ImageNet-22K as the model pre-training pre-training, training 300epoach. Use dropper samples for fine-tuning training and evaluation.

相同的测试环境下，MCHN与传统混合网络进行对比，其中主干网络中的stage1，stage2，stage4的block相等。Under the same test environment, MCHN is compared with the traditional hybrid network, where the blocks of stage1, stage2, and stage4 in the backbone network are equal.

表4MCHN与传统混合网络的识别效果比对表Table 4 Comparison table of recognition effect between MCHN and traditional hybrid network

依据表4呈现的实验结果，通过传统的混合网络进行对比，不难看出，单调地将stage3整个阶段的cnn blocks通过transformer blocks替换，识别效果方面没有显著的提升，同时增加了推理时延。在阶段里采用cnn block与transformer block交叉融合，对性能方面有提升，推理延迟未显著增加，但是CNN block与transformer block在stage3分配比例不是最优的，无法发挥融合效果。MCHN在cnn block与transformer block分配比例方面做了优化，7:2的比例能发挥他们融合的优点，识别效果有大幅度的提升，并且推理时延较小。According to the experimental results presented in Table 4, compared with the traditional hybrid network, it is not difficult to see that monotonously replacing the cnn blocks of the entire stage3 with transformer blocks does not significantly improve the recognition effect, while increasing the reasoning delay. In the stage, the cross-fusion of CNN block and transformer block is used, which improves the performance and does not increase the reasoning delay significantly. However, the distribution ratio of CNN block and transformer block in stage3 is not optimal, and the fusion effect cannot be exerted. MCHN has optimized the allocation ratio of cnn block and transformer block. The ratio of 7:2 can take advantage of their fusion, the recognition effect is greatly improved, and the reasoning delay is small.

本申请将CNN与transformer进行了交叉深度融合，所生成的多块交叉混合网络还兼容现有融合模式，且针对吊弦内在特性，本申请还提出了一种基于的基于约束的可变卷积和基于限制的L-cutout数据增强型算法。通过大量实验证明，本申请构建的多块交叉混合网络在复杂应用场景下不仅能大幅度提升吊弦识别的缺陷召回率和精准率，且识别延迟较小。In this application, CNN and transformer are cross-deeply fused, and the generated multi-block cross-mixed network is also compatible with the existing fusion mode, and for the inherent characteristics of hanging strings, this application also proposes a constraint-based variable convolution based on And a restriction-based L-cutout data augmentation algorithm. A large number of experiments have proved that the multi-block cross-hybrid network constructed in this application can not only greatly improve the defect recall rate and precision rate of hanging string recognition in complex application scenarios, but also have a small recognition delay.

以上显示和描述了本发明的基本原理和主要特征和本发明的优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是说明本发明的原理，在不脱离本发明精神和范围的前提下，本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明范围内。本发明要求保护的范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the industry should understand that the present invention is not limited by the above-mentioned embodiments. What are described in the above-mentioned embodiments and the description only illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will also have Variations and improvements are possible, which fall within the scope of the claimed invention. The scope of the claimed invention is defined by the appended claims and their equivalents.

Claims

1. The utility model provides a catenary dropper defect detection method based on CNN and Transformer fusion, which is characterized by comprising the following steps:

collecting a hanger image of the overhead line system and carrying out image enhancement processing on the hanger image to obtain a hanger defect sample set;

constructing a convolution module by using a constraint-based variable convolution network, constructing a self-attention module according to an improved high-efficiency multi-head self-attention mechanism, and performing deep fusion on the convolution module and the self-attention module based on an optimal module allocation proportion to generate a multi-block cross-mixed network;

improving the FaterRCNN network by utilizing the multi-block cross-mixed network, and training and verifying the improved FaterRCNN network based on the dropper defect sample set to obtain a trained FaterRCNN improved model;

and deploying the FasterRCNN improved model to a dropper detection device, capturing a dropper image of the overhead line system of the high-speed railway in real time, inputting the dropper image into the dropper detection device for dropper defect detection, and identifying the dropper defect in the overhead line system.

2. The method for detecting the catenary dropper defects based on the fusion of the CNN and the Transformer according to claim 1, wherein the steps of collecting the catenary dropper images and performing image enhancement processing on the dropper images to obtain a dropper defect sample set comprise:

acquiring a catenary dropper image in the running process of a high-speed railway;

performing image enhancement processing on the contact network dropper images based on the improved image enhancement algorithm, randomly generating N mask areas in any dropper image, and filtering mask areas for shielding the key features of the droppers according to the number N of the mask areas to obtain a dropper defect sample set;

the set of dropper defect samples is divided into training samples and verification samples.

3. The method for detecting the defects of the hanging strings of the overhead line system based on the fusion of the CNN and the Transformer according to claim 2, wherein the mask area which shields the key features of the hanging strings is filtered according to the number N of the mask areas, and specifically comprises the following steps:

if N=1, filtering out a single mask area which completely shields the dropper in the dropper image;

if N=2, filtering two mask areas which cover the upper end and the lower end of the dropper in the dropper image;

if N is more than or equal to 3, filtering out a single mask area which completely shields the dropper in the dropper image, and filtering out any two mask areas which shield the upper end and the lower end of the dropper in the dropper image.

4. The method for detecting the catenary dropper defect based on the fusion of the CNN and the Transformer according to claim 1, wherein the constructing the convolution module by using the constraint-based variable convolution network, constructing the self-attention module according to the improved high-efficiency multi-head self-attention mechanism, and performing deep fusion on the convolution module and the self-attention module based on the optimal module allocation proportion to generate the multi-block cross-mixed network specifically comprises:

according to the height-width ratio of the height-width ratio hanging string, adopting a constraint relation to constrain the height-width coordinate ratio of the sampling point position in the variable convolution network, simultaneously limiting the height-width coordinate of the sampling point position not to exceed the height-width of the input feature map, obtaining a constraint-based variable convolution network, and constructing a convolution module through the constraint-based variable convolution network;

performing space dimension reduction operation on a key vector K and a value vector V in an original transducer self-attention mechanism respectively to obtain an improved high-efficiency multi-head self-attention mechanism and constructing a self-attention module according to the improved high-efficiency multi-head self-attention mechanism;

based on a backbone network architecture in a FasterRCNN model, the number of convolution modules and the number of self-attention modules in the backbone network architecture are distributed according to an optimal module distribution proportion, and the convolution modules and the self-attention modules are subjected to new pattern fusion to generate a multi-block cross-mixed network.

5. The method for detecting the catenary dropper defect based on the fusion of the CNN and the Transformer according to claim 4, wherein the optimal module distribution ratio is specifically as follows: number of convolution modules: number of self-care modules = 7:2.

6. The method for detecting the dropper defects of the catenary based on the fusion of the CNN and the Transformer according to claim 1, wherein the method for improving the FasterRCNN network by using the multi-block cross-mixed network, and training and verifying the improved FasterRCNN network based on the dropper defect sample set, so as to obtain a trained FasterRCNN improvement model, comprises the following steps:

replacing the backbone network architecture of the FaterRCNN network with the network architecture of the multi-block cross-hybrid network to obtain an improved FaterRCNN model;

training the improved FaterRCNN model based on the hanger defect sample set training samples, and verifying the model by using a verification sample after training is completed to obtain a FaterRCNN improved model after training and verification.