CN117058367A

CN117058367A - Semantic segmentation method and device for high-resolution remote sensing image building

Info

Publication number: CN117058367A
Application number: CN202310811719.2A
Authority: CN
Inventors: 陈俊; 陈经纬; 李宇; 夏玮; 张洪群; 吴业炜
Original assignee: Aerospace Information Research Institute of CAS
Current assignee: Aerospace Information Research Institute of CAS
Priority date: 2023-07-04
Filing date: 2023-07-04
Publication date: 2023-11-14

Abstract

The invention provides a semantic segmentation method and device for a high-resolution remote sensing image building, wherein the method comprises the following steps: preprocessing a remote sensing image to obtain a target image; and inputting the target image into the segmentation model to obtain a building segmentation result output by the segmentation model. According to the high-resolution remote sensing image building semantic segmentation method and device, the first feature extraction layer with the pyramid structure is used for extracting the refined feature images, the middle-layer feature images of various scale features are fused, convolution operations with different scales are executed in parallel through the refined feature images with reduced sizes, features unified into the same scale are fused into the deep feature images, and images obtained by fusing the feature images of three semantic levels are restored into building segmentation results. The convolution receptive field can be increased layer by layer under the condition of no information loss, so that the output of each convolution contains semantic information in a larger range, the definition of image semantic segmentation can be improved, and a building can be accurately identified.

Description

High-resolution remote sensing image building semantic segmentation method and device

技术领域Technical field

本发明涉及遥感处理技术领域，尤其涉及一种高分辨率遥感影像建筑物语义分割方法及装置。The invention relates to the technical field of remote sensing processing, and in particular to a method and device for semantic segmentation of buildings in high-resolution remote sensing images.

背景技术Background technique

建筑物作为生产生活中必不可少的场所，是基础的地理信息要素之一。精确地获得建筑物信息对城市规划、地图制作更新、环境监测以及建设“数字城市”和“智慧城市”等诸多领域具有重要的应用价值。所以建筑物提取对于高分辨遥感对地观测应用起着非常重要的作用,一直是遥感信息处理领域研究的热点之一。相比于自然场景图像背景，高分辨率遥感影像由于其尺度大、成像方式不同，往往存在背景复杂、地物种类多等情况，导致遥感影像上的建筑物提取比较困难。同时相比于自然地物，建筑物作为人工地物的典型代表，其特征更为复杂多样，又受到阴影、遮挡等问题的干扰，使用低层特征提取或图像分割方法获得的结果往往不能令人满意。因此，研究一种在高分辨率遥感影像中高精度、自动化的建筑物提取方法非常重要。As an essential place in production and life, buildings are one of the basic geographical information elements. Accurately obtaining building information has important application value in many fields such as urban planning, map production and updating, environmental monitoring, and the construction of "digital cities" and "smart cities". Therefore, building extraction plays a very important role in the application of high-resolution remote sensing to earth observation, and has always been one of the hot topics in the field of remote sensing information processing. Compared with the background of natural scene images, high-resolution remote sensing images often have complex backgrounds and various types of land species due to their large scale and different imaging methods, making it difficult to extract buildings from remote sensing images. At the same time, compared with natural features, buildings, as typical representatives of artificial features, have more complex and diverse features, and are interfered by problems such as shadows and occlusions. The results obtained by using low-level feature extraction or image segmentation methods are often unsatisfactory. satisfy. Therefore, it is very important to study a high-precision and automated building extraction method in high-resolution remote sensing images.

基于遥感影像提取建筑物信息的方法不断涌现，根据特征构造规则可分为两类：传统方法和基于深度学习的方法。传统方法依据图像中地物的光谱、边缘、形状和阴影等特性，通过人工设计相应的特征来提取建筑物。传统的建筑物自动提取方法在较低分辨率和复杂度的遥感图像上表现良好，但是在高分遥感图像上暴露出算法鲁棒性差、准确度低、容易受人为干扰和难以逐像素精确定位等问题。近年来，随着人工智能、机器学习等技术的发展，基于深度学习的高分遥感图像建筑物自动提取方法得到了广泛的关注和研究。相比于传统方法，基于深度学习的方法具有提取能力强，人为干预少，算法鲁棒性高等优点。但是，现有的基于深度学习的处理方法中连续多次的下采样操作，造成了空间信息的丢失，使模型产生的特征图无法获得全局信息，提取建筑物的精度受到限制。虽然，也提出利用残差连接深层特征与浅层特征来增强特征的表达能力，但跳跃连接中可能会引入冗余信息，会造成提取精度的降低。可见，无论是传统方法和基于深度学习的方法都无法同时兼顾建筑物分割提取的精度和效率。Methods for extracting building information based on remote sensing images continue to emerge, which can be divided into two categories according to feature construction rules: traditional methods and methods based on deep learning. Traditional methods extract buildings by manually designing corresponding features based on the spectrum, edge, shape, and shadow characteristics of the objects in the image. Traditional automatic building extraction methods perform well on remote sensing images of lower resolution and complexity, but on high-resolution remote sensing images they are exposed to poor robustness, low accuracy, vulnerability to human interference, and difficulty in precise positioning on a pixel-by-pixel basis. And other issues. In recent years, with the development of artificial intelligence, machine learning and other technologies, automatic extraction methods of buildings from high-scoring remote sensing images based on deep learning have received widespread attention and research. Compared with traditional methods, methods based on deep learning have the advantages of strong extraction ability, less human intervention, and high algorithm robustness. However, the continuous multiple down-sampling operations in existing deep learning-based processing methods cause the loss of spatial information, making it impossible to obtain global information in the feature maps generated by the model, and the accuracy of extracting buildings is limited. Although it is also proposed to use residuals to connect deep features and shallow features to enhance the expressive ability of features, skip connections may introduce redundant information, which will cause a reduction in extraction accuracy. It can be seen that neither traditional methods nor methods based on deep learning can take into account both the accuracy and efficiency of building segmentation extraction.

发明内容Contents of the invention

本发明提供一种高分辨率遥感影像建筑物语义分割方法及装置，用以解决现有技术中无法同时兼顾建筑物分割提取的精度和效率的缺陷。The present invention provides a method and device for semantic segmentation of buildings in high-resolution remote sensing images to solve the defect in the existing technology that cannot simultaneously take into account the accuracy and efficiency of building segmentation and extraction.

本发明提供一种高分辨率遥感影像建筑物语义分割方法，包括：The present invention provides a method for semantic segmentation of buildings in high-resolution remote sensing images, including:

对遥感图像进行预处理，获取目标图像；其中，所述遥感图像的观测内容至少包括建筑物；Preprocess the remote sensing image to obtain the target image; wherein the observation content of the remote sensing image at least includes buildings;

将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果；Input the target image to the segmentation model and obtain the building segmentation result output by the segmentation model;

其中，所述分割模型是基于样本遥感图像，以及所述样本遥感图像对应标注的建筑物区域标签训练得到的；所述分割模型包括：Wherein, the segmentation model is trained based on sample remote sensing images and building area labels corresponding to the sample remote sensing images; the segmentation model includes:

第一特征抽取层，用于将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图；The first feature extraction layer is used to perform step-by-step downsampling of the refined feature map obtained by the target image, and then step-by-step upsample and fuse the refined feature map to obtain a mid-level feature map;

第二特征抽取层，用于对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图；The second feature extraction layer is used to perform convolution operations on the refined feature map at different scales to obtain a deep feature map that is fused from features of the same scale;

特征融合层，用于将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果；A feature fusion layer, used to fuse the low-level feature map, the middle-level feature map and the high-level feature map to obtain the building segmentation result segmented from the target image;

其中，所述第一特征抽取层包括自下而上级联的多个下采样子层，以及与各所述下采样子层自上而下对应级联的多个层级的上采样子层；所述低层特征图是对所述目标图像在自下而上的第一个下采样子层进行特征提取得到的。Wherein, the first feature extraction layer includes a plurality of down-sampling sub-layers cascaded from bottom to top, and a plurality of levels of up-sampling sub-layers cascaded correspondingly from top to bottom to each of the down-sampling sub-layers; The low-level feature map is obtained by extracting features of the target image in the first down-sampling sub-layer from bottom to top.

根据本发明提供的一种高分辨率遥感影像建筑物语义分割方法，所述将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图，包括：According to a method for semantic segmentation of buildings in high-resolution remote sensing images provided by the present invention, after step-by-step downsampling of the target image is performed to obtain a refined feature map, the refined feature map is progressively up-sampled. Sampling and fusion to obtain mid-level feature maps, including:

通过注意力机制依次在所述各下采样子层中执行相应尺度的卷积操作和池化操作，得到目标特征图；The attention mechanism is used to sequentially perform convolution operations and pooling operations of corresponding scales in each of the downsampling sub-layers to obtain the target feature map;

对所述目标特征图逐级上采样至各下采样子层对应的原始尺寸后进行特征融合，得到所述中层特征图；The target feature map is gradually upsampled to the original size corresponding to each downsampling sub-layer and then feature fusion is performed to obtain the mid-level feature map;

对所述目标特征图执行空洞卷积操作和池化操作，得到所述细化特征图；Perform dilated convolution operations and pooling operations on the target feature map to obtain the refined feature map;

其中，所述细化特征图和所述目标特征图的二维尺寸相同，且细化特征图的通道数大于所述特征图的通道数。Wherein, the two-dimensional dimensions of the refined feature map and the target feature map are the same, and the number of channels of the refined feature map is greater than the number of channels of the feature map.

根据本发明提供的一种高分辨率遥感影像建筑物语义分割方法，每一下采样子层为Bottleneck结构。According to the semantic segmentation method of high-resolution remote sensing images of buildings provided by the present invention, each downsampling sub-layer is a Bottleneck structure.

根据本发明提供的一种高分辨率遥感影像建筑物语义分割方法，所述对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图，包括：According to a method for semantic segmentation of buildings in high-resolution remote sensing images provided by the present invention, the convolution operation of different scales is performed on the refined feature map to obtain a deep feature map that is fused with features of the same scale, including:

对所述细化特征图分别进行1*1的卷积操作，空洞率等差递增的n个卷积核进行卷积操作以及全局平均池化操作，得n+2个二维尺寸和通道数相同的特征图；Perform a 1*1 convolution operation on the refined feature maps, perform convolution operations on n convolution kernels with equal increasing hole rates, and perform global average pooling operations to obtain n+2 two-dimensional sizes and channel numbers. The same feature map;

将n+2个二维尺寸和通道数相同的特征图进行融合成所述深层特征图。Fusion of n+2 feature maps with the same two-dimensional size and channel number into the deep feature map.

根据本发明提供的一种高分辨率遥感影像建筑物语义分割方法，所述空洞率等差递增的n个卷积核中的任一卷积核的组成方式为将深度卷积核与标准卷积核合并。According to a method for semantic segmentation of high-resolution remote sensing images of buildings provided by the present invention, any one of the n convolution kernels with equally increasing void rates is composed of a depth convolution kernel and a standard convolution kernel. Merge cores.

根据本发明提供的一种高分辨率遥感影像建筑物语义分割方法，所述将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果，包括：According to a method for semantic segmentation of buildings in high-resolution remote sensing images provided by the present invention, after fusing the low-level feature map, the mid-level feature map and the high-level feature map, all the features segmented from the target image are obtained. The building segmentation results include:

将所述高层特征图进行上采样后与所述中层特征图进行拼接融合，获取第一特征融合图；The high-level feature map is upsampled and then spliced and fused with the mid-level feature map to obtain a first feature fusion map;

将二维尺寸和通道数与所述低层特征图一致的第一特征融合图与低层特征图进行拼接融合，获取第二特征融合图；Splice and fuse the first feature fusion map whose two-dimensional size and channel number are consistent with the low-level feature map and the low-level feature map to obtain the second feature fusion map;

将所述第二特征融合图的二维尺寸和通道数恢复至与所述目标图像相等，得到所述建筑物分割结果。The two-dimensional size and channel number of the second feature fusion map are restored to be equal to the target image, and the building segmentation result is obtained.

根据本发明提供的一种高分辨率遥感影像建筑物语义分割方法，所述对遥感图像进行预处理，获取目标图像，包括：According to a method for semantic segmentation of buildings in high-resolution remote sensing images provided by the present invention, preprocessing the remote sensing images to obtain the target image includes:

将所述遥感图像裁剪为大小一致的子图，并进行随机地旋转、透视变换、亮度变换以及颜色变换等预处理操作，获取所述目标图像。The remote sensing image is cut into sub-images of the same size, and preprocessing operations such as random rotation, perspective transformation, brightness transformation, and color transformation are performed to obtain the target image.

本发明还提供一种高分辨率遥感影像建筑物语义分割装置，包括：The invention also provides a device for semantic segmentation of high-resolution remote sensing images of buildings, including:

预处理模块，用于对遥感图像进行预处理，获取目标图像；其中，所述遥感图像的观测内容至少包括建筑物；A preprocessing module, used to preprocess remote sensing images and obtain target images; wherein the observation content of the remote sensing images at least includes buildings;

分割模块，用于将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果；A segmentation module, used to input the target image to a segmentation model and obtain the building segmentation result output by the segmentation model;

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述高分辨率遥感影像建筑物语义分割方法。The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, it achieves any of the above high resolutions. Semantic segmentation method of buildings in remote sensing images.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述高分辨率遥感影像建筑物语义分割方法。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements any one of the above high-resolution remote sensing image building semantic segmentation methods.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述高分辨率遥感影像建筑物语义分割方法。The present invention also provides a computer program product, which includes a computer program. When the computer program is executed by a processor, the computer program implements any one of the above high-resolution remote sensing image building semantic segmentation methods.

本发明提供的高分辨率遥感影像建筑物语义分割方法及装置，基于多通道数的遥感图像，获取目标图像，并通过具有金字塔结构的第一特征抽取层提取出细化特征图，以及融合多种尺度特征的中层特征图，通过尺寸缩小的细化特征图并行执行不同尺度的卷积操作，将统一成相同尺度的特征融合成深层特征图，再将三种语义层级的特征图融合得到的图像还原出建筑物分割结果。能够在不损失信息的情况下，逐层增大卷积的感受野，使每个卷积的输出都包含较大范围的语义信息，能提高图像语义分割的精细性，进而提高建筑物识别的准确性。The method and device for semantic segmentation of high-resolution remote sensing images of buildings provided by the present invention are based on multi-channel remote sensing images to obtain target images, extract refined feature maps through the first feature extraction layer with a pyramid structure, and fuse multiple The mid-level feature map of three-scale features performs convolution operations of different scales in parallel through the reduced-size refined feature map, fuses the features unified into the same scale into a deep feature map, and then fuses the feature maps of the three semantic levels. The image restores the building segmentation results. It can increase the receptive field of convolution layer by layer without losing information, so that the output of each convolution contains a larger range of semantic information, which can improve the sophistication of image semantic segmentation, thereby improving the accuracy of building recognition. accuracy.

附图说明Description of the drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are of the present invention. For some embodiments of the invention, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting creative efforts.

图1是本发明提供的高分辨率遥感影像建筑物语义分割方法的流程示意图；Figure 1 is a schematic flow chart of the semantic segmentation method of high-resolution remote sensing images of buildings provided by the present invention;

图2是本发明提供的分割模型的子结构示意图之一；Figure 2 is one of the substructure schematic diagrams of the segmentation model provided by the present invention;

图3是本发明提供的分割模型的子结构示意图之二；Figure 3 is the second schematic diagram of the substructure of the segmentation model provided by the present invention;

图4是本发明提供的分割模型的子结构示意图之三；Figure 4 is the third schematic diagram of the substructure of the segmentation model provided by the present invention;

图5是本发明提供的分割模型的子结构示意图之四；Figure 5 is the fourth schematic diagram of the substructure of the segmentation model provided by the present invention;

图6是本发明提供的分割模型的总结构示意图；Figure 6 is a schematic diagram of the overall structure of the segmentation model provided by the present invention;

图7是本发明提供的高分辨率遥感影像建筑物语义分割方法的仿真结果示意图之一；Figure 7 is one of the schematic diagrams of the simulation results of the semantic segmentation method of high-resolution remote sensing images of buildings provided by the present invention;

图8是本发明提供的高分辨率遥感影像建筑物语义分割方法的仿真结果示意图之二；Figure 8 is the second schematic diagram of the simulation results of the semantic segmentation method for high-resolution remote sensing images of buildings provided by the present invention;

图9是本发明提供的高分辨率遥感影像建筑物语义分割方法的仿真结果示意图之三；Figure 9 is the third schematic diagram of the simulation results of the semantic segmentation method of high-resolution remote sensing images of buildings provided by the present invention;

图10是本发明提供的高分辨率遥感影像建筑物语义分割装置的结构示意图；Figure 10 is a schematic structural diagram of a semantic segmentation device for high-resolution remote sensing images of buildings provided by the present invention;

图11是本发明提供的电子设备的结构示意图。Figure 11 is a schematic structural diagram of the electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the present invention more clear, the technical solutions in the present invention will be clearly and completely described below in conjunction with the accompanying drawings of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

本申请中的术语“第一”、“第二”等是用于区别类似的对象，而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施，且“第一”、“第二”等所区分的对象通常为一类，并不限定对象的个数，例如第一对象可以是一个，也可以是多个。The terms "first", "second", etc. in this application are used to distinguish similar objects and are not used to describe a specific order or sequence. It is to be understood that the figures so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in orders other than those illustrated or described herein, and that "first," "second," etc. are distinguished Objects are usually of one type, and the number of objects is not limited. For example, the first object can be one or multiple.

应当理解，在本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明中所使用的那样，除非上下文清楚地指明其它情况，否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should be understood that the terminology used in the description of the present invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this invention, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise.

术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在，但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。The terms "comprises" and "comprising" indicate the presence of described features, integers, steps, operations, elements and/or components but do not exclude the presence of one or more other features, integers, steps, operations, elements, components and/or The existence or addition to its collection.

图1是本发明提供的高分辨率遥感影像建筑物语义分割方法的流程示意图。如图1所示，本发明实施例提供的高分辨率遥感影像建筑物语义分割方法，包括：步骤101、对遥感图像进行预处理，获取目标图像。Figure 1 is a schematic flow chart of the semantic segmentation method of high-resolution remote sensing images of buildings provided by the present invention. As shown in Figure 1, the method for semantic segmentation of buildings in high-resolution remote sensing images provided by embodiments of the present invention includes: step 101, preprocessing the remote sensing images to obtain target images.

其中，所述遥感图像的观测内容至少包括建筑物。Wherein, the observation content of the remote sensing image at least includes buildings.

需要说明的是，本发明实施例提供的高分辨率遥感影像建筑物语义分割方法的执行主体为高分辨率遥感影像建筑物语义分割装置。高分辨率遥感影像建筑物语义分割装置的检测对象为遥感图像。It should be noted that the execution subject of the high-resolution remote sensing image building semantic segmentation method provided by the embodiment of the present invention is a high-resolution remote sensing image building semantic segmentation device. The detection object of the high-resolution remote sensing image building semantic segmentation device is remote sensing images.

遥感图像，是指通过遥感手段获得地物在不同波段的电磁波谱信息。遥感图像可以根据不同波段进行图像分割，生成不同通道的二维数组，以表征不同波段的地物信息，二维数组中的像素值称为亮度值(或称为灰度值、DN值)。Remote sensing images refer to obtaining electromagnetic spectrum information of ground objects in different bands through remote sensing means. Remote sensing images can be segmented according to different bands to generate two-dimensional arrays of different channels to represent the feature information of different bands. The pixel values in the two-dimensional array are called brightness values (or gray values, DN values).

具体地，在步骤101中，高分辨率遥感影像建筑物语义分割装置根据与设定的需求，对遥感图像进行相应的预处理，获取目标图像。Specifically, in step 101, the high-resolution remote sensing image building semantic segmentation device performs corresponding preprocessing on the remote sensing image according to the set requirements to obtain the target image.

目标图像，是指相较于原始的遥感图像，以在几何和辐射上尽可能接近真实的图像。目标图像具有的通道数至少大于三个，不同通道对应的灰度图的分辨率和宽高尺寸不同。目标图像用于输入至分割模型，以获取对应的建筑物分割结果。The target image refers to an image that is as close to the real image as possible in terms of geometry and radiation compared with the original remote sensing image. The target image has at least three channels, and the grayscale images corresponding to different channels have different resolutions and different width and height dimensions. The target image is used as input to the segmentation model to obtain the corresponding building segmentation results.

预处理，是指对输入的原始遥感图像应用分割模型前所进行的处理。预处理用于消除遥感图像中失真的信息，使得生成的目标图像包含有用的真实信息，增强有关信息的可检测性和最大限度地简化数据。本发明实施例对预处理不作具体限定。Preprocessing refers to the processing performed before applying the segmentation model to the input original remote sensing image. Preprocessing is used to eliminate distorted information in remote sensing images, so that the generated target images contain useful real information, enhance the detectability of relevant information and simplify the data to the greatest extent. The embodiment of the present invention does not specifically limit the preprocessing.

示例性地，预处理可以为辐射校正，高分辨率遥感影像建筑物语义分割装置分别对遥感图像中由遥感传感器本身特性、地物光照条件(地形影响和太阳高度角影响)以及大气作用产生的三种辐射畸变，进行遥感器校准、大气校正(或者太阳高度校正)和地形校正，以获取目标图像。For example, the preprocessing can be radiation correction. The high-resolution remote sensing image building semantic segmentation device respectively corrects the radiation in the remote sensing image caused by the characteristics of the remote sensing sensor itself, the illumination conditions of the ground objects (the influence of terrain and the influence of the sun's altitude angle), and the effects of the atmosphere. Three types of radiation distortion are used to perform remote sensor calibration, atmospheric correction (or solar height correction) and terrain correction to obtain target images.

示例性地，预处理也可以几何校正，高分辨率遥感影像建筑物语义分割装置需要分别在待纠正的遥感图像和具有坐标信息的标准图选取控制点对，进行像元坐标变换，并对像元的亮度值重采样，以获取目标图像。For example, the preprocessing can also be geometric correction. The high-resolution remote sensing image building semantic segmentation device needs to select control point pairs in the remote sensing image to be corrected and the standard image with coordinate information, perform pixel coordinate transformation, and perform image coordinate transformation. Resample the brightness value of the element to obtain the target image.

步骤102、将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果。Step 102: Input the target image to the segmentation model, and obtain the building segmentation result output by the segmentation model.

其中，所述分割模型是基于样本遥感图像，以及所述样本遥感图像对应标注的建筑物区域标签训练得到的。所述分割模型包括：Wherein, the segmentation model is trained based on sample remote sensing images and building area labels corresponding to the sample remote sensing images. The segmentation model includes:

第一特征抽取层，用于将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图。The first feature extraction layer is used to perform step-by-step downsampling of the refined feature map obtained by the target image, and then step-by-step upsample and fuse the refined feature map to obtain a mid-level feature map.

第二特征抽取层，用于对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图。The second feature extraction layer is used to perform convolution operations of different scales on the refined feature map to obtain a deep feature map that is fused with features of the same scale.

特征融合层，用于将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果。The feature fusion layer is used to fuse the low-level feature map, the middle-level feature map and the high-level feature map to obtain the building segmentation result segmented from the target image.

其中，所述第一特征抽取层包括自下而上级联的多个下采样子层，以及与各所述下采样子层自上而下对应级联的多个层级的上采样子层。所述低层特征图是对所述目标图像在自下而上的第一个下采样子层进行特征提取得到的。Wherein, the first feature extraction layer includes a plurality of down-sampling sub-layers cascaded from bottom to top, and a plurality of levels of up-sampling sub-layers cascaded from top to bottom corresponding to each of the down-sampling sub-layers. The low-level feature map is obtained by extracting features of the target image in the first down-sampling sub-layer from bottom to top.

需要说明的是，分割模型是基于遥感图像样本数据以及预先确定的遥感图像类型标签进行训练后得到。It should be noted that the segmentation model is obtained after training based on remote sensing image sample data and predetermined remote sensing image type labels.

分割模型可以是一种人工智能模型，本发明实施例对模型类型不作具体限定。The segmentation model may be an artificial intelligence model, and the embodiment of the present invention does not specifically limit the model type.

例如，分割模型可以是一种卷积神经网络模型，卷积神经网络作为一种数据驱动的算法,依靠其逐层堆叠的采样层、卷积层对数据进行分层特征提取,并运用反向传播算法对模型参数进行迭代优化。卷积神经网络模型的结构和参数包括但不限于模型的层数，以及每一层的权重参数等。For example, the segmentation model can be a convolutional neural network model. As a data-driven algorithm, the convolutional neural network relies on its stacked sampling layers and convolutional layers to extract hierarchical features of the data, and uses reverse The propagation algorithm iteratively optimizes model parameters. The structure and parameters of the convolutional neural network model include but are not limited to the number of layers of the model and the weight parameters of each layer.

需要说明的是，样本数据包含样本数据对应的遥感图像，以及在遥感图像中为隶属于建筑物区域的像素所标记和类型标签。将遥感图像样本数据划分为训练集和测试集，本发明实施例对训练集和测试集的样本比例不作具体限定。It should be noted that the sample data includes the remote sensing images corresponding to the sample data, as well as the labels and type labels for pixels belonging to the building area in the remote sensing images. The remote sensing image sample data is divided into a training set and a test set. The embodiment of the present invention does not specifically limit the sample ratio of the training set and the test set.

示例性地，可以按照8:2的比例将将遥感图像样本数据随机分为训练集和测试集。For example, the remote sensing image sample data can be randomly divided into a training set and a test set according to a ratio of 8:2.

其中，训练集用来训练网络模型、调整模型参数。测试集用于评估模型性能。同时采用随机翻转、随机旋转等数据增强策略来增强训练样本的数量与复杂度。Among them, the training set is used to train the network model and adjust model parameters. The test set is used to evaluate model performance. At the same time, data enhancement strategies such as random flipping and random rotation are used to enhance the number and complexity of training samples.

需要说明的是，在分割模型训练过程中，高分辨率遥感影像建筑物语义分割装置将输入的目标图像按照批次进行整合，其图像格式处理为batchsize*channel*height*width，其中，channel为目标图像的通道数，height和width分别为目标图像的二维尺寸中的高度和宽度。以使得在一次训练中选取个数为batchsize的训练样本输入模型中，然后计算它们的梯度进行反向传播，提高内存的利用率。It should be noted that during the segmentation model training process, the high-resolution remote sensing image building semantic segmentation device integrates the input target images in batches, and the image format is batchsize*channel*height*width, where channel is The number of channels of the target image, height and width are respectively the height and width in the two-dimensional dimensions of the target image. This allows a batchsize number of training samples to be selected and input into the model during one training session, and then their gradients are calculated for back propagation to improve memory utilization.

具体地，在步骤102中，高分辨率遥感影像建筑物语义分割装置根据训练好的模型参数，对分割模型进行设置后，通过该模型对任一如步骤101中的目标图像进行建筑物区域的识别和分割，可以得到与该目标图像对应的建筑物分割结果。Specifically, in step 102, the high-resolution remote sensing image building semantic segmentation device sets the segmentation model according to the trained model parameters, and then uses the model to perform segmentation of the building area on any target image as in step 101. Through identification and segmentation, the building segmentation results corresponding to the target image can be obtained.

本发明实施例对建筑物分割结果的形式不作具体限定。The embodiment of the present invention does not specifically limit the form of the building segmentation results.

例如，建筑物分割结果可以是一个宽高尺寸与目标图像相同的二位数组，数组中任意一个数值的取值范围为[0，1]，其中0为该像素点不属于建筑物区域，1为该像素点属于建筑物区域，并且可以通过像素点在二维数组中的位置信息获知图像中哪些区域为建筑物。For example, the building segmentation result can be a two-digit array with the same width and height as the target image. The value range of any value in the array is [0, 1], where 0 means that the pixel does not belong to the building area, and 1 The pixel belongs to the building area, and the position information of the pixel in the two-dimensional array can be used to know which areas in the image are buildings.

例如，建筑物分割结果可以是一个统计值，则可以通过对所有像素点进行统计的数值说明目标图像为“有建筑物”，并给出目标图像中包含的建筑物数量，或者为“无建筑物”，以示意目标图像中不存在任何完整或者部分的建筑物。For example, the building segmentation result can be a statistical value. The target image can be described as "with buildings" by counting all pixels, and the number of buildings contained in the target image can be given, or it can be "without buildings". "object" to indicate that there are no complete or partial buildings in the target image.

本发明实施例对分割模型不作具体限定。The embodiment of the present invention does not specifically limit the segmentation model.

示例性地，高分辨率遥感影像建筑物语义分割装置利用分割模型，对遥感图像中的建筑物区域进行识别，该模型至少由输入层、隐藏层和输出层构成。For example, the device for semantic segmentation of buildings in high-resolution remote sensing images uses a segmentation model to identify building areas in remote sensing images. The model is composed of at least an input layer, a hidden layer and an output layer.

输入层在整个网络的最前端部分，直接接收步骤101中生成的目标图像。The input layer is at the forefront of the entire network and directly receives the target image generated in step 101.

隐藏层至少包含四层，分别是第一特征抽取层、第二特征抽取层和特征融合层。隐藏层的作用是对目标图像逐层进行下采样操作，获取不同层级遥感语义信息的特征图，通过并行不同尺度的空洞卷积核，可增强对不同尺度建筑物的特征及高级语义信息的提取。再融合不同尺寸的特征图，可以获得更广泛和更深层的特征。分割模型的隐藏层可以包括：The hidden layer contains at least four layers, namely the first feature extraction layer, the second feature extraction layer and the feature fusion layer. The function of the hidden layer is to downsample the target image layer by layer to obtain feature maps of different levels of remote sensing semantic information. By paralleling dilated convolution kernels of different scales, the extraction of features and advanced semantic information of buildings of different scales can be enhanced. . By fusing feature maps of different sizes, broader and deeper features can be obtained. Hidden layers of a segmentation model can include:

第一特征抽取层的作用是通过大量卷积核从目标图像所对应的矩阵中提取高维特征图，可以提取出多张特征图，每张特征图是从图片中提取出来的局部感知，综合这些特征图可以提取出图片中感兴趣的部分。再根据通道维度进行多层级特征融合，将所融合后的特征图进行上采样，将图像宽高尺寸放大为原来的两倍以恢复图像的空间信息(即通道维度)，得到不同通道维度融合后的中层特征图。The function of the first feature extraction layer is to extract high-dimensional feature maps from the matrix corresponding to the target image through a large number of convolution kernels. Multiple feature maps can be extracted. Each feature map is a local perception extracted from the image. Comprehensive These feature maps can extract interesting parts of the image. Then perform multi-level feature fusion according to the channel dimension, upsample the fused feature map, and enlarge the image width and height to twice the original size to restore the spatial information of the image (i.e., channel dimension), and obtain the fusion of different channel dimensions. mid-level feature map.

本发明实施例采用的四个下采样子层使用池化操作进行下采样，池化窗口大小为2。在每一层中都先卷积，后池化，使得图像经过每一层处理后特征图的高度和宽度都会缩短一半，通道数增加一倍。能够有效提取出图片的特征信息，并对特征降维，压缩数据和参数的数量，减小训练过拟合。The four downsampling sub-layers used in the embodiment of the present invention use pooling operations to perform downsampling, and the pooling window size is 2. In each layer, convolution is performed first, and then pooling is performed, so that after each layer of image processing, the height and width of the feature map will be shortened by half, and the number of channels will be doubled. It can effectively extract feature information of images, reduce feature dimensions, compress the number of data and parameters, and reduce training overfitting.

第二特征抽取层的作用是对建筑物尺寸大小不一的遥感影像进行多尺度特征提取，较大空洞率的卷积核仅有利于提取大尺寸建筑物信息，较小空洞率的卷积核则利于提取小尺寸建筑物信息，通过融合多个卷积尺度下提取的特征，得到兼顾细节信息和空间信息的深层特征图。The function of the second feature extraction layer is to extract multi-scale features from remote sensing images of buildings of different sizes. A convolution kernel with a larger hole rate is only beneficial to extracting information about large-sized buildings, and a convolution kernel with a smaller hole rate is beneficial to extracting information about large-sized buildings. It is conducive to extracting small-sized building information. By fusing features extracted at multiple convolution scales, a deep feature map that takes into account detailed information and spatial information is obtained.

特征融合层的作用是将低层特征图、中层特征图和高层特征图特征图分别上采样，依次将宽度和高度还原到与目标图像的宽高尺寸一样，再经过融合后的特征映射至建筑物分割结果。The function of the feature fusion layer is to upsample the low-level feature map, mid-level feature map and high-level feature map feature map respectively, restore the width and height to the same width and height dimensions as the target image, and then map the fused features to the building Segmentation results.

输出层是最后一层，输出层的作用是采用尺寸为1x1、个数为1的卷积核进行全连接降维，得到宽高尺寸和原图一样的特征图，该特征图为通道维度是1的灰度图，即为最终建筑物分割结果。The output layer is the last layer. The function of the output layer is to use a convolution kernel with a size of 1x1 and a number of 1 to perform fully connected dimensionality reduction to obtain a feature map with the same width and height as the original image. The channel dimension of this feature map is The grayscale image of 1 is the final building segmentation result.

在多层神经网络中，激励函数ReLU的作用是对输出层中卷积得到的特征图进行非线性映射，因为经过多层卷积后，特征向量图的数值在训练的时候变化不大，会导致梯度消失，进而导致卷积网络模型不可训练。在每次卷积后使用ReLU激励函数，可以对卷积层的线性计算的结果进行非线性映射，这样训练前后特征图的变化不会太少，进而可以对该模型进行训练。In a multi-layer neural network, the function of the excitation function ReLU is to perform nonlinear mapping on the feature map obtained by convolution in the output layer, because after multi-layer convolution, the value of the feature vector map does not change much during training. This causes the gradient to disappear and the convolutional network model to be untrainable. By using the ReLU excitation function after each convolution, the results of the linear calculation of the convolution layer can be nonlinearly mapped, so that the feature map changes before and after training will not be too small, and the model can be trained.

本发明实施例基于多通道数的遥感图像，获取目标图像，并通过具有金字塔结构的第一特征抽取层提取出细化特征图，以及融合多种尺度特征的中层特征图，通过尺寸缩小的细化特征图并行执行不同尺度的卷积操作，将统一成相同尺度的特征融合成深层特征图，再将三种语义层级的特征图融合得到的图像还原出建筑物分割结果。能够在不损失信息的情况下，逐层增大卷积的感受野，使每个卷积的输出都包含较大范围的语义信息，能提高图像语义分割的精细性，进而提高建筑物识别的准确性。The embodiment of the present invention acquires a target image based on multi-channel remote sensing images, and extracts a refined feature map through the first feature extraction layer with a pyramid structure, as well as a mid-level feature map that integrates multiple scale features. The feature map performs convolution operations of different scales in parallel, fuses the features unified into the same scale into a deep feature map, and then fuses the three semantic level feature maps to restore the building segmentation result. It can increase the receptive field of convolution layer by layer without losing information, so that the output of each convolution contains a larger range of semantic information, which can improve the sophistication of image semantic segmentation, thereby improving the accuracy of building recognition. accuracy.

在上述任一实施例的基础上，将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图，包括：通过注意力机制依次在所述各下采样子层中执行相应尺度的卷积操作和池化操作，得到目标特征图。Based on any of the above embodiments, after stepwise downsampling of the target image to obtain the refined feature map, the refined feature map is stepwise upsampled and fused to obtain a mid-level feature map, including : The attention mechanism is used to sequentially perform convolution operations and pooling operations of corresponding scales in each of the downsampling sub-layers to obtain the target feature map.

具体地，高分辨率遥感影像建筑物语义分割装置在每一下采样子层对所输入的特征图执行相应尺度的卷积操作和池化操作，进行相应的空间特征压缩，利用注意力机制对压缩后的特征图进行通道特征学习，以学习不同通道之间的重要性，直至在最后一个下采样子层输出目标特征图。Specifically, the high-resolution remote sensing image building semantic segmentation device performs convolution operations and pooling operations of corresponding scales on the input feature maps at each downsampling sub-layer, performs corresponding spatial feature compression, and uses the attention mechanism to compress The final feature map is subjected to channel feature learning to learn the importance between different channels until the target feature map is output in the last downsampling sub-layer.

对所述目标特征图逐级上采样至各下采样子层对应的原始尺寸后进行特征融合，得到所述中层特征图。The target feature map is gradually upsampled to the original size corresponding to each downsampled sub-layer and then feature fusion is performed to obtain the mid-level feature map.

具体地，高分辨率遥感影像建筑物语义分割装置自下而上地将骨干网络中各下采样子层所产生的不同尺寸、不同层次的特征图融合，有效联合低层与高层的特征信息，得到中层特征图。Specifically, the high-resolution remote sensing image building semantic segmentation device fuses feature maps of different sizes and different levels generated by each downsampling sub-layer in the backbone network from bottom to top, effectively combining low-level and high-level feature information to obtain Mid-level feature map.

对所述目标特征图执行空洞卷积操作和池化操作，得到所述细化特征图。Perform dilated convolution operations and pooling operations on the target feature map to obtain the refined feature map.

具体地，基于遥感影像的建筑物分割装置在此步骤中不进行下采样，使用空洞率为2的卷积代替步长为2的卷积对目标特征图进行空洞卷积操作和池化操作，减少目标特征图中信息的损失，同时保证感受野的前提下提取语义更强更深层的特征，得到最终的细化特征图。本发明实施例基于通道注意力机制从避免降维和跨通道信息交互两方面进行改进，有效捕获了跨通道交互的信息，可增强关键特征的表达，抑制噪声和不重要的特征，进一步增强模型提取的精度。Specifically, the building segmentation device based on remote sensing images does not perform downsampling in this step, and uses convolution with a hole rate of 2 instead of convolution with a stride of 2 to perform hole convolution operations and pooling operations on the target feature map. Reduce the loss of information in the target feature map, while ensuring the receptive field, extracting features with stronger and deeper semantics to obtain the final refined feature map. The embodiment of the present invention is based on the channel attention mechanism and improves it from two aspects: avoiding dimensionality reduction and cross-channel information interaction, effectively capturing cross-channel interaction information, enhancing the expression of key features, suppressing noise and unimportant features, and further enhancing model extraction. accuracy.

在上述任一实施例的基础上，每一下采样子层为Bottleneck结构。Based on any of the above embodiments, each downsampling sub-layer is a bottleneck structure.

具体地，高分辨率遥感影像建筑物语义分割装置在每一个下采样子层中所使用的大尺寸卷积均利用多个小尺寸卷积进行代替，即先对数据进行降维，再进行常规尺寸卷积核的卷积，最后再进行升维，以形成类似于沙漏型的Bottleneck结构。Specifically, the large-size convolutions used in each downsampling sub-layer of the high-resolution remote sensing image building semantic segmentation device are replaced by multiple small-size convolutions, that is, the data is dimensionally reduced first, and then conventional The size convolution kernel is convolved, and finally the dimension is raised to form a bottleneck structure similar to an hourglass shape.

示例性地，带有ECA通道注意力提取具有更高语义的特征，本发明实施例以设置三层下采样子层构建骨干网络作为示例。Illustratively, with ECA channel attention to extract features with higher semantics, the embodiment of the present invention takes setting up three layers of downsampling sub-layers to build a backbone network as an example.

一个好的骨干网络在提高分割效率和精确率方面都起着至关重要的作用，故可以采用ResNet50作为骨干网络逐层，ResNet50网络中的残差结构简化了学习过程，增强了梯度传播。Resnet50网络主要是由4个ECA_Block组成，分别是ECA_Block_1、ECA_Block_1、ECA_Block_3和ECA_Block_4，其中：A good backbone network plays a vital role in improving segmentation efficiency and accuracy, so ResNet50 can be used as the backbone network layer by layer. The residual structure in the ResNet50 network simplifies the learning process and enhances gradient propagation. The Resnet50 network is mainly composed of 4 ECA_Blocks, namely ECA_Block_1, ECA_Block_1, ECA_Block_3 and ECA_Block_4, among which:

ECA_Block_1的第1层结构比较简单，可以视其为对输入的预处理，图2是本发明提供的分割模型的子结构示意图之一。如图2所示，ECA_Block_1的处理过程包括4个先后操作：卷积、BN层、ReLU激活函数、最大池化层，直接下采样到原影像的1/4，如图2所示。ECA_Block_1的第2层包含3个Bottleneck模块，用于进行浅层特征提取。对于ECA_Block_1，输入的目标图像的尺寸为(3,512,512)，最终输出尺寸为(64,128,128)。The first layer structure of ECA_Block_1 is relatively simple and can be regarded as preprocessing of the input. Figure 2 is one of the substructure schematic diagrams of the segmentation model provided by the present invention. As shown in Figure 2, the processing process of ECA_Block_1 includes four sequential operations: convolution, BN layer, ReLU activation function, and maximum pooling layer, which are directly downsampled to 1/4 of the original image, as shown in Figure 2. The second layer of ECA_Block_1 contains 3 Bottleneck modules for shallow feature extraction. For ECA_Block_1, the input target image size is (3,512,512), and the final output size is (64,128,128).

ECA_Block_2可以由4个Bottleneck模块组成，输入尺寸为(64,128,128)，输出尺寸为(128,64,64)。ECA_Block_2 can be composed of 4 Bottleneck modules, the input size is (64,128,128), and the output size is (128,64,64).

ECA_Block_3可以由6个Bottleneck模块组成，输入尺寸为(128,64,64)，输出尺寸为(256,32,32)。ECA_Block_3 can be composed of 6 Bottleneck modules, the input size is (128,64,64), and the output size is (256,32,32).

ECA_Block_4不进行下采样，使用空洞率为2的卷积代替步长为2的卷积，减少信息的损失，同时保证感受野的前提下提取语义更强更深层的特征，最终ECA_Block_4输出形状为(512,32,32)。ECA_Block_4 does not perform downsampling and uses convolution with a hole rate of 2 instead of convolution with a stride of 2 to reduce the loss of information and extract features with stronger semantics and deeper depth while ensuring the receptive field. The final output shape of ECA_Block_4 is ( 512,32,32).

图3是本发明提供的分割模型的子结构示意图之二。如图3所示，以上步骤带有ECA通道注意力的Bottleneck模块中，需先经过大小分别为1、3、1的卷积进入ECA通道注意力模块，该模块具体流程为：首先对将输入特征图进行全局平均池化操作；其次进行卷积核大小为K的1维卷积操作，并经过Sigmoid激活函数得到各个通道的权重ω；将权重与原始输入特征图对应元素相乘，得到最终输出特征图。ECA注意力机制在不降维的全局平均池化后利用共享权重的一维卷积进行特征学习，通过考虑每个通道与其相邻的K个通道的相关性以捕获跨通道信息，从而可以选择更准确的特征进行表达，以提高网络模型的提取准确率。Figure 3 is the second schematic diagram of the substructure of the segmentation model provided by the present invention. As shown in Figure 3, in the Bottleneck module with ECA channel attention in the above steps, it is necessary to first enter the ECA channel attention module through convolutions with sizes of 1, 3, and 1. The specific process of this module is: first, the input The feature map is subjected to a global average pooling operation; secondly, a 1-dimensional convolution operation with a convolution kernel size of K is performed, and the weight ω of each channel is obtained through the Sigmoid activation function; the weight is multiplied by the corresponding element of the original input feature map to obtain the final Output feature map. The ECA attention mechanism utilizes one-dimensional convolution with shared weights for feature learning after global average pooling without dimensionality reduction, and captures cross-channel information by considering the correlation of each channel with its neighboring K channels, so that it can be selected Express features more accurately to improve the extraction accuracy of the network model.

图4是本发明提供的分割模型的子结构示意图之三。如图4所示，可以使用特征金字塔FPN，将大尺寸的低层特征与小尺寸的高层特征进行连接，增强不同尺度下的特征图的特征信息。其具体实现过程为：一方面，先将骨干网络中ECA_Block_3产生的1/16大小、通道数为1024的特征图进行1×1卷积降维，使其通道数变为256。然后对其进行2倍上采样，得到FPN中的一个分支。Figure 4 is the third schematic diagram of the substructure of the segmentation model provided by the present invention. As shown in Figure 4, feature pyramid FPN can be used to connect large-sized low-level features with small-sized high-level features to enhance the feature information of feature maps at different scales. The specific implementation process is: on the one hand, the feature map of 1/16 size and 1024 channels generated by ECA_Block_3 in the backbone network is first subjected to 1×1 convolution and dimensionality reduction, so that the number of channels becomes 256. Then it is upsampled by 2 times to obtain a branch in FPN.

另一方面，将骨干网络中ECA_Block_2产生的1/8大小、通道数为512的特征图进行1×1卷积降维，使其通道数变为256，得到FPN中的另一个分支。On the other hand, the feature map of 1/8 size and 512 channels generated by ECA_Block_2 in the backbone network is subjected to 1×1 convolution and dimensionality reduction, so that the number of channels becomes 256, and another branch in FPN is obtained.

最后，将这两个分支的特征图以相加的方式融合，得到中层特征图。Finally, the feature maps of the two branches are fused in an additive manner to obtain the mid-level feature map.

本发明实施例用利用Bottleneck结构中第一个1x1的卷积减少通道数，使得中间卷积的通道数减少为1/4，再利用中间的普通卷积做完卷积后输出通道数等于输入通道数，再利用最后一个1x1卷积用于恢复通道数，使得bottleneck的输出通道数等于bottleneck的输入通道数，能够有效地较少了卷积的参数个数和计算量。The embodiment of the present invention uses the first 1x1 convolution in the Bottleneck structure to reduce the number of channels, so that the number of channels in the intermediate convolution is reduced to 1/4, and then uses the ordinary convolution in the middle to complete the convolution. The number of output channels is equal to the input The number of channels is then used to restore the number of channels by using the last 1x1 convolution, so that the number of output channels of bottleneck is equal to the number of input channels of bottleneck, which can effectively reduce the number of convolution parameters and the amount of calculation.

在上述任一实施例的基础上，对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图，包括：对所述细化特征图分别进行1*1的卷积操作，空洞率等差递增的n个卷积核进行卷积操作以及全局平均池化操作，得n+2个二维尺寸和通道数相同的特征图。On the basis of any of the above embodiments, performing convolution operations of different scales on the thinned feature maps to obtain deep feature maps fused with features of the same scale includes: performing 1* on the thinned feature maps respectively. 1 convolution operation, n convolution kernels with equal increasing hole rates perform convolution operations and global average pooling operations to obtain n+2 feature maps with the same two-dimensional size and number of channels.

具体地，高分辨率遥感影像建筑物语义分割装置将第一特征抽取层所逐层压缩空间特征的细化特征图，分别进行1×1的卷积，以初始空洞率为4，并以差值为4进行等差递增的n个的卷积核同时进行卷积，最后以及全局平均池化，可以得n+2个尺寸、通道数相同的特征图。Specifically, the high-resolution remote sensing image building semantic segmentation device performs 1×1 convolution on the refined feature maps of the layer-by-layer compressed spatial features of the first feature extraction layer, with an initial hole rate of 4, and a difference n convolution kernels with a value of 4 and arithmetic increments are convolved at the same time. Finally, and global average pooling, n+2 feature maps with the same size and number of channels can be obtained.

其中，此处的卷积核可以为任一卷积核变体，例如，可以为逐深度过参数化卷积核(Depthwise Over-parameterized Convolutional Layer,DO-Conv)。Do-Conv结合了普通卷积与深度卷积，通过增加线性层来提高模型的特征表达能力，The convolution kernel here can be any convolution kernel variant, for example, it can be a depthwise over-parameterized convolutional layer (DO-Conv). Do-Conv combines ordinary convolution and depth convolution to improve the feature expression ability of the model by adding linear layers.

继而，空洞率较大的空洞Do-Conv卷积核用于分割占地面积较大的建筑物，空洞率较小的空洞Do-Conv卷积核用于分割占地面积较小的建筑物。Then, the hole Do-Conv convolution kernel with a larger hole rate is used to segment buildings with a larger area, and the hole Do-Conv kernel with a smaller hole rate is used to segment buildings with a smaller area.

具体地，高分辨率遥感影像建筑物语义分割装置将n+2个尺寸、通道数相同的特征图进行相加融合，得到深层特征图。Specifically, the high-resolution remote sensing image building semantic segmentation device adds and fuses n+2 feature maps with the same size and channel number to obtain a deep feature map.

示例性地，图5是本发明提供的分割模型的子结构示意图之四。如图5所示，本发明实施例给出一种将空洞Do-Conv卷积核的数量n设置为4时所形成Do-ASPP模块结构对高层特征进行提取的具体实施过程为：Exemplarily, FIG. 5 is the fourth schematic diagram of the substructure of the segmentation model provided by the present invention. As shown in Figure 5, the embodiment of the present invention provides a specific implementation process for extracting high-level features by the Do-ASPP module structure formed when the number n of the hole Do-Conv convolution kernel is set to 4:

采用Do-ASPP模块进行多尺度特征增强，对输入的细化特征图分别通过1*1的普通卷积核、空洞率大小为4、8、12、16的逐深度过参数化卷积核以及池化学习六种尺度下的建筑物特征，最后将不同空洞率的特征图拼接融合得到最终具有多尺度特征的高层特征图。The Do-ASPP module is used for multi-scale feature enhancement, and the input refined feature map is passed through a 1*1 ordinary convolution kernel, a depth-wise over-parameterized convolution kernel with hole rates of 4, 8, 12, and 16, and Pooling learns building features at six scales, and finally the feature maps with different void rates are spliced and fused to obtain a final high-level feature map with multi-scale features.

本发明实施例利用不同尺度的空洞卷积核并行执行卷积，能增强对不同尺度建筑物的特征及高级语义信息的提取，有效减少空洞现象的出现。Embodiments of the present invention use dilated convolution kernels of different scales to perform convolution in parallel, which can enhance the extraction of features and high-level semantic information of buildings of different scales, and effectively reduce the occurrence of hole phenomena.

在上述任一实施例的基础上，空洞率等差递增的n个卷积核中的任一卷积核的组成方式为将深度卷积核与标准卷积核合并。Based on any of the above embodiments, any one of the n convolution kernels with equally increasing hole rates is composed of a depth convolution kernel and a standard convolution kernel.

具体地，高分辨率遥感影像建筑物语义分割装置将设置有不同空洞率参数的卷积核以结合了普通卷积与深度卷积的方式构成Do-Conv卷积核，通过增加线性层来提高模型的特征表达能力，同时连续的线性层可由单个线性层来表示，从而不会大量增加的模型的复杂度。Specifically, the high-resolution remote sensing image building semantic segmentation device will set up convolution kernels with different hole rate parameters to form a Do-Conv convolution kernel by combining ordinary convolution and depth convolution, and improve the performance by adding linear layers. The feature expression ability of the model, while continuous linear layers can be represented by a single linear layer, thus not significantly increasing the complexity of the model.

其中，普通卷积与深度卷积的结合有两种形式，分别为特征组成与核组成。特征组成首先对输入特征图进行深度卷积，再进行标准卷积。核组成方式首先将深度卷积核与标准卷积核合并，然后利用得到的中间结果进行标准卷积。两种方式在数学上等价，但是核组成方式具有更高的训练效率，因为两个卷积核合并后在推理时不会增加计算开销，不影响计算速度。所以，Do-Conv选择使用核组成的方式进行训练，其计算公式如下所示：Among them, the combination of ordinary convolution and deep convolution has two forms, namely feature composition and kernel composition. Feature composition first performs depth convolution on the input feature map, and then performs standard convolution. The kernel composition method first merges the depth convolution kernel and the standard convolution kernel, and then uses the obtained intermediate results to perform standard convolution. The two methods are mathematically equivalent, but the kernel composition method has higher training efficiency, because the merger of the two convolution kernels does not increase the computational overhead during inference and does not affect the calculation speed. Therefore, Do-Conv chooses to use kernel composition for training, and its calculation formula is as follows:

其中D、W分别表示深度卷积核、标准卷积核，O、P分别表示输出特征图与输入特征图，运算符、*分别代表深度卷积与标准卷积。Among them, D and W represent the depth convolution kernel and the standard convolution kernel respectively, O and P represent the output feature map and input feature map respectively, and the operator , * represent depth convolution and standard convolution respectively.

本发明实施例采用深度卷积核与标准卷积核合并，然后利用得到的中间结果进行标准卷积的方式训练不同空洞率参数的卷积核变体，对比传统卷积，在不增加网络推理计算量的前提下，利用核组成的方式所得到的卷积核变体Do-Conv使用了更多的参数参加训练，不仅收敛速度更快，而且可以提高网络性能。The embodiment of the present invention uses a deep convolution kernel and a standard convolution kernel to merge, and then uses the obtained intermediate results to perform standard convolution to train convolution kernel variants with different hole rate parameters. Compared with traditional convolution, network reasoning is not increased. Under the premise of reducing the amount of calculation, the convolution kernel variant Do-Conv obtained by using the kernel composition method uses more parameters to participate in training, which not only converges faster, but also improves network performance.

在上述任一实施例的基础上，将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果，包括：将所述高层特征图进行上采样后与所述中层特征图进行拼接融合，获取第一特征融合图。Based on any of the above embodiments, after fusing the low-level feature map, the middle-level feature map and the high-level feature map, the building segmentation result segmented from the target image is obtained, including: The high-level feature map is upsampled and then spliced and fused with the mid-level feature map to obtain a first feature fusion map.

具体地，高分辨率遥感影像建筑物语义分割装置将高层特征图上采样至二维尺寸与中层特征图一致后，将二者进行拼接融合，得到储存了所有高层语义信息的第一特征融合图。Specifically, the high-resolution remote sensing image building semantic segmentation device upsamples the high-level feature map to a two-dimensional size consistent with the mid-level feature map, and then splices and fuses the two to obtain the first feature fusion map that stores all high-level semantic information. .

将二维尺寸和通道数与所述低层特征图一致的第一特征融合图与低层特征图进行拼接融合，获取第二特征融合图。The first feature fusion map whose two-dimensional size and channel number are consistent with the low-level feature map is spliced and fused with the low-level feature map to obtain a second feature fusion map.

具体地，高分辨率遥感影像建筑物语义分割装置将第一特征融合图上采样至二维尺寸与低层特征图相等的同时，还里利用一个1*1卷积核将低层特征图的通道数与第一特征融合图保持一致，紧接着将二维尺寸和通道数均一致的第一特征融合图与低层特征图进行拼接融合，得到储存了所有低层语义信息和高层语义信息的第二特征融合图。Specifically, the high-resolution remote sensing image building semantic segmentation device upsamples the first feature fusion map to a two-dimensional size equal to the low-level feature map, and also uses a 1*1 convolution kernel to reduce the number of channels of the low-level feature map. Consistent with the first feature fusion map, the first feature fusion map with the same two-dimensional size and channel number is then spliced and fused with the low-level feature map to obtain the second feature fusion that stores all low-level semantic information and high-level semantic information. picture.

具体地，高分辨率遥感影像建筑物语义分割装置先利用一个1*1的卷积将通道数恢复至目标图像相等后，再进行上采样操作，直至二维尺寸叶恢复至与目标图像相等，将最终缩放得到的特征图进行解码，将各特征向量映射为具有二值化的像素值，已得到建筑物分割结果。Specifically, the high-resolution remote sensing image building semantic segmentation device first uses a 1*1 convolution to restore the channel number to be equal to the target image, and then performs an upsampling operation until the two-dimensional size leaf is restored to be equal to the target image. The final scaled feature map is decoded, and each feature vector is mapped into a binary pixel value, and the building segmentation result is obtained.

示例性地，图6是本发明提供的分割模型的总结构示意图。本发明实施例给出一种高分辨率遥感影像建筑物语义分割方法的具体实施方式：Exemplarily, FIG. 6 is a schematic diagram of the overall structure of the segmentation model provided by the present invention. The embodiment of the present invention provides a specific implementation method for semantic segmentation of buildings in high-resolution remote sensing images:

(1)数据样本集构建：(1) Data sample set construction:

使用具有丰富的纹理、形状、颜色特征的数据集作为训练数据，为网络的准确性性奠定基础。由于现有硬件处理能力有限，本发明将建筑物遥感影像统一裁剪成大小为512*512的子图，将子图按照8:2的比例随机分为训练集和测试集。训练集用来训练网络模型、调整模型参数，测试集用于评估模型性能。Use data sets with rich texture, shape, and color features as training data to lay the foundation for the accuracy of the network. Due to the limited processing capabilities of existing hardware, this invention uniformly cuts the building remote sensing images into sub-images with a size of 512*512, and randomly divides the sub-images into training sets and test sets according to a ratio of 8:2. The training set is used to train the network model and adjust model parameters, and the test set is used to evaluate model performance.

(2)分割模型构建：(2) Segmentation model construction:

分割模型采用编码器与解码器结构，编码器采用带有ECA通道注意力的ResNet50作为骨干网络从输入的三通道高分辨率遥感影像提取出细化特征图后，增加DO-ASPP模块来增强对细化特征图的多尺度特征的提取，同时，使用了特征金字塔FPN在编码层中进行特征融合。FPN是一种融合不同层级特征图的方式，能够自下而上地将大尺寸的低层特征与小尺寸的高层特征进行连接，增强不同尺度特征信息的中层特征图。The segmentation model adopts an encoder and decoder structure. The encoder uses ResNet50 with ECA channel attention as the backbone network. After extracting refined feature maps from the input three-channel high-resolution remote sensing images, a DO-ASPP module is added to enhance the classification. The multi-scale feature extraction of the feature map is refined, and feature pyramid FPN is used to perform feature fusion in the coding layer. FPN is a way to fuse different levels of feature maps, which can connect large-sized low-level features with small-sized high-level features from bottom to top, and enhance the mid-level feature maps of different scale feature information.

继而，在解码层中采用逐层上采样的方法进行融合，先将Do-ASPP模块输出特征图2倍上采样后再与FPN模块输出特征图融合，然后将融合结果进行第二次2倍上采样，相当于利用两次2倍上采样实现4倍上采样，使提取建筑物结果中像素值更加连续、与原图像中像素值更加接近，从而有效减少结果中的空洞现象，提高精度。具体实现细节为：将ASPP模块得到的高层特征图进行2倍上采样后与FPN得到的中层特征图进行拼接融合,得到1/8大小、通道数为512的特征图,利用1×1卷积将其通道数变为256后继续进行2倍上采样,并将其与骨干网络第一阶段得到的低层特征图进行拼接融合。Then, the layer-by-layer upsampling method is used for fusion in the decoding layer. The feature map output by the Do-ASPP module is first upsampled by 2 times and then fused with the feature map output by the FPN module. The fusion result is then upsampled by 2 times for the second time. Sampling is equivalent to using two 2x upsampling to achieve 4x upsampling, making the pixel values in the building extraction results more continuous and closer to the pixel values in the original image, thus effectively reducing the hole phenomenon in the results and improving accuracy. The specific implementation details are: the high-level feature map obtained by the ASPP module is upsampled 2 times and then spliced and fused with the mid-level feature map obtained by FPN to obtain a feature map with a size of 1/8 and a channel number of 512, using 1×1 convolution. After changing the number of channels to 256, continue to perform 2 times upsampling, and splice and fuse it with the low-level feature map obtained in the first stage of the backbone network.

(3)分割模型训练：(3) Segmentation model training:

利用数据样本集进行模型参数训练，本发明实施例针对模型训练采用随机梯度下降优化器，编码层初始学习率为0.001，解码层初始学习率为0.01，为了防止过拟合，将权重衰减率设为0.0005，输入目标图像的大小为512*512，每次输入网络中图像的批次大小(即batchsize)为16，1个Epoch指所有数据送入网络中完成一次前向计算及反向传播的过程，总Epoch数设为33。The data sample set is used for model parameter training. The embodiment of the present invention adopts a stochastic gradient descent optimizer for model training. The initial learning rate of the encoding layer is 0.001, and the initial learning rate of the decoding layer is 0.01. In order to prevent over-fitting, the weight attenuation rate is set to is 0.0005, the size of the input target image is 512*512, the batch size (i.e. batchsize) of the image input into the network each time is 16, and 1 Epoch means that all data are sent into the network to complete one forward calculation and back propagation. process, the total Epoch number is set to 33.

(4)光学遥感影像建筑物检测：(4) Optical remote sensing image building detection:

将待检测的遥感影像分割为大小为512*512的子图(即目标图像)，然后将目标图像放入训练好的分割模型中得到建筑物提取的结果。The remote sensing image to be detected is divided into sub-images (i.e. target images) with a size of 512*512, and then the target image is put into the trained segmentation model to obtain the building extraction results.

为了验证本发明的可行性和有效性，本发明实施例利用法国国家信息与自动化研究所(Institut national de re-cherche en informatique et en automatique,Inria)2017年发布的城市航空遥感建筑物数据集(Inria Aerial Image Labeling Dataset)数据集进行训练测试，同时还与目前经典的深度学习网络从定性与定量这两方面进行评价。定量评价采用像素精度(Pixel Accuracy，PA)、均像素精度(Mean Pixel Accuracy，MPA)、均交并比(Mean Intersection over Union，MIoU)、频权交并比(Frequency WeightedIntersection over Union，FWIoU)作为评价建筑物提取精度的主要标准，所用指标计算公式如下：In order to verify the feasibility and effectiveness of the present invention, the embodiment of the present invention uses the urban aerial remote sensing building data set released by the French National Institute of Information and Automation (Institut national de re-cherche en informatique et en automatique, Inria) in 2017 ( Inria Aerial Image Labeling Dataset) data set was used for training and testing, and it was also evaluated qualitatively and quantitatively with the current classic deep learning network. Quantitative evaluation uses Pixel Accuracy (PA), Mean Pixel Accuracy (MPA), Mean Intersection over Union (MIoU), and Frequency Weighted Intersection over Union (FWIoU) as the The main criteria for evaluating the accuracy of building extraction, the index calculation formula used is as follows:

其中，k为图像像素的类别数量，p_ij表示实际类别为i而预测类别为j的像素数量，p_ii表示实际类别和预测均为i的像素数量。Among them, k is the number of categories of image pixels, p _ij represents the number of pixels with actual category i and predicted category j, and p _ii represents the number of pixels with both actual category and predicted category i.

示例性地，图7是本发明提供的高分辨率遥感影像建筑物语义分割方法的仿真结果示意图之一。图8是本发明提供的高分辨率遥感影像建筑物语义分割方法的仿真结果示意图之二。图9是本发明提供的高分辨率遥感影像建筑物语义分割方法的仿真结果示意图之三。如图7-9所示，提供一组遥感影像和对应的建筑物分割结果。Exemplarily, FIG. 7 is one of the simulation results of the semantic segmentation method for high-resolution remote sensing images of buildings provided by the present invention. Figure 8 is a second schematic diagram of the simulation results of the semantic segmentation method for high-resolution remote sensing images of buildings provided by the present invention. Figure 9 is the third schematic diagram of the simulation results of the semantic segmentation method of high-resolution remote sensing images of buildings provided by the present invention. As shown in Figure 7-9, a set of remote sensing images and corresponding building segmentation results are provided.

其中，任一图示中的子图(a)为遥感影像的真彩色图，子图(b)为子图(a)中遥感影像在UNet网络中的建筑物分割结果，子图(c)为子图(a)中遥感影像在DeepLabv3网络中的建筑物分割结果，子图(d)为子图(a)中遥感影像在DeepLabv3+网络中的建筑物分割结果，子图(e)为子图(a)中遥感影像在DeepResUnet网络中的建筑物分割结果，子图(f)为子图(a)中遥感影像使用本发明实施例的分割模型所得到的建筑物分割结果。Among them, sub-figure (a) in any diagram is the true color image of the remote sensing image, sub-figure (b) is the building segmentation result of the remote sensing image in sub-figure (a) in the UNet network, and sub-figure (c) is the building segmentation result of the remote sensing image in the DeepLabv3 network in subfigure (a), subfigure (d) is the building segmentation result of the remote sensing image in subfigure (a) in the DeepLabv3+ network, and subfigure (e) is the subfigure Figure (a) shows the building segmentation result of the remote sensing image in the DeepResUnet network. Subfigure (f) shows the building segmentation result of the remote sensing image in subfigure (a) using the segmentation model of the embodiment of the present invention.

采用本实施例的以上本方案，在所有网络的建筑物云分割结果中表现最好，不仅准确率高，而且保持了较低的漏检率和较高的交并比。Using the above solution of this embodiment, it performs best among the building cloud segmentation results of all networks. It not only has high accuracy, but also maintains a low missed detection rate and a high intersection and union ratio.

在分割结果图中，建筑物为白色(即像素值为1)，背景为黑色(即像素值为0)。本方案不仅在分割结果中出现空洞现象有了明显的改善，错分漏分现象也有了较好的提升。In the segmentation result image, the building is white (that is, the pixel value is 1), and the background is black (that is, the pixel value is 0). This solution not only significantly improves the hole phenomenon in the segmentation results, but also improves the phenomenon of misclassification and missing classification.

参见图7和图8，在面对形状较为复杂的建筑物时，部分经典网络将颜色相近的道路错误识别为建筑物，本方案较好地避免了这种情况，获得精准的建筑物分割结果。See Figures 7 and 8. When faced with buildings with relatively complex shapes, some classic networks mistakenly identify roads with similar colors as buildings. This solution can better avoid this situation and obtain accurate building segmentation results. .

参见图9，本发明实施例对于密集分布较为密集的建筑物，分割结果有一定的提升，且小目标漏分的现象也有不错的改善。并根据目标图像在不同网络中的相关指标进行对照，如表1所示。Referring to Figure 9, the embodiment of the present invention has a certain improvement in segmentation results for densely distributed buildings, and the phenomenon of missing small targets is also improved. And compare according to the relevant indicators of the target image in different networks, as shown in Table 1.

表1指标对比示意表之一Table 1 One of the indicator comparison tables

为了证明注意力机制、逐深度过参数化卷积、特征融合模块的有效性，本发明在实施例中进行了消融实验。如表2，三个模块逐次添加，每个模块都可以使得建筑物检测精度提升。其中效果最明显的是注意力机制，最不明显的为Do_Conv，推测原因如下：Do_Conv主要针对模型训练加速模型的收敛，所以MIoU的提升较小。In order to prove the effectiveness of the attention mechanism, depth-by-depth over-parameterized convolution, and feature fusion module, the present invention conducted an ablation experiment in the embodiment. As shown in Table 2, the three modules are added one after another, and each module can improve the building detection accuracy. Among them, the most obvious effect is the attention mechanism, and the least obvious one is Do_Conv. The speculated reasons are as follows: Do_Conv is mainly aimed at model training to accelerate the convergence of the model, so the improvement in MIoU is small.

表2指标对比示意表之二Table 2 Indicator Comparison Table 2

本发明实施例中的方案，基于光学传感器中常用的红、绿、蓝波段进行建筑物检测，具有较好普适性；Do-ASPP模块通过并行不同尺度的空洞卷积核，可增强对不同尺度建筑物的特征及高级语义信息的提取，有效减少空洞现象的出现。The solution in the embodiment of the present invention is based on the red, green and blue bands commonly used in optical sensors for building detection, and has good universality; the Do-ASPP module can enhance the detection of different sizes by paralleling dilated convolution kernels of different scales. The features of scale buildings and the extraction of advanced semantic information can effectively reduce the occurrence of holes.

特征融合模块通过融合不同尺寸的特征图，可以获得更广泛和更深层的特征，有效地提高了不同尺寸建筑物提取的精度。The feature fusion module can obtain broader and deeper features by fusing feature maps of different sizes, effectively improving the accuracy of extracting buildings of different sizes.

注意力机制模块，有效捕获了跨通道交互的信息，可增强关键特征的表达，抑制噪声和不重要的特征，进一步增强模型提取的精度，具有很大的实际应用前景。The attention mechanism module effectively captures cross-channel interactive information, enhances the expression of key features, suppresses noise and unimportant features, and further enhances the accuracy of model extraction, which has great practical application prospects.

本发明实施例通过融合不同语义深度的特征图，可以获得更广泛和更深层的特征，有效地提高了不同尺寸建筑物提取的精度。能够有效地解决上采样倍数过大造成的空间信息、细节信息的丢失。By fusing feature maps with different semantic depths, embodiments of the present invention can obtain broader and deeper features, effectively improving the accuracy of extracting buildings of different sizes. It can effectively solve the loss of spatial information and detailed information caused by excessive upsampling multiples.

在上述任一实施例的基础上，对遥感图像进行预处理，获取目标图像，包括：将所述遥感图像裁剪为大小一致的子图，并进行随机地旋转、透视变换、亮度变换以及颜色变换等预处理操作，获取所述目标图像。On the basis of any of the above embodiments, preprocessing the remote sensing image to obtain the target image includes: cropping the remote sensing image into sub-images of the same size, and performing random rotation, perspective transformation, brightness transformation and color transformation. Wait for preprocessing operations to obtain the target image.

具体地，在步骤101中，高分辨率遥感影像建筑物语义分割装置将将三波段的遥感图像裁剪成大小相同的子图，并对每一子图进行随机地旋转、透视变换、亮度变换以及颜色变换等数据增强操作，得到多个目标图像。Specifically, in step 101, the high-resolution remote sensing image building semantic segmentation device will crop the three-band remote sensing image into sub-images of the same size, and randomly rotate, perspective transform, brightness transform and each sub-image. Data enhancement operations such as color transformation are performed to obtain multiple target images.

图10是本发明提供的高分辨率遥感影像建筑物语义分割装置的结构示意图。在上述任一实施例的基础上，如图10所示，该装置包括预处理模块100和分割模块200，其中：Figure 10 is a schematic structural diagram of a semantic segmentation device for high-resolution remote sensing images of buildings provided by the present invention. Based on any of the above embodiments, as shown in Figure 10, the device includes a preprocessing module 100 and a segmentation module 200, wherein:

预处理模块100，用于对遥感图像进行预处理，获取目标图像。其中，所述遥感图像的观测内容至少包括建筑物。The preprocessing module 100 is used to preprocess remote sensing images and obtain target images. Wherein, the observation content of the remote sensing image at least includes buildings.

分割模块200，用于将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果。The segmentation module 200 is configured to input the target image to a segmentation model and obtain a building segmentation result output by the segmentation model.

具体地，预处理模块100和分割模块200顺次电连接。Specifically, the preprocessing module 100 and the segmentation module 200 are electrically connected in sequence.

预处理模块100根据与设定的需求，对遥感图像进行相应的预处理，获取目标图像。The preprocessing module 100 performs corresponding preprocessing on the remote sensing image according to the set requirements to obtain the target image.

分割模块200根据训练好的模型参数，对分割模型进行设置后，通过该模型对任一如预处理模块100中的目标图像进行建筑物区域的识别和分割，可以得到与该目标图像对应的建筑物分割结果。After the segmentation module 200 sets the segmentation model according to the trained model parameters, the model is used to identify and segment the building area of any target image in the preprocessing module 100, and the building corresponding to the target image can be obtained. object segmentation results.

可选地，分割模块200中的第一特征抽取单元包括层级下采样子单元、中层特征融合单元和特征细化子单元，其中：Optionally, the first feature extraction unit in the segmentation module 200 includes a hierarchical downsampling subunit, a mid-level feature fusion unit and a feature refinement subunit, where:

层级下采样子单元，用于通过注意力机制依次在所述各下采样子层中执行相应尺度的卷积操作和池化操作，得到目标特征图。The hierarchical downsampling subunit is used to sequentially perform convolution operations and pooling operations of corresponding scales in each downsampling sublayer through the attention mechanism to obtain the target feature map.

中层特征融合单元，用于对所述目标特征图逐级上采样至各下采样子层对应的原始尺寸后进行特征融合，得到所述中层特征图。A middle-level feature fusion unit is used to gradually upsample the target feature map to the original size corresponding to each downsampling sub-layer and then perform feature fusion to obtain the middle-level feature map.

特征细化子单元，用于对所述目标特征图执行空洞卷积操作和池化操作，得到所述细化特征图。The feature refinement subunit is used to perform dilated convolution operations and pooling operations on the target feature map to obtain the refined feature map.

可选地，每一下采样子层为Bottleneck结构。Optionally, each downsampling sub-layer is a bottleneck structure.

可选地，分割模块200中的第二特征抽取单元包括并行卷积子单元和深层特征图融合子单元，其中：Optionally, the second feature extraction unit in the segmentation module 200 includes a parallel convolution subunit and a deep feature map fusion subunit, where:

并行卷积子单元，用于对所述细化特征图分别进行1*1的卷积操作，空洞率等差递增的n个卷积核进行卷积操作以及全局平均池化操作，得n+2个二维尺寸和通道数相同的特征图。The parallel convolution subunit is used to perform 1*1 convolution operations on the refined feature maps, perform convolution operations on n convolution kernels with equal increasing hole rates, and perform global average pooling operations to obtain n+ 2 feature maps with the same two-dimensional size and number of channels.

深层特征图融合子单元，用于将n+2个二维尺寸和通道数相同的特征图进行融合成所述深层特征图。The deep feature map fusion subunit is used to fuse n+2 feature maps with the same two-dimensional size and channel number into the deep feature map.

可选地，空洞率等差递增的n个卷积核中的任一卷积核的组成方式为将深度卷积核与标准卷积核合并。Optionally, any one of the n convolution kernels with equally increasing hole rates is composed of a depth convolution kernel and a standard convolution kernel.

可选地，分割模块200中的特征融合单元包括第一融合子单元、第二融合子单元和解码子单元，其中：Optionally, the feature fusion unit in the segmentation module 200 includes a first fusion sub-unit, a second fusion sub-unit and a decoding sub-unit, where:

第一融合子单元，用于将所述高层特征图进行上采样后与所述中层特征图进行拼接融合，获取第一特征融合图。The first fusion subunit is used to upsample the high-level feature map and then splice and fuse it with the middle-level feature map to obtain a first feature fusion map.

第二融合子单元，用于将二维尺寸和通道数与所述低层特征图一致的第一特征融合图与低层特征图进行拼接融合，获取第二特征融合图。The second fusion subunit is used to splice and fuse the first feature fusion map whose two-dimensional size and channel number are consistent with the low-level feature map and the low-level feature map to obtain a second feature fusion map.

解码子单元，用于将所述第二特征融合图的二维尺寸和通道数恢复至与所述目标图像相等，得到所述建筑物分割结果。The decoding subunit is used to restore the two-dimensional size and channel number of the second feature fusion map to be equal to the target image to obtain the building segmentation result.

可选地，预处理模块100，具体用于将所述遥感图像裁剪为大小一致的子图，并进行随机地旋转、透视变换、亮度变换以及颜色变换等预处理操作，获取所述目标图像。Optionally, the preprocessing module 100 is specifically configured to crop the remote sensing image into sub-images of the same size, and perform preprocessing operations such as random rotation, perspective transformation, brightness transformation, and color transformation to obtain the target image.

本发明实施例提供的高分辨率遥感影像建筑物语义分割装置，用于执行本发明上述高分辨率遥感影像建筑物语义分割方法，其实施方式与本发明提供的高分辨率遥感影像建筑物语义分割方法的实施方式一致，且可以达到相同的有益效果，此处不再赘述。The high-resolution remote sensing image building semantic segmentation device provided by the embodiment of the present invention is used to perform the above-mentioned high-resolution remote sensing image building semantic segmentation method of the present invention. Its implementation is the same as the high-resolution remote sensing image building semantic segmentation provided by the present invention. The implementation of the segmentation method is consistent and can achieve the same beneficial effects, and will not be described again here.

图11示例了一种电子设备的实体结构示意图，如图11所示，该电子设备可以包括：处理器(processor)1110、通信接口(Communications Interface)1120、存储器(memory)1130和通信总线1140，其中，处理器1110，通信接口1120，存储器1130通过通信总线1140完成相互间的通信。处理器1110可以调用存储器1130中的逻辑指令，以执行高分辨率遥感影像建筑物语义分割方法，该方法包括：对遥感图像进行预处理，获取目标图像；其中，所述遥感图像的观测内容至少包括建筑物；将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果；其中，所述分割模型是基于样本遥感图像，以及所述样本遥感图像对应标注的建筑物区域标签训练得到的；所述分割模型包括：第一特征抽取层，用于将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图；第二特征抽取层，用于对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图；特征融合层，用于将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果；其中，所述第一特征抽取层包括自下而上级联的多个下采样子层，以及与各所述下采样子层自上而下对应级联的多个层级的上采样子层；所述低层特征图是对所述目标图像在自下而上的第一个下采样子层进行特征提取得到的。Figure 11 illustrates a schematic diagram of the physical structure of an electronic device. As shown in Figure 11, the electronic device may include: a processor (processor) 1110, a communications interface (Communications Interface) 1120, a memory (memory) 1130 and a communication bus 1140. Among them, the processor 1110, the communication interface 1120, and the memory 1130 complete communication with each other through the communication bus 1140. The processor 1110 can call the logic instructions in the memory 1130 to execute a method for semantic segmentation of buildings in high-resolution remote sensing images. The method includes: preprocessing the remote sensing images to obtain the target image; wherein the observation content of the remote sensing images is at least Including buildings; input the target image into the segmentation model to obtain the building segmentation result output by the segmentation model; wherein the segmentation model is based on the sample remote sensing image, and the corresponding labeled building area of the sample remote sensing image Obtained by label training; the segmentation model includes: a first feature extraction layer, which is used to perform step-by-step upsampling on the refined feature map obtained by step-by-step downsampling of the target image. and fuse to obtain a mid-level feature map; the second feature extraction layer is used to perform convolution operations on the refined feature map at different scales to obtain a deep feature map fused by features of the same scale; the feature fusion layer is used to After the low-level feature map, the middle-level feature map and the high-level feature map are fused, the building segmentation result segmented from the target image is obtained; wherein the first feature extraction layer includes a bottom-up cascade Multiple down-sampling sub-layers, and multiple levels of up-sampling sub-layers cascaded from top to bottom corresponding to each down-sampling sub-layer; the low-level feature map is a bottom-up view of the target image It is obtained by performing feature extraction on the first downsampling sub-layer.

此外，上述的存储器1130中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 1130 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的高分辨率遥感影像建筑物语义分割方法，该方法包括：对遥感图像进行预处理，获取目标图像；其中，所述遥感图像的观测内容至少包括建筑物；将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果；其中，所述分割模型是基于样本遥感图像，以及所述样本遥感图像对应标注的建筑物区域标签训练得到的；所述分割模型包括：第一特征抽取层，用于将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图；第二特征抽取层，用于对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图；特征融合层，用于将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果；其中，所述第一特征抽取层包括自下而上级联的多个下采样子层，以及与各所述下采样子层自上而下对应级联的多个层级的上采样子层；所述低层特征图是对所述目标图像在自下而上的第一个下采样子层进行特征提取得到的。On the other hand, the present invention also provides a computer program product. The computer program product includes a computer program. The computer program can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can Execute the high-resolution remote sensing image building semantic segmentation method provided by each of the above methods. The method includes: preprocessing the remote sensing image to obtain the target image; wherein the observation content of the remote sensing image at least includes buildings; The target image is input to the segmentation model, and the building segmentation result output by the segmentation model is obtained; wherein the segmentation model is trained based on the sample remote sensing image and the corresponding labeled building area label of the sample remote sensing image; The segmentation model includes: a first feature extraction layer, which is used to perform step-by-step down-sampling of the refined feature map obtained by the target image, and then step-by-step upsample and fuse the refined feature map to obtain a middle-level feature map. ; The second feature extraction layer is used to perform convolution operations on the refined feature map at different scales to obtain deep feature maps that are fused with the same scale features; the feature fusion layer is used to combine the low-level feature map and the middle-level feature map After the feature map and the high-level feature map are fused, the building segmentation result segmented from the target image is obtained; wherein the first feature extraction layer includes multiple down-sampling sub-layers cascaded from bottom to top. , and multiple levels of upsampling sub-layers cascaded from top to bottom corresponding to each of the down-sampling sub-layers; the low-level feature map is the first bottom-up down-sampling sub-layer of the target image. It is obtained by feature extraction on layers.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的高分辨率遥感影像建筑物语义分割方法，该方法包括：对遥感图像进行预处理，获取目标图像；其中，所述遥感图像的观测内容至少包括建筑物；将所述目标图像输入至分割模型，获得所述分割模型输出的建筑物分割结果；其中，所述分割模型是基于样本遥感图像，以及所述样本遥感图像对应标注的建筑物区域标签训练得到的；所述分割模型包括：第一特征抽取层，用于将对所述目标图像进行逐级下采样所得到的细化特征图后，对所述细化特征图逐级上采样并融合，获得中层特征图；第二特征抽取层，用于对所述细化特征图进行不同尺度的卷积操作，获取由相同尺度特征融合后的深层特征图；特征融合层，用于将低层特征图、所述中层特征图和所述高层特征图进行融合后，得到从所述目标图像分割出的所述建筑物分割结果；其中，所述第一特征抽取层包括自下而上级联的多个下采样子层，以及与各所述下采样子层自上而下对应级联的多个层级的上采样子层；所述低层特征图是对所述目标图像在自下而上的第一个下采样子层进行特征提取得到的。In another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored. The computer program is implemented when executed by the processor to execute the high-resolution remote sensing image building semantics provided by the above methods. Segmentation method, the method includes: preprocessing the remote sensing image to obtain a target image; wherein the observation content of the remote sensing image at least includes buildings; inputting the target image into a segmentation model to obtain the buildings output by the segmentation model Object segmentation results; wherein, the segmentation model is trained based on sample remote sensing images and building area labels corresponding to the sample remote sensing images; the segmentation model includes: a first feature extraction layer, used to After the target image is subjected to step-by-step downsampling to obtain the refined feature map, the refined feature map is gradually upsampled and fused to obtain a mid-level feature map; the second feature extraction layer is used to extract the refined feature map The graph is subjected to convolution operations at different scales to obtain deep feature maps that are fused from features of the same scale; the feature fusion layer is used to fuse the low-level feature map, the mid-level feature map and the high-level feature map to obtain the result from all the features. The building segmentation result obtained by segmenting the target image; wherein, the first feature extraction layer includes a plurality of down-sampling sub-layers cascaded from bottom to top, and corresponds to each of the down-sampling sub-layers from top to bottom. Multiple levels of cascaded upsampling sub-layers; the low-level feature map is obtained by extracting features of the target image in the first down-sampling sub-layer from bottom to top.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for semantic segmentation of buildings in high-resolution remote sensing images, which is characterized by:

Preprocess the remote sensing image to obtain the target image; wherein the observation content of the remote sensing image at least includes buildings;

Input the target image to the segmentation model and obtain the building segmentation result output by the segmentation model;

Wherein, the segmentation model is trained based on sample remote sensing images and building area labels corresponding to the sample remote sensing images; the segmentation model includes:

The first feature extraction layer is used to perform step-by-step downsampling of the refined feature map obtained by the target image, and then step-by-step upsample and fuse the refined feature map to obtain a mid-level feature map;

The second feature extraction layer is used to perform convolution operations on the refined feature map at different scales to obtain a deep feature map that is fused from features of the same scale;

A feature fusion layer, used to fuse the low-level feature map, the middle-level feature map and the high-level feature map to obtain the building segmentation result segmented from the target image;

Wherein, the first feature extraction layer includes a plurality of down-sampling sub-layers cascaded from bottom to top, and a plurality of levels of up-sampling sub-layers cascaded correspondingly from top to bottom to each of the down-sampling sub-layers; The low-level feature map is obtained by extracting features of the target image in the first down-sampling sub-layer from bottom to top.

2. The method for semantic segmentation of buildings in high-resolution remote sensing images according to claim 1, characterized in that, after the refined feature map obtained by stepwise downsampling of the target image, the refined feature map is Feature maps are upsampled and fused step by step to obtain mid-level feature maps, including:

The attention mechanism is used to sequentially perform convolution operations and pooling operations of corresponding scales in each of the downsampling sub-layers to obtain the target feature map;

The target feature map is gradually upsampled to the original size corresponding to each downsampling sub-layer and then feature fusion is performed to obtain the mid-level feature map;

Perform dilated convolution operations and pooling operations on the target feature map to obtain the refined feature map;

Wherein, the two-dimensional dimensions of the refined feature map and the target feature map are the same, and the number of channels of the refined feature map is greater than the number of channels of the feature map.

3. The method for semantic segmentation of buildings in high-resolution remote sensing images according to claim 2, characterized in that each downsampling sub-layer is a bottleneck structure.

4. The method for semantic segmentation of buildings in high-resolution remote sensing images according to claim 1, characterized in that the convolution operation of different scales is performed on the thinned feature map to obtain the deep layer after fusion of features of the same scale. Feature maps, including:

Perform a 1*1 convolution operation on the refined feature maps, perform convolution operations on n convolution kernels with equal increasing hole rates, and perform global average pooling operations to obtain n+2 two-dimensional sizes and channel numbers. The same feature map;

Fusion of n+2 feature maps with the same two-dimensional size and channel number into the deep feature map.

5. The method for semantic segmentation of buildings in high-resolution remote sensing images according to claim 4, characterized in that any one of the n convolution kernels with asymmetrically increasing void rates is composed of a depth The convolution kernel is merged with the standard convolution kernel.

6. The method for semantic segmentation of buildings in high-resolution remote sensing images according to claim 1, characterized in that, after fusing the low-level feature map, the mid-level feature map and the high-level feature map, the obtained result is obtained from the high-level feature map. The building segmentation results obtained by segmenting the target image include:

The high-level feature map is upsampled and then spliced and fused with the mid-level feature map to obtain a first feature fusion map;

Splice and fuse the first feature fusion map whose two-dimensional size and channel number are consistent with the low-level feature map and the low-level feature map to obtain the second feature fusion map;

The two-dimensional size and channel number of the second feature fusion map are restored to be equal to the target image, and the building segmentation result is obtained.

7. The method for semantic segmentation of buildings in high-resolution remote sensing images according to claim 1, characterized in that preprocessing the remote sensing images to obtain the target image includes:

The remote sensing image is cut into sub-images of the same size, and preprocessing operations such as random rotation, perspective transformation, brightness transformation, and color transformation are performed to obtain the target image.

8. A device for semantic segmentation of buildings in high-resolution remote sensing images, which is characterized by including:

A preprocessing module, used to preprocess remote sensing images and obtain target images; wherein the observation content of the remote sensing images at least includes buildings;

A segmentation module, used to input the target image to a segmentation model and obtain the building segmentation result output by the segmentation model;

9. An electronic device, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that when the processor executes the program, it implements claim 1 The semantic segmentation method of buildings in high-resolution remote sensing images described in any one of to 7.

10. A non-transitory computer-readable storage medium with a computer program stored thereon, characterized in that when the computer program is executed by a processor, the high-resolution remote sensing image as described in any one of claims 1 to 7 is realized. A method for semantic segmentation of buildings.