CN110619059A

CN110619059A - Building marking method based on transfer learning

Info

Publication number: CN110619059A
Application number: CN201910745724.1A
Authority: CN
Inventors: 余林林; 毛家发; 胡亚红; 卢书芳; 王宁; 郎嘉瑾
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2019-12-27
Anticipated expiration: 2039-08-13
Also published as: CN110619059B

Abstract

A method for building calibration based on transfer learning, comprising the following steps: 1) constructing a training set and a test set from a self-made data set, the data set comprising a building design drawing from a relevant design institute and an online real-shot building picture, Apply it as the test data for the recommendation system; 2) Build a building calibration technology architecture model, which includes image feature calibration, image feature extraction, database matching, Mahalanobis distance setting and final image authentication; 3) The dataset is divided into small The blocks are sequentially input into the building recognition neural network and the parameters are updated by the back-propagation Adam algorithm, and the last changed classifier is trained by the set number of training rounds; 4) The trained model is calibrated to the actual building picture .

Description

A method of building calibration based on transfer learning

技术领域technical field

本发明涉及图像分类领域，具体是一种基于迁移学习的建筑标定方法The invention relates to the field of image classification, in particular to a building calibration method based on migration learning

背景技术Background technique

建筑物识别同时是计算机视觉，模式识别领域的研究热点之一，它能够使人们根据图像快速获取建筑物的位置，名称，描述等相关信息，在建筑物定位建筑设计，建筑物标记等领域有着重要的应用价值，而如何能有效地标定是建筑物识别的关键问题。Building recognition is also one of the research hotspots in the field of computer vision and pattern recognition. It enables people to quickly obtain the location, name, description and other related information of buildings based on images. important application value, and how to effectively calibrate is the key issue of building identification.

建筑物识别关键技术在于特征的提取，最公共的图像特征包括颜色特征、纹理特征、形状特征，大部分已经建立的图像标注和图像提取体系都是基于这些特征的，而它们的性能大部分都是取决于所提取特征的表示方法,针对颜色特征、纹理特征、形状特征，主要有以下这些使用较广泛的特征描述子：HOG描述子，LBP描述子，HSV描述子，SIFT描述子。使用这些描述子，并结合降维、建立金字塔、构建词袋模型等方法，生成高效的特征提取器。但是传统特征具有非常大的局限性，人为特征工程耗时耗力且对专业领域知识要求高，提取的特征较单一，且随着数据集复杂程度增加，特征效果越差。所以单使用传统特征，已无法满足现计算机视觉领域对特征提取的需求。近年来，随着深度学习技术的发展，基于端对端学习的神经网络结构，依赖大数据及高维参数空间优势，从底层到顶层逐步抽象合成高级特征。数据驱动的自学习方式保证卷积神经网络具有优秀的特征抽取能力。因此将深度学习与建筑物标定结合起来具有实际的应用价值。The key technology of building recognition lies in the extraction of features. The most common image features include color features, texture features, and shape features. Most of the established image annotation and image extraction systems are based on these features, and most of their performances are It depends on the representation method of the extracted features. For color features, texture features, and shape features, there are mainly the following widely used feature descriptors: HOG descriptor, LBP descriptor, HSV descriptor, SIFT descriptor. Using these descriptors, combined with dimensionality reduction, building pyramids, and building bag-of-words models, generate efficient feature extractors. However, traditional features have great limitations. Artificial feature engineering is time-consuming and labor-intensive and requires high professional domain knowledge. The extracted features are relatively simple, and as the complexity of the data set increases, the feature effect is worse. Therefore, the use of traditional features alone cannot meet the needs of feature extraction in the current computer vision field. In recent years, with the development of deep learning technology, the neural network structure based on end-to-end learning relies on the advantages of big data and high-dimensional parameter space to gradually abstract and synthesize advanced features from the bottom layer to the top layer. The data-driven self-learning method ensures that the convolutional neural network has excellent feature extraction ability. Therefore, combining deep learning with building calibration has practical application value.

发明内容SUMMARY OF THE INVENTION

本发明要解决现有的建筑物识别技术精度低、需要大量人为特征工程的缺点，提供一种基于迁移学习的建筑物标定方法。The invention solves the shortcomings of the existing building identification technology with low precision and requires a large number of artificial feature engineering, and provides a building calibration method based on migration learning.

本发明结合迁移学习的卷积神经网络进行特征提取，以及多特征标定技术进行特征划分来让模型能够更好的捕获特征之间的潜在关联信息以提高模型的预测的精度，微调网络模型，通过Inception网络模型获得最后一层的输入，定义为Bottlenecks,然后使用Bottlenecks对最后更改的softmax layer进行训练，同时在spyder可视化平台上进行了仿真实验The invention combines the convolutional neural network of migration learning for feature extraction, and the multi-feature calibration technology for feature division, so that the model can better capture the potential correlation information between the features to improve the prediction accuracy of the model, fine-tune the network model, through The Inception network model obtains the input of the last layer, which is defined as Bottlenecks, and then uses Bottlenecks to train the last changed softmax layer, and conducts simulation experiments on the spyder visualization platform

本发明解决其技术问题所采用的技术方案是：The technical scheme adopted by the present invention to solve its technical problems is:

一种基于迁移学习的建筑物标定方法，包括如下步骤：A method for building calibration based on transfer learning, comprising the following steps:

步骤1.自制数据集，该建筑物数据集来自于Google图片以及百度图片，基于现有的数据集共分为七类生活中常见的建筑物，分别为教堂，居民楼，医院，酒店，图书馆，别墅住房以及商场，并作均一化处理。Step 1. Self-made data set. The building data set comes from Google pictures and Baidu pictures. Based on the existing data set, it is divided into seven types of common buildings in life, namely churches, residential buildings, hospitals, hotels, and books. Pavilions, villa housing and shopping malls, and make uniform treatment.

步骤2.构建基于迁移学习的多特征标定神经网络模型，整个网络架构分为图像特征标定，图像特征提取，数据库匹配，以及图形认证。Step 2. Build a multi-feature calibration neural network model based on transfer learning. The entire network architecture is divided into image feature calibration, image feature extraction, database matching, and graphic authentication.

图像特征标定：针对建筑物风格明显的建筑物进行单一特征标定，风格多样的进行多特征标定，继而形成该图片的特征矩阵。Image feature calibration: Single feature calibration is carried out for buildings with obvious architectural styles, and multi-feature calibration is carried out for various styles, and then the feature matrix of the image is formed.

图像特征提取：进行特征标定后的图片作为网络输入，多层神经网络由一系列卷积层和下采样层的相互配合来学习原始图像的特征，结合经典的BP算法来调整参数，完成权值更新，BP网络更新权值公式为：Image feature extraction: The image after feature calibration is used as the network input. The multi-layer neural network learns the features of the original image by combining a series of convolutional layers and downsampling layers, and combines the classic BP algorithm to adjust the parameters and complete the weights. Update, the BP network update weight formula is:

ω(t+1)＝ω(t)+ηδ(t)x(t) (1)ω(t+1)=ω(t)+ηδ(t)x(t) (1)

其中ω(t)为连接权值，x(t)为神经元的输出，δ(t)表示该神经元的误差项，η表示学习率，网络中卷积层的网络结构采用卷积的离散型，表示为：where ω(t) is the connection weight, x(t) is the output of the neuron, δ(t) is the error term of the neuron, η is the learning rate, and the network structure of the convolutional layer in the network adopts the discrete convolutional type, expressed as:

其中M_β表示输入特征的一个选择，k表示卷积核，γ表示网络的层数，b表示每个输入特征映射添加的偏置，对于特定的输出映射，输入的映射特征可以应用不同的卷积核卷积得到。f表示卷积神经元所用的激活函数，在此每个块使用的都是ReLu激活函数。where M _β represents a choice of input features, k represents the convolution kernel, γ represents the number of layers of the network, and b represents the bias added by each input feature map. For a specific output map, the input map features can be applied with different volumes The kernel convolution is obtained. f represents the activation function used by the convolutional neuron, and here each block uses the ReLu activation function.

数据库匹配：此处的数据库是来自于训练数据集所得到的每张图片的特征向量，网络训练后每张图片的特征信息会以1024维数据点的形式保存在标签文件中，在测试数据集的时候新训练的图片特征信息会通过计算断其马氏距离来进行数据库的匹配。Database matching: The database here is the feature vector of each image obtained from the training data set. After network training, the feature information of each image will be saved in the label file in the form of 1024-dimensional data points. When the newly trained image feature information will be matched to the database by calculating its Mahalanobis distance.

图像认证：图像特征提取模块完成特征提取后，得到两个输出结果，一是图像的特征向量，二是图像分类后的标签。图像认证模块根据待认证图像已有标签先将其归入图像库对应类别中，再与该类别其它图像进行马氏距离计算，结果计为D_i(i＝1，2，3…，m)，取出D_min。根据计算结果来判断待认证图像是否出自该数据库或与该数据库的图像出自同一张源图。判断方法本文使用阈值法，阈值Th_m根据大量实验结果设定。如果D_min>Th_m，即该图像和该类中所有图像距离较大，不属于该数据库。反之，则该图像属于该数据库，且与D_min对应的图像为同一图像或者来源于同一张图像。Image authentication: After the image feature extraction module completes the feature extraction, two output results are obtained, one is the feature vector of the image, and the other is the label after image classification. The image authentication module first classifies the image to be authenticated into the corresponding category of the image library according to the existing tags of the image, and then calculates the Mahalanobis distance with other images of this category, and the result is calculated as D _i (i=1, 2, 3..., m) , take out D _min . According to the calculation result, it is judged whether the image to be authenticated comes from the database or the same source image as the image in the database. Judgment method The threshold method is used in this paper, and the threshold Th _m is set according to a large number of experimental results. If D _min >Th _m , that is, the distance between the image and all images in this class is large, and it does not belong to the database. On the contrary, the image belongs to the database, and the image corresponding to D _min is the same image or comes from the same image.

关于马氏距离的定义：已知有M个样本向量X₁～X_m，协方差矩阵记为S，均值记为向量μ，则其中样本向量X_i到μ的马氏距离D(X)表示为：Regarding the definition of Mahalanobis distance: it is known that there are M sample vectors X ₁ ～X _m , the covariance matrix is denoted as S, and the mean value is denoted as vector μ, then the Mahalanobis distance D(X) from the sample vector X _i to μ is expressed as for:

而其中向量X_i与X_j的马氏距离定义为：The Mahalanobis distance between the vectors X _i and X _j is defined as:

若协方差矩阵是单位矩阵，即各样本向量之间独立同分布，则公式就变成：If the covariance matrix is a unit matrix, that is, independent and identical distribution among the sample vectors, the formula becomes:

也就是欧式距离。若协方差矩阵是对角矩阵，公式就变成标准化欧氏距离。因此，马氏距离和欧式距离相比，它是一种有效的计算两个未知样本集的相似度的方法。与欧氏距离不同的是它考虑到各种特性之间的联系且不受纲量影响，本过程主要测试图像认证模块准确率，通过对自制数据集每张图片进行小范围旋转，仿射，灰度变换等操作，合成新的数据集来模拟篡改的图像，而图像认证准确率主要取决于前期图像分类的结果以及阈值的选取，所以本实验目的是找到合适的阈值，阈值的初始值通过计算变换后图像与原图像的距离，统计分析后设定，通过大量实验分析，原图像与经过预处理变换的马氏距离阈值均在1.2左右，而两幅完全不同图像的马氏距离基本都是3以上，所以阈值的设定可以允许比较粗糙，所以我们可以设定阈值为2.0左右并进行实验。That is the Euclidean distance. If the covariance matrix is a diagonal matrix, the formula becomes the normalized Euclidean distance. Therefore, compared with the Euclidean distance, Mahalanobis distance is an effective method to calculate the similarity between two unknown sample sets. Unlike the Euclidean distance, it takes into account the relationship between various characteristics and is not affected by dimensions. This process mainly tests the accuracy of the image authentication module. Grayscale transformation and other operations, synthesizing a new data set to simulate the tampered image, and the image authentication accuracy mainly depends on the results of the previous image classification and the selection of the threshold, so the purpose of this experiment is to find a suitable threshold, and the initial value of the threshold passes Calculate the distance between the transformed image and the original image, and set it after statistical analysis. Through a lot of experimental analysis, the threshold of the Mahalanobis distance between the original image and the preprocessed transformation is about 1.2, while the Mahalanobis distance of two completely different images is basically the same. It is more than 3, so the setting of the threshold can be allowed to be rough, so we can set the threshold to about 2.0 and conduct experiments.

步骤3.网络训练，过程如下：Step 3. Network training, the process is as follows:

将网络的训练轮数定为10000次，训练的batch—size为100，每10次进行一轮预测，与此同时将全部数据的80％作为主要的训练样本T，另外10％作为训练过程中的交叉验证样本V，剩下的10％作为预测分类器在真实世界中的表现的测试数据集K，训练使用的算法为SGD(随机梯度下降算法)，本网络的输入为大小为229x229x3，在块中的卷积层全部采用3x3小型卷积核进行参数削减，通过迁移了来自根据imageNet训练的GoogleNet权重，并进行微调，使网络从最优点开始训练，重新训练分类器，计算训练集T的log loss损失函数值，根据反向传播算法更新多层神经网络的权值和偏差。The number of training rounds of the network is set to 10,000 times, the training batch-size is 100, and a round of prediction is performed every 10 times. At the same time, 80% of the total data is used as the main training sample T, and the other 10% is used as a The cross-validation sample V of V, and the remaining 10% are used as the test dataset K for predicting the performance of the classifier in the real world. The algorithm used for training is SGD (stochastic gradient descent algorithm). The input of this network is 229x229x3 in size. The convolutional layers in the block all use 3x3 small convolution kernels for parameter reduction. By migrating the GoogleNet weights from imageNet training and fine-tuning, the network starts training from the optimal point, retrains the classifier, and calculates the training set T. log loss loss function value, update the weights and biases of the multi-layer neural network according to the back-propagation algorithm.

计算验证集V的log loss损失函数值，判定模型是否收敛，如果收敛模型训练结束否则进入下一批训练集T上的数据进行训练直到验证集V上的log loss 损失函数值趋向收敛。最后用测试集K验证模型的精确度。Calculate the log loss loss function value of the verification set V to determine whether the model has converged. If the converged model training ends, otherwise enter the next batch of data on the training set T for training until the log loss loss function value on the verification set V tends to converge. Finally, the accuracy of the model is verified with the test set K.

步骤4.将训练好的模型对实际建筑物图片进行标定。Step 4. Calibrate the trained model to the actual building image.

本发明的优点是：采用基于迁移学习的“预训练-微调模式训练新的分类器和网络权重，该方式相比较于传统机器学习方法具有高通用性以及稳定性，同时本发明在此提出的多特征标定技术，通过对多功能建筑物进行人为特征标定使通过网络瓶颈层的特征向量更具表征性，大大的提高了网络分类准确率。The advantages of the present invention are: using the "pre-training-fine-tuning mode based on migration learning to train new classifiers and network weights, this method has high versatility and stability compared with traditional machine learning methods, and the present invention proposes here. The multi-feature calibration technology makes the feature vector passing through the bottleneck layer of the network more representative by performing artificial feature calibration on the multi-functional building, which greatly improves the accuracy of network classification.

附图说明Description of drawings

图1是本发明的实验Build-7数据集的示意图。FIG. 1 is a schematic diagram of the experimental Build-7 dataset of the present invention.

图2是本发明的建筑物特征矩阵表示图。FIG. 2 is a representation of a building characteristic matrix of the present invention.

图3是本发明的建筑物识别神经网络图。Fig. 3 is a building identification neural network diagram of the present invention.

图4是本发明的建筑物标定技术架构图。FIG. 4 is a structural diagram of the building calibration technology of the present invention.

具体实施方式Detailed ways

为了更好的说明本发明的技术方案，下面结合附图，通过一个实施例，对本发明做进一步说明。In order to better illustrate the technical solution of the present invention, the present invention will be further described below through an embodiment with reference to the accompanying drawings.

一种基于迁移学习的建筑标定方法，包括如下步骤：A method for building calibration based on transfer learning, comprising the following steps:

步骤1.取自制数据集Build-7中如图1数据分别构造训练集和测试集，该数据集包含来自相关设计所的建筑物设计图以及网上实拍建筑物图片，应用作为推荐系统测试数据。Step 1. Take the data as shown in Figure 1 in the self-made dataset Build-7 to construct a training set and a test set respectively. The data set contains the building design drawings from the relevant design institutes and the actual building pictures on the Internet, and is used as a recommendation system test. data.

步骤2.模型特征标定部分图2所示，对待训练的图片进行多特征标定并以特征矩阵的方式进行标注，1表示此建筑物存在该特征，-1表示不存在此特征。经过特征标注的图片作为网络(如图3)输入，网络中迁移的Inception模块组是将多个Inception模块串联起里，该结构将CNN中常用的卷积(1x1，3x3， 5x5)、池化操作(3x3)堆叠在一起(卷积、池化后的尺寸相同，将通道相加)，一方面增加了网络的宽度，另一方面也增加了网络对尺度的适应性。网络卷积层中的网络能够提取输入的每一个细节信息，同时5x5的滤波器也能够覆盖大部分接受层的的输入，对于卷积层，通常用成百上千的卷积核生成大量的特征图以捕获图像的各种特征，因此，可以聚合这些特征图的局部激活值作为特征向量，以构建比手工提取的描述符和直接从CNN全连接层提取的特征更具区分性的表达。Step 2. Model feature calibration part As shown in Figure 2, multi-feature calibration is performed on the image to be trained and marked in the form of a feature matrix. 1 indicates that this feature exists in this building, and -1 indicates that this feature does not exist. The feature-labeled picture is input to the network (as shown in Figure 3). The Inception module group migrated in the network is a series of multiple Inception modules. This structure combines the convolution (1x1, 3x3, 5x5), pooling and The operations (3x3) are stacked together (the same size after convolution, pooling, adding the channels), which increases the width of the network on the one hand, and the adaptability of the network to scale on the other hand. The network in the convolutional layer of the network can extract every detail of the input, and the 5x5 filter can also cover most of the input of the receiving layer. For the convolutional layer, hundreds or thousands of convolution kernels are usually used to generate a large number of Feature maps are used to capture various features of an image, therefore, the local activation values of these feature maps can be aggregated as feature vectors to build more discriminative representations than hand-extracted descriptors and features directly extracted from fully connected layers of CNNs.

步骤3.本发明的标记技术框架如图4，将数据集切分成小块依次输入到建筑物识别神经网络中利用反向传播算法更新参数，本网络的输入为大小为 229x229x3，在块中的卷积层全部采用3x3小型卷积核进行参数削减，经过特征标定和特征提取后，马氏距离设定为2.0，此外制定训练轮次为10000次，每一个块的大小(batch size)包含100条数据，其中80％的数据集作为训练样本集，10％作为训练中的交叉验证集，剩下的10％作为预测分类器在真实世界中的表现的测试数据集，每10轮进行一次预测，训练过程中使用Adam算法对模型进行训练，学习率取0.01，每个训练回合结束时，计算验证集上的验证误差。Step 3. The labeling technology framework of the present invention is shown in Figure 4. The data set is divided into small blocks and input into the building recognition neural network in turn, and the parameters are updated using the back-propagation algorithm. The input of this network is 229x229x3 in size. All convolution layers use 3x3 small convolution kernels for parameter reduction. After feature calibration and feature extraction, the Mahalanobis distance is set to 2.0. In addition, the training round is set to 10,000 times, and the size of each block (batch size) includes 100 pieces of data, of which 80% of the dataset is used as a training sample set, 10% is used as a cross-validation set in training, and the remaining 10% is used as a test data set for predicting the performance of the classifier in the real world, with predictions made every 10 epochs , the Adam algorithm is used to train the model during the training process, and the learning rate is set to 0.01. At the end of each training round, the validation error on the validation set is calculated.

步骤4.将训练好的模型在测试集上预测并比较结果Step 4. Predict the trained model on the test set and compare the results

本说明书实施例所述的内容仅仅是对发明构思的实现形式的列举，本发明的保护范围不应当被视为仅限于实施例所陈述的具体形式，本发明的保护范围也及于本领域技术人员根据本发明构思所能够想到的等同技术手段。The content described in the embodiments of the present specification is only an enumeration of the realization forms of the inventive concept, and the protection scope of the present invention should not be regarded as limited to the specific forms stated in the embodiments, and the protection scope of the present invention also extends to those skilled in the art. Equivalent technical means that can be conceived by a person based on the inventive concept.

Claims

1. A method for building calibration based on transfer learning, comprising the following steps:

Step 1. Self-made data set. The building data set comes from Google pictures and Baidu pictures. Based on the existing data sets, it is divided into seven types of common buildings in life, namely churches, residential buildings, hospitals, hotels, and libraries. , villa housing and shopping malls, and make uniform treatment;

Step 2. Build a multi-feature calibration neural network model based on migration learning, and the entire network architecture is divided into image feature calibration, image feature extraction, database matching, and graphic authentication;

Image feature calibration: Single feature calibration is performed for buildings with obvious architectural styles, and multi-feature calibration is performed for various styles, and then the feature matrix of the image is formed;

Image feature extraction: The image after feature calibration is used as the network input. The multi-layer neural network learns the features of the original image by combining a series of convolutional layers and downsampling layers, and combines the classic BP algorithm to adjust the parameters and complete the weights. Update, the BP network update weight formula is:

ω(t+1)=ω(t)+ηδ(t)x(t) (1)

where ω(t) is the connection weight, x(t) is the output of the neuron, δ(t) is the error term of the neuron, η is the learning rate, and the network structure of the convolutional layer in the network adopts the discrete convolutional type, expressed as:

where M _β represents a choice of input features, k represents the convolution kernel, γ represents the number of layers of the network, and b represents the bias added by each input feature map. For a specific output map, the input map features can be applied with different volumes The product kernel convolution is obtained; f represents the activation function used by the convolutional neuron, and each block uses the ReLu activation function here;

Database matching: The database here is the feature vector of each image obtained from the training data set. After network training, the feature information of each image will be saved in the label file in the form of 1024-dimensional data points. When the newly trained image feature information will be matched to the database by calculating its Mahalanobis distance;

Image authentication: After the image feature extraction module completes the feature extraction, two output results are obtained, one is the feature vector of the image, and the other is the label after image classification; the image authentication module first classifies the image to be authenticated into the image library according to the existing label of the image In the corresponding category, the Mahalanobis distance calculation is performed with other images of this category, and the result is counted as D _i (i=1, 2, 3..., m), and D _min is taken out; according to the calculation result, it is judged whether the image to be authenticated comes from the database or from the same source image as the image in the database; the judgment method uses the threshold method in this paper, and the threshold Th _m is set according to a large number of experimental results; if D _min >Th _m , that is, the distance between this image and all images in this class is large, and it does not belong to this category. database; otherwise, the image belongs to the database, and the image corresponding to D _min is the same image or comes from the same image;

The definition of Mahalanobis distance: It is known that there are M sample vectors X ₁ ～X _m , the covariance matrix is denoted as S, and the mean value is denoted as vector μ, then the Mahalanobis distance D(X) from the sample vector X _i to μ is expressed as :

The Mahalanobis distance between the vectors X _i and X _j is defined as:

If the covariance matrix is a unit matrix, that is, independent and identical distribution among the sample vectors, the formula becomes:

That is, the Euclidean distance; if the covariance matrix is a diagonal matrix, the formula becomes the standardized Euclidean distance; therefore, compared with the Euclidean distance, the Mahalanobis distance is an effective method for calculating the similarity of two unknown sample sets. Method; Unlike Euclidean distance, it takes into account the relationship between various characteristics and is not affected by dimensions, and tests the accuracy of the image authentication module. The image authentication accuracy mainly depends on the results of the previous image classification and the selection of the threshold, so the purpose of the experiment is to find a suitable threshold and the initial value of the threshold By calculating the distance between the transformed image and the original image, it is set after statistical analysis;

Step 3. Network training, the process is as follows:

The number of training rounds of the network is set to 10,000 times, the training batch-size is 100, and a round of prediction is performed every 10 times. At the same time, 80% of the total data is used as the main training sample T, and the other 10% is used as the cross in the training process. The validation sample V, the remaining 10% is used as the test dataset K for predicting the performance of the classifier in the real world, the algorithm used for training is the stochastic gradient descent algorithm SGD, the input of the network is 229x229x3, the convolution in the block All layers use 3x3 small convolution kernels for parameter reduction. By migrating the GoogleNet weights from imageNet training and fine-tuning, the network starts training from the optimal point, retrains the classifier, and calculates the log loss function value of the training set T. , update the weights and biases of the multi-layer neural network according to the back-propagation algorithm;

Calculate the log loss loss function value of the verification set V to determine whether the model has converged. If the convergent model training ends, otherwise enter the next batch of data on the training set T for training until the log loss loss function value on the verification set V tends to converge; finally use Test set K to verify the accuracy of the model;

Step 4. Calibrate the trained model to the actual building image.

2. A method for calibrating buildings based on transfer learning, characterized in that: the initial value of the threshold described in step 2 is 2.0.