CN110503636A

CN110503636A - Parameter adjustment method, lesion prediction method, parameter adjustment device and electronic equipment

Info

Publication number: CN110503636A
Application number: CN201910723272.7A
Authority: CN
Inventors: 边成; 郑冶枫; 马锴
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2019-08-06
Filing date: 2019-08-06
Publication date: 2019-11-26
Anticipated expiration: 2039-08-06
Also published as: CN110503636B

Abstract

The disclosure provides a method for adjusting parameters of a single-mode detection network, a method for predicting lesions in eye images, a device for adjusting parameters of a single-mode detection network, and electronic equipment; it relates to the technical field of artificial intelligence. The method includes: performing feature encoding on the first image feature to obtain a first distribution function, and performing feature encoding on the second image feature to obtain a second distribution function; determining the first loss function according to the first distribution function and the second distribution function value; according to the fusion of the sampling result of the second distribution function and the first image, the features of the image to be compared are determined; the value of the second loss function is determined according to the comparison between the feature of the image to be compared and the label to be compared, and according to the first loss The function value and the second loss function value adjust the network parameters of the single mode detection network until the loss function value converges. The method in the present disclosure can overcome the problem of poor parameter adjustment effect to a certain extent, and then improve the processing effect of the network model on the input image.

Description

Parameter adjustment method, lesion prediction method, parameter adjustment device and electronic equipment

技术领域technical field

本公开涉及人工智能技术领域，并涉及机器学习技术，具体而言，涉及一种单模态检测网络的参数调整方法、眼部图像中的病灶预测方法、单模态检测网络的参数调整装置及电子设备。The present disclosure relates to the field of artificial intelligence technology, and to machine learning technology, specifically, to a method for adjusting parameters of a single-mode detection network, a method for predicting lesions in eye images, a device for adjusting parameters of a single-mode detection network, and Electronic equipment.

背景技术Background technique

随着人工智能的不断发展，出现了越来越多的用于进行图像处理的网络模型，以对图像进行特征提取、识别以及分类等操作。With the continuous development of artificial intelligence, more and more network models for image processing have emerged to perform feature extraction, recognition, and classification operations on images.

在通过图像处理网络模型对图像进行处理之前，需要对网络模型进行训练，通常的训练方式为有监督训练。例如，对网络模型输入样本图像，根据网络模型输出的分类结果与目标分类结果的比对确定出相应的损失函数，以根据损失函数对网络模型的网络参数进行调整，进而完成对于网络模型的训练。Before the image is processed by the image processing network model, the network model needs to be trained, and the usual training method is supervised training. For example, input the sample image to the network model, and determine the corresponding loss function according to the comparison between the classification result output by the network model and the target classification result, so as to adjust the network parameters of the network model according to the loss function, and then complete the training of the network model .

但是，在上述训练方式中，输入的样本图像通常为单一样本图像，这样会使得调参效果不佳，进而会影响网络模型对于输入图像的处理效果。However, in the above-mentioned training method, the input sample image is usually a single sample image, which will lead to poor parameter tuning effect, which in turn will affect the processing effect of the network model on the input image.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above background section is only for enhancing the understanding of the background of the present disclosure, and therefore may include information that does not constitute the prior art known to those of ordinary skill in the art.

发明内容Contents of the invention

本公开的目的在于提供一种单模态检测网络的参数调整方法、眼部图像中的病灶预测方法、单模态检测网络的参数调整装置及电子设备，在一定程度上克服调参效果不佳的问题，进而提升网络模型对于输入图像的处理效果。The purpose of the present disclosure is to provide a method for adjusting parameters of a single-modal detection network, a method for predicting lesions in eye images, a device for adjusting parameters of a single-modal detection network, and electronic equipment, so as to overcome the poor effect of parameter adjustment to a certain extent problem, and then improve the processing effect of the network model on the input image.

本公开的其他特性和优点将通过下面的详细描述变得显然，或部分地通过本公开的实践而习得。Other features and advantages of the present disclosure will become apparent from the following detailed description, or in part, be learned by practice of the present disclosure.

根据本公开的第一方面，提供一种单模态检测网络的参数调整方法，包括：According to the first aspect of the present disclosure, a method for adjusting parameters of a single-mode detection network is provided, including:

对第一图像进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像进行特征融合得到；performing feature extraction on the first image, and performing feature encoding on the extracted first image features to obtain a first distribution function, and performing feature encoding on the extracted second image features to obtain a second distribution function; wherein , the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image;

根据第一分布函数和第二分布函数确定第一损失函数值；determining a first loss function value according to the first distribution function and the second distribution function;

根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征；Determining the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image;

根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。Determine the second loss function value according to the comparison between the image features to be compared and the label to be compared, and adjust the network parameters of the single-modal detection network according to the first loss function value and the second loss function value until the loss function value Convergence; wherein, the loss function value includes a first loss function value and a second loss function value.

在本公开的一种示例性实施例中，对提取到的第一图像特征进行特征编码，以得到第一分布函数，包括：In an exemplary embodiment of the present disclosure, feature encoding is performed on the extracted first image features to obtain a first distribution function, including:

通过单模态检测网络提取第一图像对应的第一图像特征；Extracting first image features corresponding to the first image through a single-mode detection network;

对第一图像特征进行特征编码，以确定出第一图像对应的第一分布函数。Perform feature encoding on the features of the first image to determine a first distribution function corresponding to the first image.

在本公开的一种示例性实施例中，对提取到的第二图像特征进行特征编码，以得到第二分布函数，包括：In an exemplary embodiment of the present disclosure, feature encoding is performed on the extracted second image features to obtain a second distribution function, including:

通过多模态检测网络对第一图像和第二图像进行融合，并根据融合结果生成第二图像特征；Fusing the first image and the second image through a multimodal detection network, and generating second image features according to the fusion result;

对第二图像特征进行特征编码，以确定出第一图像和第二图像共同对应的第二分布函数。Feature coding is performed on the features of the second image to determine a second distribution function corresponding to the first image and the second image.

在本公开的一种示例性实施例中，根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征，包括：In an exemplary embodiment of the present disclosure, the characteristics of the image to be compared are determined according to the fusion of the sampling result of the second distribution function and the first image, including:

根据第二分布函数的采样结果与第一图像对应的第一图像特征进行融合，得到第三图像特征，并将第二分布函数的采样结果与第二图像特征进行融合，得到第四图像特征；Fusing the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain a third image feature, and fusing the sampling result of the second distribution function with the second image feature to obtain a fourth image feature;

确定第三图像特征对应的第三分布函数，并确定第四图像特征对应的第四分布函数；determining a third distribution function corresponding to the third image feature, and determining a fourth distribution function corresponding to the fourth image feature;

根据第三分布函数和第四分布函数确定第三损失函数值；determining a third loss function value according to the third distribution function and the fourth distribution function;

根据第四分布函数的采样结果与第三图像特征进行融合，得到第五图像特征，并将第四分布函数的采样结果与第四图像特征进行融合，得到第六图像特征；Fusing the sampling result of the fourth distribution function with the third image feature to obtain the fifth image feature, and fusing the sampling result of the fourth distribution function with the fourth image feature to obtain the sixth image feature;

确定第五图像特征对应的第五分布函数，并确定第六图像特征对应的第六分布函数；determining a fifth distribution function corresponding to the fifth image feature, and determining a sixth distribution function corresponding to the sixth image feature;

根据第五分布函数和第六分布函数确定第四损失函数值；determining a fourth loss function value according to the fifth distribution function and the sixth distribution function;

根据第六分布函数的采样结果与第五图像特征进行融合，得到待比对图像特征。The sampling result of the sixth distribution function is fused with the fifth image feature to obtain the image feature to be compared.

在本公开的一种示例性实施例中，根据待比对图像特征与待比对标签的比对确定第二损失函数值，包括：In an exemplary embodiment of the present disclosure, determining the second loss function value according to the comparison between the image feature to be compared and the label to be compared includes:

对待比对图像特征进行特征处理，并将特征处理后的待比对图像特征与待比对标签进行比对，根据比对结果确定第二损失函数值；其中，特征处理包括卷积处理、池化处理以及非线性激活处理。Perform feature processing on the features of the image to be compared, compare the feature-processed image feature to be compared with the label to be compared, and determine the second loss function value according to the comparison result; wherein, the feature processing includes convolution processing, pooling processing and non-linear activation processing.

在本公开的一种示例性实施例中，该单模态检测网络的参数调整方法还包括：In an exemplary embodiment of the present disclosure, the parameter adjustment method of the single-mode detection network further includes:

将第一损失函数值、第二损失函数值、第三损失函数值以及第四损失函数值的和确定为损失函数值。The sum of the first loss function value, the second loss function value, the third loss function value and the fourth loss function value is determined as the loss function value.

在本公开的一种示例性实施例中，根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛，包括：In an exemplary embodiment of the present disclosure, the network parameters of the single-mode detection network are adjusted according to the first loss function value and the second loss function value until the loss function value converges, including:

根据损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛。The network parameters of the unimodal detection network are adjusted according to the loss function value until the loss function value converges.

根据本公开的第二方面，提供一种眼部图像中的病灶预测方法，包括：According to a second aspect of the present disclosure, a method for predicting lesions in eye images is provided, including:

将眼部图像输入单模态检测网络，根据单模态检测网络拟合眼部图像对应的第一分布函数；Input the eye image into the single-mode detection network, and fit the first distribution function corresponding to the eye image according to the single-mode detection network;

将第一分布函数的采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征；Fusing the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain the second image feature;

根据单模态检测网络拟合第二图像特征对应的第二分布函数，并将第二分布函数的采样结果与第二图像特征进行融合，得到第三图像特征；Fitting a second distribution function corresponding to the second image feature according to the single-mode detection network, and fusing the sampling result of the second distribution function with the second image feature to obtain a third image feature;

根据单模态检测网络拟合第三图像特征对应的第三分布函数，并将第三分布函数的采样结果与第三图像特征进行融合，得到第四图像特征并根据第四图像特征预测眼部图像中的病灶；Fit the third distribution function corresponding to the third image feature according to the single-modal detection network, and fuse the sampling result of the third distribution function with the third image feature to obtain the fourth image feature and predict the eye according to the fourth image feature lesions in the image;

其中，单模态检测网络是根据第一方面提供的一种单模态检测网络的参数调整方法调整得到的。Wherein, the single-mode detection network is adjusted according to a parameter adjustment method of the single-mode detection network provided in the first aspect.

根据本公开的第三方面，提供一种单模态检测网络的参数调整装置，包括分布函数确定单元、损失函数值确定单元、特征融合单元以及参数调整单元，其中：According to a third aspect of the present disclosure, a parameter adjustment device for a single-mode detection network is provided, including a distribution function determination unit, a loss function value determination unit, a feature fusion unit, and a parameter adjustment unit, wherein:

分布函数确定单元，用于对第一图像进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像进行特征融合得到；A distribution function determination unit, configured to perform feature extraction on the first image, and perform feature encoding on the extracted first image features to obtain a first distribution function, and perform feature encoding on the extracted second image features to obtain A second distribution function is obtained; wherein, the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image;

损失函数值确定单元，用于根据第一分布函数和第二分布函数确定第一损失函数值；A loss function value determining unit, configured to determine a first loss function value according to the first distribution function and the second distribution function;

特征融合单元，用于根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征；A feature fusion unit is used to determine the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image;

参数调整单元，用于根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。The parameter adjustment unit is used to determine the second loss function value according to the comparison between the image features to be compared and the label to be compared, and perform network parameters of the single-modal detection network according to the first loss function value and the second loss function value Adjust until the loss function value converges; wherein, the loss function value includes the first loss function value and the second loss function value.

在本公开的一种示例性实施例中，分布函数确定单元对提取到的第一图像特征进行特征编码，以得到第一分布函数的方式具体为：In an exemplary embodiment of the present disclosure, the distribution function determination unit performs feature encoding on the extracted first image features to obtain the first distribution function as follows:

分布函数确定单元通过单模态检测网络提取第一图像对应的第一图像特征；The distribution function determination unit extracts the first image feature corresponding to the first image through a single-mode detection network;

分布函数确定单元对第一图像特征进行特征编码，以确定出第一图像对应的第一分布函数。The distribution function determination unit performs feature encoding on the features of the first image to determine the first distribution function corresponding to the first image.

在本公开的一种示例性实施例中，分布函数确定单元对提取到的第二图像特征进行特征编码，以得到第二分布函数的方式具体为：In an exemplary embodiment of the present disclosure, the distribution function determination unit performs feature encoding on the extracted second image features to obtain the second distribution function as follows:

分布函数确定单元通过多模态检测网络对第一图像和第二图像进行融合，并根据融合结果生成第二图像特征；The distribution function determination unit fuses the first image and the second image through the multimodal detection network, and generates the second image features according to the fusion result;

分布函数确定单元对第二图像特征进行特征编码，以确定出第一图像和第二图像共同对应的第二分布函数。The distribution function determining unit performs feature encoding on the features of the second image to determine a second distribution function corresponding to the first image and the second image.

在本公开的一种示例性实施例中，特征融合单元根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征的方式具体为：In an exemplary embodiment of the present disclosure, the manner in which the feature fusion unit determines the feature of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image is as follows:

特征融合单元根据第二分布函数的采样结果与第一图像对应的第一图像特征进行融合，得到第三图像特征，并将第二分布函数的采样结果与第二图像特征进行融合，得到第四图像特征；The feature fusion unit fuses the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain the third image feature, and fuses the sampling result of the second distribution function with the second image feature to obtain the fourth image feature. image features;

特征融合单元确定第三图像特征对应的第三分布函数，并确定第四图像特征对应的第四分布函数；The feature fusion unit determines a third distribution function corresponding to the third image feature, and determines a fourth distribution function corresponding to the fourth image feature;

特征融合单元根据第三分布函数和第四分布函数确定第三损失函数值；The feature fusion unit determines the third loss function value according to the third distribution function and the fourth distribution function;

特征融合单元根据第四分布函数的采样结果与第三图像特征进行融合，得到第五图像特征，并将第四分布函数的采样结果与第四图像特征进行融合，得到第六图像特征；The feature fusion unit fuses the sampling result of the fourth distribution function with the third image feature to obtain the fifth image feature, and fuses the sampling result of the fourth distribution function with the fourth image feature to obtain the sixth image feature;

特征融合单元确定第五图像特征对应的第五分布函数，并确定第六图像特征对应的第六分布函数；The feature fusion unit determines a fifth distribution function corresponding to the fifth image feature, and determines a sixth distribution function corresponding to the sixth image feature;

特征融合单元根据第五分布函数和第六分布函数确定第四损失函数值；The feature fusion unit determines a fourth loss function value according to the fifth distribution function and the sixth distribution function;

特征融合单元根据第六分布函数的采样结果与第五图像特征进行融合，得到待比对图像特征。The feature fusion unit fuses the sampling result of the sixth distribution function with the fifth image feature to obtain the image feature to be compared.

在本公开的一种示例性实施例中，参数调整单元根据待比对图像特征与待比对标签的比对确定第二损失函数值的方式具体为：In an exemplary embodiment of the present disclosure, the method for the parameter adjustment unit to determine the second loss function value according to the comparison between the features of the image to be compared and the label to be compared is specifically:

参数调整单元对待比对图像特征进行特征处理，并将特征处理后的待比对图像特征与待比对标签进行比对，根据比对结果确定第二损失函数值；其中，特征处理包括卷积处理、池化处理以及非线性激活处理。The parameter adjustment unit performs feature processing on the features of the image to be compared, and compares the features of the image to be compared with the label to be compared, and determines the second loss function value according to the comparison result; wherein, the feature processing includes convolution processing, pooling, and non-linear activation processing.

在本公开的一种示例性实施例中，损失函数值确定单元，还用于将第一损失函数值、第二损失函数值、第三损失函数值以及第四损失函数值的和确定为损失函数值。In an exemplary embodiment of the present disclosure, the loss function value determination unit is further configured to determine the sum of the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value as the loss function value.

在本公开的一种示例性实施例中，损失函数值确定单元根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛的方式具体为：In an exemplary embodiment of the present disclosure, the loss function value determination unit adjusts the network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges, specifically as follows :

损失函数值确定单元根据损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛。The loss function value determination unit adjusts the network parameters of the single-mode detection network according to the loss function value until the loss function value converges.

根据本公开的第四方面，提供一种眼部图像中的病灶预测装置，包括函数拟合单元、图像特征融合单元以及图像特征获取单元，其中：According to a fourth aspect of the present disclosure, there is provided an apparatus for predicting lesions in eye images, including a function fitting unit, an image feature fusion unit, and an image feature acquisition unit, wherein:

函数拟合单元，用于将眼部图像输入单模态检测网络，根据单模态检测网络拟合眼部图像对应的第一分布函数；The function fitting unit is used to input the eye image into the single-mode detection network, and fits the first distribution function corresponding to the eye image according to the single-mode detection network;

图像特征融合单元，用于将第一分布函数的采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征；The image feature fusion unit is used to fuse the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain the second image feature;

图像特征获取单元，用于根据单模态检测网络拟合第二图像特征对应的第二分布函数，并将第二分布函数的采样结果与第二图像特征进行融合，得到第三图像特征；The image feature acquisition unit is used to fit the second distribution function corresponding to the second image feature according to the single-mode detection network, and fuse the sampling result of the second distribution function with the second image feature to obtain the third image feature;

图像特征获取单元，还用于根据单模态检测网络拟合第三图像特征对应的第三分布函数，并将第三分布函数的采样结果与第三图像特征进行融合，得到第四图像特征并根据第四图像特征预测眼部图像中的病灶；The image feature acquisition unit is also used to fit the third distribution function corresponding to the third image feature according to the single-mode detection network, and fuse the sampling result of the third distribution function with the third image feature to obtain the fourth image feature and Predicting a lesion in the eye image according to the fourth image feature;

根据本公开的第五方面，提供一种电子设备，包括：处理器；以及存储器，用于存储所述处理器的可执行指令；其中，所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的方法。According to a fifth aspect of the present disclosure, there is provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the executable instructions to Perform any of the methods described above.

根据本公开的第六方面，提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述任意一项所述的方法。According to a sixth aspect of the present disclosure, there is provided a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method described in any one of the above is implemented.

本公开示例性实施例可以具有以下部分或全部有益效果：Exemplary embodiments of the present disclosure may have some or all of the following beneficial effects:

在本公开的一示例实施方式所提供的单模态检测网络的参数调整方法中，可以对第一图像(如，OCT图像)进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像(如，眼底图像)进行特征融合得到；根据第一分布函数和第二分布函数确定第一损失函数值；进而，可以根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征；进而，可以根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。依据上述方案描述，本公开一方面能够在一定程度上克服调参效果不佳的问题，进而提升网络模型对于输入图像的处理效果；另一方面能够通过多模态(可以理解为图像)融合的方式调整单模态检测网络(如，单模态病灶检测网络)的网络参数，以实现对单模态检测网络的训练，提升网络训练效果，进而改善单模态检测网络对于输入图像的分类效果和识别效果，在本公开实施例应用于病灶识别时，可以提升对于眼底图像的病灶识别准确率。In the parameter adjustment method of a single-mode detection network provided in an exemplary embodiment of the present disclosure, feature extraction may be performed on a first image (such as an OCT image), and feature encoding is performed on the extracted first image features, To obtain the first distribution function, and perform feature encoding on the extracted second image feature to obtain the second distribution function; wherein, the first image feature corresponds to the first image, and the second image feature is composed of the first image and the second image feature Two images (such as fundus images) are obtained by feature fusion; the first loss function value is determined according to the first distribution function and the second distribution function; furthermore, the fusion of the sampling result of the second distribution function and the first image can be used to determine the value to be obtained. Compare the image features; furthermore, the second loss function value can be determined according to the comparison between the image features to be compared and the label to be compared, and the network of the single-modal detection network can be calculated according to the first loss function value and the second loss function value The parameters are adjusted until the loss function value converges; wherein, the loss function value includes a first loss function value and a second loss function value. According to the description of the above solution, on the one hand, the present disclosure can overcome the problem of poor parameter adjustment effect to a certain extent, and then improve the processing effect of the network model on the input image; Adjust the network parameters of the unimodal detection network (for example, unimodal lesion detection network) in order to realize the training of the unimodal detection network, improve the network training effect, and then improve the classification effect of the unimodal detection network for the input image and identification effect, when the embodiments of the present disclosure are applied to lesion identification, the accuracy of lesion identification for fundus images can be improved.

需要说明的是，通过本公开实施例训练得到的单模态检测网络可以应用于眼部疾病识别系统中。在传统的眼部疾病识别系统中，需要将患者的眼底图像和OCT图像成对输入系统，进而，系统可以结合眼底图像和OCT图像的特征确定出病灶；其中的眼部疾病识别系统相当于多模态检测网络，模态可以理解为图像。但是，通过本公开实施例训练得到的单模态检测网络可以仅通过眼底图像或OCT图像确定出病灶，具体地，当向单模态检测网络输入眼底图像(或OCT图像)时，单模态检测网络提取到的图像特征中不仅可以包含眼底图像(或OCT图像)的特征，还可以隐含OCT图像(或眼底图像)的特征，这样能够在保证识别准确率的情况下降低对于输入图像的要求，以往需要输入成对的图像才能进行病灶识别，而使用本公开实施例中的单模态检测网络只需要输入单张图像即可进行病灶识别，一定程度上提升了病灶识别的便捷性以及病灶识别效率。It should be noted that the single-mode detection network trained by the embodiments of the present disclosure can be applied to an eye disease recognition system. In the traditional eye disease identification system, the patient's fundus image and OCT image need to be input into the system in pairs, and then the system can determine the lesion by combining the characteristics of the fundus image and OCT image; the eye disease identification system is equivalent to many Modality detection network, modality can be understood as an image. However, the unimodal detection network obtained through the training of the embodiments of the present disclosure can only determine the lesion through the fundus image or the OCT image, specifically, when the fundus image (or OCT image) is input to the unimodal detection network, the unimodal The image features extracted by the detection network can not only contain the features of the fundus image (or OCT image), but also conceal the features of the OCT image (or fundus image), which can reduce the accuracy of the input image while ensuring the recognition accuracy. Requirements, in the past, it was necessary to input pairs of images to perform lesion identification, but using the single-modal detection network in the embodiment of the present disclosure only needs to input a single image to perform lesion identification, which improves the convenience of lesion identification to a certain extent and The efficiency of lesion identification.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure. Apparently, the drawings in the following description are only some embodiments of the present disclosure, and those skilled in the art can obtain other drawings according to these drawings without creative efforts.

图1示出了可以应用本公开实施例的一种单模态检测网络的参数调整方法及单模态检测网络的参数调整装置的示例性系统架构的示意图；FIG. 1 shows a schematic diagram of an exemplary system architecture of a method for adjusting parameters of a single-mode detection network and a device for adjusting parameters of a single-mode detection network that can be applied to embodiments of the present disclosure;

图2示出了适于用来实现本公开实施例的电子设备的计算机系统的结构示意图；FIG. 2 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present disclosure;

图3示意性示出了根据本公开的一个实施例的单模态检测网络的参数调整方法的流程图；FIG. 3 schematically shows a flowchart of a method for adjusting parameters of a single-mode detection network according to an embodiment of the present disclosure;

图4示意性示出了根据本公开的一个实施例的眼部图像中的病灶预测方法的流程图；Fig. 4 schematically shows a flow chart of a method for predicting a lesion in an eye image according to an embodiment of the present disclosure;

图5示意性示出了根据本公开的一个实施例的眼部图像的示意图；Fig. 5 schematically shows a schematic diagram of an eye image according to an embodiment of the present disclosure;

图6示意性示出了根据本公开的一个实施例的单模态检测网络的参数调整方法的架构图；FIG. 6 schematically shows an architecture diagram of a parameter adjustment method of a single-mode detection network according to an embodiment of the present disclosure;

图7示意性示出了根据本公开的一个实施例的眼部图像中的病灶预测方法的架构图；Fig. 7 schematically shows the architecture diagram of a method for predicting a lesion in an eye image according to an embodiment of the present disclosure;

图8示意性示出了根据本公开的一个实施例中的单模态检测网络的参数调整装置的结构框图；FIG. 8 schematically shows a structural block diagram of a parameter adjustment device of a single-mode detection network according to an embodiment of the present disclosure;

图9示意性示出了根据本公开的一个实施例的眼部图像中的病灶预测装置的结构框图。Fig. 9 schematically shows a structural block diagram of an apparatus for predicting a lesion in an eye image according to an embodiment of the present disclosure.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而，示例实施方式能够以多种形式实施，且不应被理解为限于在此阐述的范例；相反，提供这些实施方式使得本公开将更加全面和完整，并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中，提供许多具体细节从而给出对本公开的实施方式的充分理解。然而，本领域技术人员将意识到，可以实践本公开的技术方案而省略所述特定细节中的一个或更多，或者可以采用其它的方法、组元、装置、步骤等。在其它情况下，不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of embodiments of the present disclosure. However, those skilled in the art will appreciate that the technical solutions of the present disclosure may be practiced without one or more of the specific details being omitted, or other methods, components, devices, steps, etc. may be adopted. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

此外，附图仅为本公开的示意性图解，并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分，因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体，不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体，或在一个或多个硬件模块或集成电路中实现这些功能实体，或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus repeated descriptions thereof will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different network and/or processor means and/or microcontroller means.

图1示出了可以应用本公开实施例的一种单模态检测网络的参数调整方法及单模态检测网络的参数调整装置的示例性应用环境的系统架构的示意图。Fig. 1 shows a schematic diagram of a system architecture of an exemplary application environment of a method for adjusting parameters of a single-mode detection network and an apparatus for adjusting parameters of a single-mode detection network according to embodiments of the present disclosure.

如图1所示，系统架构100可以包括终端设备101、102、103中的一个或多个，网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型，例如有线、无线通信链路或者光纤电缆等等。终端设备101、102、103可以是具有显示屏的各种电子设备，包括但不限于台式计算机、便携式计算机、智能手机和平板电脑等等。应该理解，图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要，可以具有任意数目的终端设备、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。As shown in FIG. 1 , the system architecture 100 may include one or more of terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, and 103 may be various electronic devices with display screens, including but not limited to desktop computers, portable computers, smart phones, and tablet computers. It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers. For example, the server 105 may be a server cluster composed of multiple servers.

本公开实施例所提供的单模态检测网络的参数调整方法和眼部图像中的病灶预测方法一般由服务器105执行，相应地，单模态检测网络的参数调整装置和眼部图像中的病灶预测装置一般设置于服务器105中。但本领域技术人员容易理解的是，本公开实施例所提供的单模态检测网络的参数调整方法和眼部图像中的病灶预测方法也可以由终端设备101、102、103执行，相应的，单模态检测网络的参数调整装置和眼部图像中的病灶预测装置也可以设置于终端设备101、102、103中，本示例性实施例中对此不做特殊限定。举例而言，在一种示例性实施例中，服务器105可以对第一图像进行特征提取，并对提取到的第一图像特征进行特征编码以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码以得到第二分布函数，进而，根据第一分布函数和第二分布函数确定第一损失函数值，以及根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征，进而，根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛。服务器105还可以将眼部图像输入单模态检测网络，根据单模态检测网络拟合眼部图像对应的第一分布函数，并将第一分布函数的采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征，以及根据单模态检测网络拟合第二图像特征对应的第二分布函数，并将第二分布函数的采样结果与第二图像特征进行融合，得到第三图像特征，进而，根据单模态检测网络拟合第三图像特征对应的第三分布函数，并将第三分布函数的采样结果与第三图像特征进行融合，得到第四图像特征并根据第四图像特征预测眼部图像中的病灶。The method for adjusting parameters of a single-mode detection network and the method for predicting lesions in eye images provided by the embodiments of the present disclosure are generally executed by the server 105. The prediction device is generally set in the server 105 . However, those skilled in the art can easily understand that the parameter adjustment method of the single-modal detection network and the lesion prediction method in the eye image provided by the embodiments of the present disclosure can also be executed by the terminal devices 101, 102, and 103. Correspondingly, The device for adjusting parameters of the single-mode detection network and the device for predicting lesions in eye images may also be set in the terminal devices 101, 102, 103, which are not specifically limited in this exemplary embodiment. For example, in an exemplary embodiment, the server 105 may perform feature extraction on the first image, and perform feature encoding on the extracted first image features to obtain a first distribution function, and perform feature encoding on the extracted first image features. The second image feature is subjected to feature encoding to obtain the second distribution function, and then, the first loss function value is determined according to the first distribution function and the second distribution function, and the value to be determined is determined according to the fusion of the sampling result of the second distribution function and the first image. Compare the image features, and then determine the second loss function value according to the comparison between the image features to be compared and the label to be compared, and calculate the network parameters of the single-mode detection network according to the first loss function value and the second loss function value Adjustments are made until the loss function value converges. The server 105 can also input the eye image into the single-mode detection network, fit the first distribution function corresponding to the eye image according to the single-mode detection network, and compare the sampling result of the first distribution function with the first distribution function corresponding to the eye image. The image features are fused to obtain the second image feature, and the second distribution function corresponding to the second image feature is fitted according to the single-mode detection network, and the sampling result of the second distribution function is fused with the second image feature to obtain the second image feature The three image features, and then, according to the single-mode detection network, the third distribution function corresponding to the third image feature is fitted, and the sampling result of the third distribution function is fused with the third image feature to obtain the fourth image feature and according to the first Four-image features predict lesions in eye images.

图2示出了适于用来实现本公开实施例的电子设备的计算机系统的结构示意图。FIG. 2 shows a schematic structural diagram of a computer system suitable for implementing the electronic device of the embodiment of the present disclosure.

需要说明的是，图2示出的电子设备的计算机系统200仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。It should be noted that the computer system 200 of the electronic device shown in FIG. 2 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

如图2所示，计算机系统200包括中央处理单元(CPU)201，其可以根据存储在只读存储器(ROM)202中的程序或者从存储部分208加载到随机访问存储器(RAM)203中的程序而执行各种适当的动作和处理。在RAM 203中，还存储有系统操作所需的各种程序和数据。CPU201、ROM 202以及RAM 203通过总线204彼此相连。输入/输出(I/O)接口205也连接至总线204。As shown in FIG. 2 , a computer system 200 includes a central processing unit (CPU) 201 that can be programmed according to a program stored in a read-only memory (ROM) 202 or a program loaded from a storage section 208 into a random-access memory (RAM) 203 Instead, various appropriate actions and processes are performed. In the RAM 203, various programs and data necessary for system operation are also stored. The CPU 201 , ROM 202 , and RAM 203 are connected to each other via a bus 204 . An input/output (I/O) interface 205 is also connected to the bus 204 .

以下部件连接至I/O接口205：包括键盘、鼠标等的输入部分206；包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分207；包括硬盘等的存储部分208；以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分209。通信部分209经由诸如因特网的网络执行通信处理。驱动器210也根据需要连接至I/O接口205。可拆卸介质211，诸如磁盘、光盘、磁光盘、半导体存储器等等，根据需要安装在驱动器210上，以便于从其上读出的计算机程序根据需要被安装入存储部分208。The following components are connected to the I/O interface 205: an input section 206 including a keyboard, a mouse, etc.; an output section 207 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 208 including a hard disk, etc. and a communication section 209 including a network interface card such as a LAN card, a modem, or the like. The communication section 209 performs communication processing via a network such as the Internet. A drive 210 is also connected to the I/O interface 205 as needed. A removable medium 211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 210 as necessary so that a computer program read therefrom is installed into the storage section 208 as necessary.

特别地，根据本公开的实施例，下文参考流程图描述的过程可以被实现为计算机软件程序。例如，本公开的实施例包括一种计算机程序产品，其包括承载在计算机可读介质上的计算机程序，该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中，该计算机程序可以通过通信部分209从网络上被下载和安装，和/或从可拆卸介质211被安装。在该计算机程序被中央处理单元(CPU)201执行时，执行本申请的方法和装置中限定的各种功能。在一些实施例中，计算机系统200还可以包括AI(ArtificialIntelligence，人工智能)处理器，该AI处理器用于处理有关机器学习的计算操作。In particular, according to an embodiment of the present disclosure, the processes described below with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 209 and/or installed from removable media 211 . When the computer program is executed by a central processing unit (CPU) 201, various functions defined in the method and apparatus of the present application are performed. In some embodiments, the computer system 200 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is configured to process computing operations related to machine learning.

在一些实施例中，计算机系统200还可以包括AI(Artificial Intelligence，人工智能)处理器，该AI处理器用于处理有关机器学习的计算操作。In some embodiments, the computer system 200 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.

人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说，人工智能是计算机科学的一个综合技术，它企图了解智能的实质，并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法，使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科，涉及领域广泛，既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology. Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

计算机视觉技术(Computer Vision,CV)计算机视觉是一门研究如何使机器“看”的科学，更进一步的说，就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常包括图像处理、图像识别、图像语义理解、图像检索、OCR、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术，还包括常见的人脸识别、指纹识别等生物特征识别技术。Computer Vision Technology (Computer Vision, CV) Computer vision is a science that studies how to make machines "see". More specifically, it refers to machine vision that uses cameras and computers instead of human eyes to identify, track and measure targets. And further graphics processing, so that the computer processing becomes an image that is more suitable for human observation or sent to the instrument for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain information from images or multidimensional data. Computer vision technology usually includes image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, virtual reality, augmented reality, simultaneous positioning and maps It also includes common face recognition, fingerprint recognition and other biometric recognition technologies.

机器学习(Machine Learning,ML)是一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为，以获取新的知识或技能，重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心，是使计算机具有智能的根本途径，其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、式教学习等技术。Machine learning (Machine Learning, ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Specializes in the study of how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its application pervades all fields of artificial intelligence. Machine learning and deep learning usually include techniques such as artificial neural network, belief network, reinforcement learning, transfer learning, inductive learning, and teaching learning.

在早期的传统机器学习时代，人们需要精心设计网络参数，以缩短神经网络预测的结果与真实结果之间的差异。而在当前的机器学习时代，人们可以使得神经网络根据设置多层损失函数去自动优化网络参数，在很多场景中已不再需要对网络参数进行精心设计。In the early days of traditional machine learning, people needed to carefully design network parameters to shorten the gap between the results predicted by the neural network and the real results. In the current era of machine learning, people can make the neural network automatically optimize the network parameters according to the multi-layer loss function. In many scenarios, it is no longer necessary to carefully design the network parameters.

以下对本公开实施例的技术方案进行详细阐述：The technical solutions of the embodiments of the present disclosure are described in detail below:

随着中国人口的不断增长和人口老龄化的加剧，眼健康形式日趋严重。据统计，超过50％的人没有接受过常规眼科检查，超过90％的人发病后才接受治疗。例如，中国约有1.1亿糖尿病患者，其中，由此引起的视网膜病变的患者超过4000万，而此类病变若不及早干预治疗，后期很容易导致失明。如能在发病早期进行定期的眼科检查，失明风险可降低94.4％。With the continuous growth of China's population and the aggravation of population aging, the form of eye health is becoming more and more serious. According to statistics, more than 50% of people have not received routine eye examinations, and more than 90% of people receive treatment after the onset of the disease. For example, there are about 110 million diabetic patients in China, of which more than 40 million suffer from retinopathy, which can easily lead to blindness if not treated early. The risk of blindness can be reduced by 94.4% if regular eye examinations are carried out in the early stage of the disease.

光学相干断层扫描(Optical Coherence tomography，OCT)是一种新的成像技术，能够对生物组织的各个方面进行成像，如结构信息、血流以及弹性参数等。OCT对眼底结构观察的清晰度通常高于其他检查方法，通过OCT观察眼底时能够将视网膜神经纤维、内外从、核层、锥杆细胞层、色素上皮层等眼部组织清晰区分开，因此，通过OCT诊断黄斑裂孔、中心性浆液性脉络膜视网膜病变、囊样水肿等眼部疾病通常能够收获较好的效果。此外，OCT设备在成像时由于能够同时获得眼底图像和OCT图像，因此，能够同时根据眼底图像和OCT图像这两种模态上进行病灶诊断，以极大降低病灶遗漏的风险。Optical coherence tomography (OCT) is a new imaging technology that can image various aspects of biological tissues, such as structural information, blood flow, and elastic parameters. The clarity of OCT in observing the fundus structure is usually higher than that of other inspection methods. When observing the fundus through OCT, the retinal nerve fibers, inner and outer nerves, nuclear layer, cone rod cell layer, pigment epithelium and other ocular tissues can be clearly distinguished. Therefore, Diagnosis of macular hole, central serous chorioretinopathy, cystoid edema and other eye diseases by OCT can usually achieve better results. In addition, since OCT equipment can simultaneously obtain fundus images and OCT images during imaging, it can perform lesion diagnosis based on the two modalities of fundus images and OCT images at the same time, so as to greatly reduce the risk of missing lesions.

目前，通常是通过训练能够进行自然图像分类的网络模型，进而实现根据眼部图像识别眼部疾病。传统的网络只有一个固定的分支，比较难处理目标大小不一致的问题(如一个狗在一张照片中可能占据照片中一大部分面积，在另外一张照片中只占一小部分)。通常情况下，固定大小的卷积核无法处理这种信息的不对称。而InceptionV4可以采用拓宽网络宽度的方式，增加网络对不同大小目标的采样信息。此外，DenseNet可以从增加网络的深度来提升网络的性能。由于传统网络的训练都存在梯度消失的问题，使得网络在计算反向传播时梯度为0导致无法进行回传，进而导致训练失败。因此，DenseNet提出采用前面所有层与后面层的密集连接，以加强训练过程中梯度的反向传播。At present, the recognition of eye diseases based on eye images is usually achieved by training a network model capable of classifying natural images. The traditional network has only one fixed branch, and it is difficult to deal with the problem of inconsistent target size (for example, a dog may occupy a large area of the photo in one photo, but only a small part in another photo). Typically, fixed-size convolution kernels cannot handle this information asymmetry. InceptionV4 can use the method of widening the network width to increase the sampling information of the network for targets of different sizes. In addition, DenseNet can improve the performance of the network by increasing the depth of the network. Due to the problem of gradient disappearance in the training of traditional networks, the gradient of the network is 0 when calculating the backpropagation, which makes it impossible to perform backpropagation, which leads to the failure of training. Therefore, DenseNet proposes to use dense connections between all previous layers and subsequent layers to strengthen the backpropagation of gradients during training.

总的来说，InceptionV4和DenseNet在不同维度上都实现了性能的提升，但是上述二者都仅考虑了一个模态(即，一张图像)作为输入，而单一结构的网络并没有办法同时有效处理OCT图像和眼底图像两个模态，而缺少某个模态的信息可能会造成训练时调参效果不佳以及测试时网络对部分数据分类不正确的问题。In general, InceptionV4 and DenseNet have achieved performance improvements in different dimensions, but both of the above only consider one modality (ie, an image) as input, and a single-structure network cannot be effective at the same time. The two modalities of OCT images and fundus images are processed, and the lack of information of a certain modality may cause poor parameter adjustment effects during training and incorrect classification of some data by the network during testing.

基于上述一个或多个问题，本示例实施方式提供了一种单模态检测网络的参数调整方法。该单模态检测网络的参数调整方法可以应用于上述服务器105，也可以应用于上述终端设备101、102、103中的一个或多个，本示例性实施例中对此不做特殊限定。参考图3所示，该单模态检测网络的参数调整方法可以包括以下步骤S310至步骤S340：Based on one or more of the above problems, this example embodiment provides a method for adjusting parameters of a single-mode detection network. The method for adjusting parameters of the single-mode detection network may be applied to the above-mentioned server 105, and may also be applied to one or more of the above-mentioned terminal devices 101, 102, 103, which is not specifically limited in this exemplary embodiment. Referring to Fig. 3, the parameter adjustment method of the single-mode detection network may include the following steps S310 to S340:

步骤S310：对第一图像进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像进行特征融合得到。Step S310: Perform feature extraction on the first image, and perform feature encoding on the extracted first image features to obtain a first distribution function, and perform feature encoding on the extracted second image features to obtain a second distribution function function; wherein, the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image.

步骤S320：根据第一分布函数和第二分布函数确定第一损失函数值。Step S320: Determine a first loss function value according to the first distribution function and the second distribution function.

步骤S330：根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征。Step S330: Determine the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image.

步骤S340：根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。Step S340: Determine the second loss function value according to the comparison between the image features to be compared and the label to be compared, and adjust the network parameters of the single-modal detection network according to the first loss function value and the second loss function value until The loss function value converges; wherein, the loss function value includes a first loss function value and a second loss function value.

在本公开的一示例实施方式所提供的单模态检测网络的参数调整方法中，可以对第一图像(如，OCT图像)进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像(如，眼底图像)进行特征融合得到；根据第一分布函数和第二分布函数确定第一损失函数值；进而，可以根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征；进而，可以根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。依据上述方案描述，本公开一方面能够在一定程度上克服调参效果不佳的问题，进而提升网络模型对于输入图像的处理效果；另一方面能够通过多模态(模态可以理解为图像)融合的方式调整单模态检测网络的网络参数，以实现对单模态检测网络的训练，提升网络训练效果，进而改善单模态检测网络对于输入图像的分类效果和识别效果，在本公开实施例应用于病灶识别时，可以提升对于眼底图像的病灶识别准确率。In the parameter adjustment method of a single-mode detection network provided in an exemplary embodiment of the present disclosure, feature extraction may be performed on a first image (such as an OCT image), and feature encoding is performed on the extracted first image features, To obtain the first distribution function, and perform feature encoding on the extracted second image features to obtain the second distribution function; wherein, the first image feature corresponds to the first image, and the second image feature is composed of the first image and the second image feature Two images (such as fundus images) are obtained by feature fusion; the first loss function value is determined according to the first distribution function and the second distribution function; furthermore, the fusion of the sampling result of the second distribution function and the first image can be used to determine the value to be obtained. Compare the image features; furthermore, the second loss function value can be determined according to the comparison between the image features to be compared and the label to be compared, and the network of the single-modal detection network can be calculated according to the first loss function value and the second loss function value The parameters are adjusted until the loss function value converges; wherein, the loss function value includes a first loss function value and a second loss function value. According to the description of the above solution, on the one hand, the present disclosure can overcome the problem of poor parameter adjustment effect to a certain extent, and then improve the processing effect of the network model on the input image; The network parameters of the single-mode detection network are adjusted in a fusion manner to realize the training of the single-mode detection network, improve the network training effect, and then improve the classification effect and recognition effect of the single-mode detection network for the input image, which is implemented in this disclosure For example, when applied to focus recognition, it can improve the accuracy of focus recognition for fundus images.

此外，本公开的实施例可以应用于OCT及眼底疾病检测算法中，通过输入患者成对的OCT图像和眼底图像，进而结合OCT图像和眼底图像进行眼部疾病诊断。由于其测试阶段无监督的特性，因此可随意嵌入OCT或眼底筛查系统中，为系统提供另一个模态的信息作为补充，以同步提升眼底以及OCT网络的诊断能力。In addition, the embodiments of the present disclosure can be applied to OCT and fundus disease detection algorithms, by inputting paired OCT images and fundus images of patients, and then combining OCT images and fundus images to diagnose eye diseases. Due to its unsupervised nature in the testing phase, it can be embedded in the OCT or fundus screening system at will, providing another modality of information as a supplement to the system to simultaneously improve the diagnostic capabilities of the fundus and the OCT network.

需要说明的是，本公开实施例中所提及的眼底图像可以是通过眼底相机拍摄得到的，OCT图像可以是通过OCT成像设备拍摄得到的。可选的，也可以通过OCT成像设备同时获取眼底图像和OCT图像，本公开的实施例不作限定。其中，OCT图像为眼底图像中特定位置的相干断层扫描图像。It should be noted that the fundus image mentioned in the embodiments of the present disclosure may be obtained by taking a fundus camera, and the OCT image may be obtained by taking an OCT imaging device. Optionally, the fundus image and the OCT image may also be acquired simultaneously by an OCT imaging device, which is not limited in this embodiment of the present disclosure. Wherein, the OCT image is a coherence tomography image of a specific position in the fundus image.

此外，上述的眼底相机是用于获取人眼视网膜图像(即，眼底图像)的一种医学相机。此外，OCT成像设备用于通过弱相干光干涉仪的基本原理，检测生物组织不同深度层面对入射弱相干光的背向反射或散射信号，通过扫描可得到生物组织二维或三维结构图像(即，OCT图像)。In addition, the above-mentioned fundus camera is a kind of medical camera used to acquire retinal images of human eyes (ie, fundus images). In addition, OCT imaging equipment is used to detect the back reflection or scattering signals of incident weak coherent light at different depths of biological tissue through the basic principle of weak coherent light interferometer, and obtain two-dimensional or three-dimensional structural images of biological tissue through scanning (ie , OCT image).

下面，对于本示例实施方式的上述步骤进行更加详细的说明。Next, the above-mentioned steps of this exemplary embodiment will be described in more detail.

在步骤S310中，对第一图像进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像进行特征融合得到。In step S310, perform feature extraction on the first image, and perform feature encoding on the extracted first image features to obtain the first distribution function, and perform feature encoding on the extracted second image features to obtain the first Two distribution functions; wherein, the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image.

本示例实施方式中，第一图像和第二图像均可以为眼部图像，其中，眼部图像包括眼底图像和OCT图像，若第一图像为眼底图像，那么第二图像则为OCT图像；若第一图像为OCT图像，那么第二图像则为眼底图像。其中，眼底图像的数据大小可以为496×496，OCT图像的数据大小可以为496×X，X可以为496、768或1024。In this exemplary embodiment, both the first image and the second image may be eye images, wherein the eye image includes a fundus image and an OCT image, and if the first image is a fundus image, then the second image is an OCT image; if The first image is an OCT image, and the second image is a fundus image. Wherein, the data size of the fundus image may be 496×496, the data size of the OCT image may be 496×X, and X may be 496, 768 or 1024.

本示例实施方式中，如果眼底图像的数据大小可以为496×496，OCT图像的数据大小可以为496×496，在步骤S310之前，该单模态检测网络的参数调整方法还可以包括以下步骤：In this exemplary embodiment, if the data size of the fundus image can be 496×496, and the data size of the OCT image can be 496×496, before step S310, the parameter adjustment method of the single-modal detection network can also include the following steps:

计算第一图像对应的图像均值与图像方差的比值以及第二图像对应的图像均值与图像方差的比值，以实现对第一图像和第二图像的标准化；calculating the ratio of the image mean value corresponding to the first image to the image variance and the ratio of the image mean value corresponding to the second image to the image variance, so as to realize the standardization of the first image and the second image;

对标准化后的第一图像和第二图像执行随机旋转、随机水平翻转、随机弹性形变或添加随机斑点噪声等操作，以得到多组不同形态的图像，每组图像中包括第一图像和第二图像，以用于步骤S310中。Perform operations such as random rotation, random horizontal flip, random elastic deformation, or adding random speckle noise on the standardized first image and second image to obtain multiple sets of images of different shapes, each set of images includes the first image and the second image Two images are used in step S310.

本示例实施方式中，可选的，对提取到的第一图像特征进行特征编码，以得到第一分布函数的方式具体为：In this example embodiment, optionally, the extracted first image feature is subjected to feature encoding to obtain the first distribution function as follows:

本示例实施方式中，第一图像特征是第一分布函数的子集，第一分布函数的表现形式可以为(特征的)集合。In this example implementation, the first image feature is a subset of the first distribution function, and the expression form of the first distribution function may be a set (of features).

本示例实施方式中，通过单模态检测网络提取第一图像对应的第一图像特征的方式具体为：通过单模态检测网络对第一图像进行N次卷积处理，对卷积结果进行非线性激活处理(即，relu层处理)，并将非线性激活处理结果确定为与第一图像对应的第一图像特征；其中，N为正整数。In this exemplary embodiment, the method of extracting the first image feature corresponding to the first image through the single-modal detection network is specifically: performing N times of convolution processing on the first image through the single-modal detection network, and performing non-convolutional processing on the convolution result. Linear activation processing (that is, relu layer processing), and determining the nonlinear activation processing result as the first image feature corresponding to the first image; wherein, N is a positive integer.

本示例实施方式中，对第一图像特征进行特征编码，以确定出第一图像对应的第一分布函数的方式具体为：将第一图像特征输入编码器以使得编码器对其进行特征编码，进而确定出第一图像对应的第一分布函数；其中，编码器又名Encoder，由一系列卷积、激活层、批标准化层构成。In this example embodiment, the method of performing feature encoding on the first image feature to determine the first distribution function corresponding to the first image is specifically: input the first image feature into the encoder so that the encoder performs feature encoding on it, Then determine the first distribution function corresponding to the first image; wherein, the encoder is also called Encoder, which is composed of a series of convolution, activation layers, and batch normalization layers.

可见，实施该可选的实施方式，能够通过编码器对第一图像进行特征编码，以确定出第一分布函数，进而可以用于逼近其与第二分布函数的相似程度，提升单模态检测网络的图像特征生成效果。It can be seen that, implementing this optional embodiment, the encoder can perform feature encoding on the first image to determine the first distribution function, which can then be used to approximate its similarity with the second distribution function and improve single-mode detection Image feature generation effects for networks.

本示例实施方式中，可选的，对提取到的第二图像特征进行特征编码，以得到第二分布函数的方式具体为：In this example embodiment, optionally, the extracted second image features are subjected to feature encoding to obtain the second distribution function as follows:

本示例实施方式中，第二图像特征是第二分布函数的子集，第二分布函数的表现形式可以为(特征的)集合。In this example embodiment, the second image feature is a subset of the second distribution function, and the expression form of the second distribution function may be a set (of features).

本示例实施方式中，通过多模态检测网络对第一图像和第二图像进行融合，并根据融合结果生成第二图像特征的方式具体为：通过多模态检测网络将第一图像和第二图像融合成待处理图像，进而提取待处理图像对应的第二图像特征；其中，第二图像特征中包括第一图像对应的图像特征和第二图像对应的图像特征。In this exemplary embodiment, the method of fusing the first image and the second image through the multi-modal detection network, and generating the features of the second image according to the fusion result is as follows: combining the first image and the second image through the multi-modal detection network The images are fused into an image to be processed, and then a second image feature corresponding to the image to be processed is extracted; wherein, the second image feature includes an image feature corresponding to the first image and an image feature corresponding to the second image.

进一步地，提取待处理图像对应的第二图像特征的方式具体为：通过多模态检测网络对待处理图像进行N次卷积处理，对卷积结果进行非线性激活处理，并将非线性激活处理结果确定为与待处理图像对应的第二图像特征；其中，N为正整数。Further, the method of extracting the second image feature corresponding to the image to be processed is specifically: performing N times of convolution processing on the image to be processed through a multimodal detection network, performing nonlinear activation processing on the convolution result, and performing nonlinear activation processing The result is determined as the second image feature corresponding to the image to be processed; wherein, N is a positive integer.

本示例实施方式中，对第二图像特征进行特征编码，以确定出第一图像和第二图像共同对应的第二分布函数的方式具体为：将第二图像特征输入编码器以使得编码器对其进行特征编码，进而确定出第一图像和第二图像共同对应的第二分布函数。In this example embodiment, the method of performing feature encoding on the second image features to determine the second distribution function corresponding to the first image and the second image is specifically: input the second image features into the encoder so that the encoder can It performs feature encoding, and then determines the second distribution function corresponding to the first image and the second image.

可见，实施该可选的实施方式，能够通过编码器对第二图像进行特征编码，以确定出第二分布函数，进而可以用于逼近其与第一分布函数的相似程度，提升单模态检测网络的图像特征生成效果。It can be seen that, implementing this optional embodiment, the encoder can perform feature encoding on the second image to determine the second distribution function, which can then be used to approximate its similarity with the first distribution function and improve single-mode detection Image feature generation effects for networks.

在步骤S320中，根据第一分布函数和第二分布函数确定第一损失函数值。In step S320, a first loss function value is determined according to the first distribution function and the second distribution function.

本示例实施方式中，第一损失函数值用于表征第一分布函数和第二分布函数的差异。In this example implementation, the first loss function value is used to characterize the difference between the first distribution function and the second distribution function.

在步骤S330中，根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征。In step S330, the features of the image to be compared are determined according to the fusion of the sampling result of the second distribution function and the first image.

本示例实施方式中，第二分布函数的采样结果的确定方式可以为：对第二分布函数中的未知数进行赋值，以根据赋值后的表达式确定出第二分布函数值，作为第二分布函数的采样结果；其中，采样结果中可以包括至少一个第二分布函数值。In this example embodiment, the method of determining the sampling result of the second distribution function may be: assign a value to the unknown in the second distribution function, so as to determine the value of the second distribution function according to the expression after the assignment, as the second distribution function The sampling result; wherein, the sampling result may include at least one second distribution function value.

本示例实施方式中，可选的，根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征的方式具体为：In this exemplary embodiment, optionally, the method of determining the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image is specifically as follows:

本示例实施方式中，根据第二分布函数的采样结果与第一图像对应的第一图像特征进行融合，得到第三图像特征的方式具体可以为：对第一图像特征进行N次卷积处理；其中，N为正整数；将后验编码器拟合的高斯分布中采集的隐变量与卷积处理后的第一图像特征进行融合，再将融合结果进行极大值池化处理，得到第三图像特征。同理，得到第四图像特征、第五图像特征和第六图像特征的方式与上述得到第三图像特征的方式相同，此处不再赘述。In this exemplary embodiment, the method of obtaining the third image feature by fusing the sampling result of the second distribution function with the first image feature corresponding to the first image may specifically be: performing N times of convolution processing on the first image feature; Among them, N is a positive integer; the hidden variables collected in the Gaussian distribution fitted by the posterior encoder are fused with the first image features after convolution processing, and then the fusion results are subjected to maximum value pooling processing to obtain the third image features. Similarly, the method of obtaining the fourth image feature, the fifth image feature and the sixth image feature is the same as the above-mentioned method of obtaining the third image feature, and will not be repeated here.

本示例实施方式中，第二分布函数的采样结果可以理解为：通过在第一后验编码器拟合的高斯分布中采集的隐变量。第四分布函数的采样结果可以理解为：通过在第二后验编码器拟合的高斯分布中采集的隐变量。第六分布函数的采样结果可以理解为：通过在第三后验编码器拟合的高斯分布中采集的隐变量。In this exemplary embodiment, the sampling result of the second distribution function may be understood as: a latent variable collected in the Gaussian distribution fitted by the first posterior encoder. The sampling result of the fourth distribution function can be understood as: the latent variables collected in the Gaussian distribution fitted by the second posterior encoder. The sampling result of the sixth distribution function can be understood as: the latent variables collected in the Gaussian distribution fitted by the third posterior encoder.

此外，需要说明的是，单模态检测网络可以包括三个先验网络，多模态检测网络可以包括三个后验网络，每个网络中包括一对应的编码器；可选的，单模态检测网络中包括的先验网络的数量可以为至少一个，多模态检测网络中包括的后验网络的数量也可以为至少一个，本公开的实施例不作限定。In addition, it should be noted that the single-mode detection network may include three prior networks, and the multi-mode detection network may include three posterior networks, and each network includes a corresponding encoder; The number of a priori networks included in the modal detection network may be at least one, and the number of a posteriori networks included in the multimodal detection network may also be at least one, which is not limited in the embodiments of the present disclosure.

其中，第一后验网络包括第一后验编码器，第二后验网络包括第二后验编码器，第三后验网络包括第三后验编码器。同理，第一先验网络包括第一先验编码器，第二先验网络包括第二先验编码器，第三先验网络包括第三先验编码器。先验编码器和后验编码器均用于对图像特征进行特征编码，以确定对应的分布函数。Wherein, the first a posteriori network includes a first a posteriori encoder, the second a posteriori network includes a second a posteriori encoder, and the third a posteriori network includes a third a posteriori encoder. Similarly, the first prior network includes a first prior encoder, the second prior network includes a second prior encoder, and the third prior network includes a third prior encoder. Both the prior encoder and the posterior encoder are used to feature encode the image features to determine the corresponding distribution function.

本示例实施方式中，第一分布函数是通过第一先验网络确定出的，第二分布函数是通过第一后验网络确定出的；第三分布函数是通过第二先验网络确定出的，第四分布函数是通过第二后验网络确定出的；第五分布函数是通过第三先验网络确定出的，第六分布函数是通过第三后验网络确定出的。In this exemplary embodiment, the first distribution function is determined through the first prior network, the second distribution function is determined through the first posterior network; the third distribution function is determined through the second prior network , the fourth distribution function is determined through the second posterior network; the fifth distribution function is determined through the third prior network, and the sixth distribution function is determined through the third posterior network.

可见，实施该可选的实施方式，能够确定三个损失函数值，以根据三个损失函数值调整单模态检测网络的网络参数，以提升单模态检测网络对输入图像的分类效果和识别效果。It can be seen that by implementing this optional implementation, three loss function values can be determined to adjust the network parameters of the single-modal detection network according to the three loss function values, so as to improve the classification effect and recognition of the input image by the single-modal detection network. Effect.

在步骤S340中，根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。In step S340, the second loss function value is determined according to the comparison between the image features to be compared and the label to be compared, and the network parameters of the single-modal detection network are adjusted according to the first loss function value and the second loss function value , until the loss function value converges; wherein, the loss function value includes the first loss function value and the second loss function value.

本示例实施方式中，单模态检测网络的网络参数用于提取图像特征。待比对标签可以为输入图像对应的预设标签，如果通过对待比对图像特征的分类能够确定其与待比对标签匹配，则表明单模态检测网络的分类效果较好，能够确定出输入的图像对应的标签。此外，单模态检测网络中的网络参数可以采用Dense121网络(即，一种深度学习网络)ImageNet数据集上预训练的参数，其中，新添加的卷积层/融合层/编码解码子网络/全连接层，均可以采用方差为0.01均值为0的高斯分布进行初始化，进而对图像特征进行相应的卷积处理、融合处理、特征编码处理或分类处理。In this example implementation, the network parameters of the unimodal detection network are used to extract image features. The label to be compared can be the preset label corresponding to the input image. If it can be determined that it matches the label to be compared by classifying the features of the image to be compared, it indicates that the classification effect of the single-modal detection network is better, and the input The label corresponding to the image. In addition, the network parameters in the single-mode detection network can use the parameters pre-trained on the ImageNet dataset of the Dense121 network (that is, a deep learning network), where the newly added convolutional layer/fusion layer/encoder-decoder sub-network/ The fully connected layer can be initialized with a Gaussian distribution with a variance of 0.01 and a mean of 0, and then perform corresponding convolution processing, fusion processing, feature encoding processing, or classification processing on image features.

其中，需要说明的是，上述卷积层的参数和偏置参数可以基于Adam的梯度下降法求解得到。神经网络最基本的优化算法是反向传播算法加上梯度下降法。通过梯度下降法，可以使得网络参数不断收敛到全局(或者局部)最小值，但是由于神经网络层数太多，通常需要通过反向传播算法，把误差一层一层地从输出传播到输入，逐层地更新网络参数。由于梯度方向是函数值变大的最快的方向，因此负梯度方向则是函数值变小的最快的方向。沿着负梯度方向一步一步迭代，便能快速地收敛到函数最小值。此外，Adam是是结合AdaGrad和RMSProp两种优化算法优点的一种优化器，用于对梯度的一阶矩估计(即，梯度的均值)和二阶矩估计(即，梯度的未中心化的方差)进行综合考虑，以计算出更新步长。Wherein, it should be noted that the parameters and bias parameters of the above convolutional layer can be obtained based on Adam's gradient descent method. The most basic optimization algorithm for neural networks is the backpropagation algorithm plus gradient descent. Through the gradient descent method, the network parameters can be continuously converged to the global (or local) minimum value. However, due to the large number of neural network layers, it is usually necessary to propagate the error layer by layer from the output to the input through the backpropagation algorithm. The network parameters are updated layer by layer. Since the gradient direction is the fastest direction in which the function value becomes larger, the negative gradient direction is the fastest direction in which the function value becomes smaller. By iterating step by step along the negative gradient direction, it can quickly converge to the minimum value of the function. In addition, Adam is an optimizer that combines the advantages of two optimization algorithms, AdaGrad and RMSProp. Variance) is considered comprehensively to calculate the update step size.

本示例实施方式中，可选的，根据待比对图像特征与待比对标签的比对确定第二损失函数值的方式具体为：In this example embodiment, optionally, the method of determining the second loss function value according to the comparison between the image feature to be compared and the label to be compared is specifically:

本示例实施方式中，将特征处理后的待比对图像特征与待比对标签进行比对，根据比对结果确定第二损失函数值的方式具体为：将特征处理后的待比对图像特征输入全连接层，以确定待比对图像特征所属的分类，根据其所属的分类对应标签与待比对标签的比对确定第二损失函数值。In this exemplary embodiment, the image features to be compared after feature processing are compared with the labels to be compared, and the method of determining the second loss function value according to the comparison result is as follows: the image features to be compared after feature processing Input the fully connected layer to determine the category to which the image feature to be compared belongs, and determine the second loss function value according to the comparison between the label corresponding to the category to which it belongs and the label to be compared.

可见，实施该可选的实施方式，能够通过与待比对标签的比对，以确定出用于调整网络参数的损失函数，进而通过损失函数改善网络模型的图像识别效果。It can be seen that implementing this optional implementation mode can determine a loss function for adjusting network parameters through comparison with the tags to be compared, and then improve the image recognition effect of the network model through the loss function.

本示例实施方式中，可选的，该单模态检测网络的参数调整方法还可以包括以下步骤：In this example embodiment, optionally, the parameter adjustment method of the single-mode detection network may also include the following steps:

进一步地，根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛的方式具体为：Further, the network parameters of the single-mode detection network are adjusted according to the first loss function value and the second loss function value until the loss function value converges as follows:

本示例实施方式中，第一损失函数值、第三损失函数值以及第四损失函数值均可以对应KL损失函数，KL损失函数用于拉近三组先验网络与后验网络之间的分布。此外，第二损失函数值对应的第二损失函数可以为交叉熵损失函数，交叉熵损失函数用于引导先验网络学习眼底图像或OCT图像中存在的病灶信息或用于对病灶进行分类判断；其中，病灶信息至少包括病灶区域的面积和病灶区域的中心坐标等，本公开的实施例不作限定。In this example implementation, the first loss function value, the third loss function value, and the fourth loss function value can all correspond to the KL loss function, and the KL loss function is used to narrow the distribution between the three groups of prior networks and posterior networks . In addition, the second loss function corresponding to the second loss function value may be a cross-entropy loss function, and the cross-entropy loss function is used to guide the prior network to learn the lesion information existing in the fundus image or the OCT image or to classify and judge the lesion; Wherein, the lesion information includes at least the area of the lesion area and the central coordinates of the lesion area, etc., which are not limited in the embodiments of the present disclosure.

可见，实施该可选的实施方式，能够通过上述的损失函数值对网络参数进行调整，这样能够提升网络训练效果，进而改善单模态检测网络对于输入图像的识别效果。It can be seen that, implementing this optional implementation manner, the network parameters can be adjusted through the above-mentioned loss function value, which can improve the network training effect, and further improve the recognition effect of the single-modal detection network for the input image.

可见，实施图3所示的单模态检测网络的参数调整方法，能够在一定程度上克服调参效果不佳的问题，进而提升网络模型对于输入图像的处理效果；以及，能够通过多模态融合的方式调整单模态检测网络的网络参数，以实现对单模态检测网络的训练，提升网络训练效果，进而改善单模态检测网络对于输入图像的分类效果和识别效果，在本公开实施例应用于病灶识别时，可以提升对于眼底图像的病灶识别准确率。It can be seen that the implementation of the parameter adjustment method of the single-modal detection network shown in Figure 3 can overcome the problem of poor parameter adjustment effect to a certain extent, and then improve the processing effect of the network model on the input image; and, through multi-modal The network parameters of the single-mode detection network are adjusted in a fusion manner to realize the training of the single-mode detection network, improve the network training effect, and then improve the classification effect and recognition effect of the single-mode detection network for the input image, which is implemented in this disclosure For example, when applied to focus recognition, it can improve the accuracy of focus recognition for fundus images.

此外，本示例实施方式还提供了一种眼部图像中的病灶预测方法。该眼部图像中的病灶预测方法可以应用于上述服务器105，也可以应用于上述终端设备101、102、103中的一个或多个，本示例性实施例中对此不做特殊限定。参考图4所示，该眼部图像中的病灶预测方法可以包括以下步骤S410至步骤S440：In addition, this exemplary embodiment also provides a method for predicting a lesion in an eye image. The method for predicting lesions in eye images may be applied to the above server 105, and may also be applied to one or more of the above terminal devices 101, 102, 103, which is not specifically limited in this exemplary embodiment. Referring to FIG. 4, the method for predicting lesions in eye images may include the following steps S410 to S440:

步骤S410：将眼部图像输入单模态检测网络，根据单模态检测网络拟合眼部图像对应的第一分布函数。Step S410: Input the eye image into the single-modal detection network, and fit the first distribution function corresponding to the eye image according to the single-modal detection network.

步骤S420：将第一分布函数的采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征。Step S420: Fusion the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain the second image feature.

步骤S430：根据单模态检测网络拟合第二图像特征对应的第二分布函数，并将第二分布函数的采样结果与第二图像特征进行融合，得到第三图像特征。Step S430: Fit the second distribution function corresponding to the second image feature according to the single-modal detection network, and fuse the sampling result of the second distribution function with the second image feature to obtain the third image feature.

步骤S440：根据单模态检测网络拟合第三图像特征对应的第三分布函数，并将第三分布函数的采样结果与第三图像特征进行融合，得到第四图像特征并根据第四图像特征预测眼部图像中的病灶。Step S440: Fitting the third distribution function corresponding to the third image feature according to the single-modal detection network, and fusing the sampling result of the third distribution function with the third image feature to obtain the fourth image feature and according to the fourth image feature Predicting lesions in eye images.

其中，单模态检测网络是根据本公开实施例提供的一种单模态检测网络的参数调整方法调整得到的。Wherein, the single-mode detection network is adjusted according to a parameter adjustment method of the single-mode detection network provided by the embodiments of the present disclosure.

在步骤S410中，将眼部图像输入单模态检测网络，根据单模态检测网络拟合眼部图像对应的第一分布函数。In step S410, the eye image is input into the single-modal detection network, and the first distribution function corresponding to the eye image is fitted according to the single-modal detection network.

本示例实施方式中，上述的眼部图像可以为OCT图像也可以为眼底图像，本公开的实施例不作限定。In this exemplary embodiment, the above-mentioned eye image may be an OCT image or a fundus image, which is not limited in this embodiment of the present disclosure.

本示例实施方式中，根据单模态检测网络拟合眼部图像对应的第一分布函数的方式具体为：通过多模态检测网络对眼部图像进行N次卷积处理，对卷积结果进行非线性激活处理，并将非线性激活处理结果确定为与眼部图像对应的图像特征；其中，N为正整数；将眼部图像对应的图像特征输入编码器以使得编码器对其进行特征编码，进而确定出眼部图像对应的图像特征对应的第一分布函数。In this exemplary embodiment, the method of fitting the first distribution function corresponding to the eye image according to the single-modal detection network is specifically: performing N times of convolution processing on the eye image through the multi-modal detection network, and performing convolutional processing on the convolution result Non-linear activation processing, and determining the non-linear activation processing result as the image feature corresponding to the eye image; wherein, N is a positive integer; inputting the image feature corresponding to the eye image into the encoder so that the encoder performs feature encoding , and then determine the first distribution function corresponding to the image feature corresponding to the eye image.

在步骤S420中，将第一分布函数的采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征。In step S420, the sampling result of the first distribution function is fused with the first image feature corresponding to the eye image to obtain the second image feature.

本示例实施方式中，步骤S420得到第二图像特征的方式与图3中得到第三图像特征的方式相同，请参阅图3所示的实施例，此处不再赘述。In this exemplary embodiment, the method of obtaining the second image feature in step S420 is the same as the method of obtaining the third image feature in FIG. 3 . Please refer to the embodiment shown in FIG. 3 , and details will not be repeated here.

在步骤S430中，根据单模态检测网络拟合第二图像特征对应的第二分布函数，并将第二分布函数的采样结果与第二图像特征进行融合，得到第三图像特征。In step S430, the second distribution function corresponding to the second image feature is fitted according to the single-modal detection network, and the sampling result of the second distribution function is fused with the second image feature to obtain a third image feature.

本示例实施方式中，步骤S430得到第三图像特征的方式与图3中得到第三图像特征的方式相同，请参阅图3所示的实施例，此处不再赘述。In this exemplary embodiment, the manner of obtaining the third image feature in step S430 is the same as the manner of obtaining the third image feature in FIG. 3 . Please refer to the embodiment shown in FIG. 3 , and details will not be repeated here.

在步骤S440中，根据单模态检测网络拟合第三图像特征对应的第三分布函数，并将第三分布函数的采样结果与第三图像特征进行融合，得到第四图像特征并根据第四图像特征预测眼部图像中的病灶。In step S440, the third distribution function corresponding to the third image feature is fitted according to the single-modal detection network, and the sampling result of the third distribution function is fused with the third image feature to obtain the fourth image feature and according to the fourth Image features predict lesions in ocular images.

本示例实施方式中，如果眼部图像为OCT图像，那么，OCT图像对应的第一图像特征、第二图像特征、第三图像特征以及第四图像特征中均隐含眼底图像的图像特征；其中，该OCT图像与该眼底图像相对应。In this example embodiment, if the eye image is an OCT image, then the first image feature, the second image feature, the third image feature and the fourth image feature corresponding to the OCT image all imply the image features of the fundus image; , the OCT image corresponds to the fundus image.

本示例实施方式中，根据第四图像特征预测眼部图像中的病灶的方式具体为：对第四图像特征进行特征识别，以确定对应的病灶位置的特征部分，并根据该特征部分在对应的眼部图像中标注病灶位置。In this exemplary embodiment, the method of predicting the lesion in the eye image according to the fourth image feature is specifically: performing feature recognition on the fourth image feature to determine the feature part of the corresponding lesion position, and according to the feature part in the corresponding The location of the lesion is marked in the eye image.

可见，实施图4所示的眼部图像中的病灶预测方法，能够通过无监督多模态融合网络的上分支(即，上述的单模态检测网络)生成与OCT图像对应的眼底图像的特征，提升了病灶位置的确定效率和确定准确率；此外，由于该方法可以应用于眼底智能诊断系统或OCT智能诊断系统中，因此，可以提升各种模态智能筛查系统的诊断能力；此外，由于该方法的扩展性较强，可以支持至少一种模态的眼部图像，这样可以改善对于眼部疾病筛查的效果。It can be seen that the implementation of the lesion prediction method in the eye image shown in Figure 4 can generate the features of the fundus image corresponding to the OCT image through the upper branch of the unsupervised multi-modal fusion network (that is, the above-mentioned single-modal detection network). , improving the efficiency and accuracy of determining the location of the lesion; in addition, since this method can be applied to the fundus intelligent diagnosis system or the OCT intelligent diagnosis system, it can improve the diagnostic capabilities of various modal intelligent screening systems; in addition, Due to the strong scalability of the method, it can support at least one modality of eye images, which can improve the effect of eye disease screening.

请参阅图5，图5示意性示出了根据本公开的一个实施例的眼部图像的示意图。如图5所示，眼部图像500包括眼底图像501和OCT图像502。其中，眼底图像501中的黑色箭头所在的位置(也可以理解为眼部病灶所在的位置)与OCT图像502相对应，眼部图像500即为图4实施例中涉及的眼部图像。另外，眼底图像501和OCT图像502可以作为第一图像和第二图像对单模态检测网络进行训练。Please refer to FIG. 5 , which schematically shows a schematic diagram of an eye image according to an embodiment of the present disclosure. As shown in FIG. 5 , an eye image 500 includes a fundus image 501 and an OCT image 502 . Wherein, the position of the black arrow in the fundus image 501 (also can be understood as the position of the eye lesion) corresponds to the OCT image 502, and the eye image 500 is the eye image involved in the embodiment of FIG. 4 . In addition, the fundus image 501 and the OCT image 502 can be used as the first image and the second image to train the single modality detection network.

结合图5示出的眼部图像，请参阅图6，图6示意性示出了根据本公开的一个实施例的单模态检测网络的参数调整方法的架构图。如图6所示，单模态检测网络的参数调整方法的架构图包括：第一图像601、第二图像602、N个堆叠的卷积层和非线性激活层603、极大值池化层604、融合层605、极大值池化层606、3个卷积层+池化层+非线性激活层607、全连接层608、第一先验网络609、第二先验网络610、第三先验网络611、第一后验网络612、第二后验网络613以及第三后验网络614。Referring to the eye image shown in FIG. 5 , please refer to FIG. 6 . FIG. 6 schematically shows an architecture diagram of a method for adjusting parameters of a single-modal detection network according to an embodiment of the present disclosure. As shown in Figure 6, the architecture diagram of the parameter adjustment method of the single-mode detection network includes: a first image 601, a second image 602, N stacked convolutional layers and a nonlinear activation layer 603, and a maximum pooling layer 604, fusion layer 605, maximum value pooling layer 606, 3 convolutional layers + pooling layer + nonlinear activation layer 607, fully connected layer 608, first prior network 609, second prior network 610, the first Three a priori network 611 , a first a posteriori network 612 , a second a posteriori network 613 and a third a posteriori network 614 .

具体地，根据N个堆叠的卷积层和非线性激活层603、极大值池化层604和第一先验网络609可以确定出第一图像601对应的第一分布函数，以及根据N个堆叠的卷积层和非线性激活层603(其中，N为正整数)、极大值池化层604和第一后验网络612可以确定出第一图像601和第二图像602共同对应的第二分布函数；进而，可以根据第一后验网络612对第二分布函数进行采样，得到的采样结果与第一图像601对应的第一图像特征进行融合，得到第三图像特征，以输入下一个N个堆叠的卷积层和非线性激活层603；其中，第一图像特征是由N个堆叠的卷积层和非线性激活层603以及极大值池化层604对第一图像601处理得到的；进而，可以将第二分布函数的采样结果与所述第二图像特征进行融合，得到第四图像特征，以输入下一个N个堆叠的卷积层和非线性激活层603；其中，第二图像特征是由N个堆叠的卷积层和非线性激活层603以及极大值池化层604对第一图像601和第二图像602处理得到的；进而，可以通过第二先验网络610确定第三图像特征对应的第三分布函数，并通过第二后验网络613确定第四图像特征对应的第四分布函数；进而，可以根据第三分布函数和第四分布函数确定第三损失函数值；进而，可以根据第二后验网络613对第四分布函数进行采样，并将采样结果与第三图像特征进行融合，得到第五图像特征，以输入下一个N个堆叠的卷积层和非线性激活层603；进而，可以将第四分布函数的采样结果与所述第四图像特征进行融合，得到第六图像特征；进而，可以通过第三先验网络611确定第五图像特征对应的第五分布函数，并确定第六图像特征对应的第六分布函数，以根据第五分布函数和第六分布函数确定第四损失函数值；进而，可以根据第三后验网络614对第六分布函数进行采样，并将采样结果与第五图像特征进行融合，得到待比对图像特征；进而，可以通过3个卷积层+池化层+非线性激活层607对待比对图像特征进行特征处理，并对特征处理结果进行全局池化，并输入全连接层608以将待比对图像特征与待比对标签进行比对，以及通过交叉熵损失函数对比对结果进行训练。Specifically, the first distribution function corresponding to the first image 601 can be determined according to the N stacked convolutional layers and the nonlinear activation layer 603, the maximum pooling layer 604 and the first prior network 609, and according to the N The stacked convolutional layer and nonlinear activation layer 603 (where N is a positive integer), the maximum pooling layer 604 and the first posterior network 612 can determine the first image 601 and the second image 602 corresponding to the first Second distribution function; furthermore, the second distribution function can be sampled according to the first posteriori network 612, and the obtained sampling result is fused with the first image feature corresponding to the first image 601 to obtain the third image feature to be input to the next N stacked convolutional layers and nonlinear activation layers 603; wherein, the first image feature is obtained by processing the first image 601 by N stacked convolutional layers and nonlinear activation layers 603 and a maximum pooling layer 604 Furthermore, the sampling result of the second distribution function can be fused with the second image feature to obtain the fourth image feature, which can be input to the next N stacked convolutional layer and nonlinear activation layer 603; wherein, the first The two image features are obtained by processing the first image 601 and the second image 602 by N stacked convolutional layers and nonlinear activation layers 603 and the maximum value pooling layer 604; furthermore, the second prior network 610 Determine the third distribution function corresponding to the third image feature, and determine the fourth distribution function corresponding to the fourth image feature through the second posterior network 613; furthermore, the third loss function can be determined according to the third distribution function and the fourth distribution function Furthermore, the fourth distribution function can be sampled according to the second posterior network 613, and the sampling result can be fused with the third image feature to obtain the fifth image feature, which can be input to the next N stacked convolutional layers and Non-linear activation layer 603; furthermore, the sampling result of the fourth distribution function can be fused with the fourth image feature to obtain the sixth image feature; furthermore, the third prior network 611 can be used to determine the corresponding value of the fifth image feature The fifth distribution function, and determine the sixth distribution function corresponding to the sixth image feature, so as to determine the fourth loss function value according to the fifth distribution function and the sixth distribution function; furthermore, the sixth distribution can be calculated according to the third posterior network 614 The function performs sampling, and fuses the sampling result with the fifth image feature to obtain the image feature to be compared; furthermore, the feature processing of the image feature to be compared can be performed through 3 convolutional layers + pooling layer + nonlinear activation layer 607 , and perform global pooling on the feature processing results, and input the fully connected layer 608 to compare the image features to be compared with the labels to be compared, and perform training on the comparison results through the cross-entropy loss function.

其中，第一损失函数值、第二损失函数值以及第三损失函数值均由KL损失函数确定得到。Wherein, the first loss function value, the second loss function value and the third loss function value are determined by the KL loss function.

此外，需要说明的是，上述的任一“输入下一个N个堆叠的卷积层和非线性激活层603”之后还需经过融合层605和极大值池化层606。In addition, it should be noted that after any of the above "inputs to the next N stacked convolutional layers and nonlinear activation layers 603", a fusion layer 605 and a maximum value pooling layer 606 are required.

可见，结合图6所示的架构图执行图3所示的单模态检测网络的参数调整方法，能够在一定程度上克服调参效果不佳的问题，进而提升网络模型对于输入图像的处理效果；以及，能够通过多模态融合的方式调整单模态检测网络的网络参数，以实现对单模态检测网络的训练，提升网络训练效果，进而改善单模态检测网络对于输入图像的分类效果和识别效果。It can be seen that the parameter adjustment method of the single-mode detection network shown in Figure 3 can be implemented in combination with the architecture diagram shown in Figure 6, which can overcome the problem of poor parameter tuning effect to a certain extent, and then improve the processing effect of the network model on the input image ; and, the network parameters of the single-modal detection network can be adjusted through multi-modal fusion to realize the training of the single-modal detection network, improve the network training effect, and then improve the classification effect of the single-modal detection network for the input image and recognition effects.

结合图5示出的眼部图像，请参阅图7，图7示意性示出了根据本公开的一个实施例的眼部图像中的病灶预测方法的架构图。如图7所示，眼部图像中的病灶预测方法的架构图包括：第一图像701、N个堆叠的卷积层和非线性激活层702、极大值池化层703、融合层704、极大值池化层705、3个卷积层+池化层+非线性激活层706、全连接层707、第一先验网络708、第二先验网络709以及第三先验网络710。Referring to the eye image shown in FIG. 5 , please refer to FIG. 7 . FIG. 7 schematically shows an architecture diagram of a method for predicting a lesion in an eye image according to an embodiment of the present disclosure. As shown in Figure 7, the architecture diagram of the lesion prediction method in eye images includes: a first image 701, N stacked convolutional layers and nonlinear activation layers 702, a maximum pooling layer 703, a fusion layer 704, Maximum pooling layer 705 , 3 convolutional layers + pooling layer + nonlinear activation layer 706 , fully connected layer 707 , first prior network 708 , second prior network 709 and third prior network 710 .

具体地，根据N个堆叠的卷积层和非线性激活层702、极大值池化层703和第一先验网络708可以确定出第一图像701对应的第一分布函数；进而，可以通过第一先验网络708对第一分布函数进行采样，并将采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征，以输入下一个N个堆叠的卷积层和非线性激活层702；进而，可以根据第二先验网络708确定第二图像特征对应的第二分布函数，并通过第二先验网络708对第二分布函数进行采样，并将采样结果与第二图像特征进行融合，得到第三图像特征；进而，可以根据第三先验网络710确定第三图像特征对应的第三分布函数，并通过第三先验网络710对第三分布函数进行采样，并将采样结果与第三图像特征进行融合，得到第四图像特征；进而，可以通过3个卷积层+池化层+非线性激活层706对第四图像特征进行特征处理，并对特征处理结果进行全局池化，并输入全连接层708以将第四图像特征与待比对标签进行比对，得到第一图像701对应的标签以及预测的病灶位置。Specifically, the first distribution function corresponding to the first image 701 can be determined according to the N stacked convolutional layers and the nonlinear activation layer 702, the maximum pooling layer 703, and the first prior network 708; furthermore, the first distribution function corresponding to the first image 701 can be determined by The first prior network 708 samples the first distribution function, and fuses the sampling result with the first image feature corresponding to the eye image to obtain the second image feature, which is input to the next N stacked convolutional layers and non- Linear activation layer 702; Furthermore, the second distribution function corresponding to the second image feature can be determined according to the second prior network 708, and the second distribution function is sampled through the second prior network 708, and the sampling result is compared with the second The image features are fused to obtain a third image feature; furthermore, the third distribution function corresponding to the third image feature can be determined according to the third prior network 710, and the third distribution function is sampled through the third prior network 710, and The sampling result is fused with the third image feature to obtain the fourth image feature; furthermore, the feature processing of the fourth image feature can be performed through three convolutional layers + pooling layer + nonlinear activation layer 706, and the feature processing result Perform global pooling and input the fully connected layer 708 to compare the fourth image feature with the label to be compared to obtain the label corresponding to the first image 701 and the predicted lesion location.

此外，需要说明的是，上述的任一“输入下一个N个堆叠的卷积层和非线性激活层702”之后还需经过融合层701和极大值池化层705。In addition, it should be noted that after any of the above "inputs to the next N stacked convolutional layers and nonlinear activation layers 702", a fusion layer 701 and a maximum value pooling layer 705 are required.

可见，结合图7所示的架构图执行图4所示的眼部图像中的病灶预测方法，能够通过无监督多模态融合网络的上分支(即，上述的单模态检测网络)生成与OCT图像对应的眼底图像的特征，提升了病灶位置的确定效率和确定准确率；此外，由于该方法可以应用于眼底智能诊断系统或OCT智能诊断系统中，因此，可以提升各种模态智能筛查系统的诊断能力；此外，由于该方法的扩展性较强，可以支持至少一种模态的眼部图像，这样可以改善对于眼部疾病筛查的效果。It can be seen that, in combination with the architecture diagram shown in FIG. 7, the lesion prediction method in the eye image shown in FIG. The characteristics of the fundus image corresponding to the OCT image improve the efficiency and accuracy of determining the location of the lesion; in addition, since this method can be applied to the fundus intelligent diagnosis system or the OCT intelligent diagnosis system, it can improve various modes of intelligent screening. In addition, due to the strong scalability of this method, it can support at least one modality of eye images, which can improve the effect of eye disease screening.

进一步的，本示例实施方式中，还提供了一种单模态检测网络的参数调整装置800。该单模态检测网络的参数调整装置800可以应用于一服务器或终端设备。参考图8所示，该单模态检测网络的参数调整装置800可以包括：分布函数确定单元801、损失函数值确定单元802、特征融合单元803以及参数调整单元804，其中：Further, in this example embodiment, a parameter adjustment device 800 of a single-mode detection network is also provided. The apparatus 800 for adjusting parameters of a single-mode detection network can be applied to a server or a terminal device. Referring to FIG. 8, the parameter adjustment device 800 of the single-mode detection network may include: a distribution function determination unit 801, a loss function value determination unit 802, a feature fusion unit 803, and a parameter adjustment unit 804, wherein:

分布函数确定单元801，用于对第一图像进行特征提取，并对提取到的第一图像特征进行特征编码，以得到第一分布函数，以及，对提取到的第二图像特征进行特征编码，以得到第二分布函数；其中，第一图像特征与第一图像对应，第二图像特征由第一图像和第二图像进行特征融合得到；A distribution function determining unit 801, configured to extract features from the first image, and perform feature encoding on the extracted first image features to obtain a first distribution function, and perform feature encoding on the extracted second image features, To obtain the second distribution function; wherein, the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image;

损失函数值确定单元802，用于根据第一分布函数和第二分布函数确定第一损失函数值；A loss function value determination unit 802, configured to determine a first loss function value according to the first distribution function and the second distribution function;

特征融合单元803，用于根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征；A feature fusion unit 803, configured to determine the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image;

参数调整单元804，用于根据待比对图像特征与待比对标签的比对确定第二损失函数值，并根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛；其中，损失函数值包括第一损失函数值和第二损失函数值。The parameter adjustment unit 804 is used to determine the second loss function value according to the comparison between the image feature to be compared and the label to be compared, and to adjust the network parameters of the single-modal detection network according to the first loss function value and the second loss function value Adjust until the loss function value converges; wherein, the loss function value includes a first loss function value and a second loss function value.

可见，实施图8所示的单模态检测网络的参数调整装置800，能够在一定程度上克服调参效果不佳的问题，进而提升网络模型对于输入图像的处理效果；以及，能够通过多模态融合的方式调整单模态检测网络的网络参数，以实现对单模态检测网络的训练，提升网络训练效果，进而改善单模态检测网络对于输入图像的分类效果和识别效果，在本公开实施例应用于病灶识别时，可以提升对于眼底图像的病灶识别准确率。It can be seen that implementing the parameter adjustment device 800 of the single-mode detection network shown in FIG. The network parameters of the single-mode detection network are adjusted by mode fusion to realize the training of the single-mode detection network, improve the network training effect, and then improve the classification effect and recognition effect of the single-mode detection network for the input image. In this disclosure When the embodiments are applied to focus recognition, the accuracy of focus recognition for fundus images can be improved.

在本公开的一种示例性实施例中，分布函数确定单元801对提取到的第一图像特征进行特征编码，以得到第一分布函数的方式具体为：In an exemplary embodiment of the present disclosure, the distribution function determining unit 801 performs feature encoding on the extracted first image features to obtain the first distribution function as follows:

分布函数确定单元801通过单模态检测网络提取第一图像对应的第一图像特征；The distribution function determination unit 801 extracts the first image feature corresponding to the first image through a single-modal detection network;

分布函数确定单元801对第一图像特征进行特征编码，以确定出第一图像对应的第一分布函数。The distribution function determining unit 801 performs feature encoding on the features of the first image to determine a first distribution function corresponding to the first image.

可见，实施该示例性实施例，能够通过编码器对第一图像进行特征编码，以确定出第一分布函数，进而可以用于逼近其与第二分布函数的相似程度，提升单模态检测网络的图像特征生成效果。It can be seen that, implementing this exemplary embodiment, the encoder can perform feature encoding on the first image to determine the first distribution function, which can then be used to approximate its similarity with the second distribution function and improve the single-mode detection network. The effect of image feature generation.

在本公开的一种示例性实施例中，分布函数确定单元801对提取到的第二图像特征进行特征编码，以得到第二分布函数的方式具体为：In an exemplary embodiment of the present disclosure, the distribution function determining unit 801 performs feature encoding on the extracted second image features to obtain the second distribution function as follows:

分布函数确定单元801通过多模态检测网络对第一图像和第二图像进行融合，并根据融合结果生成第二图像特征；The distribution function determination unit 801 fuses the first image and the second image through the multimodal detection network, and generates the second image features according to the fusion result;

分布函数确定单元801对第二图像特征进行特征编码，以确定出第一图像和第二图像共同对应的第二分布函数。The distribution function determining unit 801 performs feature encoding on the features of the second image to determine a second distribution function corresponding to the first image and the second image.

可见，实施该示例性实施例，能够通过编码器对第二图像进行特征编码，以确定出第二分布函数，进而可以用于逼近其与第一分布函数的相似程度，提升单模态检测网络的图像特征生成效果。It can be seen that, implementing this exemplary embodiment, the encoder can perform feature encoding on the second image to determine the second distribution function, which can then be used to approximate its similarity with the first distribution function and improve the single-mode detection network. The effect of image feature generation.

在本公开的一种示例性实施例中，特征融合单元803根据第二分布函数的采样结果与第一图像的融合确定出待比对图像特征的方式具体为：In an exemplary embodiment of the present disclosure, the manner in which the feature fusion unit 803 determines the feature of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image is as follows:

特征融合单元803根据第二分布函数的采样结果与第一图像对应的第一图像特征进行融合，得到第三图像特征，并将第二分布函数的采样结果与第二图像特征进行融合，得到第四图像特征；The feature fusion unit 803 fuses the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain the third image feature, and fuses the sampling result of the second distribution function with the second image feature to obtain the first image feature Four image features;

特征融合单元803确定第三图像特征对应的第三分布函数，并确定第四图像特征对应的第四分布函数；The feature fusion unit 803 determines a third distribution function corresponding to the third image feature, and determines a fourth distribution function corresponding to the fourth image feature;

特征融合单元803根据第三分布函数和第四分布函数确定第三损失函数值；The feature fusion unit 803 determines a third loss function value according to the third distribution function and the fourth distribution function;

特征融合单元803根据第四分布函数的采样结果与第三图像特征进行融合，得到第五图像特征，并将第四分布函数的采样结果与第四图像特征进行融合，得到第六图像特征；The feature fusion unit 803 fuses the sampling result of the fourth distribution function with the third image feature to obtain a fifth image feature, and fuses the sampling result of the fourth distribution function with the fourth image feature to obtain a sixth image feature;

特征融合单元803确定第五图像特征对应的第五分布函数，并确定第六图像特征对应的第六分布函数；The feature fusion unit 803 determines a fifth distribution function corresponding to the fifth image feature, and determines a sixth distribution function corresponding to the sixth image feature;

特征融合单元803根据第五分布函数和第六分布函数确定第四损失函数值；The feature fusion unit 803 determines a fourth loss function value according to the fifth distribution function and the sixth distribution function;

特征融合单元803根据第六分布函数的采样结果与第五图像特征进行融合，得到待比对图像特征。The feature fusion unit 803 fuses the sampling result of the sixth distribution function with the fifth image feature to obtain the image feature to be compared.

可见，实施该示例性实施例，能够确定三个损失函数值，以根据三个损失函数值调整单模态检测网络的网络参数，以提升单模态检测网络对输入图像的分类效果和识别效果。It can be seen that, implementing this exemplary embodiment, three loss function values can be determined to adjust the network parameters of the single-modal detection network according to the three loss function values, so as to improve the classification effect and recognition effect of the single-modal detection network on the input image .

在本公开的一种示例性实施例中，参数调整单元804根据待比对图像特征与待比对标签的比对确定第二损失函数值的方式具体为：In an exemplary embodiment of the present disclosure, the manner in which the parameter adjustment unit 804 determines the second loss function value according to the comparison between the features of the image to be compared and the label to be compared is specifically:

参数调整单元804对待比对图像特征进行特征处理，并将特征处理后的待比对图像特征与待比对标签进行比对，根据比对结果确定第二损失函数值；其中，特征处理包括卷积处理、池化处理以及非线性激活处理。The parameter adjustment unit 804 performs feature processing on the features of the image to be compared, compares the features of the image to be compared with the label to be compared, and determines the second loss function value according to the comparison result; wherein, the feature processing includes volume Product processing, pooling processing, and nonlinear activation processing.

可见，实施该示例性实施例，能够通过与待比对标签的比对，以确定出用于调整网络参数的损失函数，进而通过损失函数改善网络模型的图像识别效果。It can be seen that by implementing this exemplary embodiment, a loss function for adjusting network parameters can be determined through comparison with the tags to be compared, and then the image recognition effect of the network model can be improved through the loss function.

在本公开的一种示例性实施例中，损失函数值确定单元802，还用于将第一损失函数值、第二损失函数值、第三损失函数值以及第四损失函数值的和确定为损失函数值。In an exemplary embodiment of the present disclosure, the loss function value determining unit 802 is further configured to determine the sum of the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value as Loss function value.

在本公开的一种示例性实施例中，损失函数值确定单元802根据第一损失函数值与第二损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛的方式具体为：In an exemplary embodiment of the present disclosure, the loss function value determination unit 802 adjusts the network parameters of the single-mode detection network according to the first loss function value and the second loss function value until the loss function value converges. for:

损失函数值确定单元802根据损失函数值对单模态检测网络的网络参数进行调整，直到损失函数值收敛。The loss function value determination unit 802 adjusts the network parameters of the single-mode detection network according to the loss function value until the loss function value converges.

可见，实施该示例性实施例，能够通过上述的损失函数值对网络参数进行调整，这样能够提升网络训练效果，进而改善单模态检测网络对于输入图像的识别效果。It can be seen that, implementing this exemplary embodiment, the network parameters can be adjusted through the above-mentioned loss function value, which can improve the effect of network training, and further improve the recognition effect of the single-modal detection network for the input image.

更进一步的，本示例实施方式中，还提供了一种眼部图像中的病灶预测装置900。该眼部图像中的病灶预测装置900可以应用于一服务器或终端设备。参考图9所示，该眼部图像中的病灶预测装置900可以包括：函数拟合单元901、图像特征融合单元902以及图像特征获取单元903，其中：Furthermore, in this exemplary embodiment, an apparatus 900 for predicting lesions in eye images is also provided. The apparatus 900 for predicting lesions in eye images can be applied to a server or a terminal device. Referring to FIG. 9, the device 900 for predicting lesions in eye images may include: a function fitting unit 901, an image feature fusion unit 902, and an image feature acquisition unit 903, wherein:

函数拟合单元901，用于将眼部图像输入单模态检测网络，根据单模态检测网络拟合眼部图像对应的第一分布函数；A function fitting unit 901, configured to input the eye image into the single-mode detection network, and fit the first distribution function corresponding to the eye image according to the single-mode detection network;

图像特征融合单元902，用于将第一分布函数的采样结果与眼部图像对应的第一图像特征进行融合，得到第二图像特征；An image feature fusion unit 902, configured to fuse the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain the second image feature;

图像特征获取单元903，用于根据单模态检测网络拟合第二图像特征对应的第二分布函数，并将第二分布函数的采样结果与第二图像特征进行融合，得到第三图像特征；The image feature acquisition unit 903 is configured to fit a second distribution function corresponding to the second image feature according to the single-modal detection network, and fuse the sampling result of the second distribution function with the second image feature to obtain a third image feature;

图像特征获取单元903，还用于根据单模态检测网络拟合第三图像特征对应的第三分布函数，并将第三分布函数的采样结果与第三图像特征进行融合，得到第四图像特征并根据第四图像特征预测眼部图像中的病灶；The image feature acquisition unit 903 is further configured to fit the third distribution function corresponding to the third image feature according to the single-modal detection network, and fuse the sampling result of the third distribution function with the third image feature to obtain the fourth image feature And predicting the lesion in the eye image according to the fourth image feature;

其中，单模态检测网络是根据本公开的实施例提供的一种单模态检测网络的参数调整方法调整得到的。Wherein, the single-mode detection network is adjusted according to a parameter adjustment method of the single-mode detection network provided by the embodiments of the present disclosure.

可见，实施图9所示的眼部图像中的病灶预测装置900，能够通过无监督多模态融合网络的上分支(即，上述的单模态检测网络)生成与OCT图像对应的眼底图像的特征，提升了病灶位置的确定效率和确定准确率；此外，由于该方法可以应用于眼底智能诊断系统或OCT智能诊断系统中，因此，可以提升各种模态智能筛查系统的诊断能力；此外，由于该方法的扩展性较强，可以支持至少一种模态的眼部图像，这样可以改善对于眼部疾病筛查的效果。It can be seen that implementing the lesion prediction device 900 in eye images shown in FIG. 9 can generate fundus images corresponding to OCT images through the upper branch of the unsupervised multi-modal fusion network (that is, the above-mentioned single-modal detection network). features, which improves the efficiency and accuracy of determining the location of the lesion; in addition, because this method can be applied to the fundus intelligent diagnosis system or the OCT intelligent diagnosis system, it can improve the diagnostic capabilities of various modal intelligent screening systems; in addition , due to the strong scalability of the method, it can support at least one modality of eye images, which can improve the effect of eye disease screening.

应当注意，尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元，但是这种划分并非强制性的。实际上，根据本公开的实施方式，上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之，上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. Actually, according to the embodiment of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above can be further divided to be embodied by a plurality of modules or units.

由于本公开的示例实施例的单模态检测网络的参数调整装置的各个功能模块与上述单模态检测网络的参数调整方法的示例实施例的步骤对应，因此对于本公开装置实施例中未披露的细节，请参照本公开上述的单模态检测网络的参数调整方法的实施例。Since each functional module of the parameter adjustment device for a single-mode detection network in the exemplary embodiment of the present disclosure corresponds to the steps of the above-mentioned exemplary embodiment of the parameter adjustment method for a single-mode detection network, it is not disclosed in the embodiment of the disclosed device For details, please refer to the embodiment of the parameter adjustment method for a single-modal detection network mentioned above in this disclosure.

此外，由于本公开的示例实施例的眼部图像中的病灶预测装置的各个功能模块与上述眼部图像中的病灶预测方法的示例实施例的步骤对应，因此对于本公开装置实施例中未披露的细节，请参照本公开上述的眼部图像中的病灶预测方法的实施例。In addition, since each functional module of the device for predicting lesions in eye images in the exemplary embodiment of the present disclosure corresponds to the steps of the above-mentioned exemplary embodiment of the method for predicting lesions in eye images, there is no disclosure in the device embodiments of the present disclosure. For details, please refer to the embodiment of the lesion prediction method in eye images mentioned above in this disclosure.

需要说明的是，本公开所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中，计算机可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中，计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质，该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于：无线、电线、光缆、RF等等，或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图，图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分，上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意，在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图或流程图中的每个方框、以及框图或流程图中的方框的组合，可以用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元可以通过软件的方式实现，也可以通过硬件的方式来实现，所描述的单元也可以设置在处理器中。其中，这些单元的名称在某种情况下并不构成对该单元本身的限定。The units described in the embodiments of the present disclosure may be implemented by software or by hardware, and the described units may also be set in a processor. Wherein, the names of these units do not constitute a limitation of the unit itself under certain circumstances.

作为另一方面，本申请还提供了一种计算机可读介质，该计算机可读介质可以是上述实施例中描述的电子设备中所包含的；也可以是单独存在，而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序，当上述一个或者多个程序被一个该电子设备执行时，使得该电子设备实现上述实施例中所述的方法。As another aspect, the present application also provides a computer-readable medium. The computer-readable medium may be included in the electronic device described in the above-mentioned embodiments; or it may exist independently without being assembled into the electronic device. middle. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, the electronic device is made to implement the methods described in the above-mentioned embodiments.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any modification, use or adaptation of the present disclosure, and these modifications, uses or adaptations follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field not disclosed in the present disclosure . The specification and examples are to be considered exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It should be understood that the present disclosure is not limited to the precise constructions which have been described above and shown in the drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A parameter adjustment method of a single mode detection network, characterized in that, comprising:

performing feature extraction on the first image, and performing feature encoding on the extracted first image features to obtain a first distribution function, and performing feature encoding on the extracted second image features to obtain a second distribution function; wherein , the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image;

determining a first loss function value based on the first distribution function and the second distribution function;

Determining the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image;

Determine the second loss function value according to the comparison between the image features to be compared and the label to be compared, and perform network parameters of the single-mode detection network according to the first loss function value and the second loss function value Adjust until the loss function value converges; wherein, the loss function value includes the first loss function value and the second loss function value.

2. The method according to claim 1, wherein the extracted first image feature is subjected to feature encoding to obtain the first distribution function, comprising:

Extracting first image features corresponding to the first image through a single-mode detection network;

Perform feature encoding on the features of the first image to determine a first distribution function corresponding to the first image.

3. The method according to claim 1, wherein the extracted second image feature is subjected to feature encoding to obtain a second distribution function, comprising:

Fusing the first image and the second image through a multimodal detection network, and generating second image features according to the fusion result;

Perform feature encoding on the features of the second image to determine a second distribution function corresponding to the first image and the second image.

4. The method according to claim 1, wherein the characteristics of the image to be compared are determined according to the fusion of the sampling result of the second distribution function and the first image, including:

Fusing the sampling result of the second distribution function with the first image feature corresponding to the first image to obtain a third image feature, and combining the sampling result of the second distribution function with the second image feature fusion to obtain the fourth image feature;

determining a third distribution function corresponding to the third image feature, and determining a fourth distribution function corresponding to the fourth image feature;

determining a third loss function value based on the third distribution function and the fourth distribution function;

Fusion is performed according to the sampling result of the fourth distribution function and the third image feature to obtain a fifth image feature, and the sampling result of the fourth distribution function is fused with the fourth image feature to obtain a sixth image feature. image features;

determining a fifth distribution function corresponding to the fifth image feature, and determining a sixth distribution function corresponding to the sixth image feature;

determining a fourth loss function value based on the fifth distribution function and the sixth distribution function;

The sampling result of the sixth distribution function is fused with the fifth image feature to obtain the image feature to be compared.

5. The method according to claim 1, wherein the second loss function value is determined according to the comparison between the image features to be compared and the label to be compared, comprising:

Perform feature processing on the features of the images to be compared, and compare the feature-processed image features to be compared with the labels to be compared, and determine the second loss function value according to the comparison result; wherein, the feature processing includes Convolution processing, pooling processing, and nonlinear activation processing.

6. The method according to claim 4, further comprising:

A sum of the first loss function value, the second loss function value, the third loss function value, and the fourth loss function value is determined as a loss function value.

7. The method according to claim 6, wherein the network parameters of the single-mode detection network are adjusted according to the first loss function value and the second loss function value until the loss function value converges, including :

Adjusting network parameters of the single-mode detection network according to the loss function value until the loss function value converges.

8. A lesion prediction method in an eye image, comprising:

Input the eye image into a single-mode detection network, and fit the first distribution function corresponding to the eye image according to the single-mode detection network;

Fusing the sampling result of the first distribution function with the first image feature corresponding to the eye image to obtain a second image feature;

fitting a second distribution function corresponding to the second image feature according to the single-modal detection network, and fusing the sampling result of the second distribution function with the second image feature to obtain a third image feature;

Fitting a third distribution function corresponding to the third image feature according to the single-modal detection network, and fusing the sampling result of the third distribution function with the third image feature to obtain a fourth image feature and predicting a lesion in the eye image according to the fourth image feature;

Wherein, the single-mode detection network is adjusted according to the method described in any one of claims 1-6.

9. A parameter adjustment device for a single-mode detection network, characterized in that it comprises:

A distribution function determination unit, configured to perform feature extraction on the first image, and perform feature encoding on the extracted first image features to obtain a first distribution function, and perform feature encoding on the extracted second image features to obtain A second distribution function is obtained; wherein, the first image feature corresponds to the first image, and the second image feature is obtained by feature fusion of the first image and the second image;

a loss function value determining unit, configured to determine a first loss function value according to the first distribution function and the second distribution function;

A feature fusion unit, configured to determine the features of the image to be compared according to the fusion of the sampling result of the second distribution function and the first image;

A parameter adjustment unit, configured to determine a second loss function value based on the comparison between the image features to be compared and the label to be compared, and to perform single-modal analysis based on the first loss function value and the second loss function value The network parameters of the detection network are adjusted until the loss function value converges; wherein the loss function value includes the first loss function value and the second loss function value.

10. An electronic device, characterized in that it comprises:

processor; and

a memory for storing executable instructions of the processor;

Wherein, the processor is configured to execute the method according to any one of claims 1-8 by executing the executable instructions.