CN117333703A

CN117333703A - Tongue image quality evaluation method and system based on deep learning and feature fusion

Info

Publication number: CN117333703A
Application number: CN202311281197.6A
Authority: CN
Inventors: 崔曼曼; 张腾达; 张洪来; 陈慧仪
Original assignee: Guangzhou University of Traditional Chinese Medicine
Current assignee: Guangzhou University of Traditional Chinese Medicine
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2024-01-02

Abstract

The invention discloses a tongue image quality assessment method based on deep learning and feature fusion, which comprises the following steps: constructing a tongue image dataset; training the evaluation model based on tongue image data to obtain a trained evaluation model; the evaluation model comprises a tongue segmentation module, a feature extraction module and a tongue image classification module; inputting the tongue image to be tested into the trained evaluation model; dividing the tongue image to be detected based on a tongue dividing module to obtain a target tongue image; respectively carrying out feature extraction processing on the target tongue body image based on a feature extraction module to obtain shallow features and deep semantic features; the shallow layer features at least comprise texture features, natural scene statistical features and color features; based on the tongue image classification module, shallow features and deep semantic features are fused in series, classification is carried out according to the fused feature vectors, a quality evaluation result is obtained, and the quality condition of the tongue image can be efficiently and accurately judged after the tongue image classification module is used.

Description

Tongue image quality assessment method and system based on deep learning and feature fusion

技术领域Technical field

本发明涉及中医图像处理技术领域，特别涉及一种基于深度学习和特征融合的舌图像质量评估方法及系统。The present invention relates to the technical field of traditional Chinese medicine image processing, and in particular to a tongue image quality assessment method and system based on deep learning and feature fusion.

背景技术Background technique

舌诊作为中医望诊的重要组成部分，具有无创伤、简单易行等优点。近年来，随着科技的迅猛发展，数字化舌诊开始兴起，通过运用现代科技手段，客观地采集病人的舌图像数据，结合中医理论，便可获得人体阴阳虚实等信息。然而，不合格的舌图像不仅会影响后续的数据处理，甚至可能会引发误诊，严重阻碍智能中医诊断和中医远程诊疗的发展和应用。Tongue diagnosis, as an important part of traditional Chinese medicine inspection, has the advantages of being non-invasive, simple and easy to perform. In recent years, with the rapid development of science and technology, digital tongue diagnosis has begun to rise. By using modern scientific and technological means to objectively collect patient's tongue image data, combined with the theory of traditional Chinese medicine, information such as the deficiency and excess of yin and yang in the human body can be obtained. However, unqualified tongue images will not only affect subsequent data processing, but may even lead to misdiagnosis, seriously hindering the development and application of intelligent TCM diagnosis and TCM remote diagnosis and treatment.

因环境光照影响而产生的舌象舌质颜色变换、因运动抖动而产生的舌体图像模糊等情况均会造成一定的图像失真，为中医舌诊的智能化发展带来了挑战。为了保证数智舌诊的准确性需求，往往需要通过人工的方式判断采集到的舌图像是否合格，而上述方法不仅人工成本高、效率低，而且还将大大制约中医舌诊的智能化发展。Changes in the color of the tongue due to the influence of ambient lighting, blurring of the tongue image due to motion jitter, etc. will all cause certain image distortion, posing challenges to the intelligent development of tongue diagnosis in traditional Chinese medicine. In order to ensure the accuracy of digital intelligent tongue diagnosis, it is often necessary to manually determine whether the collected tongue images are qualified. The above method not only has high labor costs and low efficiency, but will also greatly restrict the intelligent development of traditional Chinese medicine tongue diagnosis.

因此，如何高效、准确地实现舌图像质量评估，对中医舌诊的智能化和客观化研究至关重要。Therefore, how to efficiently and accurately realize tongue image quality assessment is crucial to the intelligent and objective research of tongue diagnosis in traditional Chinese medicine.

发明内容Contents of the invention

本发明的目的在于提供一种基于深度学习和特征融合的舌图像质量评估方法及系统，以解决现有舌图像质量判断不准确、不高效的问题。The purpose of the present invention is to provide a tongue image quality assessment method and system based on deep learning and feature fusion to solve the existing problems of inaccurate and inefficient tongue image quality judgment.

为了解决上述技术问题，本发明提供了两个方面的方案，第一方面，提供了一种基于深度学习和特征融合的舌图像质量评估方法，包括以下步骤：In order to solve the above technical problems, the present invention provides two solutions. First, a tongue image quality assessment method based on deep learning and feature fusion is provided, which includes the following steps:

构建舌图像数据集；Construct tongue image dataset;

基于所述舌图像数据对评估模型进行训练，得到训练完成的评估模型；The evaluation model is trained based on the tongue image data to obtain a trained evaluation model;

将待测舌图像输入至所述训练完成的评估模型；Input the image of the tongue to be tested into the trained evaluation model;

所述评估模型包括舌体分割模块、特征提取模块和舌体图像分类模块；The evaluation model includes a tongue segmentation module, a feature extraction module and a tongue image classification module;

基于所述舌体分割模块对所述待测舌图像进行分割处理，得到目标舌体图像；Segment the tongue image to be measured based on the tongue segmentation module to obtain a target tongue image;

基于所述特征提取模块对所述目标舌体图像分别进行特征提取处理，得到浅层特征和深层语义特征；Based on the feature extraction module, perform feature extraction processing on the target tongue image respectively to obtain shallow features and deep semantic features;

其中，所述浅层特征至少包括纹理特征、自然场景统计特征和颜色特征；Wherein, the shallow features include at least texture features, natural scene statistical features and color features;

基于所述舌体图像分类模块对所述浅层特征和所述深层语义特征进行归一化和串联融合，并根据融合后的特征向量进行图像分类，得到舌图像质量评估结果。The shallow features and the deep semantic features are normalized and fused in series based on the tongue image classification module, and image classification is performed according to the fused feature vectors to obtain tongue image quality evaluation results.

在第一方面的一些实施例中，所述浅层特征的提取步骤具体包括：将所述目标舌体图像转化为灰度共生矩阵，通过计算所述灰度共生矩阵得到所述目标舌体图像的多维纹理特征；对所述目标舌体图像进行自然图像的统计特征提取，得到所述目标舌体图像的多维自然场景统计特征；将所述目标舌体图像进行颜色空间转换，得到所述目标舌体图像的多维颜色特征。In some embodiments of the first aspect, the shallow feature extraction step specifically includes: converting the target tongue image into a gray level co-occurrence matrix, and obtaining the target tongue image by calculating the gray level co-occurrence matrix. multi-dimensional texture features; perform statistical feature extraction of natural images on the target tongue image to obtain multi-dimensional natural scene statistical features of the target tongue image; perform color space conversion on the target tongue image to obtain the target Multidimensional color features of tongue body images.

在第一方面的一些实施例中，所述纹理特征包括对比度、相异性、均匀性、能量、相关性和角二阶矩，其具体计算如下：In some embodiments of the first aspect, the texture features include contrast, dissimilarity, uniformity, energy, correlation and angular second-order moment, which are specifically calculated as follows:

所述对比度的计算公式：The formula for calculating the contrast:

所述相异性的计算公式：The calculation formula for the dissimilarity is:

所述均匀性的计算公式：The calculation formula for the uniformity is:

所述能量的计算公式：The energy calculation formula is:

所述相关性的计算公式：The calculation formula for the correlation is:

其中，in,

所述角二阶阵的计算公式：The calculation formula of the second-order angular matrix:

其中，i和j分别代表像素(x,y)和像素(x+Δx,y+Δy)对应的灰度值；P(i,j,d,θ)表示灰度为i的像素(x,y)与灰度为j的像素(x+Δx,y+Δy)同时出现的概率；d是两个像素间的间隔距离，d取1和2；θ是灰度矩阵的生成方向，θ取0o、45o、90o、135o四个方向；n为像素的灰度级数。Among them, i and j represent the gray value corresponding to the pixel (x, y) and the pixel (x+Δx, y+Δy) respectively; P(i, j, d, θ) represents the pixel (x, The probability that y) appears at the same time as the pixel (x+Δx, y+Δy) with gray level j; d is the separation distance between the two pixels, d takes 1 and 2; θ is the generation direction of the grayscale matrix, θ takes There are four directions: 0o, 45o, 90o, and 135o; n is the gray level of the pixel.

在第一方面的一些实施例中，在对所述目标舌体图像进行自然图像的统计特征提取，得到所述目标舌体图像的多维自然场景统计特征，这一步骤中，包括以下步骤：将所述目标舌体图像进行下采样，得到下采样后舌体图像；利用GGD拟合分布方法分别对所述目标舌体图像和所述下采样后舌体图像进行形状参数α和尺度参数σ的特征提取，用于描述舌体图像是否失真的多维自然场景分布统计特征；利用AGGD拟合分布方法对所述目标舌体图像和所述下采样后舌体图像进行水平相邻、垂直相邻和两对对角相邻的四个方向拟合的形状参数η、均值参数μ、左尺度参数和右尺度参数/>的特征提取，得到用于描述舌体图像是否失真的多维自然场景边缘分布统计特征；In some embodiments of the first aspect, performing statistical feature extraction of natural images on the target tongue image to obtain multi-dimensional natural scene statistical features of the target tongue image includes the following steps: The target tongue image is down-sampled to obtain the down-sampled tongue image; the GGD fitting distribution method is used to calculate the shape parameter α and scale parameter σ of the target tongue image and the down-sampled tongue image respectively. Feature extraction is used to describe the multi-dimensional natural scene distribution statistical characteristics of whether the tongue image is distorted; the AGGD fitting distribution method is used to perform horizontal adjacent, vertical adjacent and summation of the target tongue image and the down-sampled tongue image. Two pairs of diagonally adjacent four-direction fitting shape parameters η, mean parameter μ, and left scale parameter and right scale parameters/> Feature extraction to obtain multi-dimensional natural scene edge distribution statistical features used to describe whether the tongue image is distorted;

其中，in,

GGD的具体计算公式如下：The specific calculation formula of GGD is as follows:

上列公式中：x是自然图像中某个像素块的亮度值所对应的均值减损对比归一化系数；α控制分布的形状；σ为标准差，控制方差；β为计算中间变量；μ为均值参数；In the above formula: mean parameter;

AGGD的具体计算公式如下：The specific calculation formula of AGGD is as follows:

上列公式中：x是自然图像中某个像素块的亮度值所对应的均值减损对比归一化系数；η控制分布的形状，左右尺度参数分别控制分布左右两边的方差，β_l和β_r均为计算中间变量，μ为均值参数。In the above formula: x is the mean loss contrast normalization coefficient corresponding to the brightness value of a certain pixel block in the natural image; eta controls the shape of the distribution, and the left and right scale parameters The variances on the left and right sides of the distribution are controlled respectively, β _l and β _r are both calculation intermediate variables, and μ is the mean parameter.

在第一方面的一些实施例中，将所述目标舌体图像的颜色空间转变为YIQ、YcbCr、HSV和CIELAB空间图像，提取YIQ、YcbCr、HSV和CIELAB空间图像中的灰度分量，得到所述目标舌体图像的多维颜色特征。In some embodiments of the first aspect, the color space of the target tongue image is converted into YIQ, YcbCr, HSV and CIELAB space images, and the grayscale components in the YIQ, YcbCr, HSV and CIELAB space images are extracted to obtain the The multi-dimensional color features of the target tongue image are described.

在第一方面的一些实施例中，所述深层语义特征的提取步骤具体包括：对ResNet-34网络模型添加具有多个神经元的全连接层，将所述舌体图像的数据集输入至所述ResNet-34网络模型，输出所述全连接层的神经元特征作为舌体图像的多维深层语义特征；其中，所述全连接层的神经元的数量与所述浅层特征的维度数量相同。In some embodiments of the first aspect, the step of extracting deep semantic features specifically includes: adding a fully connected layer with multiple neurons to the ResNet-34 network model, and inputting the data set of the tongue body image into the ResNet-34 network model. The ResNet-34 network model outputs the neuron features of the fully connected layer as multi-dimensional deep semantic features of the tongue body image; wherein the number of neurons in the fully connected layer is the same as the number of dimensions of the shallow layer features.

在第一方面的一些实施例中，在所述舌体分割模块包括YOLOv5s网络模型和U²-Net网络模型；将收集到的舌图像输入至所述YOLOv5s网络模型，提取包含完整舌体的目标区域；将所述目标区域输入至所述U²-Net网络模型，分割并提取完整的舌体图像，得到目标舌体图像。In some embodiments of the first aspect, the tongue body segmentation module includes a YOLOv5s network model and a U ² -Net network model; the collected tongue images are input to the YOLOv5s network model, and targets containing the complete tongue body are extracted. region; input the target region into the U ² -Net network model, segment and extract the complete tongue body image, and obtain the target tongue body image.

在第一方面的一些实施例中，在基于所述舌体图像分类模块对所述浅层特征和所述深层语义特征进行串联融合，并根据融合后的特征向量进行图像分类，得到舌图像质量评估结果，这一步骤中，具体包括：所述数据集包括多个类别；将所有类别的数据集两两组合，形成多个二分类任务；针对所述每个二分类任务，基于所述舌体图像分类模块对所述浅层特征和所述深层语义特征进行串联融合；根据融合后的特征向量，训练相应的二分类支持向量机SVM模型；基于所述舌图像数据集训练出多个二分类支持向量机模型，且所得SVM模型数量与数据集类别两两组合的个数相等；将待测的舌图像数据分别输入全部训练好的SVM模型，得到多个二分类结果；对其分类结果采用相对多数投票策略进行决策，完成多分类评估任务。In some embodiments of the first aspect, the shallow features and the deep semantic features are fused in series based on the tongue image classification module, and image classification is performed according to the fused feature vectors to obtain tongue image quality. Evaluating the results, this step specifically includes: the data set includes multiple categories; combining the data sets of all categories in pairs to form multiple two-classification tasks; for each of the two-classification tasks, based on the tongue The volume image classification module performs serial fusion of the shallow features and the deep semantic features; trains the corresponding two-class support vector machine SVM model according to the fused feature vectors; trains multiple two-class support vector machine SVM models based on the tongue image data set. Classify support vector machine models, and the number of SVM models obtained is equal to the number of pairwise combinations of categories in the data set; input the tongue image data to be tested into all the trained SVM models, and obtain multiple binary classification results; classify the results Use a relative majority voting strategy to make decisions and complete multi-classification evaluation tasks.

在第一方面的一些实施例中，在将所述多维浅层特征和所述多维深层语义特征进行串联融合前，还包括有以下步骤：对所述多维浅层特征和所述多维深层语义特征均进行线性归一化处理，得到放缩至[-1,1]的数据。In some embodiments of the first aspect, before the multi-dimensional shallow features and the multi-dimensional deep semantic features are serially fused, the following steps are further included: merging the multi-dimensional shallow features and the multi-dimensional deep semantic features. All are subjected to linear normalization processing to obtain data scaled to [-1,1].

第二方面，本方案提供了应用了第一方面的一种基于深度学习和特征融合的舌图像质量评估系统，包括：数据处理单元，用于将舌图像进行分割处理，得到目标舌体图像；控制器构建单元，用于基于深浅特征融合构建舌图像质量评估模型；计算单元，用于将所述目标舌体图像输入至所述舌图像质量评估模型进行评估，得到舌图像质量评估结果。In the second aspect, this solution provides a tongue image quality assessment system based on deep learning and feature fusion that applies the first aspect, including: a data processing unit used to segment the tongue image to obtain a target tongue body image; A controller construction unit is used to construct a tongue image quality assessment model based on deep and shallow feature fusion; a calculation unit is used to input the target tongue body image into the tongue image quality assessment model for assessment, and obtain a tongue image quality assessment result.

本发明的有益效果如下：The beneficial effects of the present invention are as follows:

基于深度学习的舌体分割模块，将收集到的舌图像输入至所述YOLOv5s网络模型，提取包含完整舌体的目标区域；将所述目标区域输入至所述U²-Net网络模型，分割并提取完整的舌体图像；相比传统的舌体分割，可以在提高舌体分割精度的基础上，有效减少舌图像处理的运算量。The tongue body segmentation module based on deep learning inputs the collected tongue images into the YOLOv5s network model, extracts the target area containing the complete tongue body; inputs the target area into the U ² -Net network model, segments and Extract a complete tongue body image; compared with traditional tongue body segmentation, it can effectively reduce the computational complexity of tongue image processing on the basis of improving the accuracy of tongue body segmentation.

基于深浅层特征融合的舌图像质量评估模型，其将舌图像的深层语义特征与浅层特征进行串联融合，并将串联融合后的图像特征向量用于模型训练，得到舌图像质量评估模型；相比于单一特征训练的模型，可以获取更加充分地舌图像特征，客观全面地判断舌图像质量情况，其分类准确性更高，适用于舌象分析系统，帮助其筛选到高质量的舌图像。A tongue image quality assessment model based on the fusion of deep and shallow features, which fuses the deep semantic features and shallow features of tongue images in series, and uses the serially fused image feature vectors for model training to obtain a tongue image quality assessment model; Compared with the model trained with a single feature, it can obtain more complete tongue image features and objectively and comprehensively judge the tongue image quality. Its classification accuracy is higher, and it is suitable for tongue image analysis systems to help it screen high-quality tongue images.

附图说明Description of drawings

为了更清楚地说明本发明的技术方案，下面将对实施方式中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solution of the present invention more clearly, the drawings needed to be used in the implementation will be briefly introduced below. Obviously, the drawings in the following description are only some implementations of the present invention. For ordinary people in the art, For technical personnel, other drawings can also be obtained based on these drawings without exerting creative work.

图1是本发明基于深度学习和特征融合的中医舌图像质量评估流程图。Figure 1 is a flow chart of the present invention's TCM tongue image quality assessment based on deep learning and feature fusion.

图2是本发明YOLOv5标签标注过程示意图。Figure 2 is a schematic diagram of the YOLOv5 tag annotation process of the present invention.

图3是本发明U²-Net标签标注过程示意图。Figure 3 is a schematic diagram of the U ² -Net labeling process of the present invention.

图4是本发明U²-Net标注后转换的mask图片。Figure 4 is a mask picture converted after U ² -Net annotation of the present invention.

图5是本发明YOLOv5s网络结构图。Figure 5 is a YOLOv5s network structure diagram of the present invention.

图6是本发明U²-Net网络结构图。Figure 6 is a U ² -Net network structure diagram of the present invention.

图7是本发明经过YOLOv5s粗分割后的舌体目标区域。Figure 7 is the tongue target area after rough segmentation by YOLOv5s according to the present invention.

图8是本发明经过U²-Net细分割后的目标舌体图像。Figure 8 is the target tongue image after U ² -Net fine segmentation according to the present invention.

图9是本发明基于ResNet-34的深层语义特征提取方法示意图。Figure 9 is a schematic diagram of the deep semantic feature extraction method based on ResNet-34 of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施方式中的附图，对本发明实施方式中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

在现有的中医舌图像数据采集中，因环境光照影响而产生的舌象舌质颜色变换、因运动抖动而产生的舌体图像模糊等情况均会造成一定的图像失真，为中医舌诊的智能化发展带来了挑战。为了保证数智舌诊的准确性需求，往往需要通过人工的方式判断采集到的舌图像是否合格。而上述方法不仅人工成本高，而且效率低，大大制约了中医舌诊的智能化发展。In the existing TCM tongue image data collection, the color change of the tongue image and tongue due to the influence of environmental lighting, the blurring of the tongue body image due to motion jitter, etc. will cause certain image distortion, which is a key issue for TCM tongue diagnosis. Intelligent development brings challenges. In order to ensure the accuracy of digital tongue diagnosis, it is often necessary to manually judge whether the collected tongue images are qualified. The above methods not only have high labor costs, but also have low efficiency, which greatly restricts the intelligent development of tongue diagnosis in traditional Chinese medicine.

为此，本发明设计出一种“先分割、后评估”的两阶段舌图像质量评估方法，如图1所示，依次完成对舌图像的粗分割和细分割，随后提取图像的深浅层特征，最后再基于特征融合后的中医舌图像质量评估模型进行舌图像质量分类。通过上述工作，一方面能降低算法计算量、提高舌图像分类精度，解决现有舌图像判断不准确、不高效的问题；另一方面，能够为中医临床研究人员提供高质量的、包含完整舌象信息的标准舌图像数据集，这对于建立中医舌图像标准数据库以及推动中医智能舌诊技术的发展都有非常重要的意义。To this end, the present invention designs a two-stage tongue image quality assessment method of "segment first, then evaluate". As shown in Figure 1, the coarse segmentation and fine segmentation of the tongue image are completed in sequence, and then the deep and shallow features of the image are extracted. , and finally the tongue image quality classification is performed based on the TCM tongue image quality assessment model after feature fusion. Through the above work, on the one hand, it can reduce the calculation amount of the algorithm, improve the accuracy of tongue image classification, and solve the problem of inaccurate and inefficient judgment of existing tongue images; on the other hand, it can provide high-quality, complete tongue images for clinical researchers of traditional Chinese medicine. A standard tongue image data set of image information, which is of great significance for establishing a standard database of TCM tongue images and promoting the development of TCM intelligent tongue diagnosis technology.

本发明提供了一种基于深度学习和特征融合的舌图像质量评估方法，包括以下步骤：The present invention provides a tongue image quality assessment method based on deep learning and feature fusion, which includes the following steps:

S1，构建舌图像的数据集；S1, construct a data set of tongue images;

具体地，其是指对收集到的舌图像数据集按舌图像质量进行分类，并对数据集进行划分，分层选择80％的舌图像数据作为训练集，10％的数据作为验证集，剩余10％用作测试集。Specifically, it refers to classifying the collected tongue image data sets according to tongue image quality, dividing the data set, hierarchically selecting 80% of the tongue image data as the training set, 10% of the data as the verification set, and the remaining 10% is used as the test set.

其中，分层选择是根据划分的高质量舌图像、模糊舌图像、高亮度舌图像及低亮度舌图像，从此四者中各自进行分层选择，譬如，高质量舌图像中选择80％的舌图像数据作为训练集，10％的数据作为验证集，剩余10％用作测试集，模糊舌图像、高亮度舌图像及低亮度舌图像可以依此类推。Among them, the hierarchical selection is based on the divided high-quality tongue image, blurred tongue image, high-brightness tongue image and low-brightness tongue image. From then on, hierarchical selection is performed separately among the four. For example, 80% of the tongues are selected from the high-quality tongue image. The image data is used as a training set, 10% of the data is used as a verification set, and the remaining 10% is used as a test set. The blurred tongue image, high-brightness tongue image, and low-brightness tongue image can be deduced in this way.

S2，基于舌图像数据对舌图像质量评估模型进行训练，得到训练好的评估模型；S2, train the tongue image quality assessment model based on the tongue image data, and obtain the trained assessment model;

所述评估模型包括舌体分割模块、特征提取模块和舌体图像分类模块。The evaluation model includes a tongue segmentation module, a feature extraction module and a tongue image classification module.

对于分割模块而言，如图1所示的舌体分割部分，舌体分割模块主要用于对待测舌图像进行分割处理，得到完整的舌体图像。For the segmentation module, the tongue segmentation part is shown in Figure 1. The tongue segmentation module is mainly used to segment the tongue image to be measured to obtain a complete tongue image.

具体的，分割模块包括基于YOLOv5s网络模型的粗分割(输出为包含完整舌体的目标区域)和基于U²-Net网络模型的细分割(输出为目标舌体图像)，YOLOv5s网络模型和U²-Net网络模型均需要进行预训练，以满足分割要求。Specifically, the segmentation module includes coarse segmentation based on the YOLOv5s network model (the output is the target area containing the complete tongue) and fine segmentation based on the U ² -Net network model (the output is the target tongue image). The YOLOv5s network model and U ² -Net network models all need to be pre-trained to meet segmentation requirements.

在对分割模型进行预训练前，首先使用labelImg标注工具对舌图像进行标注，图2为标注过程示意图，选择保存类型为YOLO格式，该格式文件包含每张舌图像的目标舌体区域的边框坐标信息，并将目标区域的边框坐标信息以.txt的文件格式进行保存；类似地，为训练U2-Net网络，将收集到的舌图像采用labelme软件进行手动标注，图3为标注过程示意图，并以.json的文件格式进行保存，每个.json文件中均包含图像的标签类别、每个点的坐标信息、标注图形的类别形状以及图像信息等，将文件转换为mask形式，并以.png图片格式保存，图4为其标注后的mask图片。Before pre-training the segmentation model, first use the labelImg annotation tool to annotate the tongue images. Figure 2 is a schematic diagram of the annotation process. Select the save type as YOLO format. This format file contains the border coordinates of the target tongue area of each tongue image. information, and save the border coordinate information of the target area in a .txt file format; similarly, in order to train the U2-Net network, the collected tongue images are manually annotated using labelme software. Figure 3 is a schematic diagram of the annotation process, and Save in .json file format. Each .json file contains the label category of the image, the coordinate information of each point, the category shape of the annotated graphic, and image information. Convert the file to mask form and save it as .png The image format is saved, and Figure 4 is the annotated mask image.

有关粗分割网络模型，采用网络深度小、但检测速度最快的YOLOv5s网络(其网络结构如图5所示)检测包含完整舌体的目标区域。YOLOv5s模型包括输入层(Input)、骨干网络层(Backbone)、特征金字塔层(Neck)和预测层(Predict)；将图片输入至输入层后，通过Backone进行提取特征并不断缩小特征图后，利用Neck结构来实现不同层级的特征图融合，基于融合后的特征预测目标边框。基于该模型，利用所述预处理后的训练集进行模型训练，训练过程中输入的舌图像尺寸被调整为640*640，每次迭代处理batchsize(批量大小)为10，训练200epoch(迭代次数)结束。之后使用所述验证集进行验证优化，重复上述过程，直到满足分割精度要求，得到训练好的YOLOv5网络模型。Regarding the coarse segmentation network model, the YOLOv5s network (its network structure is shown in Figure 5), which has a small network depth but the fastest detection speed, is used to detect the target area containing the complete tongue body. The YOLOv5s model includes the input layer (Input), the backbone network layer (Backbone), the feature pyramid layer (Neck) and the prediction layer (Predict); after inputting the image to the input layer, the features are extracted through Backone and the feature map is continuously reduced, using Neck structure to achieve different levels of feature map fusion, and predict the target border based on the fused features. Based on this model, the preprocessed training set is used for model training. The tongue image size input during the training process is adjusted to 640*640. The batchsize (batch size) of each iteration is 10, and the training is 200 epochs (number of iterations). Finish. The verification set is then used for verification optimization, and the above process is repeated until the segmentation accuracy requirements are met, and the trained YOLOv5 network model is obtained.

有关细分割网络模型，采用U²-Net网络去除嘴唇等少量干扰信息，提取完整的目标舌体图像。图6为U²-Net网络结构图，该结构整体框架为11个阶段组成的大U型结构，每个阶段为U型残差块(Residual U-blocks,RSU)填充，网络中包含两种结构的RSU，即编码器(Encoder,En)和译码器(Decoder,De)，编码器作用在于对特征图进行下采样，而译码器作用在于对特征图进行上采样，网络底部为膨胀卷积，其作用在于使特征图大小不变，最终将每个阶段的特征图融合，经过卷积层及上采样恢复图像大小后进行拼接，通过sigmoid函数输出最终分割结果。基于该网络，利用所述预处理后的训练集进行模型训练，其中每次迭代处理的batchsize(批量大小)为10，设置超参数学习率为0.001，训练360epoch(迭代次数)结束，之后使用所述验证集进行验证优化，重复上述过程，直到满足分割精度要求，获得训练好的U²-Net网络模型。Regarding the fine segmentation network model, the ^U2 -Net network is used to remove a small amount of interference information such as lips, and extract a complete target tongue image. Figure 6 is a U ² -Net network structure diagram. The overall framework of the structure is a large U-shaped structure composed of 11 stages. Each stage is filled with U-shaped residual blocks (Residual U-blocks, RSU). The network contains two The RSU of the structure is the encoder (Encoder, En) and the decoder (Decoder, De). The encoder is used to downsample the feature map, while the decoder is used to upsample the feature map. The bottom of the network is expanded. Convolution, its function is to keep the size of the feature map unchanged, and finally fuse the feature maps of each stage. After the convolution layer and upsampling restore the image size, it is spliced, and the final segmentation result is output through the sigmoid function. Based on this network, the preprocessed training set is used for model training, in which the batchsize (batch size) of each iteration is 10, the hyperparameter learning rate is set to 0.001, and the training is completed for 360 epoch (number of iterations), and then the Perform verification optimization on the above verification set, repeat the above process until the segmentation accuracy requirements are met, and the trained U ² -Net network model is obtained.

当完成YOLOv5s网络模型和U²-Net网络模型的训练后，将待测的舌图像数据输入至已训练好的YOLOv5s网络模型，进行粗分割，检测获得舌图像中包含完整舌体的目标区域坐标，并输出该目标区域，完成舌图像的粗分割，图7为粗分割后的舌体目标区域；随后进行细分割，即将上述舌体目标区域输入至已训练好的U²-Net网络模型，输出完整的目标舌体图像，并将干扰信息背景设置为黑色，完成舌体分割，图8为细分割后的目标舌体图像。After completing the training of the YOLOv5s network model and U2 ^- Net network model, input the tongue image data to be tested into the trained YOLOv5s network model, perform rough segmentation, and detect and obtain the coordinates of the target area containing the complete tongue body in the tongue image. , and output the target area to complete the rough segmentation of the tongue image. Figure 7 shows the tongue target area after rough segmentation; then perform fine segmentation, that is, input the above tongue target area into the trained U ² -Net network model, Output the complete target tongue image, and set the interference information background to black to complete the tongue segmentation. Figure 8 shows the target tongue image after fine segmentation.

对于特征提取模块而言，特征提取模块用于对分割后的舌体图像分别进行特征提取处理，得到浅层特征和深层语义特征。For the feature extraction module, the feature extraction module is used to perform feature extraction processing on the segmented tongue body images to obtain shallow features and deep semantic features.

有关浅层特征，浅层特征分别包括纹理特征、自然场景统计特征和颜色特征，三者总共88维特征。Regarding shallow features, shallow features include texture features, natural scene statistical features and color features, with a total of 88-dimensional features.

在纹理特征提取中，主要采用灰度共生矩阵，灰度共生矩阵是一种基于图像中灰度结构重复出现的概率的纹理描述方法，它被定义为沿着θ方向且像素间隔距离为d的灰度值分别为i和j的像素对共同出现的频数，而本发明通过计算所述舌图像数据灰度共生矩阵的对比度、相异性、均匀性、能量、相关性和角二阶矩来得出其纹理特征。In texture feature extraction, the gray-level co-occurrence matrix is mainly used. The gray-level co-occurrence matrix is a texture description method based on the probability of repeated gray-level structures in the image. It is defined as the pixel separation distance d along the θ direction. The gray value is the frequency of co-occurrence of pixel pairs i and j respectively, and the present invention obtains it by calculating the contrast, dissimilarity, uniformity, energy, correlation and angular second-order moment of the gray co-occurrence matrix of the tongue image data its texture characteristics.

对比度的计算公式：Contrast calculation formula:

相异性的计算公式：Dissimilarity calculation formula:

均匀性的计算公式：Calculation formula for uniformity:

能量的计算公式：Energy calculation formula:

相关性的计算公式：Correlation calculation formula:

其中，in,

角二阶阵的计算公式：The calculation formula of angular second-order matrix:

其中，i和j分别代表像素(x,y)和像素(x+Δx,y+Δy)对应的灰度值；P(i,j,d,θ)表示灰度为i的像素(x,y)与灰度为j的像素(x+Δx,y+Δy)同时出现的概率；d是两个像素间的间隔距离，这里分别取1和2；θ是灰度矩阵的生成方向，通常取0°、45°、90°、135°四个方向；n为像素的灰度级数。即本发明通过从0°、45°、90°、135°四个方向进行提取，并将间隔距离d分别取1和2，由此获得8个灰度共生矩阵，每一个灰度共生矩阵通过对比度、相异性、均匀性、能量、相关性和角二阶矩的公式各获取6个纹理特征，总共得到48维纹理特征。Among them, i and j represent the gray value corresponding to the pixel (x, y) and the pixel (x+Δx, y+Δy) respectively; P(i, j, d, θ) represents the pixel (x, y) The probability of appearing simultaneously with the pixel (x+Δx, y+Δy) with gray level j; d is the distance between the two pixels, here respectively taken as 1 and 2; θ is the generation direction of the gray matrix, usually Take four directions: 0°, 45°, 90°, and 135°; n is the gray level of the pixel. That is, the present invention extracts from four directions of 0°, 45°, 90°, and 135°, and takes the separation distance d as 1 and 2 respectively, thereby obtaining 8 gray-level co-occurrence matrices. Each gray-level co-occurrence matrix passes The formulas of contrast, dissimilarity, uniformity, energy, correlation and angular second moment each obtain 6 texture features, resulting in a total of 48-dimensional texture features.

在自然场景统计特征提取中，本发明对每幅舌图像用GGD(零均值广义高斯分布)拟合分布获得的形状参数α和尺度参数σ作为舌图像的自然场景分布统计特征1和统计特征2，共2维的自然场景统计特征。In the natural scene statistical feature extraction, the present invention uses GGD (zero-mean generalized Gaussian distribution) fitting distribution for each tongue image to obtain the shape parameter α and scale parameter σ as the natural scene distribution statistical feature 1 and statistical feature 2 of the tongue image. , a total of 2-dimensional statistical characteristics of natural scenes.

还对每幅舌图像用AGGD(零均值非对称的广义高斯模型)分别从水平相邻(Horizontal,H)、垂直相邻(Vertical,V)、对角相邻(Diagonal Direction,D₁,D₂)4个方向对自然图像相邻系数的统计分布进行拟合，提取上述四种相邻的自然场景统计系数分布的：We also use AGGD (zero-mean asymmetric generalized Gaussian model) for each tongue image from horizontal adjacent (Horizontal, H), vertical adjacent (Vertical, V), diagonal adjacent (Diagonal Direction, D ₁ , D ₂ ) Fit the statistical distribution of adjacent coefficients of natural images in four directions, and extract the statistical coefficient distributions of the above four adjacent natural scenes:

形状参数均值参数/> Shape parameters Mean parameter/>

左尺度参数和右尺度参数/> left scale parameter and right scale parameters/>

共16维的自然场景统计特征。A total of 16-dimensional statistical characteristics of natural scenes.

因此，结合GGD和AGGD对每幅舌体图像的提取，总共提取18维自然场景统计特征。为了获取更加丰富的舌图像自然场景统计特征，以表征自然场景下舌图像质量的统计特性变化，本文采用两种不同的尺度特征形式，包括原始图像以及对图像进行一次下采样的下采样图像。Therefore, by combining the extraction of each tongue body image with GGD and AGGD, a total of 18-dimensional natural scene statistical features are extracted. In order to obtain richer statistical features of natural scenes of tongue images and characterize changes in statistical characteristics of tongue image quality under natural scenes, this paper uses two different scale feature forms, including original images and downsampled images that downsample the image once.

具体的，首先将舌体图像进行下采样，得到下采样后舌体图像；随后利用GGD拟合分布方法对分割后舌体图像和下采样后舌体图像进行形状参数α和尺度参数σ的特征提取，得到用于描述舌体图像是否失真的多维自然场景分布统计特征。Specifically, the tongue image is first down-sampled to obtain the down-sampled tongue image; then the GGD fitting distribution method is used to characterize the shape parameter α and scale parameter σ on the segmented tongue image and the down-sampled tongue image. Extract and obtain multi-dimensional natural scene distribution statistical features used to describe whether the tongue image is distorted.

其中，GGD的具体计算如下：Among them, the specific calculation of GGD is as follows:

其中，in,

上列公式中：x是自然图像中某个像素块的亮度值所对应的均值减损对比归一化(mean subtracted contrast normalized,MSCN)系数(由于自然图像归一化后的亮度值趋近于单位高斯特性，故常采用MSCN系数描述舌图像归一化后的亮度值)；α控制分布的形状；σ为标准差，控制方差；β为计算中间变量；t是伽马函数Γ(·)的自变量，积分区间为[0,+∞)。In the above formula: x is the mean subtracted contrast normalized (MSCN) coefficient corresponding to the brightness value of a certain pixel block in the natural image (because the normalized brightness value of the natural image approaches unity Gaussian characteristics, so the MSCN coefficient is often used to describe the normalized brightness value of the tongue image); α controls the shape of the distribution; σ is the standard deviation, which controls the variance; β is the calculation intermediate variable; t is the natural value of the gamma function Γ(·) Variable, the integration interval is [0,+∞).

之后，再利用AGGD拟合分布方法对所述舌体图像和下采样后舌体图像分别从水平相邻(Horizontal,H)、垂直相邻(Vertical,V)、对角相邻(Diagonal Direction,D₁,D₂)四个方向对自然图像相邻系数的统计分布进行拟合，提取上述四种相邻的自然场景统计系数分布的形状参数η、均值参数μ、左尺度参数和右尺度参数/>得到用于描述舌体图像是否失真的多维自然场景边缘分布统计特征。After that, the AGGD fitting distribution method is used to classify the tongue image and the down-sampled tongue image from horizontal adjacent (Horizontal, H), vertical adjacent (Vertical, V), and diagonal adjacent (Diagonal Direction, respectively). D ₁ , D ₂ ) fit the statistical distribution of adjacent coefficients of natural images in four directions, and extract the shape parameter η, mean parameter μ, and left scale parameter of the above four adjacent natural scene statistical coefficient distributions. and right scale parameters/> The multi-dimensional natural scene edge distribution statistical characteristics used to describe whether the tongue image is distorted are obtained.

AGGD的具体计算如下：The specific calculation of AGGD is as follows:

其中，in,

上列公式中：x是自然图像中某个像素块的亮度值所对应的均值减损对比归一化(mean subtracted contrast normalized,MSCN)系数；η控制分布的形状，左右尺度参数分别控制分布左右两边的方差，β_l和β_r均为计算中间变量，μ为均值参数。In the above formula: x is the mean subtracted contrast normalized (MSCN) coefficient corresponding to the brightness value of a certain pixel block in the natural image; eta controls the shape of the distribution, and the left and right scale parameters The variances on the left and right sides of the distribution are controlled respectively, β _l and β _r are both calculation intermediate variables, and μ is the mean parameter.

进而提取拟合后的模型参数组成以下16维自然场景边缘分布统计特征：Then the fitted model parameters are extracted to form the following 16-dimensional natural scene edge distribution statistical characteristics:

最后，对于每张舌图像总共获得36维空间域下的自然场景统计特征。Finally, for each tongue image, a total of natural scene statistical features in the 36-dimensional spatial domain are obtained.

在颜色特征提取中，为满足实际应用需求，需使舌图像的RGB颜色空间转换为其他颜色空间。本发明通过scikit-image库中的color函数对RGB舌图像进行颜色空间变换，将输入的RGB舌图像变换至YIQ、YcbCr、HSV、CIELAB颜色空间图像，分别提取每一类颜色空间中的灰度分量，获得4维的舌图像颜色特征。In color feature extraction, in order to meet practical application requirements, the RGB color space of the tongue image needs to be converted into other color spaces. This invention uses the color function in the scikit-image library to perform color space transformation on the RGB tongue image, transforms the input RGB tongue image into YIQ, YcbCr, HSV, CIELAB color space images, and extracts the grayscale in each type of color space respectively. components to obtain 4-dimensional tongue image color features.

其中，RGB是代表红、绿、蓝三个通道的颜色；YIQ颜色空间是从YUV颜色空间推导而来，旨在利用人类色彩响应特性，其中，Y代表亮度(Brightness)，表示每一个颜色的一个亮度信号，I代表同相(In-phase)色彩，色彩从橙色到青色，Q代表正交(Quadrature-phase)色彩，色彩从紫色到黄绿色；YCbCr是YUV颜色空间经过缩放和偏移的翻版，其中Y是亮度信号，Cb、Cr同样都指色彩，只是在表示方法上不同而已；HSV是根据颜色的直观特性创建的一种颜色空间，HSV即色相(Hue)、饱和度(Saturation)、明度(Value)；CIELAB颜色模型是基于人对颜色的感觉设计的，其数值描述正常视力的人能够看到的所有颜色，它由三个要素组成：亮度(L)、颜色通道a和b，其中a包括的颜色是从深绿色(低亮度值)到灰色(中亮度值)再到亮粉红色(高亮度值)；b是从亮蓝色(低亮度值)到灰色(中亮度值)再到黄色(高亮度值)。Among them, RGB represents the color of the three channels of red, green, and blue; YIQ color space is derived from the YUV color space and aims to take advantage of human color response characteristics. Among them, Y represents brightness (Brightness), indicating the brightness of each color. A brightness signal, I represents the in-phase color, the color ranges from orange to cyan, Q represents the quadrature-phase color, the color ranges from purple to yellow-green; YCbCr is a scaled and shifted version of the YUV color space , where Y is the brightness signal, Cb and Cr both refer to color, but they are expressed in different ways; HSV is a color space created based on the intuitive characteristics of color, HSV refers to hue (Hue), saturation (Saturation), Lightness (Value); the CIELAB color model is designed based on people's perception of color. Its numerical value describes all the colors that people with normal vision can see. It consists of three elements: brightness (L), color channels a and b, The colors a includes are from dark green (low brightness value) to gray (medium brightness value) to bright pink (high brightness value); b is from bright blue (low brightness value) to gray (medium brightness value) to yellow (high brightness value).

至此，对于每张输入的待测试舌图像，共提取88维浅层特征，即48维基于灰度共生矩阵的纹理特征、36维空间域中的自然场景统计特征和4维颜色特征。So far, for each input tongue image to be tested, a total of 88-dimensional shallow features have been extracted, namely 48-dimensional texture features based on gray-level co-occurrence matrix, natural scene statistical features in the 36-dimensional spatial domain, and 4-dimensional color features.

有关深层语义特征，如图9所示，深层语义特征的提取是对ResNet-34网络模型添加具有多个神经元的全连接层，将舌体图像的数据集输入至ResNet-34网络模型，输出全连接层的神经元特征作为舌体图像的多维深层语义特征，其中，全连接层的神经元个数与浅层特征的维度数量相同。Regarding deep semantic features, as shown in Figure 9, the extraction of deep semantic features is to add a fully connected layer with multiple neurons to the ResNet-34 network model, input the tongue body image data set to the ResNet-34 network model, and output The neuron features of the fully connected layer are used as multi-dimensional deep semantic features of the tongue body image. The number of neurons in the fully connected layer is the same as the number of dimensions of the shallow features.

具体地，将标注好的舌体图像训练集输入至ResNet-34网络中进行模型训练，训练过程中每次迭代处理的batchsize为10，训练300epoch结束，之后使用所述验证集进行验证优化，重复上述过程，直到满足分类精度要求，获得训练好的ResNet-34网络，对于待评估的舌图像数据，提取全连接层输出的88维特征数值作为该舌图像的深层语义特征。Specifically, the annotated tongue body image training set is input into the ResNet-34 network for model training. During the training process, the batch size of each iteration is 10, and the training is completed after 300 epochs. The verification set is then used for verification optimization, and repeated The above process is carried out until the classification accuracy requirements are met and the trained ResNet-34 network is obtained. For the tongue image data to be evaluated, the 88-dimensional feature values output by the fully connected layer are extracted as the deep semantic features of the tongue image.

对于舌图像质量分类模块而言，分类模块用于对浅层特征和深层语义特征进行归一化处理和串联融合，并根据融合后的特征向量进行图像分类，得到舌图像质量评估结果。For the tongue image quality classification module, the classification module is used to normalize and fuse shallow features and deep semantic features in series, and perform image classification based on the fused feature vectors to obtain tongue image quality assessment results.

首先，为了排除不同类特征值之间的量纲影响，需要将提取后的特征数据进行归一化，利用线性归一化方法将数据放缩至[-1,1]之间，其归一化公式如下：First, in order to eliminate the dimensional influence between different types of feature values, the extracted feature data needs to be normalized, and the linear normalization method is used to scale the data to between [-1,1]. The formula is as follows:

式中，x表示原始数据，x’表示线性归一化后的数据，x_max表示原始数据中的最大值，x_min表示最小值，x_mean表示均值。In the formula, x represents the original data, x' represents the linearly normalized data, x _max represents the maximum value in the original data, x _min represents the minimum value, and x _mean represents the mean.

然后，采用串联融合的方式，将提取到的88维浅层特征和88维深层语义特征进行特征融合，获得一个新的176维的舌图像特征向量；Then, a series fusion method is used to fuse the extracted 88-dimensional shallow features and 88-dimensional deep semantic features to obtain a new 176-dimensional tongue image feature vector;

最后，将所有类别的数据集两两组合，形成多个二分类任务；针对所述每个二分类任务，基于所述舌体图像质量分类模块对输入的舌图像数据进行所述浅层特征和所述深层语义特征的特征提取、归一化处理和串联融合；根据融合后的特征向量，训练相应的二分类支持向量机SVM模型；基于所述舌图像数据集训练出多个二分类支持向量机模型，且所得SVM模型数量与数据集类别两两组合的个数相等；将待测的舌图像数据分别输入所述全部训练好的SVM模型，得到多个二分类结果；之后对其分类结果采用相对多数投票策略进行决策，完成多分类评估任务。Finally, the data sets of all categories are combined in pairs to form multiple two-classification tasks; for each of the two-classification tasks, the shallow feature summation is performed on the input tongue image data based on the tongue image quality classification module. Feature extraction, normalization processing and serial fusion of the deep semantic features; training the corresponding two-class support vector machine SVM model according to the fused feature vectors; training multiple two-class support vectors based on the tongue image data set machine model, and the number of SVM models obtained is equal to the number of pairwise combinations of categories in the data set; input the tongue image data to be tested into all the trained SVM models to obtain multiple binary classification results; and then classify the results Use a relative majority voting strategy to make decisions and complete multi-classification evaluation tasks.

其中，支持向量机模型具体采用高斯核函数(RBF核函数)，核函数系数为0.1，错误项惩罚因子C为1。Among them, the support vector machine model specifically uses Gaussian kernel function (RBF kernel function), the kernel function coefficient is 0.1, and the error term penalty factor C is 1.

RBF核函数公式如下：The RBF kernel function formula is as follows:

式中，K(x_i,x_j)表示核函数，δ为高斯核的宽带，其值大于零。In the formula, K( _xi ,x _j ) represents the kernel function, δ is the broadband of the Gaussian kernel, and its value is greater than zero.

至此，便完成了基于深度学习和特征融合的中医舌图像质量评估模型的建立。At this point, the establishment of a TCM tongue image quality assessment model based on deep learning and feature fusion has been completed.

S3，将待测的舌图像输入至所述训练好的中医舌图像质量评估模型。S3: Input the tongue image to be tested into the trained TCM tongue image quality assessment model.

具体地，将待测的舌图像输入至上述中医舌图像质量评估模型，进行舌图像质量评估，得到最终的评估结果，若为高质量舌图像则直接保存该图像，用于后续舌像处理；否则，将提示该舌图像存在的质量问题。Specifically, the tongue image to be tested is input into the above-mentioned TCM tongue image quality assessment model, the tongue image quality is assessed, and the final assessment result is obtained. If it is a high-quality tongue image, the image is directly saved for subsequent tongue image processing; Otherwise, it will prompt that there is a quality problem with the tongue image.

综上所述，上述实施例中，所提出的基于深度学习和特征融合的中医舌图像质量评估，分为两个阶段：基于YOLOv5和U²-Net的中医舌体分割与基于深浅层舌图像特征融合的中医舌图像质量评估。在第一阶段，首先基于YOLOv5网络获得舌体目标区域的框体坐标，并提取包含完整舌体的舌体目标区域；之后将该目标区域输入于U²-Net网络，并从中分割出完整的舌体图像，作为后续舌图像质量评估的输入数据。在第二阶段，通过提取舌体图像的基于灰度共生矩阵的纹理特征、基于自然场景统计的特征和颜色特征共88维浅层特征，通过在ResNet-34输出层前添加具有88个神经元的全连接层来获取舌图像的88维深层语义特征。将浅层特征与深层语义特征进行归一化处理和串联特征融合后，训练支持向量机模型，最终完成中医舌图像质量评估。To sum up, in the above embodiments, the proposed TCM tongue image quality assessment based on deep learning and feature fusion is divided into two stages: TCM tongue body segmentation based on YOLOv5 and ^U2 -Net and deep and shallow tongue image-based Quality assessment of traditional Chinese medicine tongue images using feature fusion. In the first stage, the frame coordinates of the tongue target area are first obtained based on the YOLOv5 network, and the tongue target area containing the complete tongue is extracted; then the target area is input into the U ² -Net network, and the complete tongue is segmented from it Tongue body image is used as input data for subsequent tongue image quality assessment. In the second stage, a total of 88-dimensional shallow features are extracted from the tongue image based on the texture features based on the gray level co-occurrence matrix, features based on natural scene statistics, and color features, and 88 neurons are added in front of the ResNet-34 output layer. The fully connected layer is used to obtain the 88-dimensional deep semantic features of the tongue image. After normalizing the shallow features and deep semantic features and fusing the concatenated features, the support vector machine model is trained, and the quality assessment of the TCM tongue image is finally completed.

从上文可知，本发明第一方面中的基本内容，下文将给出应用第一方面的方法的系统，本方案提供了一种基于深度学习和特征融合的舌图像质量评估系统。As can be seen from the above, the basic contents of the first aspect of the present invention are as follows. A system applying the method of the first aspect will be given below. This solution provides a tongue image quality assessment system based on deep learning and feature fusion.

具体包括，数据处理单元，用于将舌图像进行分割处理，得到分割后的目标舌体图像；控制器构建单元，用于基于深浅特征融合构建舌图像质量评估模型；计算单元，用于将所述舌体图像输入至所述舌图像质量评估模型进行评估，得到舌图像质量评估结果。Specifically, it includes a data processing unit for segmenting the tongue image to obtain a segmented target tongue image; a controller construction unit for constructing a tongue image quality assessment model based on deep and shallow feature fusion; and a calculation unit for converting all the tongue images into The tongue body image is input to the tongue image quality assessment model for evaluation, and a tongue image quality assessment result is obtained.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也视为本发明的保护范围。The above is the preferred embodiment of the present invention. It should be pointed out that for those of ordinary skill in the art, several improvements and modifications can be made without departing from the principles of the present invention. These improvements and modifications are also regarded as It is the protection scope of the present invention.

Claims

1. The tongue image quality evaluation method based on deep learning and feature fusion is characterized by comprising the following steps of:

constructing a tongue image dataset;

training the evaluation model based on the tongue image data to obtain a trained evaluation model;

inputting the tongue image to be tested into the trained evaluation model;

the evaluation model comprises a tongue segmentation module, a feature extraction module and a tongue image classification module;

dividing the tongue image to be detected based on the tongue dividing module to obtain a target tongue image;

respectively carrying out feature extraction processing on the target tongue body image based on the feature extraction module to obtain shallow features and deep semantic features;

wherein the shallow features at least comprise texture features, natural scene statistical features and color features;

and carrying out normalization and series fusion on the shallow layer features and the deep semantic features based on the tongue image classification module, and carrying out image classification according to the fused feature vectors to obtain tongue image quality assessment results.

2. The tongue image quality evaluation method based on deep learning and feature fusion according to claim 1, wherein the shallow feature extraction step specifically comprises:

converting the target tongue image into a gray level co-occurrence matrix, and obtaining multidimensional texture features of the target tongue image by calculating the gray level co-occurrence matrix;

carrying out statistic feature extraction of natural images on the target tongue image to obtain multidimensional natural scene statistic features of the target tongue image;

and performing color space conversion on the target tongue image to obtain multi-dimensional color characteristics of the target tongue image.

3. The tongue image quality assessment method based on deep learning and feature fusion according to claim 2, wherein the texture features include contrast, dissimilarity, homogeneity, energy, correlation and angular second moment, which are specifically calculated as follows:

the calculation formula of the contrast ratio is as follows:

the calculation formula of the dissimilarity is as follows:

the calculation formula of the uniformity is as follows:

the energy calculation formula:

the calculation formula of the correlation is as follows:

wherein,

the calculation formula of the angular second-order array comprises the following steps:

wherein i and j represent the gray values corresponding to the pixel (x, y) and the pixel (x+Δx, y+Δy), respectively; p (i, j, d, θ) represents the probability that a pixel (x, y) with a gray level i and a pixel (x+Δx, y+Δy) with a gray level j are present at the same time; d is the separation distance between two pixels, d is taken as 1 and 2; θ is the generation direction of the gray matrix, and θ takes four directions of 0 °, 45 °, 90 ° and 135 °; n is the number of gray levels of the pixel.

4. The tongue image quality evaluation method based on deep learning and feature fusion according to claim 2, wherein the step of extracting the statistical features of the natural image from the target tongue image to obtain the multi-dimensional natural scene statistical features of the target tongue image comprises the steps of:

downsampling the target tongue image to obtain a downsampled tongue image;

the feature extraction of the shape parameter alpha and the scale parameter sigma is respectively carried out on the target tongue image and the downsampled tongue image by using a GGD fitting distribution method, so as to describe whether the tongue image is distorted or not;

performing four-direction fitting shape parameters eta, mean parameter mu and left scale parameter of horizontal adjacent, vertical adjacent and two pairs of diagonal adjacent on the target tongue image and the downsampled tongue image by using an AGGD fitting distribution methodAnd right scale parameter->The feature extraction of the tongue image is used for obtaining multi-dimensional natural scene edge distribution statistical features for describing whether the tongue image is distorted;

wherein,

the specific calculation formula of GGD is as follows:

in the formula: x is the average value loss contrast normalization coefficient corresponding to the brightness value of a certain pixel block in the natural image; the α controls the shape of the distribution; sigma is standard deviation, and the variance is controlled; beta is a calculated intermediate variable; t is the argument of gamma function Γ (), the integration interval is set to be 0, ++ infinity a) is provided;

the specific calculation formula of AGGD is as follows:

in the formula: x is the average value loss contrast normalization coefficient corresponding to the brightness value of a certain pixel block in the natural image; η controls the shape of the distribution, the left and right scale parametersControlling the variance of the left and right sides of the distribution, beta _l And beta _r All are calculated intermediate variables, and mu is a mean parameter.

5. The tongue image quality assessment method based on deep learning and feature fusion according to claim 2, wherein,

and converting the color space of the target tongue image into YIQ, ycbCr, HSV and CIELAB space images, and extracting gray components in YIQ, ycbCr, HSV and CIELAB space images to obtain the multi-dimensional color characteristics of the target tongue image.

6. The tongue image quality assessment method based on deep learning and feature fusion according to claim 1, wherein the deep semantic feature extraction step specifically comprises:

adding a full-connection layer with a plurality of neurons to a ResNet-34 network model, inputting a data set of the tongue image to the ResNet-34 network model, and outputting neuron characteristics of the full-connection layer as multidimensional deep semantic characteristics of the tongue image;

wherein the number of neurons of the fully connected layer is the same as the number of dimensions of the shallow features.

7. The tongue image quality assessment method based on deep learning and feature fusion according to claim 1, wherein the tongue segmentation module comprises a YOLOv5s network model and a U ² -Net network model;

inputting the collected tongue image into the YOLOv5s network model, and extracting a target area containing a complete tongue body;

inputting the target area into the U ² -Net network model, segmenting and extracting complete tongue image, obtaining target tongue image.

8. The tongue image quality evaluation method based on deep learning and feature fusion according to claim 1, wherein the step of performing series fusion on the shallow features and the deep semantic features based on the tongue image classification module and performing image classification according to the fused feature vectors to obtain a tongue image quality evaluation result specifically comprises the following steps:

the dataset includes a plurality of categories;

combining all types of data sets pairwise to form a plurality of classification tasks;

aiming at each classification task, carrying out series fusion on the shallow layer features and the deep semantic features based on the tongue image classification module;

training a corresponding bi-classification Support Vector Machine (SVM) model according to the fused feature vectors;

training a plurality of two-classification support vector machine models based on the tongue image data set, wherein the number of the obtained SVM models is equal to the number of the two-to-two combination of the data set types;

respectively inputting tongue image data to be detected into all trained SVM models to obtain a plurality of classification results;

and adopting a relative majority voting strategy to make a decision on the classification result, and completing the multi-classification evaluation task.

9. The tongue image quality assessment method based on deep learning and feature fusion according to claim 1, further comprising the steps of, before the multi-dimensional shallow features and the multi-dimensional deep semantic features are fused in series:

and carrying out linear normalization processing on the multidimensional shallow layer features and the multidimensional deep semantic features to obtain data scaled to [ -1,1 ].

10. A tongue image quality evaluation system based on deep learning and feature fusion, characterized in that the tongue image quality evaluation method based on deep learning and feature fusion according to any one of claims 1 to 9 is applied, comprising:

the data processing unit is used for carrying out segmentation processing on the tongue image to obtain a target tongue image;

the controller building unit is used for building a tongue image quality evaluation model based on depth feature fusion;

the computing unit is used for inputting the target tongue image into the tongue image quality evaluation model for evaluation, and obtaining a tongue image quality evaluation result.