CN120600315A

CN120600315A - Thyroid tumor classification and recurrence risk prediction method based on deep learning

Info

Publication number: CN120600315A
Application number: CN202510766187.4A
Authority: CN
Inventors: 冯嘉伟
Original assignee: First Peoples Hospital of Changzhou
Current assignee: First Peoples Hospital of Changzhou
Priority date: 2025-06-10
Filing date: 2025-06-10
Publication date: 2025-09-05

Abstract

The present invention discloses a method for thyroid tumor classification and recurrence risk prediction based on deep learning, which belongs to the field of medical image processing and includes the following steps: collecting and preprocessing multimodal data related to thyroid follicular carcinoma (FTC) or follicular adenoma (FTA); extracting features from the preprocessed multimodal data to obtain image features and pathological features; using different fusion and optimization methods for the combination of image features and pathological features, as well as for the image features, to obtain a first optimized fusion feature and a second optimized fusion feature; inputting the first optimized fusion feature into a preoperative classification model to obtain a thyroid tumor type classification probability, and inputting the second optimized fusion feature and the pathological feature into a postoperative recurrence risk prediction model to obtain a recurrence risk score. The present invention improves the sensitivity and specificity of preoperative classification through specially constructed multimodal image classification models and recurrence risk prediction models, and provides accurate individualized management for postoperative patients.

Description

Thyroid tumor classification and recurrence risk prediction method based on deep learning

Technical Field

The invention relates to the field of medical image processing, in particular to a thyroid tumor classification and recurrence risk prediction method based on deep learning.

Background

Thyroid follicular carcinoma (FTC), a highly invasive and clinically occult subtype of thyroid carcinoma, has limitations in that conventional imaging diagnostic methods, such as ultrasound and enhanced CT, have low specificity and cannot effectively capture dynamic micro-blood flow characteristics, resulting in difficulty in early diagnosis of FTC. Existing pathological diagnostic methods such as fine needle aspiration biopsy (FNA) also suffer from insufficient accuracy, especially for the identification of FTC from follicular adenoma (FTA). The TNM staging system is used to describe the range of primary tumors (T), regional lymph nodes (N) and distant metastases (M) of malignant tumors, which are widely used for prognostic evaluation, but which are largely dependent on the anatomical features of the tumor, with about 20% of FTC patients already present with occult metastases at the time of initial diagnosis, but TNM staging relies on imaging detection, lacking accurate prediction of recurrence risk.

The existing thyroid tumor type classification and postoperative recurrence risk prediction method mainly comprises the following steps of (1) a deep learning diagnosis model based on a single-mode image, wherein a part of deep learning models based on ultrasonic images are researched and developed, and the benign and malignant thyroid nodules are classified through a convolutional neural network. However, such methods only use single-mode ultrasound images and cannot comprehensively analyze the multidimensional characteristics of tumors. In addition, enhanced CT images have also been used to develop risk prediction models, but the diagnostic performance is limited due to the lack of characterization of tumor dynamic micro-blood flow by CT images. (2) A part of researches start to try to fuse Contrast Enhanced Ultrasound (CEUS) and CT image feature enhancement in malignant tumors such as livers and the like by using a model with simple fusion of the multi-modal images, and the multi-modal model is constructed by using a simple feature stitching mode. However, the fusion mode in these studies fails to fully exploit the complementarity of different modal features, and it is difficult to dynamically adjust the focus of the model according to the importance of the features, resulting in limited improvement of diagnostic performance. (3) The single analysis of images and pathology data, most of the studies on FTC prognosis only analyze on postoperative pathology data, failing to establish an effective link between preoperative images and postoperative pathology data. The use of pre-operative imaging features and post-operative pathology features to fracture limits the depth and breadth of the joint analysis. In addition, the existing thyroid tumor type classification and postoperative recurrence risk prediction methods lack an interpretability support, and the application of the deep learning model in medical image analysis has a problem of 'black box', which means that the decision process of the model lacks transparency for doctors. Although these models can make accurate diagnoses or risk predictions through the training of large amounts of data, it is often difficult for a physician to understand why a model makes a particular decision. Especially in complex clinical cases, doctors tend to rely on their own expertise rather than on model-based predictions entirely, making it difficult to actually play its supportive role in clinical decisions.

Disclosure of Invention

In the prior art, a thyroid tumor classification model based on deep learning only utilizes a single-mode ultrasonic image, cannot comprehensively analyze multidimensional features of tumors, or utilizes a simple feature splicing mode to construct a multi-mode model, cannot fully mine complementarity of different mode features, so that the improvement of diagnosis performance is limited, the existing postoperative recurrence risk prediction model only analyzes postoperative pathology data, and cannot establish effective connection between preoperative images and postoperative pathology data.

In one aspect of the invention, a deep learning-based thyroid tumor classification and recurrence risk prediction method is provided, which comprises the following steps of S1, collecting and preprocessing multi-mode data related to a thyroid follicular carcinoma FTC or follicular gonadoma FTA, wherein the multi-mode data comprise dynamic micro blood flow data, anatomical data and full-section pathological image WSI data, S2, respectively adopting different feature extraction methods to perform feature extraction aiming at the preprocessed dynamic micro blood flow data, anatomical data and full-section pathological image WSI data, obtaining a dynamic micro blood flow feature F_CEUS, an anatomical feature F_CT and a pathological feature F_path, S3, fusing and optimizing the dynamic micro blood flow feature F_CEUS, the anatomical feature F_CT and the pathological feature F_path, obtaining a first optimized fusion feature F_fused_timed, fusing and optimizing the dynamic micro blood flow feature F_CEUS and the anatomical feature F_CT, obtaining a second optimized fusion feature F_temporal, S4, and inputting the first optimized fusion feature F_CT and the pathological feature F_path into a classification model to obtain a recurrence risk prediction model.

More specifically, in the aspect, the step S1 of collecting the multi-mode data comprises the steps of obtaining dynamic micro blood flow data through contrast enhanced ultrasound CEUS by adopting a time sequence collecting technology, obtaining anatomical data through enhanced CT by adopting a multi-phase scanning and iterative reconstruction algorithm, obtaining full-slice pathological image WSI data through a digital scanner, the step S1 of preprocessing the multi-mode data comprises the steps of applying adaptive median filtering and wavelet transformation to the dynamic micro blood flow data to obtain first dynamic micro blood flow data, applying non-local mean denoising to the anatomical data to obtain first anatomical data, applying a mutual information maximizing algorithm to the first dynamic micro blood flow data and the first anatomical data to obtain dynamic micro blood flow data and anatomical data with sub-pixel level spatial alignment, applying pyramid blocking strategy to the full-slice pathological image WSI data, respectively extracting pixel blocks of the full-slice pathological image WSI under different amplification factors, and applying multi-scale feature fusion to the pixel blocks to obtain fused full-slice image WSI data.

More specifically, in the above aspect, the feature extraction of the preprocessed multi-mode data in step S2 includes performing spatial feature extraction and temporal feature modeling on the preprocessed dynamic micro-blood flow data by using a CNN-GRU hybrid architecture to obtain a dynamic micro-blood flow feature f_ceus associated with the dynamic micro-blood flow data, performing multi-level feature extraction on the preprocessed anatomical data by using a pretraining ResNet and fusing the multi-level scale features by using a pyramid pooling module PPM to obtain an anatomical feature f_ct associated with the anatomical data, dividing the preprocessed full-slice pathological image WSI into N tiles, processing each tile by using a InceptionResNetV model to obtain a corresponding example feature, then calculating the attention weight of each tile by using an attention weight network, and performing weighted aggregation on all the example features according to the calculated attention weight to obtain a pathological feature f_path associated with the full-slice pathological image WSI data.

More specifically, in the above aspect, in step S3, the dynamic micro-blood flow feature F_CEUS, the anatomical feature F_CT, and the pathological feature F_path are fused and optimized to obtain a first optimized fused feature F_fused_optimized, comprising generating a first initial weight vector for the dynamic micro-blood flow feature F_CEUS, the anatomical feature F_CT, and the pathological feature F_path using a multi-headed self-attention mechanismCombining feature quality assessment and task association analysis to the first initial weight vectorPerforming adaptive adjustment to obtain a first initial weight vector after the first adjustmentUtilizing a task-aware dynamic weight adjustment mechanism to adjust the first initial weight vector after the first timeDynamically adjusting to obtain a first initial weight vector after the second adjustmentFor the first initial weight vector after the second adjustmentCarrying out normalization and adopting task-sensitive self-adaptive adjustment to obtain a first weight vector [ alpha, beta, gamma ], wherein alpha+beta+gamma=1, and carrying out weighted fusion on the dynamic micro blood flow characteristic F_CEUS, the anatomical characteristic F_CT and the pathological characteristic F_path based on the first weight vector [ alpha, beta, gamma ] to obtain a first fusion characteristic F_fused, wherein the first fusion characteristic F_fused is shown in the following formula:

F_fused=α·F_CEUS+β·F_CT+γ·F_path

And then optimizing the first fusion characteristic F_fused by a transducer coder to obtain a first optimized fusion characteristic F_fused_optimized.

More specifically, in the above aspect, in step S3, the dynamic micro-blood flow feature F_CEUS and the anatomical feature F_CT are fused and optimized to obtain a second optimized fused feature F_temporal, comprising computing a feature similarity matrix using a multi-head attention mechanism for the dynamic micro-blood flow feature F_CEUS and the anatomical feature F_CT, generating a second initial weight vectorA task-aware dynamic weight adjustment mechanism is adopted for the second initial weight vectorAdjusting to obtain an adjusted second initial weight vectorFor the adjusted second initial weight vectorNormalization is carried out to obtain a second weight vector [ alpha ', beta' ], wherein alpha '+beta' =1, and the dynamic micro-blood flow feature F_CEUS and the anatomical feature F_CT are subjected to weighted fusion based on the second weight vector [ alpha ', beta' ] to obtain a second fusion feature F_fused ', wherein the second fusion feature F_fused' is shown in the following formula:

F_fused'=α'·F_CEUS+β'·F_CT;

introducing a residual connection to obtain an enhanced feature f_enhanced:

F_enhanced = F_fused' + Dropout(FC(F_fused'))

Wherein FC is a fully connected layer, dropout is a random deactivation operation, and FC is a fully connected layer;

normalizing the enhanced feature f_enhanced application layer to obtain a normalized enhanced feature f_normalized:

F_normalized = LayerNorm(F_enhanced)

inputting the normalized enhancement feature f_normalized into the attention enhanced GRU network to obtain a second optimized fusion feature f_temporal:

F_temporal = GRU(F_normalized)。

More specifically, in the above aspect, the pre-operative classification model in step S4 is constructed by constructing an initial pre-operative classification model by adopting a dual-branch fully-connected network structure, collecting multi-modal data related to thyroid follicular carcinoma FTC or follicular gonadoma FTA as a dataset, dividing the dataset into a first training set, a first verification set and a first test set according to a predetermined proportion by adopting hierarchical sampling, training the initial pre-operative classification model by using the first training set, evaluating the performance of the initial pre-operative classification model by using the first verification set in the training process, and adjusting parameters of the initial pre-operative classification model by adopting a multi-stage migration learning strategy, a cross entropy loss function and an L2 regularization term, a dynamic fine tuning mechanism based on the characteristics of CEUS equipment and CT equipment and an adaptive learning rate adjustment, obtaining the pre-operative classification model after the training is completed, and evaluating the performance of the pre-operative classification model by using the first test set.

More specifically, in the aspect, the postoperative recurrence risk prediction model in the step S4 is constructed by adopting a multi-mode contrast learning framework to construct an initial postoperative recurrence risk prediction model, collecting multi-mode data related to thyroid follicular carcinoma (FTC) and follicular adenoma (FTA) as data sets, dividing the data sets into a second training set, a second verification set and a second test set according to a preset proportion by adopting hierarchical sampling, training the initial postoperative recurrence risk prediction model by using the second training set, evaluating the performance of the initial postoperative recurrence risk prediction model by using the second verification set, adjusting parameters of the initial postoperative recurrence risk prediction model by adopting field-invariant feature learning based on countermeasure training and adaptive early shutdown based on the performance of the verification set, obtaining the postoperative recurrence risk prediction model after training, and evaluating the performance of the postoperative recurrence risk prediction model by using the first test set.

In a further embodiment of the above embodiments, a dual-branch contrast learning network and risk scoring mechanism are further employed to enhance performance of the postoperative recurrence risk prediction model.

In the aspects described above, in further embodiments, thermodynamic diagrams are generated using Grad-CAM optimization to improve the clarity and reliability of the visualized results based on thyroid tumor type classification probability and recurrence risk score, and SHAP feature contribution analysis is employed to quantify the contribution of various features to the prognosis results.

In another aspect of the invention, an electronic device is provided, comprising a processor and a memory, characterized in that a computer program is stored in the memory, which when executed by the processor causes the processor to perform the deep learning based thyroid tumor classification and recurrence risk prediction method of the above aspect of the invention.

The invention has the beneficial effects that:

1. The invention not only utilizes the image data of a single mode, but also fuses the data of three modes of dynamic micro blood flow data (CEUS), anatomical data (CT) and full-section pathological image (WSI), thereby providing more comprehensive tumor characteristic information.

2. For data of different modes, the invention adopts a specific feature extraction method, namely, dynamic micro-blood flow features of CEUS are extracted by using a CNN-GRU mixed architecture, multi-level and multi-scale anatomical features of CT are extracted by using a pre-training ResNet and Pyramid Pooling Module (PPM), and pathological features of WSI are extracted by using InceptionResNetV and a concentration weight network.

3. The invention provides a self-attention-based dynamic weighted fusion mechanism for fusion of image data (CEUS and CT) and pathological data, and the weight of each modal feature is adaptively adjusted by combining feature quality assessment and task association analysis. Meanwhile, the transform encoder is utilized to carry out global optimization of the depth self-attention network, so that the expression capability of fusion features is further improved, and the problem that the mode complementarity cannot be mined in the traditional simple splicing fusion mode is solved.

4. The invention adopts the attention mechanism based on the feature similarity matrix to generate the weight aiming at the fusion of the image data (CEUS and CT), combines with residual connection, layer normalization and attention-enhanced GRU network, strengthens the space-time feature interaction, captures the synergistic effect of CEUS time sequence evolution and CT space features, and improves the feature fusion efficiency among image modes.

5. The invention respectively builds different models aiming at preoperative classification and postoperative recurrence risk prediction. The preoperative classification model adopts a double-branch full-connection network structure, combines a multi-stage migration learning strategy and a dynamic fine adjustment mechanism, improves the adaptability of the model in different equipment environments, dynamically adjusts model parameters based on equipment characteristics (resolution, frame rate and the like), combines a cyclic attenuation learning rate strategy, avoids training from being trapped into local optimum, and improves convergence speed and cross-equipment performance stability. The postoperative recurrence risk prediction model is based on a multi-mode contrast learning framework, a double-branch contrast learning network is introduced, image features and pathological features of similar samples (recurrence/non-recurrence) are forced to be close in a feature space, heterogeneous samples are far away, feature discriminant and modal semantic alignment degree are improved, recurrence risk AUC is improved by 7-13%, continuous risk scores are output through weighted mixed loss (binary cross entropy+contrast loss), and the influence of equipment difference on model performance is reduced by combining with domain invariant feature learning based on countermeasure training, so that cross-equipment performance retention rate is improved to 85-92%.

6. The invention adopts the self-adaptive learning rate adjustment and early-stop strategy, effectively avoids the problem of over-fitting in the model construction process, reduces unnecessary training time, and improves the training efficiency and performance of the model.

7. According to the invention, a thermodynamic diagram is generated through Grad-CAM optimization, time dimension weights are superimposed in a CEUS thermodynamic diagram, a multi-mode fusion visualization is realized by combining CT and pathological features, the transparency of a model decision process is improved, a doctor is helped to understand a key diagnosis area, contributions of various mode features (such as CEUS peak flow rate, CT boundary ambiguity and pathological nuclear division count) to a prognosis result are quantized through SHAP feature contribution analysis, a downloadable explanatory report is generated, the problem of a deep learning model 'black box' is solved, and the trust degree of a clinician on model prediction is enhanced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention, are incorporated in and constitute a part of this specification. The drawings and their description are illustrative of the invention and are not to be construed as unduly limiting the invention.

In the drawings:

fig. 1 is a flowchart of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Fig. 2 is a flow chart of data acquisition and preprocessing of the deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Fig. 3 is a feature extraction block diagram of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Fig. 4 is a schematic diagram of tri-modal feature fusion and optimization of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Fig. 5 is a schematic diagram of bimodal feature fusion and optimization of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Fig. 6 is a classification and prediction flow chart of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Fig. 7 is a flow chart of an explanatory analysis of the deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention.

Detailed Description

Referring to fig. 1, fig. 1 is a flowchart of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. The deep learning-based thyroid tumor classification and recurrence risk prediction method comprises the steps of S1, collecting and preprocessing multi-mode data related to thyroid follicular carcinoma FTC or follicular gonadoma FTA, wherein the multi-mode data comprise dynamic micro blood flow data, anatomical data and full-slice pathological image WSI data, S2, respectively adopting different feature extraction methods for the preprocessed dynamic micro blood flow data, anatomical data and WSI data to conduct feature extraction to obtain dynamic micro blood flow features F_CEUS, anatomical features F_CT and pathological features F_path, S3, generating weight vectors aiming at the dynamic micro blood flow features F_CEUS, anatomical features F_CT and pathological features F_path by using a multi-head self-attention mechanism, adopting an adaptive weight adjustment mechanism combining feature quality evaluation and task relevance analysis to conduct weighted fusion, utilizing a trans-former encoder to obtain first optimized fusion features F_optimal, adopting a correlation matrix between dynamic micro features F_CEUS and anatomical features F_CT to obtain a dynamic micro blood flow features F_CEUS, utilizing a correlation feature F_CT and a pathological features F_path to conduct weighted fusion, utilizing a second weighted fusion model to obtain a recurrent risk prediction model, and utilizing a first functional model to conduct weighted fusion, and a second functional model to obtain a recurrent risk prediction model, and a weighted fusion model before the first functional model and a second functional model are input to a weighted fusion model.

The pretreatment, feature extraction, fusion and optimization, model training and construction and interpretation enhancement technology in the thyroid tumor classification and recurrence risk prediction method based on deep learning are implemented through Python. In this specification, the dynamic micro-blood flow feature f_ceus may also be referred to as CEUS feature, and the anatomical feature f_ct may also be referred to as CT feature.

Referring to fig. 2, a flow chart of data acquisition and preprocessing according to the present invention is shown. Mainly comprises the following steps:

1. And (3) acquiring CEUS data, namely acquiring CEUS time sequence images and capturing arterial period and venous period dynamic characteristics.

2. CT data is acquired, namely three-phase enhanced CT images are acquired, wherein the three-phase enhanced CT images comprise an arterial phase, a venous phase and a delay phase.

3. And (3) acquiring pathological section data, namely acquiring H & E stained pathological section full-section images and carrying out block segmentation.

4. And (3) data cleaning, namely removing an invalid blank area in the image and filling in lost or incomplete data.

5. Format conversion, namely saving the CEUS data as a standard DICOM format, and converting the pathological section into a JPEG format, so that deep learning processing is facilitated.

6. Normalization processing, which is to normalize brightness and contrast and unify resolution, for example, CEUS is 640×480, CT is 256×256, and pathological section is 256×256.

7. Spatial registration, namely using Elastix to spatially align the CT and CEUS images and registering multi-mode data to ensure consistency of the images in spatial positions.

8. And finishing preprocessing, namely outputting the multi-mode data after normalization and registration.

More specifically, in one embodiment of the present invention, regarding the data acquisition portion, for the multimode characteristics of the FTC, the following standardized acquisition procedure is proposed:

(1) Dynamic micro blood flow feature Capture (CEUS) is to accurately divide arterial period (30-60 seconds) and venous period (60-120 seconds) by adopting a time sequence acquisition technology, capture the space-time dynamic feature of tumor micro blood flow perfusion by a high-frequency ultrasonic probe (frequency is more than or equal to 12 MHz), and store the data in DICOM format (resolution 640X 480, frame rate 15 fps).

(2) The high resolution anatomy feature extraction (enhanced CT) adopts multi-phase scanning (arterial phase, venous phase and delay phase), the thickness of the layer is 1.0-1.5mm, and the noise is reduced by combining an iterative reconstruction algorithm, so that the fine depiction of the three-dimensional anatomy structure is ensured.

(3) Digital processing of pathology data full-slice pathology image (WSI) is digitized by a digital scanner (resolution 0.25 μm/pixel), combined with tissue region screening algorithm (Otsu thresholding) using a blocking process (256×256 pixels), excluding blank region interference.

(4) And the data storage and cross-platform compatibility is that a standardized data warehouse is constructed, CEUS (DICOM), CT (DICOM) and pathology image (JPEG) are uniformly mapped to the same space coordinate system, and the lossless transmission and cross-equipment analysis of multi-center data are supported.

Through the standardized acquisition flow, the standardized acquisition of CEUS, CT and pathological data can be ensured, the accurate acquisition of dynamic micro-blood flow characteristics, high-resolution anatomical characteristics and pathological characteristics is realized, and unified storage and cross-platform compatibility of multi-mode data are supported by constructing a standardized data warehouse, so that nondestructive transmission and analysis of multi-center data are realized.

In one embodiment of the invention, regarding data preprocessing, the following operations are employed:

(1) Spatial registration and dynamic alignment, namely adopting Elastix registration frames (based on B spline elastic transformation), realizing sub-pixel level spatial alignment of CEUS and enhanced CT images through a mutual information maximization algorithm, and eliminating artifacts caused by respiratory motion and equipment offset.

(2) And (3) multi-scale optimization of pathological data, namely, aiming at WSI data, a pyramid blocking strategy is provided, namely 256×256 pixel blocks are respectively extracted under 5×, 10×and20× magnification, the pixel blocks are input into a InceptionResNetV multi-scale feature fusion module, 512-dimensional feature vectors containing cross-scale information of cell morphology and tissue structure are output, and cross-scale representation of the cell morphology and tissue structure is realized.

(3) Noise suppression and enhancement processing, namely, adopting adaptive median filtering (window size is 3 multiplied by 3) to the CEUS data and combining wavelet transformation (Daubechies basis function) to effectively separate blood flow signals from background noise, and retaining anatomical details of CT data through non-local mean denoising (NL-Means) and simultaneously suppressing metal artifacts.

Through the data preprocessing operation, sub-pixel level space alignment of CEUS and CT images is realized, artifacts caused by respiratory motion and equipment offset are eliminated, multi-scale optimization is carried out on pathological data, cross-scale representation of cell morphology and tissue structure is extracted, CEUS blood flow signals and background noise are effectively separated through noise suppression and enhancement processing, and CT anatomical details are reserved.

Referring to fig. 3, a feature extraction block diagram of the deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention is shown. Mainly comprises the following parts:

CEUS dynamic micro-blood flow feature extraction spatial features of single frame CEUS images are extracted using Convolutional Neural Networks (CNNs). The time series features are captured by a gated loop unit (GRU) to generate a micro-blood flow feature vector (e.g., 128 dimensions).

And extracting static anatomical features of CT, namely extracting anatomical features such as tumor boundary ambiguity, capsule invasion and the like by utilizing ResNet. A Pyramid Pooling Module (PPM) is used to enhance the combination of global and local features to generate CT feature vectors (e.g., 256 dimensions).

3. Pathological feature extraction, namely dividing a full-slice pathological image into small blocks, and cleaning. Cell morphology was extracted using InceptionResNetV a 2. Based on a multi-instance learning (MILs) framework, the patch features are weighted and aggregated to generate a pathology feature vector (e.g., 512 dimensions).

More specifically, the extraction of dynamic micro-blood flow features, anatomical features and pathological features according to the invention is achieved by:

1. Extraction of dynamic micro-blood flow characteristics F_CEUS of CEUS

Adopting a CNN-GRU hybrid architecture design, comprising:

Spatial feature extraction, namely constructing a 3-layer convolution network (Conv3×3, reLU, batchNorm), and outputting 128-dimensional feature vectors for each CEUS image;

time series feature modeling, capturing hemodynamic changes from arterial phase to venous phase with a bi-directional GRU (hidden layer 64 unit), and outputting dynamic features (e.g., 128 dimensions).

Extracting mathematical expression of the CNN spatial features:

F_spatial^i = CNN(I_i)

(wherein I_i represents the ith frame CEUS image, F_spatial ≡i represents the extracted 128-dimensional spatial feature vector.)

Modeling GRU time sequence characteristics:

h_t = GRU(F_spatial^t, h_{t-1})

r_t = σ(W_r·[h_{t-1}, F_spatial^t])

z_t = σ(W_z·[h_{t-1}, F_spatial^t])

ñ_t = tanh(W·[r_t * h_{t-1}, F_spatial^t])

h_t = (1 - z_t) * h_{t-1} + z_t * ñ_t

Wherein h_t is a hidden state at time T, r_t is a reset gate, z_t is an update gate, sigma is a sigmoid activation function, a dynamic feature f_ceus=h_t is finally obtained, T is the last time step of the sequence, ñ _t is a candidate hidden state in the GRU, and represents a new candidate state calculated based on current input and history information after reset in the time step T, and W is a weight matrix for calculating the candidate hidden state in the GRU, and is used for mapping the history state controlled by the reset gate and the current input to a candidate state space.

In summary, through the CNN-GRU hybrid architecture, the hemodynamic changes from arterial phase to venous phase are captured, and 128-dimensional dynamic micro-blood flow characteristics are output.

2. Enhanced CT anatomical feature F_CT extraction

Using ResNet-PPM joint optimization, multi-level features (Conv 1 to Conv 5) are extracted based on pre-training ResNet50, 4-level scale features (1×1,2×2, 3×3, 6×6 pooling) are fused by a Pyramid Pooling Module (PPM), and 256-dimensional global-local joint features are output. Because the data scale of the enhanced CT is smaller, if the random initialized ResNet is directly used for extracting the features, the problems of parameter redundancy, low feature learning efficiency, difficult optimization and the like can be caused, and the features are extremely poor in performance in a small data scene, so that the method adopts the pre-trained ResNet to migrate the general feature representation learned in the large data to the small data set through migration learning, and the dependence of the model on target data is remarkably reduced. Wherein, the

ResNet50 feature extraction:

F_conv_i = ResNet50_layer_i(I_CT), i ∈ {1,2,3,4,5}

wherein i_ct represents a CT input image, and f_conv_i represents a ResNet50 0 ith layer convolution output feature map.

Pyramid Pooling Module (PPM):

F_pool_j = Pool_j(F_conv_5), j ∈ {1×1, 2×2, 3×3, 6×6}

F_ppm_j = Conv(Upsample(F_pool_j))

F_CT = Concat(F_ppm_1, F_ppm_2, F_ppm_3, F_ppm_4)

The F_pool_j is a multi-scale feature map obtained by carrying out different scale pooling operations (1×1,2×2, 3×3 and 6×6) on a ResNet50 0 layer 5 feature map, wherein F_ppm_j is a feature map obtained by carrying out upsampling and convolution processing on the pooled feature F_pool_j and restoring to the original size for subsequent splicing to form a final 256-dimensional CT feature vector, and F_CT is a CT feature vector with 256 dimensions and contains multi-scale feature information.

In summary, based on ResNet-PPM joint optimization, multi-scale anatomical features are extracted, and 256-dimensional global-local joint features are output.

3. Extraction of pathological features F_path

Weak supervised feature learning of pathology images (MIL framework) is adopted, wherein WSI is divided into N blocks (256×256) through multi-instance learning and attention mechanisms, 512-dimensional features are extracted through InceptionResNetV2, and an instance-level feature pool is constructed. And designing an attention weight network (full connection layer+sigmoid), dynamically screening key image blocks (such as vascular infiltration areas) related to the FTC, and outputting the aggregated 512-dimensional pathological features.

Example-level feature extraction:

f_i = InceptionResNetV2(p_i), i ∈ {1,2,...,N}

Where p_i represents the ith pathology tile and f_i represents the corresponding 512-dimensional instance feature.

Attention weight calculation:

w_i = sigmoid(W·f_i + b)

Where W and b are a learnable weight matrix and bias, w_i ε [0,1] represents the attention weight of the ith tile.

And (3) weighting and aggregating:

F_path = Σ(w_i·f_i) / Σ(w_i)

Wherein F_path is the final 512-dimensional pathology, representing a global representation of the pathology image.

In summary, key pathological patches related to FTC are screened through a multi-instance learning and attention mechanism, and 512-dimensional pathological features are output.

Referring to fig. 4, fig. 4 is a schematic diagram of tri-modal feature fusion and optimization of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. It mainly comprises the following parts:

1. And the self-adaptive weighting mechanism dynamically calculates the weights of the dynamic micro blood flow characteristic F_CEUS, the anatomical characteristic F_CT and the pathological characteristic F_path, and adjusts the contribution proportion of each modal characteristic according to task requirements.

2. Deep self-attention network-the global dependency relationship among modalities is captured through a multi-head (e.g. 8-head) attention mechanism, and fusion characteristic expression is optimized.

3. And optimizing the inter-mode features, namely further improving the expressive power of the fusion features through feature interaction and redundancy elimination processing.

4. And outputting the fusion characteristics, namely outputting high-dimensional optimized fusion characteristics for subsequent classification and prediction.

Specifically, the adaptive fusion and optimization of the trimodal features of the present invention includes the following operations:

1. self-attention based dynamic weighted fusion

1.1 Adopting a weight distribution algorithm, inputting CEUS (128-dimensional), CT (256-dimensional) and pathology (512-dimensional) characteristics, calculating a correlation matrix among modes through a multi-head self-attention mechanism (head number=8), and generating a first initial weight vector。

First a query (Q), key (K) and value (V) matrix is calculated:

Q = W_Q·F_concat

K = W_K·F_concat

V = W_V·F_concat

Wherein, f_concat= [ f_ceus, f_ct, f_path ], is a spliced feature vector.

The attention score is then calculated:

Attention(Q, K, V) = softmax(QK^T/√d_k)V

where d_k is the dimension of the key vector in the attention mechanism for scaling the attention score to prevent the gradient from disappearing.

Multi-head attention splice (head number=8) was then performed:

MultiHead(F_concat) = Concat(head_1, head_2, ..., head_8)·W_O

head_i = Attention(Q_i, K_i, V_i)

wherein W_O is the output weight matrix of the multi-head attention, and is used for mapping the spliced multi-head attention result to the final output feature space.

A first initial weight vector is then generated from the multi-headed attention result:

= softmax(W_modal·MultiHead(F_concat))

Wherein W_ modal is a modal weight generation matrix, and the output result of the multi-head attention is converted into weight coefficients of three modes 。

1.2 Adaptive adjustment of the first initial weight vector in combination with feature quality assessment and task relevance analysis:

dynamically adjusting a first initial weight vector based on feature quality assessment and task relevance analysis :

1.2.1 Calculating a quality assessment score:

quality_CEUS = w_q·Quality(F_CEUS) + w_r·Relevance(F_CEUS,task)

quality_CT = w_q·Quality(F_CT) + w_r·Relevance(F_CT,task)

quality_path = w_q·Quality(F_path) + w_r·Relevance(F_path,task)

wherein quality_CEUS is the Quality score of CEUS characteristics and is calculated by the weighted combination of characteristic Quality and task Relevance, w_q is a Quality weight coefficient, quality function evaluates the Quality of the current characteristics, w_r is a Relevance weight coefficient, quality function evaluates the Relevance degree of the characteristics and the current tasks, quality_CT is the Quality score of CT characteristics, quality_path is the Quality score of pathological characteristics

1.2.2 Normalized mass fraction:

[q_α, q_β, q_γ] = softmax([quality_CEUS, quality_CT, quality_path])

where q_α, q_β, q_γ are the quality score of the normalized CEUS feature, the quality score of the normalized CT feature, and the quality score of the normalized pathology feature, respectively.

1.2.3 Adjustment is made based on the initial weights:

α₁_temp = λ·α₀+ (1-λ)·q_α

β₁_temp = λ·β₀+ (1-λ)·q_β

γ₁_temp = λ·γ₀+ (1-λ)·q_γ

Where λ=0.7, is the initial weight retention coefficient.

1.2.4 Renormalization:

= softmax([α₁_temp, β₁_temp, γ₁_temp])

Wherein, the Is the first initial weight vector after the first adjustment.

2. Task perception weight optimization:

L_weight = L_task + λ·R(α₁, β₁, γ₁)

Where l_task is task penalty (classification or regression) and R is a weight regularization term.

2.1 Obtaining a first initial weight vector after second adjustment through task self-adaptive updating based on gradient:

α₂= α₁- η_α·L_weight/α₁

β₂= β₁- η_β·L_weight/β₁

γ₂= γ₁- η_γ·L_weight/γ₁

Where η_α is a learning rate corresponding to the weight α ₁ for controlling the update step size of the weight parameter α ₁, η_β is a learning rate corresponding to the weight β ₁ for controlling the update step size of the weight parameter β ₁, and η_γ is a learning rate corresponding to the weight γ ₁ for controlling the update step size of the weight parameter γ ₁.

Finally, the first weight vector [ α, β, γ ] =softmax ])。

2.2 Task-sensitive adaptive adjustment:

increasing ηα to increase CEUS feature weight when diagnostic tasks are prioritized

Increasing eta gamma to increase pathological feature weight when predicting task priority

Real-time monitoring of verification set performance and dynamic adjustment of final weight vector

In practical applications, the diagnostic tasks are typically larger (biased towards CEUS) for α≡0.4, while the recurrence prediction tasks are larger (biased towards pathological features) for γ≡0.5. The weights are automatically adapted to the device characteristics and data quality of different institutions.

Finally, the dynamic micro-blood flow feature F_CEUS, the anatomical feature F_CT and the pathological feature F_path are fused by a first weight vector to obtain a first fused feature F_fused, wherein F_fused=alpha.F_CEUS+beta.F_CT+gamma.F_path, (alpha+beta+gamma=1)

In conclusion, through a multi-head self-attention mechanism, weights of CEUS, CT and pathological features are dynamically distributed, and accurate fusion of cross-modal features is achieved.

2. Global optimization of deep self-attention network

Using a transducer-based feature interaction, inputting the fused features (512 dimensions) into a 6-layer transducer encoder (head number=8, feed-forward layer dimension=2048), capturing cross-modal global dependencies by position coding and layer normalization (LayerNorm);

The specific process is as follows:

X' = LayerNorm(X + MultiHeadAttention(X))

X'' = LayerNorm(X' + FFN(X'))

Wherein MultiHeadAttention (X) = Concat (head ₁,...,head₈) w≡o

head_i = Attention(XW_i^Q, XW_i^K, XW_i^V)

Attention(Q,K,V) = softmax(QK^T/√d_k)V

The method comprises the steps of performing multi-head self-attention and residual connection on X 'and then performing layer normalization on the X', wherein X '' is a final output characteristic representation after feedforward network FFN processing and secondary residual connection and layer normalization, W O is an output projection matrix of a multi-head attention mechanism and is used for mapping spliced multi-head attention results to final output dimensions, XW_i Q is a query vector Q_i obtained after the input characteristic X is transformed by a query weight matrix W_i_Q of an ith head, XW_i_K is a key vector K_i obtained after the input characteristic X is transformed by a key weight matrix W_i_k of the ith head, and XW_i_v is a value vector V_i obtained after the input characteristic X is transformed by a value weight matrix W_i_v of the ith head.

Position coding:

PE(pos, 2i) = sin(pos/10000^(2i/d_model))

PE(pos, 2i+1) = cos(pos/10000^(2i/d_model))

Transformer encoder layer:

x' = LayerNorm(x + MultiHeadAttention(x, x, x))

output = LayerNorm(x' + FFN(x'))

Wherein FFN is feed forward network:

FFN(x) = max(0, xW_1 + b_1)W_2 + b_2

by stacking 6 layers of the transducer encoder, the final features are expressed as:

F_fused_optimized = Transformer_6(Transformer_5(...Transformer_1(F_fused)...))

to sum up, cross-modal global dependencies are captured by a transducer encoder, outputting 512-dimensional optimization features.

Referring to fig. 5, fig. 5 is a schematic diagram of bimodal feature fusion and optimization of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. It is worth noting that in the case of tri-modal fusion, because of large semantic difference between modalities, a transform encoder and a position encoder are adopted to establish a global context relation for deep cross-modal feature interaction, and the dual-modal fusion mainly processes two relatively homogeneous image data of CEUS and CT, and more attention is paid to the time sequence evolution trend of CEUS, so that residual connection and GRU network are adopted to capture time dependence.

Specifically, the bimodal adaptive fusion and optimization of the present invention includes the following operations:

1. Attention-based weight distribution

Inputting CEUS (128-dimensional) and CT (256-dimensional) features, calculating feature similarity matrix, and generating a second initial weight vector。

The specific process is as follows:

Cross-modality similarity score is calculated:

similarity_matrix = F_CEUS·F_CT^T/√d

cross_similarity = mean(softmax(similarity_matrix))

Where F_CEUS. F_CT≡represents a feature dot product operation, d is the feature dimension, softmax can normalize similarity to probability distribution

Modality autocorrelation calculation:

self_sim_CEUS = mean(F_CEUS·F_CEUS^T/√128)

self_sim_CT = mean(F_CT·F_CT^T/√256)

weight raw score calculation:

w_CEUS_raw = cross_similarity * self_sim_CEUS

w_CT_raw = (1 - cross_similarity) * self_sim_CT

Weight vector normalization:

= softmax([w_CEUS_raw, w_CT_raw])

2. task aware dynamic weight adjustment

Weight optimization is based on gradient descent and task loss:

L_weight' = L_task' + λ'·R' (α₀', β₀')

Where l_task 'is the primary task loss function, λ' is the regularization coefficient, and R '(α ₀', β₀') is the weight regularization term. And monitoring the performance of the second verification set in real time, and dynamically fine-tuning the weight parameters.

2.1 Gradient-based weight update

Gradient update formula:

α₁' = α₀' - η_α'·L_weight/α₀'

β₁' = β₀' - η_β'·L_weight/β₀'

[α', β'] = softmax()

wherein:

η_α ', η_β ' are learning rate parameters of CEUS and CT modalities, respectively, α ₁'、β₁ ' is a temporary weight after gradient update, and α ', β ' is a normalized weight, i.e. a second weight.

The invention adopts a task self-adaptive adjustment strategy in the fusion process of the dynamic micro blood flow characteristic F_CEUS and the anatomical characteristic F_CT, and the task self-adaptive adjustment strategy is as follows:

when diagnostic tasks are prioritized:

Increasing η_α ' emphasizes the micro-blood flow characteristics of CEUS, typically weight distribution α ' ≡0.6, β ' ≡0.4.

When other analysis tasks are prioritized:

increasing η_β ' emphasizes anatomical detail features of CT, typical weight distributions are α ' ≡0.45, β ' ≡0.55.

The performance of the verification set can be monitored in real time, the weight parameters are dynamically and finely adjusted, the learning rate of eta alpha 'and eta beta' is automatically adjusted according to the performance of the verification set, and the optimal weight distribution of the model under different task scenes is ensured.

And finally fusing output to obtain a second fusion characteristic:

F_fused' = α'·F_CEUS + β'·F_CT

Where α '+β' =1.

3. Feature optimization and interaction enhancement

3.1 Residual connection and feature enhancement

Introducing residual connection avoids gradient disappearance, and enhances information flow:

F_enhanced = F_fused' + Dropout(FC(F_fused'))

Where F_enhanced is the enhanced feature, F_fused' is the second fusion feature, FC is the fully connected layer, dropout is a random deactivation operation, preventing overfitting.

And then unifying different modal characteristic distributions by applying layer normalization (Layer Normalization) to obtain normalized enhancement characteristic F_normalized:

F_normalized = LayerNorm(F_enhanced)

3.2, space-time dependency modeling

The normalized enhancement feature f_normalized is input into the attention enhanced GRU network:

F_temporal = GRU(F_normalized)

where GRU is the gating loop and F_temporal is the feature after capture timing dependence, i.e., the second optimized fusion feature.

In summary, the bimodal optimization fusion method of the invention strengthens the space-time feature interaction through a self-attention mechanism (head number=4) by capturing the synergistic effect of the CEUS time sequence feature and the CT space feature.

Fig. 6 is a classification and prediction flow chart of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. The thyroid tumor classification method is implemented by a preoperative classification model, and the recurrence risk prediction method is implemented by a postoperative recurrence risk prediction model. The construction of the pre-operative classification model and the post-operative recurrence risk prediction model is described below.

1. Construction of preoperative classification model

1.1 Data acquisition and Pre-processing

In one embodiment of the invention, multimodal data of a total of 1500 patients, including cases of thyroid follicular carcinoma (FTC) and follicular adenoma (FTA), were employed from multiple trimester hospitals using different brands and models of imaging equipment. The dataset was divided into training, validation and test sets at a rate of about 70% (1050 cases), 15% (225 cases), and hierarchical sampling was used to ensure consistent proportions of FTC and FTA in each subset. And then, carrying out standardized acquisition and processing on CEUS data, CT data and pathological data:

CEUS data, namely precisely dividing arterial period (30-60 seconds) and venous period (60-120 seconds) by adopting a time sequence acquisition technology, and capturing the space-time dynamic characteristics of tumor micro-blood perfusion by a high-frequency ultrasonic probe;

CT data, namely adopting multi-phase scanning, wherein the layer thickness is 1.0-1.5mm, and reducing noise by combining an iterative reconstruction algorithm;

pathological data the whole-slice pathological image is digitized by a digital scanner (resolution 0.25 μm/pixel) and a block processing and tissue region screening algorithm is adopted.

1.2 Training of Pre-operative Classification models

The preoperative classification model is trained by adopting a double-branch full-connection network structure, an input layer receives 512-dimensional fusion characteristics, a two-layer full-connection network (FC-128) is matched with a ReLU activation function to perform characteristic dimension reduction, and an output layer outputs classification probability of FTC and FTA through a Softmax function. Through the double-branch full-connection network, the accurate classification of tumor types is realized, and the classification probability is output.

The pre-operation classification model is constructed by adopting a multi-stage migration learning strategy, and a three-stage progressive training framework is utilized, wherein the three stages are pre-training, feature migration and fine adjustment in sequence. The input is source domain data (1050 cases of multi-center training data), the output is field-adaptive model parameters theta, and the optimization is carried out aiming at specific medical equipment environments.

The adjustment method of the model parameter theta is as follows:

First stage Pre-training

θ_pretrain = argmin_θ L_src(θ, D_src)

Wherein argmin_θ is an optimization operator that represents finding the optimal parameter θ that minimizes the objective function, and l_src is a source domain loss function that measures the prediction error of the model on the source domain dataset.

Second stage feature migration

θ_transfer = argmin_θ [L_src(θ, D_src) + λ_1·d_MMD(F_src, F_tgt)]

Where d_mmd is the maximum mean difference metric for reducing source domain and target domain feature distribution differences.

Third stage of fine tuning

θ_finetune = argmin_θ [L_tgt(θ, D_tgt) + λ_2·R(θ, θ_transfer)]

Where R is a regularization term, limiting parameter deviations from the pre-trained model are excessive.

The distribution difference of the source domain and the target domain is reduced through progressive migration, the adaptability of the model on new equipment is improved, and the accuracy rate on a verification set is improved by about 8-12%.

The pre-operation classification model is constructed by adopting a dynamic fine adjustment mechanism, and a self-adaptive parameter adjustment algorithm based on equipment characteristics is adopted. The CEUS and CT characteristic parameters, including imaging resolution, frame rate, contrast, etc., input to the target device and the output is a network weight optimized for the particular device characteristics.

The method for adjusting the network weight is as follows:

characterization of device characteristics:

v_device = Encoder(device_params)

And (3) generating condition parameters:

α_device = MLP(v_device)

Dynamic weight adjustment:

W_adjusted = W_base * (1 + α_device * Tanh(v_device))

Where w_base is the base weight and α_device is the device-specific adjustment factor.

By automatically adjusting model parameters according to equipment characteristics, the model can adapt to imaging characteristic differences of equipment of different manufacturers, and the cross-equipment performance attenuation is reduced by 20-35%.

The pre-operation classification model is constructed by adopting self-adaptive learning rate adjustment, which is a cyclic attenuation learning rate strategy based on verification set performance. The input is the performance index (accuracy, loss value) on the verification set, and the output is the dynamically adjusted learning value.

The learning rate adjustment method is as follows:

basic cycle learning rate:

η_t = η_min + 0.5*(η_max - η_min)*(1 + cos(t/T_cycle * π))

Wherein eta_t is a learning rate value in the T training step and is obtained by dynamic calculation through a cosine annealing formula, eta_min is a minimum learning rate value, training stagnation caused by too small learning rate is prevented, eta_max is a maximum learning rate value and can be dynamically adjusted according to performance in the training process, and T_cycle is the training step number of a complete learning rate period and determines the period length of cosine annealing.

And (3) performance monitoring:

If val_loss_t>val_loss_{t-1} * (1 - ε):

patience += 1

Else:

patience = 0

wherein val_loss_t is the loss value on the validation set at the t training step for monitoring model performance, patience is a patience counter, which records how many times the validation set performance is not improved continuously for triggering learning rate adjustment.

And (3) learning rate adjustment:

If patience>patience_threshold:

η_max = η_max * 0.5

patience = 0

by avoiding the training from falling into the local optimum, the upper limit of the learning rate is automatically reduced along with the training, the model convergence is promoted, and the average convergence speed is improved by about 30%.

The loss function of the preoperative classification model adopts a cross entropy loss function and an L2 regularization term:

L_diag = -Σ y_i log(P(y=i)) + λ||θ||^2

Where y_i is the true label, λ θ is the L2 regularization term, preventing overfitting.

2. Construction of postoperative recurrence risk prediction model

The postoperative recurrence risk prediction model is based on a multi-mode comparison learning framework, the second optimization fusion feature is subjected to sequential dependence extraction through a gating circulation unit GRU, the pathological feature is subjected to self-attention screening on a high-risk region, a comparison loss (Contrastive Loss) is adopted to draw a similar sample (recurrence vs. non-recurrence), and a spacing threshold delta=0.5.

The construction of the postoperative recurrence risk prediction model adopts a contrast learning strategy, and utilizes a double-branch contrast learning network to combine sample-to-sample similarity learning so as to learn discriminant features and align the semantics of the expansion mode. The input of the double-branch contrast learning network is the feature F_temporal extracted by the image branch and the feature F_path extracted by the pathological branch, and the output is the semantic alignment representation in the feature space. The process involves the following operations:

positive and negative sample pair construction:

y_ij=1 if samples i, j are either co-recurrent or co-non-recurrent

y_ij = 0 otherwise

And (3) calculating contrast loss:

L_contra = Σ_i Σ_j y_ij max(0, δ - cos(F_temporal^i, F_path^j)) + (1-y_ij) max(0, cos(F_temporal^i, F_path^j) - m)

Wherein δ=0.5 is a positive sample-to-similarity threshold, representing that the distance of the same class of sample features should be less than this value, m=0.2 is a negative sample-to-boundary interval, representing that the distance of different classes of sample features should be greater than this value, cos (,) represents cosine similarity, and y_ij is 1, representing that samples i and j are both recurring or non-recurring.

By promoting the model to learn the distinguishing characteristics of recurrent and non-recurrent cases, the semantic consistency between the image and the pathological mode is maintained, and the recurrence risk AUC is improved by 7-13%.

The construction of the postoperative recurrence risk prediction model adopts a risk scoring mechanism, which is a risk scoring regressor based on optimized fusion characteristics, is input into an image characteristic F_temporal and a pathological characteristic F_path after comparison and learning alignment, and is output into continuous recurrence risk scores (0-1), and the process involves the following operations:

1. Risk score calculation

Risk_score = sigmoid(W_risk·[F_temporal, F_path] + b_risk)

Wherein W_risk is a risk weight matrix, which is a trainable weight matrix for mapping the second optimized fusion feature F_temporal and the pathological feature F_path to the risk space, the dimension of W_risk depends on the dimension of the input feature and the dimension of the risk score, typically (d_combined×1), where d_combined is the dimension after the f_temporal and F_path are spliced, and the weight matrix is automatically learned by back propagation during training, and can be given a higher weight value to the feature more important for relapse prediction, the weight value reflects the contribution degree of different features to relapse risk prediction, b_risk is a risk bias term, is a bias parameter of the risk prediction model, is a scalar value, is used for adjusting the baseline level of the risk score, ensures that the model output has proper bias, is optimized together with W_risk during training, helps the model process the feature distribution imbalance, and can be understood as a default risk prediction value of the model without any feature input. Together, w_risk and b_risk form the linear part of the risk score, and the output is then mapped between 0 and1 by a sigmoid function, representing the probability of a patient's postoperative recurrence.

2. Weighted mixing loss

L_pred = BCE(Risk_score, y_recur) + α·L_contra

Wherein L_pred consists of two parts, a weighted sum of binary cross entropy loss (BCE) and contrast loss (L_ contra), BCE (risk_score, y_ recur) representing the binary cross entropy loss between the recurrence Risk prediction value (risk_score) and the true recurrence label (y_ recur), for directly optimizing the accuracy of the Risk prediction. Where y_ recur ε {0,1} indicates whether the patient relapsed (1 indicates relapsed, 0 indicates not relapsed). Alpha is the weight coefficient of contrast loss, used to balance the contribution of direct predicted loss with characteristic contrast loss. In practical training, α is typically set to between 0.3-0.5, which can be adjusted by validation set performance. L contra is a contrast loss component, as described above, for approximating the characteristic representation of the same class of samples (either both recurrent or both non-recurrent), pushing the characteristic representation of different classes of samples away, thereby enhancing the model's ability to distinguish between recurrent risks.

By means of a risk scoring mechanism, the model can provide finer granularity of risk prediction, rather than just a classification result, with an approximately 25% improvement in clinical decision support.

The construction of the postoperative recurrence risk prediction model adopts cross-equipment adaptability optimization, which is a field adaptation technology and is specifically characterized by field invariant feature learning based on countermeasure training. The input is a feature representation of the source domain data (training device) and the output is a domain-invariant feature representation. The process involves the following operations:

1. domain classifier training

L_domain = -Σ[d_i log(D(f_i)) + (1-d_i)log(1-D(f_i))]

2. Feature extractor training (challenge goal)

L_feature = L_task - λ_d·L_domain

Where D is a domain classifier, d_i is a domain label (0 represents a source domain, 1 represents a target domain), and f_i is a feature representation.

Through countermeasure training, the model learns the characteristic representation which is not influenced by the difference of the equipment, so that the model can reach the performance close to the original equipment without a large amount of data on the new equipment, and the cross-equipment performance retention rate is improved to 85-92%.

The post-operation recurrence risk prediction model is constructed by adopting an early-stop strategy, which is a self-adaptive early-stop mechanism based on verification set performance. The input is the risk prediction performance index (AUC (Area Under the Curve, area under curve), recall@high risk (recall of high risk class, ability of the model to correctly identify high risk class samples) on the validation set, and the output is the training stop signal and the optimal model preservation point. The process involves the following operations:

1. performance monitoring

Metric_t=0.7. AUC+ 0.3 x recall @ high risk

If metric_t>best_metric:

best_metric = metric_t

best_epoch = t

save_model(θ_t)

patience = 0

Else:

patience += 1

The method comprises the steps of calculating a comprehensive performance index of a t training period by using weighted combination of AUC and high risk recall rate, judging whether a current model reaches new optimal performance by best performance index recorded so far, wherein best_metric is a training period number corresponding to the best performance index, and is used for recording a training time point of the optimal model, t is a current training period number and is used for indicating how many epochs the model has been trained, save_model (theta_t) is a model saving operation, saving model parameter theta_t of the current period t as an optimal model check point, patience is a endurance counter, and recording how many continuous periodic performances are not improved and are used for triggering an early stopping mechanism.

2. Early stop determination

If patience>patience_max or t - best_epoch>window_size:

stop_training()

restore_model(best_epoch)

By adopting the early-stop strategy, the over-fitting problem can be effectively avoided, unnecessary training time is reduced, the average training time of the model is reduced by 35%, and meanwhile, the performance of the test set is maintained or improved.

The pre-operative classification model and the recurrence risk prediction model of the present invention perform on the test set (225 cases) as follows:

The accuracy of FTC/FTA classification reaches 91.3%, the sensitivity is 89.6%, and the specificity is 92.5%.

The recurrence risk prediction model has AUC of 0.873 and recurrence rate prediction accuracy of 83.7% in the high risk group.

Through detailed training strategy design and optimization, the invention can maintain stable performance under different medical institutions and equipment environments, and provides reliable support for clinical thyroid cancer diagnosis and risk assessment.

Fig. 7 is a flow chart of an explanatory analysis of the deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. The flow chart includes the following parts:

1. Grad-CAM thermodynamic diagram generation:

and (3) inputting image features, namely extracting CEUS features and CT features used in classification and prediction models.

Generating a key region thermodynamic diagram based on Grad-CAM technology, generating a thermodynamic diagram to highlight key regions of interest to the model.

Labeling the dynamic abnormal region, namely highlighting the blood perfusion abnormal region on a CEUS thermodynamic diagram.

Labeling static abnormal areas, namely labeling boundary fuzzy areas or calcified areas of tumors on a CT thermodynamic diagram.

2. SHAP feature contribution analysis:

and inputting fusion characteristics, namely classifying and predicting input characteristic vectors of the model.

Feature importance is calculated by analyzing the contribution of input features to the classification result based on the SHAP technique.

2.1, Quantization characteristic contribution:

CEUS features contribute such as dynamic perfusion rate, time series pattern.

CT features contribute, e.g., boundary ambiguity, anatomical morphology.

Pathological characteristics such as cell morphology and histological structure.

3. Interpretation analysis results show that:

Grad-CAM thermodynamic diagrams are displayed showing key regions in an image at a user interface.

Feature contribution ranking is displayed-the contribution weights of different modality features are quantified and visualized.

An explanatory report is generated by integrating the thermodynamic diagram with the feature contribution rank, and a downloadable report is generated.

Specifically, the Grad-CAM optimization and SHAP feature contribution analysis employed by the interpretive enhancement techniques of the present invention are embodied as follows.

1. Grad-CAM optimization:

the optimization strategy is to superimpose time dimension weights in a CEUS thermodynamic diagram, accurately position an arterial phase perfusion abnormal region, and combine CT images and pathological features (such as tumor boundary ambiguity and nuclear division counting) to realize multi-mode data fusion, so that the definition and reliability of a visual result are improved.

The Grad-CAM is embodied as follows:

For the target class c, the feature map weights α_k≡c are calculated:

α_k^c = (1/Z) Σ_i Σ_jy^c/A_ij^k

Wherein Z is a normalization factor equal to the total number of pixels of the feature map for normalization processing of the weights, and Σ_i Σ_j is a double summation symbol representing accumulation operation of all spatial positions (i, j) of the feature map; y^c/ A_ij≡is the partial derivative of the predicted value of the target class c to the activation value of the kth feature map at the position (i, j), indicating the degree of influence of the position on the classification result.

Time dimension weight integration (for CEUS sequence):

α_k^c(t) = α_k^c·w_t

where w_t is the weight coefficient of time step t, associated with the perfusion phase.

Grad-CAM thermodynamic diagram calculation:

L_Grad-CAM^c = ReLU(Σ_k α_k^c A^k)

The weight alpha_k≡c of all channels and the corresponding feature map A≡k are weighted and summed, and then Grad-CAM thermodynamic diagrams are obtained through a ReLU activation function, which highlights the feature areas important to the target class c.

CEUS-specific thermodynamic diagrams:

L_Grad-CAM^c(CEUS) = ReLU(Σ_k Σ_t α_k^c(t) A^k(t))

On the basis of the Grad-CAM, the alpha_k≡c (t) and the time variation of the characteristic diagram A≡k (t) at different moments t in the time dimension are further considered, so that a thermodynamic diagram which is more in line with the characteristics of the CEUS sequence is obtained and is used for analyzing the characteristic importance distribution of the target category on the time sequence.

Grad-CAM optimization adopted by the invention can accurately position the CEUS arterial phase perfusion abnormal region, and fusion visualization of multi-mode data is realized by combining CT images and pathological features.

2. SHAP feature contribution analysis:

Through SHAP feature contribution analysis, the model can quantify the contribution of various features to the prognosis results, helping to reveal the role of different factors in the prediction. The SHAP value assigns an importance score to each feature, and by measuring the influence of each feature on the predicted outcome, a clinician can intuitively understand how factors such as CEUS peak flow rate, boundary ambiguity in CT images, pathological nuclear division count, etc. influence prognosis prediction.

The specific implementation process of SHAP feature contribution analysis is as follows:

SHAP value calculation for feature i:

φ_i = Σ_{S⊆N\{i}} |S|!(|N|-|S|-1)!/|N|! [f_x(S∪{i}) - f_x(S)]

where N is the set of all features, S is the subset that does not contain feature i, and f_x (S) is the prediction of sample x when the model uses feature set S.

Inter-modality feature interaction SHAP values:

φ_ij = Σ_{S⊆N\{i,j}} |S|!(|N|-|S|-2)!/2|N|! [f_x(S∪{i,j}) - f_x(S∪{i}) - f_x(S∪{j}) + f_x(S)]

feature importance quantification:

Importance(i) = Σ_x |φ_i(x)|

modality importance quantification:

Importance(CEUS) = Σ_{i∈CEUS_features} Importance(i)

Importance(CT) = Σ_{i∈CT_features} Importance(i)

Importance(Pathology) = Σ_{i∈Pathology_features} Importance(i)

The SHAP characteristic contribution analysis adopted by the invention can quantify the contribution of each characteristic to the prognosis result and reveal the effect of different factors in prediction.

In summary, the invention combines the dynamic micro blood flow characteristics of CEUS and the high-resolution anatomical information of enhanced CT through a deep learning method, establishes a preoperative classification model based on multi-modal image data, comprehensively improves the sensitivity and specificity of the preoperative classification of FTC, adopts a self-adaptive weight dynamic adjustment mechanism, realizes the optimized combination of multi-modal image characteristics through a deep self-attention network, fully excavates the complementary information of each modal characteristic, utilizes the joint analysis of images and pathological data to construct a recurrence risk prediction model based on molecular and cell level characteristics, and provides an accurate individuation management strategy for postoperative high-risk patients.

Furthermore, the invention also provides electronic equipment, which mainly comprises a processor and a memory, wherein a computer program is stored in the memory, and the computer program can cause the processor to execute the thyroid tumor classification and recurrence risk prediction method based on deep learning.

The memory may be random access memory, read only memory, nonvolatile, programmable ROM, erasable PROM, electrically erasable, flash memory, optical memory, registers, and so forth.

The processor may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a Solid state disk (Solid STATEDISK, SSD), etc.

Claims

1. A method for thyroid tumor classification and recurrence risk prediction based on deep learning, characterized by comprising the following steps:

S1. Acquiring and preprocessing multimodal data related to follicular thyroid carcinoma (FTC) or follicular adenoma (FTA), wherein the multimodal data includes dynamic microblood flow data, anatomical data, and whole-slice pathology image (WSI) data;

S2. Different feature extraction methods are used to extract the preprocessed dynamic microblood flow data, anatomical data and whole-slice pathological image WSI data, respectively, to obtain dynamic microblood flow features F_CEUS, anatomical features F_CT and pathological features F_path;

S3. Fusing and optimizing the dynamic microblood flow feature F_CEUS, the anatomical feature F_CT, and the pathological feature F_path to obtain a first optimized fusion feature F_fused_optimized; fusing and optimizing the dynamic microblood flow feature F_CEUS and the anatomical feature F_CT to obtain a second optimized fusion feature F_temporal;

S4. Input the first optimized fusion feature F_fused_optimized into the preoperative classification model to obtain the thyroid tumor type classification probability, and input the second optimized fusion feature F_temporal and the pathological feature F_path into the postoperative recurrence risk prediction model to obtain the recurrence risk score.

2. The method for thyroid tumor classification and recurrence risk prediction based on deep learning according to claim 1, wherein collecting multimodal data in step S1 comprises:

Dynamic microblood flow data were obtained by contrast-enhanced ultrasound (CEUS) using time series acquisition technology.

Anatomical data were obtained by using enhanced CT with multi-phase scanning and iterative reconstruction algorithm;

Whole-slice pathology image WSI data were obtained using a digital scanner;

The preprocessing of multimodal data in step S1 includes:

Applying adaptive median filtering and wavelet transform to the dynamic microblood flow data to obtain first dynamic microblood flow data, and applying non-local mean denoising to the anatomical data to obtain first anatomical data;

Applying a mutual information maximization algorithm to the first dynamic microblood flow data and the first anatomical data to obtain sub-pixel spatially aligned dynamic microblood flow data and anatomical data;

A pyramid block partitioning strategy is applied to the whole-slice pathology image WSI data. Pixel blocks of the whole-slice pathology image WSI are extracted at different magnifications. Multi-scale feature fusion is then applied to the pixel blocks to obtain the fused whole-slice pathology image WSI data.

3. The method for thyroid tumor classification and recurrence risk prediction based on deep learning according to claim 1, wherein the feature extraction of the preprocessed multimodal data in step S2 comprises:

The CNN-GRU hybrid architecture is used to extract spatial features and model temporal features of the preprocessed dynamic microblood flow data, and the dynamic microblood flow features F_CEUS associated with the dynamic microblood flow data are obtained;

The pre-trained ResNet50 is used to extract multi-level features from the pre-processed anatomical data, and the multi-scale features are fused through the pyramid pooling module PPM to obtain the anatomical features F_CT associated with the anatomical data;

The preprocessed whole-slice pathology image (WSI) is divided into N tiles. Each tile is processed using the InceptionResNetV2 model to obtain the corresponding instance features. The attention weight of each tile is then calculated using the attention weight network. Based on the calculated attention weight, all instance features are weightedly aggregated to obtain the pathology feature F_path associated with the whole-slice pathology image (WSI) data.

4. The deep learning-based thyroid tumor classification and recurrence risk prediction method according to claim 3, characterized in that in step S3, the dynamic microblood flow feature F_CEUS, the anatomical feature F_CT, and the pathological feature F_path are fused and optimized to obtain a first optimized fused feature F_fused_optimized, comprising:

For the dynamic microblood flow feature F_CEUS, anatomical feature F_CT and pathological feature F_path, a multi-head self-attention mechanism is used to generate the first initial weight vector ;

Combined with feature quality evaluation and task relevance analysis, the first initial weight vector Perform adaptive adjustment to obtain the first initial weight vector after the first adjustment ;

The first initial weight vector after the first adjustment is adjusted using the task-aware dynamic weight adjustment mechanism Perform dynamic adjustment to obtain the first initial weight vector after the second adjustment ;

The first initial weight vector after the second adjustment Normalize and use task-sensitive adaptive adjustment to obtain the first weight vector [α, β, γ], where α+β+γ=1;

Based on the first weight vector [α, β, γ], the dynamic microblood flow feature F_CEUS, the anatomical feature F_CT, and the pathological feature F_path are weightedly fused to obtain the first fused feature F_fused, as shown in the following formula:

F_fused=α·F_CEUS+β·F_CT+γ·F_path

The first fused feature F_fused is then optimized through the Transformer encoder to obtain the first optimized fused feature F_fused_optimized.

5. The method for thyroid tumor classification and recurrence risk prediction based on deep learning according to claim 3, characterized in that in step S3, the dynamic microblood flow feature F_CEUS and the anatomical feature F_CT are fused and optimized to obtain a second optimized fusion feature F_temporal, comprising:

For the dynamic microblood flow feature F_CEUS and the anatomical feature F_CT, the multi-head attention mechanism is used to calculate the feature similarity matrix and generate the second initial weight vector ;

The second initial weight vector is adjusted using a task-aware dynamic weight adjustment mechanism Adjust to get the adjusted second initial weight vector ;

The adjusted second initial weight vector Normalize to obtain the second weight vector [α', β'], where α'+β' = 1;

Based on the second weight vector [α', β'], the dynamic microblood flow feature F_CEUS and the anatomical feature F_CT are weightedly fused to obtain the second fused feature F_fused', as shown in the following formula:

F_fused'=α'·F_CEUS+β'·F_CT;

Introduce residual connections to obtain enhanced features F_enhanced:

F_enhanced = F_fused' + Dropout(FC(F_fused'))

Among them, FC is the fully connected layer, Dropout is the random deactivation operation, and FC is the fully connected layer;

Apply layer normalization to the enhanced feature F_enhanced to obtain the normalized enhanced feature F_normalized:

F_normalized = LayerNorm(F_enhanced)

The normalized enhanced feature F_normalized is input into the attention-enhanced GRU network to obtain the second optimized fusion feature F_temporal:

F_temporal = GRU(F_normalized).

6. The method for thyroid tumor classification and recurrence risk prediction based on deep learning according to claim 1, wherein the preoperative classification model in step S4 is constructed by:

A dual-branch fully connected network structure was used to construct the initial preoperative classification model;

Multimodal data related to follicular thyroid carcinoma (FTC) or follicular adenoma (FTA) are collected as a dataset, and stratified sampling is used to divide the dataset into a first training set, a first validation set, and a first test set according to a predetermined ratio;

The first training set was used to train the initial preoperative classification model. During the training process, the first validation set was used to evaluate the performance of the initial preoperative classification model. A multi-stage transfer learning strategy, a cross-entropy loss function and an L2 regularization term, a dynamic fine-tuning mechanism based on the characteristics of CEUS and CT devices, and an adaptive learning rate adjustment were used to adjust the parameters of the initial preoperative classification model. After training, the preoperative classification model was obtained, and the performance of the preoperative classification model was evaluated using the first test set.

7. The method for thyroid tumor classification and recurrence risk prediction based on deep learning according to claim 1, wherein the postoperative recurrence risk prediction model in S4 is constructed by:

A multimodal contrastive learning framework was used to construct an initial postoperative recurrence risk prediction model;

Multimodal data related to follicular thyroid carcinoma (FTC) and follicular adenoma (FTA) are collected as a dataset, and stratified sampling is used to divide the dataset into a second training set, a second validation set, and a second test set according to a predetermined proportion;

The second training set was used to train the initial postoperative recurrence risk prediction model, and the second validation set was used to evaluate the performance of the initial postoperative recurrence risk prediction model. Domain-invariant feature learning based on adversarial training and an adaptive early stopping mechanism based on validation set performance were used to adjust the parameters of the initial postoperative recurrence risk prediction model. After training, the postoperative recurrence risk prediction model was obtained, and the performance of the postoperative recurrence risk prediction model was evaluated using the first test set.

8. The deep learning-based thyroid tumor classification and recurrence risk prediction method according to claim 7 is characterized in that a dual-branch contrastive learning network and a risk scoring mechanism are further used to enhance the performance of the postoperative recurrence risk prediction model.

9. The deep learning-based thyroid tumor classification and recurrence risk prediction method according to any one of claims 1 to 8 is characterized in that, based on the thyroid tumor type classification probability and recurrence risk score, Grad-CAM optimization is used to generate a heat map to improve the clarity and reliability of the visualization results, and SHAP feature contribution analysis is used to quantify the contribution of various features to the prognostic results.

10. An electronic device comprising a processor and a memory, characterized in that a computer program is stored in the memory, and when the computer program is executed by the processor, the processor will execute the deep learning-based thyroid tumor classification and recurrence risk prediction method according to any one of claims 1 to 9.