Detailed Description
Referring to fig. 1, fig. 1 is a flowchart of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. The deep learning-based thyroid tumor classification and recurrence risk prediction method comprises the steps of S1, collecting and preprocessing multi-mode data related to thyroid follicular carcinoma FTC or follicular gonadoma FTA, wherein the multi-mode data comprise dynamic micro blood flow data, anatomical data and full-slice pathological image WSI data, S2, respectively adopting different feature extraction methods for the preprocessed dynamic micro blood flow data, anatomical data and WSI data to conduct feature extraction to obtain dynamic micro blood flow features F_CEUS, anatomical features F_CT and pathological features F_path, S3, generating weight vectors aiming at the dynamic micro blood flow features F_CEUS, anatomical features F_CT and pathological features F_path by using a multi-head self-attention mechanism, adopting an adaptive weight adjustment mechanism combining feature quality evaluation and task relevance analysis to conduct weighted fusion, utilizing a trans-former encoder to obtain first optimized fusion features F_optimal, adopting a correlation matrix between dynamic micro features F_CEUS and anatomical features F_CT to obtain a dynamic micro blood flow features F_CEUS, utilizing a correlation feature F_CT and a pathological features F_path to conduct weighted fusion, utilizing a second weighted fusion model to obtain a recurrent risk prediction model, and utilizing a first functional model to conduct weighted fusion, and a second functional model to obtain a recurrent risk prediction model, and a weighted fusion model before the first functional model and a second functional model are input to a weighted fusion model.
The pretreatment, feature extraction, fusion and optimization, model training and construction and interpretation enhancement technology in the thyroid tumor classification and recurrence risk prediction method based on deep learning are implemented through Python. In this specification, the dynamic micro-blood flow feature f_ceus may also be referred to as CEUS feature, and the anatomical feature f_ct may also be referred to as CT feature.
Referring to fig. 2, a flow chart of data acquisition and preprocessing according to the present invention is shown. Mainly comprises the following steps:
1. And (3) acquiring CEUS data, namely acquiring CEUS time sequence images and capturing arterial period and venous period dynamic characteristics.
2. CT data is acquired, namely three-phase enhanced CT images are acquired, wherein the three-phase enhanced CT images comprise an arterial phase, a venous phase and a delay phase.
3. And (3) acquiring pathological section data, namely acquiring H & E stained pathological section full-section images and carrying out block segmentation.
4. And (3) data cleaning, namely removing an invalid blank area in the image and filling in lost or incomplete data.
5. Format conversion, namely saving the CEUS data as a standard DICOM format, and converting the pathological section into a JPEG format, so that deep learning processing is facilitated.
6. Normalization processing, which is to normalize brightness and contrast and unify resolution, for example, CEUS is 640×480, CT is 256×256, and pathological section is 256×256.
7. Spatial registration, namely using Elastix to spatially align the CT and CEUS images and registering multi-mode data to ensure consistency of the images in spatial positions.
8. And finishing preprocessing, namely outputting the multi-mode data after normalization and registration.
More specifically, in one embodiment of the present invention, regarding the data acquisition portion, for the multimode characteristics of the FTC, the following standardized acquisition procedure is proposed:
(1) Dynamic micro blood flow feature Capture (CEUS) is to accurately divide arterial period (30-60 seconds) and venous period (60-120 seconds) by adopting a time sequence acquisition technology, capture the space-time dynamic feature of tumor micro blood flow perfusion by a high-frequency ultrasonic probe (frequency is more than or equal to 12 MHz), and store the data in DICOM format (resolution 640X 480, frame rate 15 fps).
(2) The high resolution anatomy feature extraction (enhanced CT) adopts multi-phase scanning (arterial phase, venous phase and delay phase), the thickness of the layer is 1.0-1.5mm, and the noise is reduced by combining an iterative reconstruction algorithm, so that the fine depiction of the three-dimensional anatomy structure is ensured.
(3) Digital processing of pathology data full-slice pathology image (WSI) is digitized by a digital scanner (resolution 0.25 μm/pixel), combined with tissue region screening algorithm (Otsu thresholding) using a blocking process (256×256 pixels), excluding blank region interference.
(4) And the data storage and cross-platform compatibility is that a standardized data warehouse is constructed, CEUS (DICOM), CT (DICOM) and pathology image (JPEG) are uniformly mapped to the same space coordinate system, and the lossless transmission and cross-equipment analysis of multi-center data are supported.
Through the standardized acquisition flow, the standardized acquisition of CEUS, CT and pathological data can be ensured, the accurate acquisition of dynamic micro-blood flow characteristics, high-resolution anatomical characteristics and pathological characteristics is realized, and unified storage and cross-platform compatibility of multi-mode data are supported by constructing a standardized data warehouse, so that nondestructive transmission and analysis of multi-center data are realized.
In one embodiment of the invention, regarding data preprocessing, the following operations are employed:
(1) Spatial registration and dynamic alignment, namely adopting Elastix registration frames (based on B spline elastic transformation), realizing sub-pixel level spatial alignment of CEUS and enhanced CT images through a mutual information maximization algorithm, and eliminating artifacts caused by respiratory motion and equipment offset.
(2) And (3) multi-scale optimization of pathological data, namely, aiming at WSI data, a pyramid blocking strategy is provided, namely 256×256 pixel blocks are respectively extracted under 5×, 10×and20× magnification, the pixel blocks are input into a InceptionResNetV multi-scale feature fusion module, 512-dimensional feature vectors containing cross-scale information of cell morphology and tissue structure are output, and cross-scale representation of the cell morphology and tissue structure is realized.
(3) Noise suppression and enhancement processing, namely, adopting adaptive median filtering (window size is 3 multiplied by 3) to the CEUS data and combining wavelet transformation (Daubechies basis function) to effectively separate blood flow signals from background noise, and retaining anatomical details of CT data through non-local mean denoising (NL-Means) and simultaneously suppressing metal artifacts.
Through the data preprocessing operation, sub-pixel level space alignment of CEUS and CT images is realized, artifacts caused by respiratory motion and equipment offset are eliminated, multi-scale optimization is carried out on pathological data, cross-scale representation of cell morphology and tissue structure is extracted, CEUS blood flow signals and background noise are effectively separated through noise suppression and enhancement processing, and CT anatomical details are reserved.
Referring to fig. 3, a feature extraction block diagram of the deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention is shown. Mainly comprises the following parts:
CEUS dynamic micro-blood flow feature extraction spatial features of single frame CEUS images are extracted using Convolutional Neural Networks (CNNs). The time series features are captured by a gated loop unit (GRU) to generate a micro-blood flow feature vector (e.g., 128 dimensions).
And extracting static anatomical features of CT, namely extracting anatomical features such as tumor boundary ambiguity, capsule invasion and the like by utilizing ResNet. A Pyramid Pooling Module (PPM) is used to enhance the combination of global and local features to generate CT feature vectors (e.g., 256 dimensions).
3. Pathological feature extraction, namely dividing a full-slice pathological image into small blocks, and cleaning. Cell morphology was extracted using InceptionResNetV a 2. Based on a multi-instance learning (MILs) framework, the patch features are weighted and aggregated to generate a pathology feature vector (e.g., 512 dimensions).
More specifically, the extraction of dynamic micro-blood flow features, anatomical features and pathological features according to the invention is achieved by:
1. Extraction of dynamic micro-blood flow characteristics F_CEUS of CEUS
Adopting a CNN-GRU hybrid architecture design, comprising:
Spatial feature extraction, namely constructing a 3-layer convolution network (Conv3×3, reLU, batchNorm), and outputting 128-dimensional feature vectors for each CEUS image;
time series feature modeling, capturing hemodynamic changes from arterial phase to venous phase with a bi-directional GRU (hidden layer 64 unit), and outputting dynamic features (e.g., 128 dimensions).
Extracting mathematical expression of the CNN spatial features:
F_spatial^i = CNN(I_i)
(wherein I_i represents the ith frame CEUS image, F_spatial ≡i represents the extracted 128-dimensional spatial feature vector.)
Modeling GRU time sequence characteristics:
h_t = GRU(F_spatial^t, h_{t-1})
r_t = σ(W_r·[h_{t-1}, F_spatial^t])
z_t = σ(W_z·[h_{t-1}, F_spatial^t])
ñ_t = tanh(W·[r_t * h_{t-1}, F_spatial^t])
h_t = (1 - z_t) * h_{t-1} + z_t * ñ_t
Wherein h_t is a hidden state at time T, r_t is a reset gate, z_t is an update gate, sigma is a sigmoid activation function, a dynamic feature f_ceus=h_t is finally obtained, T is the last time step of the sequence, ñ _t is a candidate hidden state in the GRU, and represents a new candidate state calculated based on current input and history information after reset in the time step T, and W is a weight matrix for calculating the candidate hidden state in the GRU, and is used for mapping the history state controlled by the reset gate and the current input to a candidate state space.
In summary, through the CNN-GRU hybrid architecture, the hemodynamic changes from arterial phase to venous phase are captured, and 128-dimensional dynamic micro-blood flow characteristics are output.
2. Enhanced CT anatomical feature F_CT extraction
Using ResNet-PPM joint optimization, multi-level features (Conv 1 to Conv 5) are extracted based on pre-training ResNet50, 4-level scale features (1×1,2×2, 3×3, 6×6 pooling) are fused by a Pyramid Pooling Module (PPM), and 256-dimensional global-local joint features are output. Because the data scale of the enhanced CT is smaller, if the random initialized ResNet is directly used for extracting the features, the problems of parameter redundancy, low feature learning efficiency, difficult optimization and the like can be caused, and the features are extremely poor in performance in a small data scene, so that the method adopts the pre-trained ResNet to migrate the general feature representation learned in the large data to the small data set through migration learning, and the dependence of the model on target data is remarkably reduced. Wherein, the
ResNet50 feature extraction:
F_conv_i = ResNet50_layer_i(I_CT), i ∈ {1,2,3,4,5}
wherein i_ct represents a CT input image, and f_conv_i represents a ResNet50 0 ith layer convolution output feature map.
Pyramid Pooling Module (PPM):
F_pool_j = Pool_j(F_conv_5), j ∈ {1×1, 2×2, 3×3, 6×6}
F_ppm_j = Conv(Upsample(F_pool_j))
F_CT = Concat(F_ppm_1, F_ppm_2, F_ppm_3, F_ppm_4)
The F_pool_j is a multi-scale feature map obtained by carrying out different scale pooling operations (1×1,2×2, 3×3 and 6×6) on a ResNet50 0 layer 5 feature map, wherein F_ppm_j is a feature map obtained by carrying out upsampling and convolution processing on the pooled feature F_pool_j and restoring to the original size for subsequent splicing to form a final 256-dimensional CT feature vector, and F_CT is a CT feature vector with 256 dimensions and contains multi-scale feature information.
In summary, based on ResNet-PPM joint optimization, multi-scale anatomical features are extracted, and 256-dimensional global-local joint features are output.
3. Extraction of pathological features F_path
Weak supervised feature learning of pathology images (MIL framework) is adopted, wherein WSI is divided into N blocks (256×256) through multi-instance learning and attention mechanisms, 512-dimensional features are extracted through InceptionResNetV2, and an instance-level feature pool is constructed. And designing an attention weight network (full connection layer+sigmoid), dynamically screening key image blocks (such as vascular infiltration areas) related to the FTC, and outputting the aggregated 512-dimensional pathological features.
Example-level feature extraction:
f_i = InceptionResNetV2(p_i), i ∈ {1,2,...,N}
Where p_i represents the ith pathology tile and f_i represents the corresponding 512-dimensional instance feature.
Attention weight calculation:
w_i = sigmoid(W·f_i + b)
Where W and b are a learnable weight matrix and bias, w_i ε [0,1] represents the attention weight of the ith tile.
And (3) weighting and aggregating:
F_path = Σ(w_i·f_i) / Σ(w_i)
Wherein F_path is the final 512-dimensional pathology, representing a global representation of the pathology image.
In summary, key pathological patches related to FTC are screened through a multi-instance learning and attention mechanism, and 512-dimensional pathological features are output.
Referring to fig. 4, fig. 4 is a schematic diagram of tri-modal feature fusion and optimization of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. It mainly comprises the following parts:
1. And the self-adaptive weighting mechanism dynamically calculates the weights of the dynamic micro blood flow characteristic F_CEUS, the anatomical characteristic F_CT and the pathological characteristic F_path, and adjusts the contribution proportion of each modal characteristic according to task requirements.
2. Deep self-attention network-the global dependency relationship among modalities is captured through a multi-head (e.g. 8-head) attention mechanism, and fusion characteristic expression is optimized.
3. And optimizing the inter-mode features, namely further improving the expressive power of the fusion features through feature interaction and redundancy elimination processing.
4. And outputting the fusion characteristics, namely outputting high-dimensional optimized fusion characteristics for subsequent classification and prediction.
Specifically, the adaptive fusion and optimization of the trimodal features of the present invention includes the following operations:
1. self-attention based dynamic weighted fusion
1.1 Adopting a weight distribution algorithm, inputting CEUS (128-dimensional), CT (256-dimensional) and pathology (512-dimensional) characteristics, calculating a correlation matrix among modes through a multi-head self-attention mechanism (head number=8), and generating a first initial weight vector。
First a query (Q), key (K) and value (V) matrix is calculated:
Q = W_Q·F_concat
K = W_K·F_concat
V = W_V·F_concat
Wherein, f_concat= [ f_ceus, f_ct, f_path ], is a spliced feature vector.
The attention score is then calculated:
Attention(Q, K, V) = softmax(QK^T/√d_k)V
where d_k is the dimension of the key vector in the attention mechanism for scaling the attention score to prevent the gradient from disappearing.
Multi-head attention splice (head number=8) was then performed:
MultiHead(F_concat) = Concat(head_1, head_2, ..., head_8)·W_O
head_i = Attention(Q_i, K_i, V_i)
wherein W_O is the output weight matrix of the multi-head attention, and is used for mapping the spliced multi-head attention result to the final output feature space.
A first initial weight vector is then generated from the multi-headed attention result:
= softmax(W_modal·MultiHead(F_concat))
Wherein W_ modal is a modal weight generation matrix, and the output result of the multi-head attention is converted into weight coefficients of three modes 。
1.2 Adaptive adjustment of the first initial weight vector in combination with feature quality assessment and task relevance analysis:
dynamically adjusting a first initial weight vector based on feature quality assessment and task relevance analysis :
1.2.1 Calculating a quality assessment score:
quality_CEUS = w_q·Quality(F_CEUS) + w_r·Relevance(F_CEUS,task)
quality_CT = w_q·Quality(F_CT) + w_r·Relevance(F_CT,task)
quality_path = w_q·Quality(F_path) + w_r·Relevance(F_path,task)
wherein quality_CEUS is the Quality score of CEUS characteristics and is calculated by the weighted combination of characteristic Quality and task Relevance, w_q is a Quality weight coefficient, quality function evaluates the Quality of the current characteristics, w_r is a Relevance weight coefficient, quality function evaluates the Relevance degree of the characteristics and the current tasks, quality_CT is the Quality score of CT characteristics, quality_path is the Quality score of pathological characteristics
1.2.2 Normalized mass fraction:
[q_α, q_β, q_γ] = softmax([quality_CEUS, quality_CT, quality_path])
where q_α, q_β, q_γ are the quality score of the normalized CEUS feature, the quality score of the normalized CT feature, and the quality score of the normalized pathology feature, respectively.
1.2.3 Adjustment is made based on the initial weights:
α1_temp = λ·α0+ (1-λ)·q_α
β1_temp = λ·β0+ (1-λ)·q_β
γ1_temp = λ·γ0+ (1-λ)·q_γ
Where λ=0.7, is the initial weight retention coefficient.
1.2.4 Renormalization:
= softmax([α1_temp, β1_temp, γ1_temp])
Wherein, the Is the first initial weight vector after the first adjustment.
2. Task perception weight optimization:
L_weight = L_task + λ·R(α1, β1, γ1)
Where l_task is task penalty (classification or regression) and R is a weight regularization term.
2.1 Obtaining a first initial weight vector after second adjustment through task self-adaptive updating based on gradient:
α2= α1- η_α·L_weight/α1
β2= β1- η_β·L_weight/β1
γ2= γ1- η_γ·L_weight/γ1
Where η_α is a learning rate corresponding to the weight α 1 for controlling the update step size of the weight parameter α 1, η_β is a learning rate corresponding to the weight β 1 for controlling the update step size of the weight parameter β 1, and η_γ is a learning rate corresponding to the weight γ 1 for controlling the update step size of the weight parameter γ 1.
Finally, the first weight vector [ α, β, γ ] =softmax ])。
2.2 Task-sensitive adaptive adjustment:
increasing ηα to increase CEUS feature weight when diagnostic tasks are prioritized
Increasing eta gamma to increase pathological feature weight when predicting task priority
Real-time monitoring of verification set performance and dynamic adjustment of final weight vector
In practical applications, the diagnostic tasks are typically larger (biased towards CEUS) for α≡0.4, while the recurrence prediction tasks are larger (biased towards pathological features) for γ≡0.5. The weights are automatically adapted to the device characteristics and data quality of different institutions.
Finally, the dynamic micro-blood flow feature F_CEUS, the anatomical feature F_CT and the pathological feature F_path are fused by a first weight vector to obtain a first fused feature F_fused, wherein F_fused=alpha.F_CEUS+beta.F_CT+gamma.F_path, (alpha+beta+gamma=1)
In conclusion, through a multi-head self-attention mechanism, weights of CEUS, CT and pathological features are dynamically distributed, and accurate fusion of cross-modal features is achieved.
2. Global optimization of deep self-attention network
Using a transducer-based feature interaction, inputting the fused features (512 dimensions) into a 6-layer transducer encoder (head number=8, feed-forward layer dimension=2048), capturing cross-modal global dependencies by position coding and layer normalization (LayerNorm);
The specific process is as follows:
X' = LayerNorm(X + MultiHeadAttention(X))
X'' = LayerNorm(X' + FFN(X'))
Wherein MultiHeadAttention (X) = Concat (head 1,...,head8) w≡o
head_i = Attention(XW_i^Q, XW_i^K, XW_i^V)
Attention(Q,K,V) = softmax(QK^T/√d_k)V
The method comprises the steps of performing multi-head self-attention and residual connection on X 'and then performing layer normalization on the X', wherein X '' is a final output characteristic representation after feedforward network FFN processing and secondary residual connection and layer normalization, W O is an output projection matrix of a multi-head attention mechanism and is used for mapping spliced multi-head attention results to final output dimensions, XW_i Q is a query vector Q_i obtained after the input characteristic X is transformed by a query weight matrix W_i_Q of an ith head, XW_i_K is a key vector K_i obtained after the input characteristic X is transformed by a key weight matrix W_i_k of the ith head, and XW_i_v is a value vector V_i obtained after the input characteristic X is transformed by a value weight matrix W_i_v of the ith head.
Position coding:
PE(pos, 2i) = sin(pos/10000^(2i/d_model))
PE(pos, 2i+1) = cos(pos/10000^(2i/d_model))
Transformer encoder layer:
x' = LayerNorm(x + MultiHeadAttention(x, x, x))
output = LayerNorm(x' + FFN(x'))
Wherein FFN is feed forward network:
FFN(x) = max(0, xW_1 + b_1)W_2 + b_2
by stacking 6 layers of the transducer encoder, the final features are expressed as:
F_fused_optimized = Transformer_6(Transformer_5(...Transformer_1(F_fused)...))
to sum up, cross-modal global dependencies are captured by a transducer encoder, outputting 512-dimensional optimization features.
Referring to fig. 5, fig. 5 is a schematic diagram of bimodal feature fusion and optimization of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. It is worth noting that in the case of tri-modal fusion, because of large semantic difference between modalities, a transform encoder and a position encoder are adopted to establish a global context relation for deep cross-modal feature interaction, and the dual-modal fusion mainly processes two relatively homogeneous image data of CEUS and CT, and more attention is paid to the time sequence evolution trend of CEUS, so that residual connection and GRU network are adopted to capture time dependence.
Specifically, the bimodal adaptive fusion and optimization of the present invention includes the following operations:
1. Attention-based weight distribution
Inputting CEUS (128-dimensional) and CT (256-dimensional) features, calculating feature similarity matrix, and generating a second initial weight vector。
The specific process is as follows:
Cross-modality similarity score is calculated:
similarity_matrix = F_CEUS·F_CT^T/√d
cross_similarity = mean(softmax(similarity_matrix))
Where F_CEUS. F_CT≡represents a feature dot product operation, d is the feature dimension, softmax can normalize similarity to probability distribution
Modality autocorrelation calculation:
self_sim_CEUS = mean(F_CEUS·F_CEUS^T/√128)
self_sim_CT = mean(F_CT·F_CT^T/√256)
weight raw score calculation:
w_CEUS_raw = cross_similarity * self_sim_CEUS
w_CT_raw = (1 - cross_similarity) * self_sim_CT
Weight vector normalization:
= softmax([w_CEUS_raw, w_CT_raw])
2. task aware dynamic weight adjustment
Weight optimization is based on gradient descent and task loss:
L_weight' = L_task' + λ'·R' (α0', β0')
Where l_task 'is the primary task loss function, λ' is the regularization coefficient, and R '(α 0', β0') is the weight regularization term. And monitoring the performance of the second verification set in real time, and dynamically fine-tuning the weight parameters.
2.1 Gradient-based weight update
Gradient update formula:
α1' = α0' - η_α'·L_weight/α0'
β1' = β0' - η_β'·L_weight/β0'
[α', β'] = softmax()
wherein:
η_α ', η_β ' are learning rate parameters of CEUS and CT modalities, respectively, α 1'、β1 ' is a temporary weight after gradient update, and α ', β ' is a normalized weight, i.e. a second weight.
The invention adopts a task self-adaptive adjustment strategy in the fusion process of the dynamic micro blood flow characteristic F_CEUS and the anatomical characteristic F_CT, and the task self-adaptive adjustment strategy is as follows:
when diagnostic tasks are prioritized:
Increasing η_α ' emphasizes the micro-blood flow characteristics of CEUS, typically weight distribution α ' ≡0.6, β ' ≡0.4.
When other analysis tasks are prioritized:
increasing η_β ' emphasizes anatomical detail features of CT, typical weight distributions are α ' ≡0.45, β ' ≡0.55.
The performance of the verification set can be monitored in real time, the weight parameters are dynamically and finely adjusted, the learning rate of eta alpha 'and eta beta' is automatically adjusted according to the performance of the verification set, and the optimal weight distribution of the model under different task scenes is ensured.
And finally fusing output to obtain a second fusion characteristic:
F_fused' = α'·F_CEUS + β'·F_CT
Where α '+β' =1.
3. Feature optimization and interaction enhancement
3.1 Residual connection and feature enhancement
Introducing residual connection avoids gradient disappearance, and enhances information flow:
F_enhanced = F_fused' + Dropout(FC(F_fused'))
Where F_enhanced is the enhanced feature, F_fused' is the second fusion feature, FC is the fully connected layer, dropout is a random deactivation operation, preventing overfitting.
And then unifying different modal characteristic distributions by applying layer normalization (Layer Normalization) to obtain normalized enhancement characteristic F_normalized:
F_normalized = LayerNorm(F_enhanced)
3.2, space-time dependency modeling
The normalized enhancement feature f_normalized is input into the attention enhanced GRU network:
F_temporal = GRU(F_normalized)
where GRU is the gating loop and F_temporal is the feature after capture timing dependence, i.e., the second optimized fusion feature.
In summary, the bimodal optimization fusion method of the invention strengthens the space-time feature interaction through a self-attention mechanism (head number=4) by capturing the synergistic effect of the CEUS time sequence feature and the CT space feature.
Fig. 6 is a classification and prediction flow chart of a deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. The thyroid tumor classification method is implemented by a preoperative classification model, and the recurrence risk prediction method is implemented by a postoperative recurrence risk prediction model. The construction of the pre-operative classification model and the post-operative recurrence risk prediction model is described below.
1. Construction of preoperative classification model
1.1 Data acquisition and Pre-processing
In one embodiment of the invention, multimodal data of a total of 1500 patients, including cases of thyroid follicular carcinoma (FTC) and follicular adenoma (FTA), were employed from multiple trimester hospitals using different brands and models of imaging equipment. The dataset was divided into training, validation and test sets at a rate of about 70% (1050 cases), 15% (225 cases), and hierarchical sampling was used to ensure consistent proportions of FTC and FTA in each subset. And then, carrying out standardized acquisition and processing on CEUS data, CT data and pathological data:
CEUS data, namely precisely dividing arterial period (30-60 seconds) and venous period (60-120 seconds) by adopting a time sequence acquisition technology, and capturing the space-time dynamic characteristics of tumor micro-blood perfusion by a high-frequency ultrasonic probe;
CT data, namely adopting multi-phase scanning, wherein the layer thickness is 1.0-1.5mm, and reducing noise by combining an iterative reconstruction algorithm;
pathological data the whole-slice pathological image is digitized by a digital scanner (resolution 0.25 μm/pixel) and a block processing and tissue region screening algorithm is adopted.
1.2 Training of Pre-operative Classification models
The preoperative classification model is trained by adopting a double-branch full-connection network structure, an input layer receives 512-dimensional fusion characteristics, a two-layer full-connection network (FC-128) is matched with a ReLU activation function to perform characteristic dimension reduction, and an output layer outputs classification probability of FTC and FTA through a Softmax function. Through the double-branch full-connection network, the accurate classification of tumor types is realized, and the classification probability is output.
The pre-operation classification model is constructed by adopting a multi-stage migration learning strategy, and a three-stage progressive training framework is utilized, wherein the three stages are pre-training, feature migration and fine adjustment in sequence. The input is source domain data (1050 cases of multi-center training data), the output is field-adaptive model parameters theta, and the optimization is carried out aiming at specific medical equipment environments.
The adjustment method of the model parameter theta is as follows:
First stage Pre-training
θ_pretrain = argmin_θ L_src(θ, D_src)
Wherein argmin_θ is an optimization operator that represents finding the optimal parameter θ that minimizes the objective function, and l_src is a source domain loss function that measures the prediction error of the model on the source domain dataset.
Second stage feature migration
θ_transfer = argmin_θ [L_src(θ, D_src) + λ_1·d_MMD(F_src, F_tgt)]
Where d_mmd is the maximum mean difference metric for reducing source domain and target domain feature distribution differences.
Third stage of fine tuning
θ_finetune = argmin_θ [L_tgt(θ, D_tgt) + λ_2·R(θ, θ_transfer)]
Where R is a regularization term, limiting parameter deviations from the pre-trained model are excessive.
The distribution difference of the source domain and the target domain is reduced through progressive migration, the adaptability of the model on new equipment is improved, and the accuracy rate on a verification set is improved by about 8-12%.
The pre-operation classification model is constructed by adopting a dynamic fine adjustment mechanism, and a self-adaptive parameter adjustment algorithm based on equipment characteristics is adopted. The CEUS and CT characteristic parameters, including imaging resolution, frame rate, contrast, etc., input to the target device and the output is a network weight optimized for the particular device characteristics.
The method for adjusting the network weight is as follows:
characterization of device characteristics:
v_device = Encoder(device_params)
And (3) generating condition parameters:
α_device = MLP(v_device)
Dynamic weight adjustment:
W_adjusted = W_base * (1 + α_device * Tanh(v_device))
Where w_base is the base weight and α_device is the device-specific adjustment factor.
By automatically adjusting model parameters according to equipment characteristics, the model can adapt to imaging characteristic differences of equipment of different manufacturers, and the cross-equipment performance attenuation is reduced by 20-35%.
The pre-operation classification model is constructed by adopting self-adaptive learning rate adjustment, which is a cyclic attenuation learning rate strategy based on verification set performance. The input is the performance index (accuracy, loss value) on the verification set, and the output is the dynamically adjusted learning value.
The learning rate adjustment method is as follows:
basic cycle learning rate:
η_t = η_min + 0.5*(η_max - η_min)*(1 + cos(t/T_cycle * π))
Wherein eta_t is a learning rate value in the T training step and is obtained by dynamic calculation through a cosine annealing formula, eta_min is a minimum learning rate value, training stagnation caused by too small learning rate is prevented, eta_max is a maximum learning rate value and can be dynamically adjusted according to performance in the training process, and T_cycle is the training step number of a complete learning rate period and determines the period length of cosine annealing.
And (3) performance monitoring:
If val_loss_t>val_loss_{t-1} * (1 - ε):
patience += 1
Else:
patience = 0
wherein val_loss_t is the loss value on the validation set at the t training step for monitoring model performance, patience is a patience counter, which records how many times the validation set performance is not improved continuously for triggering learning rate adjustment.
And (3) learning rate adjustment:
If patience>patience_threshold:
η_max = η_max * 0.5
patience = 0
by avoiding the training from falling into the local optimum, the upper limit of the learning rate is automatically reduced along with the training, the model convergence is promoted, and the average convergence speed is improved by about 30%.
The loss function of the preoperative classification model adopts a cross entropy loss function and an L2 regularization term:
L_diag = -Σ y_i log(P(y=i)) + λ||θ||^2
Where y_i is the true label, λ θ is the L2 regularization term, preventing overfitting.
2. Construction of postoperative recurrence risk prediction model
The postoperative recurrence risk prediction model is based on a multi-mode comparison learning framework, the second optimization fusion feature is subjected to sequential dependence extraction through a gating circulation unit GRU, the pathological feature is subjected to self-attention screening on a high-risk region, a comparison loss (Contrastive Loss) is adopted to draw a similar sample (recurrence vs. non-recurrence), and a spacing threshold delta=0.5.
The construction of the postoperative recurrence risk prediction model adopts a contrast learning strategy, and utilizes a double-branch contrast learning network to combine sample-to-sample similarity learning so as to learn discriminant features and align the semantics of the expansion mode. The input of the double-branch contrast learning network is the feature F_temporal extracted by the image branch and the feature F_path extracted by the pathological branch, and the output is the semantic alignment representation in the feature space. The process involves the following operations:
positive and negative sample pair construction:
y_ij=1 if samples i, j are either co-recurrent or co-non-recurrent
y_ij = 0 otherwise
And (3) calculating contrast loss:
L_contra = Σ_i Σ_j y_ij max(0, δ - cos(F_temporal^i, F_path^j)) + (1-y_ij) max(0, cos(F_temporal^i, F_path^j) - m)
Wherein δ=0.5 is a positive sample-to-similarity threshold, representing that the distance of the same class of sample features should be less than this value, m=0.2 is a negative sample-to-boundary interval, representing that the distance of different classes of sample features should be greater than this value, cos (,) represents cosine similarity, and y_ij is 1, representing that samples i and j are both recurring or non-recurring.
By promoting the model to learn the distinguishing characteristics of recurrent and non-recurrent cases, the semantic consistency between the image and the pathological mode is maintained, and the recurrence risk AUC is improved by 7-13%.
The construction of the postoperative recurrence risk prediction model adopts a risk scoring mechanism, which is a risk scoring regressor based on optimized fusion characteristics, is input into an image characteristic F_temporal and a pathological characteristic F_path after comparison and learning alignment, and is output into continuous recurrence risk scores (0-1), and the process involves the following operations:
1. Risk score calculation
Risk_score = sigmoid(W_risk·[F_temporal, F_path] + b_risk)
Wherein W_risk is a risk weight matrix, which is a trainable weight matrix for mapping the second optimized fusion feature F_temporal and the pathological feature F_path to the risk space, the dimension of W_risk depends on the dimension of the input feature and the dimension of the risk score, typically (d_combined×1), where d_combined is the dimension after the f_temporal and F_path are spliced, and the weight matrix is automatically learned by back propagation during training, and can be given a higher weight value to the feature more important for relapse prediction, the weight value reflects the contribution degree of different features to relapse risk prediction, b_risk is a risk bias term, is a bias parameter of the risk prediction model, is a scalar value, is used for adjusting the baseline level of the risk score, ensures that the model output has proper bias, is optimized together with W_risk during training, helps the model process the feature distribution imbalance, and can be understood as a default risk prediction value of the model without any feature input. Together, w_risk and b_risk form the linear part of the risk score, and the output is then mapped between 0 and1 by a sigmoid function, representing the probability of a patient's postoperative recurrence.
2. Weighted mixing loss
L_pred = BCE(Risk_score, y_recur) + α·L_contra
Wherein L_pred consists of two parts, a weighted sum of binary cross entropy loss (BCE) and contrast loss (L_ contra), BCE (risk_score, y_ recur) representing the binary cross entropy loss between the recurrence Risk prediction value (risk_score) and the true recurrence label (y_ recur), for directly optimizing the accuracy of the Risk prediction. Where y_ recur ε {0,1} indicates whether the patient relapsed (1 indicates relapsed, 0 indicates not relapsed). Alpha is the weight coefficient of contrast loss, used to balance the contribution of direct predicted loss with characteristic contrast loss. In practical training, α is typically set to between 0.3-0.5, which can be adjusted by validation set performance. L contra is a contrast loss component, as described above, for approximating the characteristic representation of the same class of samples (either both recurrent or both non-recurrent), pushing the characteristic representation of different classes of samples away, thereby enhancing the model's ability to distinguish between recurrent risks.
By means of a risk scoring mechanism, the model can provide finer granularity of risk prediction, rather than just a classification result, with an approximately 25% improvement in clinical decision support.
The construction of the postoperative recurrence risk prediction model adopts cross-equipment adaptability optimization, which is a field adaptation technology and is specifically characterized by field invariant feature learning based on countermeasure training. The input is a feature representation of the source domain data (training device) and the output is a domain-invariant feature representation. The process involves the following operations:
1. domain classifier training
L_domain = -Σ[d_i log(D(f_i)) + (1-d_i)log(1-D(f_i))]
2. Feature extractor training (challenge goal)
L_feature = L_task - λ_d·L_domain
Where D is a domain classifier, d_i is a domain label (0 represents a source domain, 1 represents a target domain), and f_i is a feature representation.
Through countermeasure training, the model learns the characteristic representation which is not influenced by the difference of the equipment, so that the model can reach the performance close to the original equipment without a large amount of data on the new equipment, and the cross-equipment performance retention rate is improved to 85-92%.
The post-operation recurrence risk prediction model is constructed by adopting an early-stop strategy, which is a self-adaptive early-stop mechanism based on verification set performance. The input is the risk prediction performance index (AUC (Area Under the Curve, area under curve), recall@high risk (recall of high risk class, ability of the model to correctly identify high risk class samples) on the validation set, and the output is the training stop signal and the optimal model preservation point. The process involves the following operations:
1. performance monitoring
Metric_t=0.7. AUC+ 0.3 x recall @ high risk
If metric_t>best_metric:
best_metric = metric_t
best_epoch = t
save_model(θ_t)
patience = 0
Else:
patience += 1
The method comprises the steps of calculating a comprehensive performance index of a t training period by using weighted combination of AUC and high risk recall rate, judging whether a current model reaches new optimal performance by best performance index recorded so far, wherein best_metric is a training period number corresponding to the best performance index, and is used for recording a training time point of the optimal model, t is a current training period number and is used for indicating how many epochs the model has been trained, save_model (theta_t) is a model saving operation, saving model parameter theta_t of the current period t as an optimal model check point, patience is a endurance counter, and recording how many continuous periodic performances are not improved and are used for triggering an early stopping mechanism.
2. Early stop determination
If patience>patience_max or t - best_epoch>window_size:
stop_training()
restore_model(best_epoch)
By adopting the early-stop strategy, the over-fitting problem can be effectively avoided, unnecessary training time is reduced, the average training time of the model is reduced by 35%, and meanwhile, the performance of the test set is maintained or improved.
The pre-operative classification model and the recurrence risk prediction model of the present invention perform on the test set (225 cases) as follows:
The accuracy of FTC/FTA classification reaches 91.3%, the sensitivity is 89.6%, and the specificity is 92.5%.
The recurrence risk prediction model has AUC of 0.873 and recurrence rate prediction accuracy of 83.7% in the high risk group.
Through detailed training strategy design and optimization, the invention can maintain stable performance under different medical institutions and equipment environments, and provides reliable support for clinical thyroid cancer diagnosis and risk assessment.
Fig. 7 is a flow chart of an explanatory analysis of the deep learning-based thyroid tumor classification and recurrence risk prediction method of the present invention. The flow chart includes the following parts:
1. Grad-CAM thermodynamic diagram generation:
and (3) inputting image features, namely extracting CEUS features and CT features used in classification and prediction models.
Generating a key region thermodynamic diagram based on Grad-CAM technology, generating a thermodynamic diagram to highlight key regions of interest to the model.
Labeling the dynamic abnormal region, namely highlighting the blood perfusion abnormal region on a CEUS thermodynamic diagram.
Labeling static abnormal areas, namely labeling boundary fuzzy areas or calcified areas of tumors on a CT thermodynamic diagram.
2. SHAP feature contribution analysis:
and inputting fusion characteristics, namely classifying and predicting input characteristic vectors of the model.
Feature importance is calculated by analyzing the contribution of input features to the classification result based on the SHAP technique.
2.1, Quantization characteristic contribution:
CEUS features contribute such as dynamic perfusion rate, time series pattern.
CT features contribute, e.g., boundary ambiguity, anatomical morphology.
Pathological characteristics such as cell morphology and histological structure.
3. Interpretation analysis results show that:
Grad-CAM thermodynamic diagrams are displayed showing key regions in an image at a user interface.
Feature contribution ranking is displayed-the contribution weights of different modality features are quantified and visualized.
An explanatory report is generated by integrating the thermodynamic diagram with the feature contribution rank, and a downloadable report is generated.
Specifically, the Grad-CAM optimization and SHAP feature contribution analysis employed by the interpretive enhancement techniques of the present invention are embodied as follows.
1. Grad-CAM optimization:
the optimization strategy is to superimpose time dimension weights in a CEUS thermodynamic diagram, accurately position an arterial phase perfusion abnormal region, and combine CT images and pathological features (such as tumor boundary ambiguity and nuclear division counting) to realize multi-mode data fusion, so that the definition and reliability of a visual result are improved.
The Grad-CAM is embodied as follows:
For the target class c, the feature map weights α_k≡c are calculated:
α_k^c = (1/Z) Σ_i Σ_jy^c/A_ij^k
Wherein Z is a normalization factor equal to the total number of pixels of the feature map for normalization processing of the weights, and Σ_i Σ_j is a double summation symbol representing accumulation operation of all spatial positions (i, j) of the feature map; y^c/ A_ij≡is the partial derivative of the predicted value of the target class c to the activation value of the kth feature map at the position (i, j), indicating the degree of influence of the position on the classification result.
Time dimension weight integration (for CEUS sequence):
α_k^c(t) = α_k^c·w_t
where w_t is the weight coefficient of time step t, associated with the perfusion phase.
Grad-CAM thermodynamic diagram calculation:
L_Grad-CAM^c = ReLU(Σ_k α_k^c A^k)
The weight alpha_k≡c of all channels and the corresponding feature map A≡k are weighted and summed, and then Grad-CAM thermodynamic diagrams are obtained through a ReLU activation function, which highlights the feature areas important to the target class c.
CEUS-specific thermodynamic diagrams:
L_Grad-CAM^c(CEUS) = ReLU(Σ_k Σ_t α_k^c(t) A^k(t))
On the basis of the Grad-CAM, the alpha_k≡c (t) and the time variation of the characteristic diagram A≡k (t) at different moments t in the time dimension are further considered, so that a thermodynamic diagram which is more in line with the characteristics of the CEUS sequence is obtained and is used for analyzing the characteristic importance distribution of the target category on the time sequence.
Grad-CAM optimization adopted by the invention can accurately position the CEUS arterial phase perfusion abnormal region, and fusion visualization of multi-mode data is realized by combining CT images and pathological features.
2. SHAP feature contribution analysis:
Through SHAP feature contribution analysis, the model can quantify the contribution of various features to the prognosis results, helping to reveal the role of different factors in the prediction. The SHAP value assigns an importance score to each feature, and by measuring the influence of each feature on the predicted outcome, a clinician can intuitively understand how factors such as CEUS peak flow rate, boundary ambiguity in CT images, pathological nuclear division count, etc. influence prognosis prediction.
The specific implementation process of SHAP feature contribution analysis is as follows:
SHAP value calculation for feature i:
φ_i = Σ_{S⊆N\{i}} |S|!(|N|-|S|-1)!/|N|! [f_x(S∪{i}) - f_x(S)]
where N is the set of all features, S is the subset that does not contain feature i, and f_x (S) is the prediction of sample x when the model uses feature set S.
Inter-modality feature interaction SHAP values:
φ_ij = Σ_{S⊆N\{i,j}} |S|!(|N|-|S|-2)!/2|N|! [f_x(S∪{i,j}) - f_x(S∪{i}) - f_x(S∪{j}) + f_x(S)]
feature importance quantification:
Importance(i) = Σ_x |φ_i(x)|
modality importance quantification:
Importance(CEUS) = Σ_{i∈CEUS_features} Importance(i)
Importance(CT) = Σ_{i∈CT_features} Importance(i)
Importance(Pathology) = Σ_{i∈Pathology_features} Importance(i)
The SHAP characteristic contribution analysis adopted by the invention can quantify the contribution of each characteristic to the prognosis result and reveal the effect of different factors in prediction.
In summary, the invention combines the dynamic micro blood flow characteristics of CEUS and the high-resolution anatomical information of enhanced CT through a deep learning method, establishes a preoperative classification model based on multi-modal image data, comprehensively improves the sensitivity and specificity of the preoperative classification of FTC, adopts a self-adaptive weight dynamic adjustment mechanism, realizes the optimized combination of multi-modal image characteristics through a deep self-attention network, fully excavates the complementary information of each modal characteristic, utilizes the joint analysis of images and pathological data to construct a recurrence risk prediction model based on molecular and cell level characteristics, and provides an accurate individuation management strategy for postoperative high-risk patients.
Furthermore, the invention also provides electronic equipment, which mainly comprises a processor and a memory, wherein a computer program is stored in the memory, and the computer program can cause the processor to execute the thyroid tumor classification and recurrence risk prediction method based on deep learning.
The memory may be random access memory, read only memory, nonvolatile, programmable ROM, erasable PROM, electrically erasable, flash memory, optical memory, registers, and so forth.
The processor may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (DIGITAL SIGNAL Processing, DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a Solid state disk (Solid STATEDISK, SSD), etc.