TWI863845B - Method and system for auxiliary medical diagnosis for gastritis - Google Patents
Method and system for auxiliary medical diagnosis for gastritis Download PDFInfo
- Publication number
- TWI863845B TWI863845B TW113109840A TW113109840A TWI863845B TW I863845 B TWI863845 B TW I863845B TW 113109840 A TW113109840 A TW 113109840A TW 113109840 A TW113109840 A TW 113109840A TW I863845 B TWI863845 B TW I863845B
- Authority
- TW
- Taiwan
- Prior art keywords
- image
- model
- gastric
- data set
- module
- Prior art date
Links
Landscapes
- Image Analysis (AREA)
Abstract
Description
本發明是關於醫學圖像辨識的技術,尤其是關於一種利用人工智慧診斷胃癌前病變及幽門螺旋桿菌感染的系統與方法。 The present invention relates to medical image recognition technology, and in particular to a system and method for diagnosing gastric precancerous lesions and Helicobacter pylori infection using artificial intelligence.
胃癌是一個重大的全球健康問題。單以2020年統計為例,總共有超過100萬例新病例和約769,000例死亡。就發病率而言,胃癌被列為第五大常見癌症,也是全球癌症死亡的第四大原因。胃癌的主要原因是幽門螺旋桿菌感染,而這種感染可以透過短期抗生素治療來預防。然而,即使上述感染已被成功根除,個體仍有可能出現癌前病變,如萎縮性胃炎和腸上皮化生。因此,患者們需要持續接受內視鏡的監測,以便及早發現和治療。一般而言,組織學評估被公認為對診斷胃癌前病變及幽門螺旋桿菌感染,讓胃癌風險能進行分層的最有效方法之一。然而,該方法的瓶頸在於耗時、對專家的需求、以及胃鏡切片理想數量及位置的不確定性。 Gastric cancer is a major global health problem. In 2020 alone, there were more than 1 million new cases and approximately 769,000 deaths. In terms of incidence, gastric cancer is ranked as the fifth most common cancer and the fourth leading cause of cancer death worldwide. The main cause of gastric cancer is Helicobacter pylori infection, which can be prevented with short-term antibiotic treatment. However, even if the above infection has been successfully eradicated, individuals may still develop precancerous lesions such as atrophic gastritis and intestinal metaplasia. Therefore, patients need to continue to receive endoscopic surveillance for early detection and treatment. In general, histological evaluation is recognized as one of the most effective methods for diagnosing gastric precancerous lesions and Helicobacter pylori infection, allowing for stratification of gastric cancer risk. However, the bottleneck of this method is that it is time-consuming, requires experts, and has uncertainties in the ideal number and location of gastroscopic slices.
根據統計,臺灣近海的馬祖是胃癌高風險地區。為了消除當地人民的胃癌威脅,有關單位已經開展了近二十年的大規模幽門螺旋桿菌根除計劃。儘管上述計畫已經降低大約50%的胃癌發病率,當地仍然持續出現零星的胃癌病例。一般認為,倘能進一步根據個體風險進行適當人群分層,應可獲得更佳疾病防治效益。因此,一種結合現代電腦運算技術的醫療輔助系統,是有待開發的。 According to statistics, Matsu, located off the coast of Taiwan, is a high-risk area for gastric cancer. In order to eliminate the threat of gastric cancer to local people, relevant units have been carrying out a large-scale Helicobacter pylori eradication program for nearly 20 years. Although the above program has reduced the incidence of gastric cancer by about 50%, sporadic cases of gastric cancer continue to occur in the area. It is generally believed that if appropriate population stratification can be further carried out according to individual risks, better disease prevention and control benefits should be obtained. Therefore, a medical assistance system combined with modern computer computing technology is to be developed.
有鑒於此,本發明提出一種基於人工智慧(Artificial Intelligence;AI)的醫療輔助診斷系統,以協助前線醫生在現實世界中診斷癌前胃病變及幽門螺旋桿菌感染。 In view of this, the present invention proposes a medical auxiliary diagnosis system based on artificial intelligence (AI) to assist frontline doctors in diagnosing precancerous gastric lesions and Helicobacter pylori infection in the real world.
在一實施例中,醫療輔助診斷系統包含採樣系統、用戶裝置和影像分析系統。採樣系統用於採集並儲存醫學影像,其中儲存有一第一數據集和一第二數據集。用戶裝置連接該採樣系統,設置為可從該採樣系統讀取醫學影像,或上傳醫學影像。影像分析系統連接該採樣系統和該用戶裝置,設置為可依據用戶裝置的要求而分析醫學影像,據以產生可輔助診斷的輔助影像。該影像分析系統從該採樣系統中讀取該第一數據集和該第二數據集以進行一深度學習。該影像分析系統依據該深度學習的結果,對影像分析系統提供的一待測圖像進行推測,以產生可輔助辨識癌前病變及幽門螺旋桿菌感染的一輔助診斷影像。該第一數據集來自不特定地區,具有一平均發病率。該第二數據集來自一特定地區,發病率高於該平均發病率。該第一數據集及該第二數據集包含上消化道內視鏡圖像。 In one embodiment, a medical auxiliary diagnosis system includes a sampling system, a user device, and an image analysis system. The sampling system is used to collect and store medical images, wherein a first data set and a second data set are stored. The user device is connected to the sampling system and is configured to read medical images from the sampling system or upload medical images. The image analysis system is connected to the sampling system and the user device and is configured to analyze medical images according to the requirements of the user device to generate auxiliary images that can assist in diagnosis. The image analysis system reads the first data set and the second data set from the sampling system to perform deep learning. The image analysis system makes inferences on a test image provided by the image analysis system based on the result of the deep learning to generate an auxiliary diagnostic image that can assist in identifying precancerous lesions and Helicobacter pylori infection. The first data set comes from an unspecified region and has an average incidence rate. The second data set comes from a specific region and has an incidence rate higher than the average incidence rate. The first data set and the second data set include upper gastrointestinal endoscopic images.
在一實施例中,該影像分析系統包含訓練模組,第一模型,第二模型和第三模型。訓練模組依據該第一數據集和該第二數據集而進行該深度學習。第一模型受到該訓練模組的訓練而可用於辨識一樣本圖像中的胃部區域和非胃區域。第二模型受到該訓練模組的訓練而可將該樣本圖像中辨識出的胃部區域,進一步分類為胃竇、胃體、以及胃底。第三模型受到該訓練模組的訓練而可判斷該樣本圖像中的胃竇和胃體中是否存在一癌前病變或幽門螺旋桿菌感染。該樣本圖像是上消化道內視鏡圖像,來自該第一數據集或該第二數據集。該癌前病變包含萎縮性胃炎或腸上皮化生。 In one embodiment, the image analysis system includes a training module, a first model, a second model, and a third model. The training module performs the deep learning based on the first data set and the second data set. The first model is trained by the training module and can be used to identify the gastric area and the non-gastric area in a sample image. The second model is trained by the training module and can further classify the gastric area identified in the sample image into the gastric sinus, the gastric body, and the gastric fundus. The third model is trained by the training module and can determine whether there is a precancerous lesion or Helicobacter pylori infection in the gastric sinus and the gastric body in the sample image. The sample image is an upper gastrointestinal tract endoscopic image from the first data set or the second data set. The precancerous lesions include atrophic gastritis or intestinal metaplasia.
進一步地,該影像分析系統可包含一預處理模組,設置為可將一輸入圖像進行預處理以產生一正規化圖像。當該預處理模組對該輸入圖像進行預處理時,該預處理模組將該輸入圖像轉換為一灰階圖像,並利用大津法閾值將該灰階圖像轉換為一二元地圖。該預處理模組對該二元地圖進行邊緣偵測以識別該輸入圖像中的一目標區域,並對該輸入圖像進行裁切,得到只保留該目標區域的一裁切圖像。該預處理模組將該裁切圖像大小調整至一預設維度,產生對應該輸入圖像的該正規化圖像。該訓練模組對該第一數據集和該第二數 據集進行該深度學習前,利用該預處理模組將該第一數據集和該第二數據集中的圖像正規化。 Furthermore, the image analysis system may include a preprocessing module configured to preprocess an input image to generate a normalized image. When the preprocessing module preprocesses the input image, the preprocessing module converts the input image into a grayscale image, and converts the grayscale image into a binary map using an Otsu threshold. The preprocessing module performs edge detection on the binary map to identify a target area in the input image, and crops the input image to obtain a cropped image that only retains the target area. The preprocessing module resizes the cropped image to a preset dimension to generate the normalized image corresponding to the input image. Before the training module performs the deep learning on the first dataset and the second dataset, the preprocessing module is used to normalize the images in the first dataset and the second dataset.
進一步地,當該預處理模組對該輸入圖像進行預處理時,該預處理模組可對該輸入圖像進行對比度有限自適應直方圖等化(Contrast Limited Adaptive Histogram Equalization;CLAHE)運算,以增強圖像細節,突顯胃黏膜特徵。 Furthermore, when the preprocessing module preprocesses the input image, the preprocessing module may perform a contrast limited adaptive histogram equalization (CLAHE) operation on the input image to enhance image details and highlight gastric mucosal features.
在進一步的實施例中,該影像分析系統包含推測模組,設置為可利用第一模型、第二模型、以及第三模型,對該待測圖像進行推測,以判定胃癌風險。該推測模組利用該第一模型辨識該待測圖像中的胃部區域和非胃區域。該推測模組利用該第二模型將該待測圖像中的胃部區域,進一步分類為胃竇、胃體、以及胃底。該推測模組利用該第三模型判斷該待測圖像中的胃竇和胃體中是否存在該癌前病變。最後,該推測模組利用梯度加權類激活映射(Gradient-weighted Class Activation Mapping;Grad-CAM)運算,將該待測圖像依據患病機率疊加一視覺效果,而產生該輔助診斷影像。其中,該待測圖像為上消化道內視鏡圖像,由採樣系統採集並儲存,並透過用戶裝置提供給影像分析系統而使推測模組進行分析。該推測模組在分析該待測圖像前,也可透過該預處理模組將該待測圖像正規化。 In a further embodiment, the image analysis system includes an inference module, which is configured to use a first model, a second model, and a third model to infer the image to be tested to determine the risk of gastric cancer. The inference module uses the first model to identify the gastric area and the non-gastric area in the image to be tested. The inference module uses the second model to further classify the gastric area in the image to be tested into the gastric sinus, the gastric body, and the gastric fundus. The inference module uses the third model to determine whether the precancerous lesion exists in the gastric sinus and the gastric body in the image to be tested. Finally, the inference module uses a gradient-weighted class activation mapping (Grad-CAM) operation to superimpose a visual effect on the image to be tested according to the probability of disease, thereby generating the auxiliary diagnostic image. The image to be tested is an upper gastrointestinal endoscopic image, which is collected and stored by the sampling system and provided to the image analysis system through the user device so that the inference module can perform analysis. Before analyzing the image to be tested, the inference module can also normalize the image to be tested through the preprocessing module.
在另一實施例中,第一模型可以是由密集網路DenseNet201為模型樣板,經過該訓練模組微調參數所訓練而成。當該訓練模組訓練該第一模型時,從該採樣系統讀取該第一數據集和該第二數據集,並將其中的70%分配為訓練集,20%分配為驗證集,而10%分配為測試集。 In another embodiment, the first model can be trained by using the dense network DenseNet201 as a model template and fine-tuning parameters by the training module. When the training module trains the first model, the first data set and the second data set are read from the sampling system, and 70% of them are allocated as a training set, 20% are allocated as a validation set, and 10% are allocated as a test set.
在另一實施例,第二模型是由密集網路DenseNet121為模型樣板,經過該訓練模組微調參數所訓練而成,包含特徵提取模組和分類模組。特徵提取模組包含多層類神經網路,設置為可透過卷積運算從輸入數據中提取一特徵圖。分類模組包含一或多個全連接層,耦接該特徵提取模組的輸出端,設置為可依據該特徵提取模組產生的該特徵圖,計算該特徵提取模組的該輸入數據屬於胃底、胃竇或胃體的機率值。其中,該訓練模組訓練該第二模型時,從採樣系統讀取該第一數據集和該第二數據集,並將取該第一數據集中的80%分配為訓練集,20%分配為驗證集,而該第二數據集中的10%分配為測試集。最 後該分類模組採用SoftMax激活函數計算該特徵提取模組的輸入數據屬於胃底、胃竇或胃體的機率值。 In another embodiment, the second model is formed by using a dense network DenseNet121 as a model template and fine-tuning parameters of the training module, and includes a feature extraction module and a classification module. The feature extraction module includes a multi-layer neural network, which is configured to extract a feature map from the input data through a convolution operation. The classification module includes one or more fully connected layers, coupled to the output end of the feature extraction module, and configured to calculate the probability value of the input data of the feature extraction module belonging to the gastric fundus, gastric sinus or gastric body based on the feature map generated by the feature extraction module. When the training module trains the second model, it reads the first data set and the second data set from the sampling system, and allocates 80% of the first data set as the training set, 20% as the validation set, and 10% of the second data set as the test set. Finally, the classification module uses the SoftMax activation function to calculate the probability value of the input data of the feature extraction module belonging to the gastric fundus, gastric sinus or gastric body.
在另一實施例中,第三模型是由視覺轉換器(Vision Transformer)為模型樣板,經過該訓練模組微調參數所訓練而成。視覺轉換器中的平移補丁層可將一輸入圖像向四個對角線平移而產生四種位移圖像,再與該輸入圖像堆疊而形成一平移堆疊圖。圖像切割層耦接該平移補丁層,將該平移堆疊圖切成多個補丁(patch),展平成一維序列,其中每一個補丁對應一個固定長度的特徵向量。位置嵌入層耦接該圖像切割層,接收該圖像切割層輸出的多個補丁後,為每一補丁加上位置資訊。多層編碼器耦接該位置嵌入層,對所輸入的多個補丁之間進行自關注運算,以提取輸入圖像所對應的一特徵向量矩陣。多層感知頭耦接該多層編碼器,包含多層全連接層,設置為可將該特徵向量矩陣池化,以得到可代表整個輸入圖像的機率分布矩陣。該多層編碼器還為每一補丁與相鄰補丁進行局部自關注運算,使該特徵向量矩陣強調局部相關性。該多層感知頭採用Sigmoid激活函數計算該機率分布矩陣,其中該機率分布矩陣中的每一元素分別代表一種病理嚴重程度。 In another embodiment, the third model is trained by fine-tuning parameters of the training module using the Vision Transformer as a model template. The translation patch layer in the Vision Transformer can translate an input image to four diagonals to generate four displacement images, which are then stacked with the input image to form a translation stack map. The image cutting layer is coupled to the translation patch layer, cuts the translation stack map into multiple patches, and flattens them into a one-dimensional sequence, where each patch corresponds to a feature vector of a fixed length. The position embedding layer is coupled to the image cutting layer, receives multiple patches output by the image cutting layer, and adds position information to each patch. The multi-layer encoder is coupled to the position embedding layer, and performs self-attention operation on the input multiple patches to extract a feature vector matrix corresponding to the input image. The multi-layer perception head is coupled to the multi-layer encoder, and includes a multi-layer fully connected layer, which is configured to pool the feature vector matrix to obtain a probability distribution matrix that can represent the entire input image. The multi-layer encoder also performs local self-attention operation for each patch and its neighboring patches, so that the feature vector matrix emphasizes local correlation. The multi-layer perception head uses a Sigmoid activation function to calculate the probability distribution matrix, wherein each element in the probability distribution matrix represents a pathological severity.
在另一實施例中,本發明提出一種醫療輔助診斷胃炎的方法。首先提供一第一數據集和一第二數據集,其中該第一數據集及該第二數據集中包含上消化道內視鏡圖像。本方法先對該第一數據集和該第二數據集進行圖像預處理,再對該第一數據集和該第二數據集進行一深度學習。最後依據該深度學習的結果,對一待測圖像進行推測,以產生一輔助影像,用於輔助辨識待測圖像中是否包含一癌前病變或幽門螺旋桿菌感染。該第一數據集來自不特定地區,具有一平均發病率。該第二數據集來自一特定地區,發病率高於該平均發病率。 In another embodiment, the present invention proposes a method for medically assisting the diagnosis of gastritis. First, a first data set and a second data set are provided, wherein the first data set and the second data set contain upper gastrointestinal endoscopic images. The method first performs image preprocessing on the first data set and the second data set, and then performs a deep learning on the first data set and the second data set. Finally, based on the result of the deep learning, an image to be tested is inferred to generate an auxiliary image for assisting in identifying whether the image to be tested contains a precancerous lesion or Helicobacter pylori infection. The first data set comes from an unspecified region and has an average morbidity. The second data set comes from a specific region and has a morbidity higher than the average morbidity.
綜上所述,本發明所提的醫療圖像分析系統,特色在於,可對內視鏡圖像進行病理學辨識,產生病灶熱圖,輔助專家學者提高診斷準確性,減少誤診率,優化後續治療資源的分配。 In summary, the medical image analysis system mentioned in the present invention is characterized in that it can perform pathological identification on endoscopic images, generate lesion heat maps, and assist experts and scholars to improve diagnostic accuracy, reduce misdiagnosis rates, and optimize the allocation of subsequent treatment resources.
100:醫療輔助診斷系統 100: Medical Assisted Diagnosis System
110:採樣系統 110: Sampling system
112:影像獲取裝置 112: Image acquisition device
114:影像服務器 114: Image Server
120:用戶裝置 120: User device
130:影像分析系統 130: Image analysis system
131:第一模型 131: First Model
132:第二模型 132: Second Model
133:第三模型 133: The third model
134:預處理模組 134: Preprocessing module
136:訓練模組 136: Training module
138:推測模組 138: Inference module
200:輔助醫療診斷流程 200: Assist medical diagnosis process
201-209:步驟 201-209: Steps
300:圖像預處理流程 300: Image preprocessing process
301-309:步驟 301-309: Steps
400:圖像推測流程 400: Image estimation process
401-407:步驟 401-407: Steps
500:視覺轉換器 500: Visual Converter
501:特徵提取模組 501: Feature extraction module
502:分類模組 502: Classification module
$IN:輸入圖像 $IN: Input image
$OUT:輸出結果 $OUT: output result
510:卷積模組 510: Convolution module
512:最大池化層 512: Max pooling layer
514:第一密集塊 514: The first dense block
520:卷積模組 520: Volume module
522:平均池化層 522: Average pooling layer
524:第二密集塊 524: The second dense block
530:卷積模組 530: Volume module
532:平均池化層 532: Average pooling layer
534:第三密集塊 534: The third dense block
540:卷積模組 540: Volume module
542:平均池化層 542: Average pooling layer
544:第四密集塊 544: The fourth dense block
550:全局池化層 550: Global pooling layer
552:第一全連接層 552: First fully connected layer
554:第二全連接層 554: Second fully connected layer
600:視覺轉換器模型 600: Visual converter model
602:輸入圖像 602: Input image
604:機率分布矩陣 604:Probability distribution matrix
610:平移補丁層 610:Translate patch layer
620:圖像切割層 620: Image cutting layer
630:位置嵌入層 630: Position embedding layer
640:多層編碼器 640:Multi-layer encoder
650:多層感知頭 650: Multi-layer sensing head
710:原始胃竇圖像 710: Original gastric sinus image
712:增強胃竇圖像 712: Enhanced gastric sinus imaging
720:原始胃體圖像 720: Original gastric image
722:增強胃體圖像 722: Enhanced gastric image
730:原始胃底圖像 730: Original gastric fundus image
732:增強胃底圖像 732: Enhanced gastric fundus image
810:原始圖像 810: Original image
820:熱圖疊加圖 820: Heatmap overlay
此處所說明的圖式用來提供對本發明的進一步理解,構成本發明的一部分,本發明的示意性實施例及其說明用於解釋本發明,並不構成對本發明的不當限定。在圖式中:圖1是本發明實施例的醫療輔助診斷系統;圖2是本發明實施例的輔助醫療診斷流程圖;圖3是本發明實施例的圖像預處理流程圖;圖4是本發明實施例的圖像推測流程圖;圖5是本發明實施例的視覺轉換器;圖6是本發明申請第三模型的訓練流程實施例;圖7是本發明實施例的影像增強結果;以及圖8是本發明實施例的熱圖疊加結果。 The diagrams described herein are used to provide a further understanding of the present invention and constitute a part of the present invention. The schematic embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation on the present invention. In the diagrams: FIG. 1 is a medical auxiliary diagnosis system of an embodiment of the present invention; FIG. 2 is a flowchart of auxiliary medical diagnosis of an embodiment of the present invention; FIG. 3 is a flowchart of image preprocessing of an embodiment of the present invention; FIG. 4 is a flowchart of image inference of an embodiment of the present invention; FIG. 5 is a visual converter of an embodiment of the present invention; FIG. 6 is an embodiment of the training process of the third model of the present invention; FIG. 7 is an image enhancement result of an embodiment of the present invention; and FIG. 8 is a heat map superposition result of an embodiment of the present invention.
以下將結合本發明實施例中的圖式,對本發明實施例中的技術方案進行清楚、完整地描述,顯然,所描述的實施例是本發明一部分實施例,而不是全部的實施例。基於本發明中的實施例,本領域普通技術人員在沒有作出創造性勞動前提下所獲得的所有其他實施例,都屬於本發明保護的範圍。 The following will combine the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative labor are within the scope of protection of the present invention.
萎縮性胃炎和腸上皮化生是與胃癌風險相關的組織學症狀,屬於癌前病變。本發明的實施例提供了一種醫療輔助診斷系統100,利用常規收集的內視鏡圖像進行分析,輔助專家學者診斷胃部癌前病變。以下以圖1說明本發明實施例的基礎架構。 Atrophic gastritis and intestinal metaplasia are histological symptoms associated with gastric cancer risk and are precancerous lesions. The embodiment of the present invention provides a medical auxiliary diagnosis system 100, which uses routinely collected endoscopic images for analysis to assist experts in diagnosing gastric precancerous lesions. The basic structure of the embodiment of the present invention is illustrated in Figure 1 below.
圖1是本發明實施例的醫療輔助診斷系統100,可將組織學分析結果代入人工智慧演算法,產生可輔助診斷胃內癌前病變的視覺圖像。 FIG1 is a medical auxiliary diagnosis system 100 of an embodiment of the present invention, which can substitute the results of histological analysis into an artificial intelligence algorithm to generate a visual image that can assist in the diagnosis of precancerous lesions in the stomach.
在本實施例的醫療輔助診斷系統100中,採用了兩種不同屬性的數據來源。第一個數據來源是具有全國平均發病率的大型醫學中心數據,另一個數據來源是具有高發病率的特定地區醫院數據。本實施例所述的數據具體內容即為胃內視鏡圖像。這些胃內視鏡圖像經過特定的預處理之後,再透過深度學習的演算法處理,可模擬人類專家進行內視鏡活檢和組織學評估的過程。本發明的醫療輔助診斷系統100可應用在具有高發病率的特定醫院,輔助實施 端到端遠端醫療服務。本發明的醫療輔助診斷系統100在實作上已驗證了系統可預測性的魯棒性,並在NCT05762991 ClinicalTrials.gov註冊在案。 In the medical-assisted diagnosis system 100 of this embodiment, two data sources with different attributes are used. The first data source is the data of a large medical center with a national average incidence rate, and the other data source is the data of a specific regional hospital with a high incidence rate. The specific content of the data described in this embodiment is gastric endoscopic images. After these gastric endoscopic images are processed by a deep learning algorithm after specific preprocessing, the process of endoscopic biopsy and histological evaluation by human experts can be simulated. The medical-assisted diagnosis system 100 of the present invention can be applied to specific hospitals with a high incidence rate to assist in the implementation of end-to-end remote medical services. The medical auxiliary diagnosis system 100 of the present invention has verified the robustness of system predictability in practice and has been registered at ClinicalTrials.gov NCT05762991.
本發明的醫療輔助診斷系統100應用了人工智慧模型來輔助辨識胃內視鏡圖像。本發明應用人工智慧模型進行訓練的流程,大致包括以下步驟。數據收集:收集用於訓練模型的數據,這些數據可能是圖像、文本、音頻等形式,具體取决於模型要解决的問題。數據預處理:對收集的數據進行清洗、標準化(正規化)、特徵提取等預處理操作,以確保數據的質量和適用性。數據分割:將數據分成訓練集、驗證集和測試集。訓練集用於訓練模型,驗證集用於調整模型的超參數和評估模型的性能,測試集用於最終評估模型的泛化能力。模型樣板選擇:選擇適合解决問題的模型樣板,例如深度神經網路、决策樹、支持向量機等。模型訓練(Training):使用訓練集數據來訓練模型,透過優化算法(如梯度下降)來調整模型參數,使模型能够最小化損失函數。模型驗證(Validation):在訓練過程中,或每個訓練週期(epoch)結束時,使用驗證集數據來評估模型的性能,以指導模型訓練的方向和調整模型的超參數。驗證集可以幫助監測模型是否出現過擬合(Overfitting)或欠擬合,並根據監測結果調整模型的超參數,例如學習率、正則化參數等。驗證結果通常不會直接用於模型的參數更新,而是作為指導性的評估,以確保模型能够在未見過的數據上表現良好。超參數調優:透過嘗試不同的超參數組合來優化模型的性能,例如學習率、批量大小、層數等。模型評估:當模型完成訓練後,使用測試集數據來評估模型的性能,以確定模型的泛化能力和實際應用的效果。測試集數據通常採用已知答案的數據集,便於比對機器學習的辨識結果並統計正確率。模型部署:將訓練好的模型部署到實際應用中,以解决實際問題。 The medical auxiliary diagnosis system 100 of the present invention applies an artificial intelligence model to assist in the identification of gastric endoscopic images. The process of applying the artificial intelligence model for training in the present invention generally includes the following steps. Data collection: Collect data for training the model. These data may be in the form of images, text, audio, etc., depending on the problem to be solved by the model. Data preprocessing: Perform preprocessing operations such as cleaning, standardization (normalization), feature extraction, etc. on the collected data to ensure the quality and applicability of the data. Data segmentation: Divide the data into a training set, a validation set, and a test set. The training set is used to train the model, the validation set is used to adjust the model's hyperparameters and evaluate the model's performance, and the test set is used to finally evaluate the model's generalization ability. Model template selection: Select a model template suitable for solving the problem, such as deep neural network, decision tree, support vector machine, etc. Model training: Use the training set data to train the model, and adjust the model parameters through optimization algorithms (such as gradient descent) so that the model can minimize the loss function. Model validation: During the training process, or at the end of each training cycle (epoch), use the validation set data to evaluate the performance of the model to guide the direction of model training and adjust the model's hyperparameters. The validation set can help monitor whether the model is overfitting or underfitting, and adjust the model's hyperparameters, such as learning rate, regularization parameter, etc., based on the monitoring results. The validation results are usually not used directly to update the model's parameters, but are used as a guiding evaluation to ensure that the model can perform well on unseen data. Hyperparameter tuning: Optimize the performance of the model by trying different hyperparameter combinations, such as learning rate, batch size, number of layers, etc. Model evaluation: After the model is trained, use the test set data to evaluate the model's performance to determine the model's generalization ability and the effect of practical application. The test set data usually uses a data set with known answers to facilitate comparison of machine learning recognition results and statistics of accuracy. Model deployment: Deploy the trained model to actual applications to solve practical problems.
在數據收集方面,本發明實施例的醫療輔助診斷系統100採用的第一數據集來源可以是國立臺灣大學醫院醫療系統的集中研究資料庫。該資料庫包括位於臺灣本島的十家醫院。為了滿足真實世界證據的要求,自2006年以來,該等醫院的所有電子病歷,如病歷記錄、實驗室數據、檢查報告、病理結果和醫學圖像,都已匿名傳輸到稱為NTUH-iMD的綜合醫學資料庫。在本實施例中,第一數據集包含患者的內視鏡圖像,以及胃竇、胃體和胃黏膜的 組織學檢驗結果。第一數據集為醫療輔助診斷系統100進行模型訓練的主要依據。 In terms of data collection, the first data set source used by the medical auxiliary diagnosis system 100 of the embodiment of the present invention can be the centralized research database of the National Taiwan University Hospital Medical System. The database includes ten hospitals located on the main island of Taiwan. In order to meet the requirements of real-world evidence, since 2006, all electronic medical records of these hospitals, such as medical records, laboratory data, examination reports, pathological results, and medical images, have been anonymously transmitted to a comprehensive medical database called NTUH-iMD. In this embodiment, the first data set includes endoscopic images of patients, as well as histological examination results of the gastric sinus, gastric body, and gastric mucosa. The first dataset is the main basis for model training of the medical auxiliary diagnosis system 100.
另一方面,第二數據集可以是來自馬祖群島的地方醫院。統計數據指出當地居民的胃癌發病率很高。有關單位自2004年以來,邀請當地30歲以上的居民參加幽門螺旋桿菌的篩檢計畫,並為參與者提供了內視鏡檢查。這些檢查建立了潛在病理病變及幽門螺旋桿菌感染數據庫,而有關單位再透過臨床試驗方法論從這些數據庫出統計出癌前胃病變的患病率和嚴重程度及幽門螺旋桿菌感染有無,建立第二數據集。在本實施例中,第二數據集可為從第一數據集建立的模型進行驗證和測試。 On the other hand, the second data set can be from a local hospital in the Matsu Islands. Statistics show that the incidence of gastric cancer among local residents is very high. Since 2004, the relevant units have invited local residents over 30 years old to participate in the Helicobacter pylori screening program and provided endoscopic examinations to participants. These examinations established a database of potential pathological lesions and Helicobacter pylori infection, and the relevant units then used clinical trial methodology to statistically calculate the prevalence and severity of precancerous gastric lesions and the presence or absence of Helicobacter pylori infection from these databases to establish a second data set. In this embodiment, the second data set can be used to verify and test the model established from the first data set.
綜上所述,第一數據集的樣本來源,範圍廣泛涵蓋全國總人口,其中的發病率數值為全民平均值。而第二數據集的樣本來源限定於高發病率的地居,因此適合在人工智慧訓練流程中,為第一數據集的辨識結果進行驗證和測試。在本實施例中,第一數據集和第二數據集中的圖像,主要包含了胃體、胃竇、和胃底的胃內視鏡圖像。 In summary, the sample source of the first dataset covers a wide range of the total population of the country, and the morbidity value is the average value of the whole population. The sample source of the second dataset is limited to the places with high morbidity, so it is suitable for verifying and testing the recognition results of the first dataset in the artificial intelligence training process. In this embodiment, the images in the first dataset and the second dataset mainly include gastric endoscopic images of the gastric body, gastric sinus, and gastric fundus.
為了建立標準化的組織學數據,供醫療輔助診斷系統100進行後續訓練分析,上述第一數據和第二數據的活檢樣本採集方法可仿照雪梨方案(Sydney Protocol)進行改良。雪梨方案是一種對胃黏膜進行組織活檢的標準化方案,最初是由澳大利亞的一組胃病專家制定,用於將胃黏膜活檢的位置和處理方法標準化,以提高診斷的準確性和一致性。在本實施例中,胃黏膜活檢標本可從胃竇(於胃大彎及胃小彎)和胃體(於胃大彎及胃小彎)獲得。藉由標準化的採樣程序,專家學者們可獲取條件一致的胃黏膜活檢樣本,換句話說,所有受試者的抽樣位置都是一致的。專家學者不需要知悉參與者的臨床狀態,即可客觀進行組織學評估,進而提高統計數據的可靠性。在本實施例中,病理標本的分類包含急性炎症(多形核浸潤)、慢性炎症(淋巴漿細胞浸潤)、萎縮性胃炎(腺體組織丟失和纖維置換)或腸上皮化生(存在杯狀細胞和吸收細胞),而每個分類的嚴重級別定義為無、輕度、中度或顯著。依據上述分類,癌前病變的嚴重程度可採用萎縮性胃炎評估手術連結(Operative Link for Gastritis Assessment of Atrophic Gastritis;OLGA)和胃炎評估腸道化生手術連 結(Operative Link for Gastritis Assessment of Intestinal Metaplasia;OLGIM)做為期別判定標準,範圍從0期到4期。 In order to establish standardized histological data for subsequent training analysis by the medical auxiliary diagnosis system 100, the biopsy sample collection method of the first data and the second data can be improved according to the Sydney Protocol. The Sydney Protocol is a standardized protocol for tissue biopsy of gastric mucosa, which was originally developed by a group of gastric disease experts in Australia to standardize the location and processing methods of gastric mucosal biopsy to improve the accuracy and consistency of diagnosis. In this embodiment, gastric mucosal biopsy specimens can be obtained from the gastric sinus (at the greater and lesser curvatures of the stomach) and the gastric body (at the greater and lesser curvatures of the stomach). By standardizing the sampling procedure, experts can obtain gastric mucosal biopsy samples with consistent conditions. In other words, the sampling location is consistent for all subjects. Experts do not need to know the clinical status of the participants to objectively perform histological evaluation, thereby improving the reliability of statistical data. In this embodiment, the classification of pathological specimens includes acute inflammation (polymorphonuclear infiltration), chronic inflammation (lymphoplasmic cell infiltration), atrophic gastritis (glandular tissue loss and fibrous replacement) or intestinal metaplasia (presence of goblet cells and absorptive cells), and the severity of each classification is defined as none, mild, moderate or significant. According to the above classification, the severity of precancerous lesions can be determined by the Operative Link for Gastritis Assessment of Atrophic Gastritis (OLGA) and the Operative Link for Gastritis Assessment of Intestinal Metaplasia (OLGIM) as staging criteria, ranging from stage 0 to stage 4.
在系統架構上,本發明的醫療輔助診斷系統100包含採樣系統110、用戶裝置120和影像分析系統130,彼此之間以網路聯繫。 In terms of system architecture, the medical auxiliary diagnosis system 100 of the present invention includes a sampling system 110, a user device 120 and an image analysis system 130, which are connected to each other via a network.
在本實施例中的採樣系統110,泛指具有檢測儀器的醫院或診所,包含影像獲取裝置112和影像服務器114。影像獲取裝置112泛指X光機、斷層掃描、內視鏡檢查、或各種可產生患者檢驗數據的儀器設備。影像服務器114是圖片存檔和通信系統(Picture Archiving and Communication System;PACS)。在醫界,醫學圖像以醫學數位成像和通信(Digital Imaging and Communications in Medicine;DICOM)格式傳輸。故影像獲取裝置112從患者身上收集到的各種資訊,以DICOM格式集中儲存在影像服務器114中。用戶裝置120可以是醫院內的門診電腦,或是醫療工作人員持有的行動裝置,可透過特定程式介面和存取機制,查看儲存在影像服務器114中的資料。需要理解的是,採樣系統110的數量可以是多數,分布在全國各地醫院,不限定為大型醫學中心或偏鄉地區診所。採樣系統110、用戶裝置120和影像分析系統130,彼此之間的網路聯繫並不限定為有線網路或無線網路。 The sampling system 110 in this embodiment generally refers to a hospital or clinic with detection instruments, including an image acquisition device 112 and an image server 114. The image acquisition device 112 generally refers to an X-ray machine, a tomography scan, an endoscopy, or various instruments and equipment that can generate patient examination data. The image server 114 is a picture archiving and communication system (PACS). In the medical field, medical images are transmitted in the Digital Imaging and Communications in Medicine (DICOM) format. Therefore, various information collected from the patient by the image acquisition device 112 is centrally stored in the image server 114 in the DICOM format. The user device 120 can be a clinic computer in the hospital or a mobile device held by medical staff, and can view the data stored in the image server 114 through a specific program interface and access mechanism. It should be understood that the number of sampling systems 110 can be plural, distributed in hospitals across the country, not limited to large medical centers or rural clinics. The network connection between the sampling system 110, the user device 120 and the image analysis system 130 is not limited to a wired network or a wireless network.
影像分析系統130代表的是執行人工智慧分析的主要伺服器。影像分析系統130中包含第一模型131、第二模型132、第三模型133、預處理模組134、訓練模組136、以及推測模組138。影像分析系統130可以是設置在大型醫學中心裏的實體主機,提供工作人員使用。為了便於讓分散全國各地的的醫療人員能夠透過用戶裝置120訪問影像分析系統130,影像分析系統130可採用公有雲架構。意即,影像分析系統130也可以是由多階層虛化服務組成的雲端系統。公有雲架構以功能區分,至少可區分為前端介面和後端平臺(未圖示)。前端介面可以採用網頁服務的形式供使用者透過用戶裝置120上的瀏覽器操作。後端平臺可供採樣系統110或用戶裝置120上傳資料選定的圖像資訊至影像分析系統130進行處理、訓練、及推測。影像分析系統130中的元件,並未必需要實體設置於大型醫學中心內部,而是分散於各雲端廠商的虛擬服務。舉例來說,影像分析系統130中的各元件,可採用客製化的軟體即服務(Software as a service;SaaS)或是市面上既有的平臺即服務(Platform as a service;PaaS) 解決方案。由於公有雲架構是極有彈性且方案多元的已知技術,本實施例不加詳述。因此,可以理解的是,本發明的影像分析系統130,以及其中的各元件模組,並不限定實體實施方式,而採樣系統110和影像分析系統130也不限於位在相同或不同位置。 The image analysis system 130 represents the main server for executing artificial intelligence analysis. The image analysis system 130 includes a first model 131, a second model 132, a third model 133, a preprocessing module 134, a training module 136, and an inference module 138. The image analysis system 130 can be a physical host installed in a large medical center for use by staff. In order to facilitate medical personnel scattered across the country to access the image analysis system 130 through the user device 120, the image analysis system 130 can adopt a public cloud architecture. That is, the image analysis system 130 can also be a cloud system composed of multi-layered virtual services. The public cloud architecture is divided by function, and can be divided into at least a front-end interface and a back-end platform (not shown). The front-end interface can be provided in the form of a web service for users to operate through a browser on the user device 120. The back-end platform allows the sampling system 110 or the user device 120 to upload selected image information to the image analysis system 130 for processing, training, and inference. The components in the image analysis system 130 do not necessarily need to be physically located inside a large medical center, but are distributed in the virtual services of various cloud vendors. For example, each component in the image analysis system 130 can adopt a customized Software as a Service (SaaS) or an existing Platform as a Service (PaaS) solution on the market. Since the public cloud architecture is a known technology with extremely flexible and diverse solutions, this embodiment will not be described in detail. Therefore, it can be understood that the image analysis system 130 of the present invention, and each component module therein, is not limited to a physical implementation method, and the sampling system 110 and the image analysis system 130 are not limited to being located in the same or different locations.
預處理模組134設置為可接收影像服務器114提供的圖像數據,並對其進行預處理。詳細預處理流程將於圖3中說明。 The pre-processing module 134 is configured to receive image data provided by the image server 114 and pre-process it. The detailed pre-processing process will be described in FIG. 3.
訓練模組136設置為可依據預處理模組134預處理的結果,針對本發明所需的分析方法執行深度學習,最後訓練出第一模型131、第二模型132、和第三模型133等可實現特定目的的人工智慧模型。可以理解的是,在本發明中所稱的人工智慧模型,泛指類神經網路演算法和特定參數的數位資料組合,可在特定硬體和作業系統架構下被執行,根據輸入資料而產生對應輸出結果。在一實施例中,訓練模組136可以是支援卷積神經網路(Convolution neural network,CNN)的運算系統,受到特定作業系統和軟體驅動而進行人工智慧學習功能。舉例來說,訓練模組136本身可包含由Python撰寫的程式碼、基於Tensorflow的深度學習框架、以及NVIDIA DGX A100繪圖卡。 The training module 136 is configured to perform deep learning for the analysis method required by the present invention based on the preprocessing result of the preprocessing module 134, and finally train the first model 131, the second model 132, and the third model 133, etc., which can achieve specific purposes. It can be understood that the artificial intelligence model referred to in the present invention generally refers to a combination of digital data of a neural network algorithm and specific parameters, which can be executed under a specific hardware and operating system architecture to generate corresponding output results based on input data. In one embodiment, the training module 136 can be a computing system that supports a convolutional neural network (CNN), which is driven by a specific operating system and software to perform artificial intelligence learning functions. For example, the training module 136 itself may include code written in Python, a deep learning framework based on Tensorflow, and an NVIDIA DGX A100 graphics card.
推測模組138設置為可利用第一模型131、第二模型132和第三模型133為輸入圖像進行推測辨識,以獲得辨識結果。輸入圖像可以是由用戶裝置120提供。影像分析系統130收到用戶裝置120提供的輸入圖像後,可先經由預處理模組134執行預處理,再傳至推測模組138執行推測辨識流程。 The inference module 138 is configured to use the first model 131, the second model 132 and the third model 133 to perform inference recognition for the input image to obtain a recognition result. The input image may be provided by the user device 120. After the image analysis system 130 receives the input image provided by the user device 120, it may be pre-processed by the pre-processing module 134 before being transmitted to the inference module 138 to perform the inference recognition process.
第一模型131受訓練模組136訓練為可辨識上消化道內視鏡圖像中的胃部區域和非胃區域。 The first model 131 is trained by the training module 136 to recognize the stomach area and the non-stomach area in the upper gastrointestinal tract endoscopy image.
第二模型132受訓練模組136訓練可將第一模型131中辨識出的胃部區域,進一步分類為三個類別:胃竇、胃體、以及賁門胃底。 The second model 132 is trained by the training module 136 to further classify the stomach regions identified in the first model 131 into three categories: gastric sinus, gastric body, and gastric fundus.
第三模型133受訓練模組136訓練可判斷胃竇和胃體區域是否存在癌前病變,例如萎縮性胃炎和腸上皮化生,以及幽門螺旋桿菌感染。 The third model 133 is trained by the training module 136 to judge whether there are precancerous lesions in the gastric sinus and gastric body area, such as atrophic gastritis and intestinal metaplasia, as well as Helicobacter pylori infection.
本發明所提出的醫療輔助診斷系統100,可為社區提高醫療可及性,並提供胃癌高患病率地區的即時患者管理循證資訊。以下以流程圖說明各模組的實作流程。 The medical assisted diagnosis system 100 proposed by the present invention can improve the accessibility of medical care for the community and provide real-time evidence-based information for patient management in areas with a high incidence of gastric cancer. The following flowchart illustrates the implementation process of each module.
圖2是本發明的人工智慧病變輔助醫療診斷流程200實施例。本實施例基於圖1的醫療輔助診斷系統100架構,由採樣系統110、用戶裝置120和影像分析系統130配合實作。 FIG2 is an embodiment of the artificial intelligence lesion-assisted medical diagnosis process 200 of the present invention. This embodiment is based on the medical-assisted diagnosis system 100 architecture of FIG1 and is implemented by the sampling system 110, the user device 120 and the image analysis system 130.
在步驟201中,在影像服務器114中提供具有平均發病率的第一數據集和具有高發病率的第二數據集。在一實施例中,第一數據集包含大型醫學中心提供的萎縮性胃炎和腸上皮化生患者的內視鏡圖像和組織學檢驗結果。第二數據集泛指由罹病率偏高的特定社區醫院的萎縮性胃炎和腸上皮化生患者內視鏡圖像和組織學檢驗結果。在偏鄉或離島地區,例如馬祖,因為生活方式、醫療資源落差,居民罹患胃病的風險差異在統計上具有顯著性,適合做為第二數據集的來源。上述第一數據集和第二數據集中的胃竇和胃體圖像資料,可以是由各地醫療人員利用影像獲取裝置112遵循改良版雪梨方案採集而得,具有標準格式檔案,儲存在影像服務器114中。 In step 201, a first dataset with an average incidence and a second dataset with a high incidence are provided in the image server 114. In one embodiment, the first dataset includes endoscopic images and histological test results of patients with atrophic gastritis and intestinal metaplasia provided by a large medical center. The second dataset generally refers to endoscopic images and histological test results of patients with atrophic gastritis and intestinal metaplasia from a specific community hospital with a high incidence. In remote rural or outlying island areas, such as Matsu, due to the lifestyle and medical resource gap, the risk difference of residents suffering from gastric disease is statistically significant, and is suitable as a source of the second dataset. The gastric sinus and gastric body image data in the first and second datasets can be collected by medical personnel from various places using image acquisition devices 112 in accordance with the modified Sydney scheme, and have standard format files stored in the image server 114.
在進一步的實施例中,影像分析系統130中也可包含圖像資料庫(未圖示),用於集中儲存大型醫學中心所收集的第一數據集,以及由全國各地的地區醫院所收集而來的第二數據集。圖像資料庫的實作,也不限定為本地儲存裝置,而可以是公有雲解決方案。換言之,本發明的實施例不限定影像服務器114的數量及位置。而各元件之間的資料傳輸可遵循已知網路協議實作,不再於本實施例贅述。 In a further embodiment, the image analysis system 130 may also include an image database (not shown) for centrally storing the first data set collected by a large medical center and the second data set collected by regional hospitals across the country. The implementation of the image database is not limited to a local storage device, but can be a public cloud solution. In other words, the embodiment of the present invention does not limit the number and location of the image server 114. The data transmission between the components can be implemented in accordance with known network protocols and will not be repeated in this embodiment.
由於從全國各處的採樣系統110的獲取數據來源和方式可能具有極大差異,在進行深度學習階段之前,需要對數據中的原始圖像進行預處理,以去除不必要的雜訊,例如患者信息、時間戳記、或浮水印等。此外,預處理也有助於使後續訓練模組136所學習的資訊具有較高的一致性。 Since the data sources and methods of obtaining data from sampling systems 110 across the country may vary greatly, the original images in the data need to be pre-processed before the deep learning stage to remove unnecessary noise, such as patient information, timestamps, or watermarks. In addition, pre-processing also helps to make the information learned by the subsequent training module 136 more consistent.
在步驟203中,對第一數據集和第二數據集進行圖像預處理。預處理的步驟可以是在採樣系統110端事先處理,也可以由影像分析系統130中的預處理模組134所處理。當第一數據和第二數據被送至預處理模組134執行預處理步驟時,預處理模組134將第一數據和第二數據進行正規化處理而轉換為正規化圖像。這些正規化圖像隨後被傳送至訓練模組136中進行訓練。詳細的圖像預處理步驟,將於圖3的流程圖中詳述。 In step 203, image preprocessing is performed on the first data set and the second data set. The preprocessing step can be preprocessed at the sampling system 110 end, or processed by the preprocessing module 134 in the image analysis system 130. When the first data and the second data are sent to the preprocessing module 134 to perform the preprocessing step, the preprocessing module 134 normalizes the first data and the second data and converts them into normalized images. These normalized images are then transmitted to the training module 136 for training. The detailed image preprocessing steps will be described in detail in the flow chart of Figure 3.
在步驟205中,第一數據集和第二數據集被預處理模組134轉換為正規化圖像,並輸入訓練模組136進行深度學習,而後建立推測模型。 In step 205, the first data set and the second data set are converted into normalized images by the preprocessing module 134 and input into the training module 136 for deep learning, and then the inference model is established.
訓練模組136在運行時,可將第一數據集和第二數據集分為訓練數據集、驗證數據集和測試數據集。訓練模組136利用在訓練和驗證過程中判定的截止值(cut off value)進行訓練,學習根據被診斷為特定狀況的小塊的比例對正規化圖像進行分類。隨後訓練模組136在測試數據集上評估所學模型的性能。在本實施例中,為了加速模型的成熟,訓練模組136採用轉移學習(Transfer Learning)法,透過微調已知模型,以善用已知模型中的既有知識。 When the training module 136 is running, the first data set and the second data set can be divided into a training data set, a validation data set, and a test data set. The training module 136 uses the cutoff value determined during the training and validation process for training, and learns to classify the normalized image according to the proportion of small blocks diagnosed as specific conditions. The training module 136 then evaluates the performance of the learned model on the test data set. In this embodiment, in order to accelerate the maturity of the model, the training module 136 adopts the transfer learning method to make good use of the existing knowledge in the known model by fine-tuning the known model.
本發明實施例透過微調已知模型,例如ResNet和密集連接網路(DenseNet),並克服結合困難,而發展出第一模型131、第二模型132和第三模型133。ResNet神經網路,是發布於2016年的技術,可透過前面層與後面層之間的「短路連接」,在訓練過程中反向傳播梯度,進而能訓練出更深的卷積神經網路。而DenseNet模型發布於2018年,是由ResNet與初版DenseNet神經網路架構被進一步改良而成。DenseNet特色在於其區塊中各個層會與前面所有層的特徵圖以通道的維度堆疊在一起,再作為後一層的輸入。因此,N層卷積神經網路即具有N個連接,可增強特徵傳播,實現特徵重用性,同時減少模型參數與計算量,提升訓練效率。 The embodiments of the present invention develop a first model 131, a second model 132, and a third model 133 by fine-tuning known models, such as ResNet and densely connected networks (DenseNet), and overcoming the difficulty of integration. ResNet neural network is a technology released in 2016. It can reversely propagate gradients during training through "short-circuit connections" between the front layer and the back layer, thereby training a deeper convolutional neural network. The DenseNet model was released in 2018 and is a further improvement of the ResNet and the original DenseNet neural network architecture. The feature of DenseNet is that each layer in its block will be stacked with the feature maps of all previous layers in the dimension of the channel, and then used as the input of the next layer. Therefore, an N-layer convolutional neural network has N connections, which can enhance feature propagation and achieve feature reusability, while reducing model parameters and computational complexity, and improving training efficiency.
本實施例中的第一模型131、第二模型132和第三模型133,皆是由訓練模組136依據根據第一數據集和第二數據集所訓練而成的神經網路模型。 The first model 131, the second model 132 and the third model 133 in this embodiment are all neural network models trained by the training module 136 based on the first data set and the second data set.
第一模型131被訓練為可判讀胃內視鏡圖像中的胃部位置和非胃部位置。第二模型132被訓練為可進一步將胃部位置分類為胃體、胃竇、和幽門。第三模型133可進一步依據第二模型132所提供的對應部位的胃部圖像判斷其中是否發生萎縮性胃炎及腸上皮化生之癌前病變,並預測其嚴重程度。 The first model 131 is trained to identify gastric locations and non-gastric locations in gastric endoscopic images. The second model 132 is trained to further classify gastric locations into gastric body, gastric sinus, and pylorus. The third model 133 can further determine whether atrophic gastritis and precancerous lesions of intestinal metaplasia occur based on the gastric images of the corresponding parts provided by the second model 132, and predict their severity.
訓練模組136訓練第一模型131、第二模型132、和第三模型133時所採用的模型樣板,可以包含各種知名神經網網路架構的衍生版本,例如EfficientNet-b0,EfficiientNet-b4,EfficientNet-b6,AlexNet,VGG11,VGG19,ResNet18,ResNet152,DenseNet121,Vision Transformer以及DenseNet201等。 在最後訓練完成時,再選擇其中判別能力最佳的推測模型,部署至影像分析系統130中對外服務。 The model templates used by the training module 136 to train the first model 131, the second model 132, and the third model 133 may include derivative versions of various well-known neural network architectures, such as EfficientNet-b0, EfficiientNet-b4, EfficientNet-b6, AlexNet, VGG11, VGG19, ResNet18, ResNet152, DenseNet121, Vision Transformer, and DenseNet201. When the training is finally completed, the inference model with the best discriminative ability is selected and deployed to the image analysis system 130 for external services.
在訓練131和第二模型132的實施例中,影像分析系統130可採用大型醫學中心的數據,例如台大醫院,以病患為單位隨機洗牌並取80%用於訓練,20%用於驗證。另外,在測試方面,可採用高風險地區醫院的資料,例如馬祖資料,來評估第一模型131或第二模型132的準確性與穩定性。 In the embodiment of training 131 and the second model 132, the image analysis system 130 can use data from a large medical center, such as the National Taiwan University Hospital, randomly shuffle the data by patient and use 80% for training and 20% for verification. In addition, in terms of testing, data from hospitals in high-risk areas, such as Matsu data, can be used to evaluate the accuracy and stability of the first model 131 or the second model 132.
為了將第三模型133訓練成可以預測病理嚴重程度(萎縮性胃炎及腸上皮化生)的模型,提供給訓練模組136的資料必須先排除病理切片結果有缺失之患者,並選取胃竇和胃體位置之圖像資料做為輸入值。選取胃部區域的工作,可利用上述實施例中已訓練完成的第一模型131和第二模型132來執行。 In order to train the third model 133 to be a model that can predict the severity of pathology (atrophic gastritis and intestinal metaplasia), the data provided to the training module 136 must first exclude patients with missing pathological section results and select image data of the gastric sinus and gastric body as input values. The work of selecting the gastric area can be performed using the first model 131 and the second model 132 that have been trained in the above embodiment.
在訓練第三模型133時,不論是預測萎縮性胃炎或腸上皮化生之嚴重程度,訓練集、驗證集和測試集的分配比例可彈性設置,並依據執行效能而動態調整。舉例來說,在第一種訓練方式中,訓練模組136可從影像服務器114中選擇偏鄉醫院或高風險數據集,例如馬祖資料,做為輸入數據。並將輸入數據的70%分配為訓練集,20%分配為測試集,而10%分配為驗證集。在第二種訓練方式中,還進一步加入台大醫院資料進行混合訓練。在混合訓練的實施例中,可將大型醫學中心,例如台大醫院的資料依8:2的比例,分配給訓練集和驗證集。或者,全部台大醫院資料分配給訓練集,並從馬祖資料中挑選10%分配給驗證集。又或者,將全部台大醫院資料混合70%的馬祖資料後,分配給訓練集,而10%的馬祖資料分配給驗證集。實驗顯示混合兩種不同來源的數據於訓練集時,所訓練出來的模型效能較佳,泛化能力較高。 When training the third model 133, whether predicting the severity of atrophic gastritis or intestinal metaplasia, the allocation ratio of the training set, validation set, and test set can be flexibly set and dynamically adjusted according to the execution performance. For example, in the first training method, the training module 136 can select a remote hospital or a high-risk data set, such as Matsu data, from the image server 114 as input data. And 70% of the input data is allocated as a training set, 20% is allocated as a test set, and 10% is allocated as a validation set. In the second training method, the National Taiwan University Hospital data is further added for mixed training. In the implementation of mixed training, the data of large medical centers, such as National Taiwan University Hospital, can be allocated to the training set and the validation set in a ratio of 8:2. Alternatively, all the data of National Taiwan University Hospital can be allocated to the training set, and 10% of the data from Matsu can be selected and allocated to the validation set. Alternatively, all the data of National Taiwan University Hospital can be mixed with 70% of the data from Matsu and then allocated to the training set, while 10% of the data from Matsu can be allocated to the validation set. Experiments show that when the data from two different sources are mixed in the training set, the trained model has better performance and higher generalization ability.
在本實施例中,為了訓練第一模型131區分胃和非胃圖像,訓練模組136可從影像服務器114中讀取大量原始胃鏡圖像,做為輸入數據集,利用卷積神經網路學習胃部位置的特徵。訓練模組136可將輸入數據集中的70%分配為模型的訓練集、20%分配為驗證集,而10%分配為測試集。實驗測知,當第一模型131選用DenseNet201做為模型樣板時,訓練結果具有較佳效能。DenseNet201是一種深度神經網路架構,屬於DenseNet系列的一部分。它由密集連接塊(Dense Blocks)組成,具有非常深的網路結構,總共有201層。 DenseNet201是利用已知的ImageNet數據集訓練而成,主要用途為圖像分類任務。DenseNet201相比於較淺的網路具有更強大的特徵提取能力。其核心思想是密集連接,即每個層的輸入都與前面所有層的輸出連接在一起。這種設計使得信息可以更加充分地傳遞和利用,有效地緩解梯度消失問題,促進特徵的重複利用,從而提高了網路的效率和性能。DenseNet201的網路結構相對較複雜,包含了多個密集連接塊和過渡層(Transition Layers)。密集連接塊中的層之間透過直接連接實現信息的共享和傳遞,而過渡層則用於調整特徵圖的大小和數量,幫助網路適應不同的輸入尺寸和複雜度。DenseNet201在許多計算機視覺任務中都表現出色,如圖像分類、物體檢測和語義分割等。它在提取圖像特徵方面具有優異的性能,被廣泛應用於各種實際應用中,並在許多競賽和學術研究中取得了顯著的成績。 In this embodiment, in order to train the first model 131 to distinguish between stomach and non-stomach images, the training module 136 can read a large number of original gastroscopic images from the image server 114 as an input data set, and use the convolutional neural network to learn the characteristics of the stomach position. The training module 136 can allocate 70% of the input data set as the training set of the model, 20% as the validation set, and 10% as the test set. Experimental tests show that when the first model 131 uses DenseNet201 as a model template, the training results have better performance. DenseNet201 is a deep neural network architecture, which is part of the DenseNet series. It is composed of densely connected blocks (Dense Blocks) and has a very deep network structure with a total of 201 layers. DenseNet201 is trained using the known ImageNet dataset and is mainly used for image classification tasks. DenseNet201 has stronger feature extraction capabilities than shallower networks. Its core idea is dense connection, that is, the input of each layer is connected to the output of all previous layers. This design allows information to be transmitted and utilized more fully, effectively alleviates the gradient vanishing problem, promotes the reuse of features, and thus improves the efficiency and performance of the network. The network structure of DenseNet201 is relatively complex, including multiple densely connected blocks and transition layers. The layers in the densely connected blocks share and transmit information through direct connections, while the transition layers are used to adjust the size and number of feature maps to help the network adapt to different input sizes and complexities. DenseNet201 performs well in many computer vision tasks, such as image classification, object detection, and semantic segmentation. It has excellent performance in extracting image features, is widely used in various practical applications, and has achieved remarkable results in many competitions and academic research.
第二模型132是可對胃部區域進行分類的模型,使推測模組138執行後將胃部圖像區分為胃竇、胃體、以及賁門胃底。訓練模組136為了訓練第二模型132達成上述功能,可從影像服務器114中讀取大型醫學中心和社區醫院收集的大量原始圖像,這些原始圖像中,來自大型醫學中心的部份,80%分配為訓練集、20%分配為驗證集。來自社區醫院的原始圖像,分配10%為測試集。訓練集和驗證集使用從大型醫學中心獲得的數據,其中的發病率偏向於全國平均值。而測試集主要使用從特定社區醫院獲取的數據,其中已知具有高於全國平均值的發病率。實驗測知,DenseNet121模型的訓練結果可獲得較佳效能,因此DenseNet121被選用於部署第二模型132。與DenseNet201相似,DenseNet121也是DenseNet系列的一部分,由多個密集連接塊組成,總共有121層。 The second model 132 is a model that can classify the gastric area, so that the inference module 138 can distinguish the gastric image into the gastric sinus, gastric body, and gastric fundus after execution. In order to train the second model 132 to achieve the above functions, the training module 136 can read a large number of original images collected by large medical centers and community hospitals from the image server 114. Among these original images, 80% of the images from large medical centers are allocated as training sets and 20% are allocated as validation sets. 10% of the original images from community hospitals are allocated as test sets. The training set and validation set use data obtained from large medical centers, in which the incidence rate tends to the national average. The test set mainly uses data obtained from specific community hospitals, which are known to have an incidence rate higher than the national average. Experimental results show that the training results of the DenseNet121 model can achieve better performance, so DenseNet121 was selected to deploy the second model 132. Similar to DenseNet201, DenseNet121 is also part of the DenseNet series, consisting of multiple densely connected blocks, with a total of 121 layers.
第三模型133是可對癌前胃部疾病進行預測識別的模型。在本實施例的訓練模組136訓練第三模型133的過程中,使用大型醫學中心的所有數據和社區醫院的70%數據。在第三模型133的驗證過程中,使用了社區醫院數據的10%。在第三模型133的測試過程中,使用社區醫院20%的數據。舉例來說,訓練模組136可以僅採用馬祖資料集訓練第三模型133,並嘗試套用多種模型樣板,包含EfficientNet-b0、EfficientNet-b4、EfficientNet-b6、AlexNet、VGG11、VGG19、ResNet18、ResNet50、ResNet152、DenseNet121、DenseNet201 等11種卷積神經網路模型,以及視覺轉換器(Vision Transformer;VIT)等架構模型進行訓練。上述各種模型樣板的訓練結果依據Micro-AUC指標進行評估,可知視覺轉換器模型相對於各種卷積神經網路模型的效以提升約13-37%,因此第三模型133可採用視覺轉換器模型進行萎縮性胃炎和腸上皮化生的癌前病變診斷、驗證、與測試。 The third model 133 is a model that can predict and identify precancerous gastric diseases. In the process of training the third model 133 by the training module 136 of this embodiment, all the data of large medical centers and 70% of the data of community hospitals are used. In the verification process of the third model 133, 10% of the data of community hospitals are used. In the testing process of the third model 133, 20% of the data of community hospitals are used. For example, the training module 136 can only use the Mazu dataset to train the third model 133, and try to apply multiple model templates, including 11 convolutional neural network models such as EfficientNet-b0, EfficientNet-b4, EfficientNet-b6, AlexNet, VGG11, VGG19, ResNet18, ResNet50, ResNet152, DenseNet121, DenseNet201, and Vision Transformer (VIT) and other architecture models for training. The training results of the above-mentioned various model templates were evaluated based on the Micro-AUC indicator. It can be seen that the visual converter model is about 13-37% more effective than various convolutional neural network models. Therefore, the third model 133 can use the visual converter model to diagnose, verify, and test precancerous lesions of atrophic gastritis and intestinal metaplasia.
訓練模組136訓練各模型的成果,可經由一效能評測程序而判斷是否部署至影像分析系統130。在評測訓練後的模型能力時,本實施例可採用靈敏度、特異性和受試者工作特徵面積(Area under the Receiver Operating Characteristic;AUROC)曲線。在受試者的基線特徵中,分類數據以百分比(%)為指標,連續數據則以平均值(標準差)表示。這些指標可驗證各種推測模型是否能正確識別陽性和陰性案例,並判別整體不平衡的數據集。本發明的實施例,訓練模組136可採用95%信賴區間(Confidence Intervals,CIs)來評估所訓練出來的第一模型131、第二模型132和第三模型133的統計學意義。在訓練模組136訓練第一模型131、第二模型132和第三模型133的過程中,可同時對多個模型樣板進行訓練,最後依據各模型樣板表現出來的AUROC,選擇判別能力最高的模型,部署到影像分析系統130中,做為實際執行應用的推測模型。換句話,部署在影像分析系統130中的第一模型131、第二模型132和第三模型133,皆為訓練模組136在依據各種模型樣板和輸入數據訓練完成後透過AUROC所選擇而得。 The results of each model trained by the training module 136 can be used to determine whether to deploy it to the image analysis system 130 through a performance evaluation program. When evaluating the ability of the trained model, the present embodiment can use sensitivity, specificity and area under the receiver operating characteristic (AUROC) curve. In the baseline characteristics of the subjects, the classification data is expressed as a percentage (%), and the continuous data is expressed as a mean (standard deviation). These indicators can verify whether various inference models can correctly identify positive and negative cases and judge overall unbalanced data sets. In the embodiment of the present invention, the training module 136 may use 95% confidence intervals (CIs) to evaluate the statistical significance of the trained first model 131, the second model 132, and the third model 133. In the process of the training module 136 training the first model 131, the second model 132, and the third model 133, multiple model templates may be trained simultaneously, and finally, based on the AUROCs shown by each model template, the model with the highest discriminative ability is selected and deployed in the image analysis system 130 as the inference model for actual execution application. In other words, the first model 131, the second model 132, and the third model 133 deployed in the image analysis system 130 are all selected by the training module 136 through AUROC after training based on various model templates and input data.
本實施例所訓練的第三模型133,可在真實環境中驗證預測值。舉例來說,研究者可將胃蛋白酶原檢測呈陽性的患者進行內視鏡檢查和組織學評估後所得結果,與醫療輔助診斷系統100的分析結果進行比對。實測證實本實施例的醫療輔助診斷系統100可正確識別瞭解剖位置的96.2%圖像和胃部區域的97.5%圖像。在組織學預測方面,診斷胃癌前病變的PPV和NPV分別為0.583(95%CI:0.468-0.690)和0.881(0.815-0.925)。 The third model 133 trained in this embodiment can verify the prediction value in a real environment. For example, researchers can compare the results of endoscopic examination and histological evaluation of patients with positive pepsinogen test results with the analysis results of the medical auxiliary diagnosis system 100. The actual test confirmed that the medical auxiliary diagnosis system 100 of this embodiment can correctly identify 96.2% of the images of the anatomical location and 97.5% of the images of the gastric area. In terms of histological prediction, the PPV and NPV of diagnosing gastric precancerous lesions are 0.583 (95% CI: 0.468-0.690) and 0.881 (0.815-0.925), respectively.
在進一步的實施例中,為了提高第三模型133的辨識性能,進行訓練前,預處理模組134還可事先為輸入的圖像數據執行對比度有限自適應直方圖等化(Contrast Limited Adaptive Histogram Equalization;CLAHE)運算,以增強圖像細節,突顯胃黏膜特徵。CLAHE是一種圖像處理技術,旨在增強 圖像的對比度。它是直方圖等化(Histogram Equalization;HE)的一種變體,用於解決HE在局部對比度增強時可能導致的過度增強雜訊的問題。CLAHE的工作原理是將圖像分成許多局部區域,然後對每個區域應用HE,從而在整個圖像中實現局部對比度增強。但與標準HE不同的是,CLAHE會限制局部區域的對比度增強,以避免產生過度增強的結果。這是透過對局部區域中像素的直方圖進行裁剪和重新分配來實現的。CLAHE的一個重要參數是“對比度限制”,用於控制局部區域中像素亮度值的等化程度。對比度限制越大,則對比度增強的效果越明顯,但可能會導致過度增強的結果。因此,適當調整對比度限制,可以獲得最佳的CLAHE效果。 In a further embodiment, in order to improve the recognition performance of the third model 133, before training, the pre-processing module 134 may also perform a contrast limited adaptive histogram equalization (CLAHE) operation on the input image data in advance to enhance image details and highlight gastric mucosal features. CLAHE is an image processing technique that aims to enhance the contrast of an image. It is a variant of histogram equalization (HE) and is used to solve the problem of excessive noise enhancement that may be caused by HE when local contrast enhancement is performed. The working principle of CLAHE is to divide the image into many local regions and then apply HE to each region, thereby achieving local contrast enhancement in the entire image. However, unlike standard HE, CLAHE limits the contrast enhancement in local areas to avoid over-enhancement. This is achieved by cropping and redistributing the histogram of pixels in the local area. An important parameter of CLAHE is "contrast limit", which is used to control the degree of equalization of pixel brightness values in the local area. The larger the contrast limit, the more obvious the contrast enhancement effect, but it may lead to over-enhancement. Therefore, appropriately adjusting the contrast limit can achieve the best CLAHE effect.
完成訓練的第一模型131,可判斷胃內視鏡圖像中的胃部位置,而第二模型132可分類出胃底、胃竇及胃體等部位。第三模型133可依據第二模型132分辨出來的圖像,進行病理嚴重程度預測(萎縮性胃炎及腸上皮化生)。在進一步衍生的實施例中,藉由不同的輸入圖像以及參數調校,第二模型132還可被訓練為可分類下咽、食道、及十二指腸等部位,而第三模型133也可進一步訓練成可對胃底、胃體、和幽門進行幽門桿菌檢測。 The trained first model 131 can determine the location of the stomach in the gastric endoscopic image, and the second model 132 can classify the gastric fundus, gastric sinus, and gastric body. The third model 133 can predict the severity of the pathology (atrophic gastritis and intestinal metaplasia) based on the image identified by the second model 132. In a further derived embodiment, through different input images and parameter adjustments, the second model 132 can also be trained to classify the hypopharynx, esophagus, and duodenum, and the third model 133 can also be further trained to detect Helicobacter pylori in the gastric fundus, gastric body, and pylorus.
在步驟207中,推測模組138接收樣本數據,並利用步驟205建立的推測模型進行人工智慧推測(Inference)。在醫療輔助診斷系統100對外開放服務時,用戶裝置120的使用者可隨時透過雲端上傳新患者的樣本數據,供影像分析系統130中的推測模組138進行人工智慧推測。推測模組138會利用第一模型131、第二模型132和第三模型133所學習的知識,有效地依序分辨所述樣本數據中的各式特徵,最後判定是否存在胃癌風險。在實作中,推測模組138所使用的硬體和作業系統架構可與訓練模組136相同或共用,可載入第一模型131、第二模型132、及第三模型133而執行對應的人工智慧推測服務。關於推測模組138進行推測的進一步詳細步驟,將於圖4中詳述。 In step 207, the inference module 138 receives the sample data and performs artificial intelligence inference using the inference model established in step 205. When the medical auxiliary diagnosis system 100 is open to the outside world, the user of the user device 120 can upload the sample data of a new patient via the cloud at any time for the inference module 138 in the image analysis system 130 to perform artificial intelligence inference. The inference module 138 will use the knowledge learned by the first model 131, the second model 132, and the third model 133 to effectively distinguish various features in the sample data in sequence, and finally determine whether there is a risk of gastric cancer. In practice, the hardware and operating system architecture used by the inference module 138 can be the same as or shared with the training module 136, and the first model 131, the second model 132, and the third model 133 can be loaded to execute the corresponding artificial intelligence inference service. The further detailed steps of the inference module 138 for inference will be described in detail in FIG. 4.
在步驟209中,影像分析系統130接收已知檢測結果的組織學樣本,以改進影像預處理參數。換句話說,使用者在真實環境中使用醫療輔助診斷系統100時,可主動提供已知檢測結果的給影像分析系統130,以助於驗證或改進模型效能,醫療輔助診斷系統100的推測模組138在診斷癌前病變時達到約80%的靈敏度和特異性。 In step 209, the image analysis system 130 receives histological samples with known test results to improve image preprocessing parameters. In other words, when the user uses the medical-assisted diagnosis system 100 in a real environment, the user can actively provide the image analysis system 130 with known test results to help verify or improve the model performance. The inference module 138 of the medical-assisted diagnosis system 100 achieves a sensitivity and specificity of about 80% when diagnosing precancerous lesions.
影像分析系統130在接收到這些已知檢測結果的組織學樣本時,同樣地可先執行步驟203,對所輸入的組織學樣本圖像進行預處理,再由推測模組138執行推測程序。藉由這些陽性個體的套用第一模型131、第二模型132及第三模型133的分析結果,便可驗證第一模型131、第二模型132、或第三模型133的PPV和NPV效能,及其相應的95%信賴區間。 When the image analysis system 130 receives these histological samples with known test results, it can also perform step 203 to pre-process the input histological sample images, and then the inference module 138 executes the inference program. By applying the analysis results of the first model 131, the second model 132 and the third model 133 to these positive individuals, the PPV and NPV performance of the first model 131, the second model 132, or the third model 133, and their corresponding 95% confidence intervals can be verified.
圖3是本發明預處理模組134進行圖像預處理流程300的示意圖。本實施例進一步說明步驟202的分解步驟。從採樣系統110或用戶裝置120傳送至影像分析系統130的圖像格式,可能因儀器的類型、廠牌不同,畫面的解析度和信息格式皆有不同。舉例來說,傳統的胃內視鏡圖像大小規格雜亂,可能包含720*480、640*480、或512*384等各種長寬比與解析度,且圖像中經常存在有時戳或符水印等雜訊。在進行深度學習階段之前,需要對原始圖像進行預處理,以解決圖像規格不一致的問題或雜訊干擾問題。預處理的程序有助於使訓練模組136進行深度學習時,只處理包含關鍵信息的黏膜外觀圖像。 FIG3 is a schematic diagram of an image preprocessing process 300 performed by the preprocessing module 134 of the present invention. This embodiment further illustrates the decomposition steps of step 202. The image format transmitted from the sampling system 110 or the user device 120 to the image analysis system 130 may have different resolutions and information formats due to different types and brands of instruments. For example, traditional gastroendoscopic images have a variety of size specifications, which may include various aspect ratios and resolutions such as 720*480, 640*480, or 512*384, and there are often noises such as timestamps or watermarks in the images. Before the deep learning stage is performed, the original image needs to be preprocessed to solve the problem of inconsistent image specifications or noise interference. The preprocessing procedure helps the training module 136 to process only mucosal appearance images containing key information when performing deep learning.
在步驟301中,預處理模組134將從採樣系統110或用戶裝置120接收到的輸入圖像轉換為灰階圖像。輸入圖像一般為彩色內視鏡圖像,在本步驟中先轉換為灰度圖像以便後續處理。 In step 301, the pre-processing module 134 converts the input image received from the sampling system 110 or the user device 120 into a grayscale image. The input image is generally a color endoscopic image, which is first converted into a grayscale image in this step for subsequent processing.
在步驟303,利用大津法閾值將灰階影像轉換為二元地圖。大津法(Otsu's method)是一種圖像處理技術,用於自動判定圖像的二元化閾值。首先,對圖像的灰度直方圖進行分析,尋找一個閾值,將圖像分為兩個類別(前景和背景),使得在該閾值下兩個類別之間的類內方差(intra-class variance)最小,而類間方差(inter-class variance)最大。最小類內方差,即為同類別最小變異值。最大類間方差,即為不同類最大變異值。換句話說,大津法所計算出來的閾值,可使得圖像中的不同類差異盡可能大,而同類差異盡可能小。 In step 303, the grayscale image is converted into a binary map using the Otsu's method threshold. Otsu's method is an image processing technique used to automatically determine the binarization threshold of an image. First, the grayscale histogram of the image is analyzed to find a threshold value to divide the image into two categories (foreground and background) so that the intra-class variance between the two categories is minimized and the inter-class variance is maximized under the threshold value. The minimum intra-class variance is the minimum variance of the same category. The maximum inter-class variance is the maximum variance of different categories. In other words, the threshold value calculated by the Otsu's method can make the difference between different categories in the image as large as possible and the difference between the same category as small as possible.
在預處理模組134以大津法計算閾值時,首先計算圖像的灰度直方圖,以統計每個灰度級別的像素數量。接著針對每個可能的閾值(即0到255)將圖像分成兩個類別,再計算每個類別的像素數量和平均灰度值。對於每個可能的閾值,分別計算類內方差和類間方差。最後,在所有閾值中,類間方差最大的閾值,即被選為最佳閾值,並被用於圖像的二元化。大津法常用於圖像分割、邊緣檢測和目標識別等領域,特別是在需要自動處理大量圖像的情 況下。它是一種簡單但有效的圖像處理技術,可以幫助提高圖像處理的準確性和效率。 When the pre-processing module 134 calculates the threshold using the Otsu method, the grayscale histogram of the image is first calculated to count the number of pixels at each grayscale level. Then, for each possible threshold (i.e., 0 to 255), the image is divided into two categories, and the number of pixels and the average grayscale value of each category are calculated. For each possible threshold, the intra-class variance and the inter-class variance are calculated respectively. Finally, among all the thresholds, the threshold with the largest inter-class variance is selected as the optimal threshold and used for image binarization. The Otsu method is commonly used in fields such as image segmentation, edge detection, and target recognition, especially when a large number of images need to be processed automatically. It is a simple but effective image processing technique that can help improve the accuracy and efficiency of image processing.
在步驟305中,預處理模組134對二元地圖進行邊緣偵測以識別輸入影像中的目標區域。依據二元地圖中的物件邊緣,預處理模組134可識別出具有最大面積的區域(由x軸和y軸的最大和最小坐標確定)。 In step 305, the pre-processing module 134 performs edge detection on the binary map to identify the target area in the input image. Based on the edges of the objects in the binary map, the pre-processing module 134 can identify the area with the largest area (determined by the maximum and minimum coordinates of the x-axis and y-axis).
在步驟307中,預處理模組134依據步驟305的辨識結果,裁切輸入影像,只保留目標區域圖像。換句話說,輸入影像中不屬於目標區域的範圍被裁剪捨棄,而留下特定長寬比例的影像檔案。在一實施例中,裁剪後的影像檔案的長寬比例設定為一比一,即正方形圖像。 In step 307, the pre-processing module 134 crops the input image according to the recognition result of step 305, and only retains the target area image. In other words, the range of the input image that does not belong to the target area is cropped and discarded, leaving an image file with a specific aspect ratio. In one embodiment, the aspect ratio of the cropped image file is set to one to one, that is, a square image.
在步驟309中,預處理模組134將目標區域的裁切圖像大小正規化至預設維度。舉例來說,預處理模組134可透過圖像縮放演算法,將裁切後的圖像大小調整至方形像素尺寸256×256或128×128。這些預處理步驟確保圖像被適當地準備用於後續的深度學習訓練。 In step 309, the pre-processing module 134 normalizes the cropped image size of the target area to a preset dimension. For example, the pre-processing module 134 may resize the cropped image to a square pixel size of 256×256 or 128×128 through an image scaling algorithm. These pre-processing steps ensure that the image is properly prepared for subsequent deep learning training.
圖4是本發明實施例之一的人工智慧病理圖像推測流程400。 FIG4 is an artificial intelligence pathological image estimation process 400 of one embodiment of the present invention.
在步驟401中,推測模組138利用第一模型131從預處理圖像中去除非胃部圖像。在本實施例中,第一模型131是圖2的步驟205所訓練出來的DenseNet201模型。 In step 401, the inference module 138 removes non-stomach images from the pre-processed images using the first model 131. In this embodiment, the first model 131 is the DenseNet201 model trained in step 205 of FIG. 2.
在步驟403中,推測模組138利用第二模型132對胃部圖像進行區域分類。 In step 403, the estimation module 138 uses the second model 132 to perform regional classification on the stomach image.
在步驟405中,推測模組138利用第三模型133對各區域胃部圖像進行組織學預測。 In step 405, the estimation module 138 uses the third model 133 to perform histological prediction on each regional stomach image.
在步驟407中,依據預測結果於胃部圖像上疊加熱圖。為了更深入地瞭解第一模型131、第二模型132、第三模型133的推策過程,本實施例還利用梯度加權類激活映射(Gradient-weighted Class Activation Mapping;Grad-CAM)運算來生成熱圖(Heatmap),將輸入圖像中的重要區域視覺化呈現。Grad-CAM是一種深度學習模型預測的視覺呈現方法,透過可視化神經網路中每個特徵圖的重要性,來解釋模型對於特定類別的預測依據。Grad-CAM使用了反向傳播中的梯度信息來計算每個特徵圖對於模型最終預測的重要性。這些梯度可以反映出在預測過程中每個特徵圖的激活程度對於特定類別的貢 獻。Grad-CAM運算可由影像分析系統130中的預處理模組134來達成,也可以是由推測模組138進行運算而處理。舉例來說,推測模組138在進行推測時,可根據特定類別的預測結果和梯度信息,生成類激活映射,以顯示在輸入圖像中哪些區域對於模型最終的預測具有重要性。這些區域通常與模型在預測特定類別時關注的區域相關聯。透過將類激活映射與原始圖像進行叠加,使輸入圖像上産生額外的色偏視覺效果,即可用於強調值得關注的圖像區域。這有助於理解模型的推測過程,也能判斷人工智慧辨識效果是否與專家學者的經驗符合。 In step 407, a heat map is superimposed on the stomach image according to the prediction result. In order to have a deeper understanding of the inference process of the first model 131, the second model 132, and the third model 133, the present embodiment also uses the gradient-weighted class activation mapping (Grad-CAM) operation to generate a heat map to visualize the important regions in the input image. Grad-CAM is a visual presentation method for deep learning model predictions, which explains the prediction basis of the model for a specific category by visualizing the importance of each feature map in the neural network. Grad-CAM uses the gradient information in the back propagation to calculate the importance of each feature map to the final prediction of the model. These gradients can reflect the contribution of the activation degree of each feature map to a specific category during the prediction process. Grad-CAM operation can be achieved by the pre-processing module 134 in the image analysis system 130, or it can be processed by the inference module 138. For example, when the inference module 138 makes inferences, it can generate a class activation map based on the prediction results and gradient information of a specific category to show which areas in the input image are important for the model's final prediction. These areas are usually associated with the areas that the model focuses on when predicting a specific category. By superimposing the class activation map with the original image, an additional color bias visual effect is generated on the input image, which can be used to emphasize the image areas that deserve attention. This helps to understand the inference process of the model and to determine whether the artificial intelligence recognition effect is consistent with the experience of expert scholars.
在高風險人群中診斷癌前胃部疾病的先前技術中,通常過度依賴大量同質化的資料來源,例如從資源豐富的大型醫院中篩選的圖像。本發明所提出的醫療輔助診斷系統100,不但整合了人工智慧模型來模擬醫學判斷思維過程,還將資源有限地區的醫療數據納入分析。本發明的實作方式帶來幾個優點。舉例來說,傳統內視鏡圖像的品質,如接近度和模糊度,會影響圖像分類的逐步過程。本發明的實施例提供熱圖疊加的效果,可輔助內視鏡醫師判斷醫療輔助診斷系統100產出的推測結果。在傳統做法中,內視鏡圖像的取樣角度誤差,會影響人工智慧系統的判別能力。本發明除了遵循組織學標準採樣程序,還透過圖像預處理程序產生正規化的圖像數據,以有效減少誤差變數。本發明實施例在利用第三模型133推測癌前胃部疾病時,還增強黏膜圖像細節,有效輔助內視鏡圖像分析工作。此外,本實施例的人工智慧預測模型在對樣本進行推測分析時,將癌前狀況納入考慮。實測執行結果後發現,整體患病率具有較高負面預測值。這表示本發明的醫療輔助診斷系統100可有效減少低風險胃癌患者進行內視鏡監測的要求,節省了不必要的醫療成本。 Prior art for diagnosing precancerous gastric diseases in high-risk populations often relies too heavily on large, homogeneous data sources, such as images screened from large hospitals with abundant resources. The medical-assisted diagnosis system 100 proposed in the present invention not only integrates an artificial intelligence model to simulate the medical judgment thinking process, but also incorporates medical data from resource-limited areas into the analysis. The implementation method of the present invention brings several advantages. For example, the quality of traditional endoscopic images, such as proximity and blur, affects the step-by-step process of image classification. An embodiment of the present invention provides a heat map overlay effect that can assist endoscopists in judging the inferred results produced by the medical-assisted diagnosis system 100. In traditional practices, sampling angle errors of endoscopic images will affect the discrimination ability of the artificial intelligence system. In addition to following the histological standard sampling procedure, the present invention also generates normalized image data through an image preprocessing procedure to effectively reduce error variables. When using the third model 133 to infer precancerous gastric diseases, the embodiment of the present invention also enhances the details of the mucosal image, effectively assisting the endoscopic image analysis work. In addition, the artificial intelligence prediction model of this embodiment takes precancerous conditions into consideration when performing inference analysis on the sample. After the actual implementation results were tested, it was found that the overall morbidity had a higher negative prediction value. This means that the medical auxiliary diagnosis system 100 of the present invention can effectively reduce the requirement for endoscopic monitoring of low-risk gastric cancer patients, saving unnecessary medical costs.
本發明的醫療輔助診斷系統100是一個全面性的解決方案,技術特徵涵蓋數據收集、模型開發、和現場實作。相較於僅在模型開發過程中進行標註,醫療輔助診斷系統100可在實作中依據醫生臨床經驗的反饋,來提升辨識準確率。醫療輔助診斷系統100還包含端到端(end-to-end)服務的概念,能夠從影像服務器114中儲存的內視鏡圖像中提取資訊。影像分析系統130可以彈性運用圖像診斷或檢測相關人工智慧演算法。本發明的醫療輔助診斷系統100採用數據的方式具有特殊性。傳統的訓練數據主要來自大型醫學中心。在 研究人員對解剖位置和胃部區域進行分類時,社區醫院所提供的訓練數據尚可通用。然而在進行組織學分級時,必須對樣本中細微黏膜變化有較高敏感度。不同來源的圖像數據中的技術參數或圖像品質誤差,容易造成辨識誤差。 The medical-assisted diagnosis system 100 of the present invention is a comprehensive solution, and its technical features cover data collection, model development, and field implementation. Compared with labeling only during the model development process, the medical-assisted diagnosis system 100 can improve the recognition accuracy in implementation based on the feedback of the doctor's clinical experience. The medical-assisted diagnosis system 100 also includes the concept of end-to-end services, which can extract information from endoscopic images stored in the image server 114. The image analysis system 130 can flexibly use image diagnosis or detection related artificial intelligence algorithms. The medical auxiliary diagnosis system 100 of the present invention uses data in a particular way. Traditional training data mainly comes from large medical centers. When researchers classify anatomical locations and gastric regions, the training data provided by community hospitals can be used universally. However, when performing histological grading, it is necessary to have a high sensitivity to subtle mucosal changes in the sample. Technical parameter or image quality errors in image data from different sources can easily cause identification errors.
本發明的實施例為了解決上述技術瓶頸,在訓練過程中納入來自社區醫院的數據,並利用具有自關注機制的模型來捕捉補丁(Patch)之間的關係,從而使預測結果更加可靠。本發明的醫療輔助診斷系統100在不同數據品質和分布的區域之間使用聯邦學習(Federated Learning)來增強模型的泛化能力。聯邦學習是一種機器學習的分散式訓練方法,旨在保護數據隱私的同時,實現模型的集中訓練。相對於傳統的集中式訓練,聯邦學習可將訓練過程推送到數據所在的本地設備上進行,而不是將數據收集到一個中心化的服務器上。在聯邦學習中,每個參與方都會在本地訓練一個本地模型,使用本地數據進行訓練。然後,本地模型的更新被匿名地匯總到中心服務器上,形成全局模型。中心服務器將全局模型的更新分發給所有參與方,並重複此過程直到模型收斂或達到停止條件。聯邦學習的主要優點之一是保護了用戶的數據隱私,因為數據始終保留在本地,不需要傳輸到中心服務器上。這種方法有助於避免了數據泄露和侵犯隱私的風險。同時,聯邦學習還有助於解決數據分散和不平衡的問題,因為每個參與方都可以使用自己的本地數據進行訓練,而不需要將數據集中在一個地方。 In order to solve the above-mentioned technical bottlenecks, the embodiment of the present invention incorporates data from community hospitals in the training process, and uses a model with a self-attention mechanism to capture the relationship between patches, thereby making the prediction results more reliable. The medical-assisted diagnosis system 100 of the present invention uses federated learning between areas of different data quality and distribution to enhance the generalization ability of the model. Federated learning is a distributed training method for machine learning that aims to achieve centralized training of models while protecting data privacy. Compared with traditional centralized training, federated learning can push the training process to the local device where the data is located, instead of collecting the data on a centralized server. In federated learning, each participant trains a local model locally, using local data for training. Then, updates to the local model are anonymously aggregated on the central server to form a global model. The central server distributes updates to the global model to all participants, and repeats this process until the model converges or a stopping condition is reached. One of the main advantages of federated learning is that it protects the user's data privacy, because the data is always kept locally and does not need to be transmitted to the central server. This approach helps avoid the risk of data leakage and privacy violations. At the same time, federated learning also helps solve the problem of data dispersion and imbalance, because each participant can use their own local data for training without the need to centralize the data in one place.
本發明的醫療輔助診斷系統100還為每個胃底和胃體圖像提供了完整的組織學預測,效能超出傳統胃黏膜病變評分系統(Operative Link for Gastritis Assessment;OLGA)或胃黏膜腺上皮化生評分系統(Operative Link for Gastritis Assessment of Intestinal Metaplasia;OLGIM)的要求。本系統產生的AI生成信息,可做為進一步長期研究的素材,持續增強預測胃癌風險的能力。 The medical auxiliary diagnosis system 100 of the present invention also provides a complete histological prediction for each gastric fundus and gastric body image, which exceeds the requirements of the traditional gastric mucosal lesion scoring system (Operative Link for Gastritis Assessment; OLGA) or gastric mucosal glandular metaplasia scoring system (Operative Link for Gastritis Assessment of Intestinal Metaplasia; OLGIM). The AI-generated information generated by this system can be used as material for further long-term research to continuously enhance the ability to predict gastric cancer risk.
綜上所述,本發明的醫療輔助診斷系統100可在遠程醫療中作為一個有價值的深度學習服務系統,特別是用於診斷資源有限地區的癌前胃部狀況及幽門螺旋桿菌感染。本系統提高了醫療服務的可及性,並使有限資源得以最優化地分配給那些真正需要進行程序的人群。 In summary, the medical auxiliary diagnosis system 100 of the present invention can be used as a valuable deep learning service system in remote medical care, especially for diagnosing precancerous gastric conditions and Helicobacter pylori infection in areas with limited resources. The system improves the accessibility of medical services and enables limited resources to be optimally allocated to those who really need procedures.
圖5是本發明實施例的第二模型132。在實作上,第二模型132是在DenseNet121神經網路架構的基礎上修改而成。輸入圖像$IN經過特徵提取模組501和分類模組502兩個階段處理後,產生輸出結果$OUT。 FIG5 is a second model 132 of an embodiment of the present invention. In practice, the second model 132 is modified based on the DenseNet121 neural network architecture. The input image $IN is processed by the feature extraction module 501 and the classification module 502 to generate the output result $OUT.
特徵提取模組501的主要功能是自輸入圖像$IN中提取有意義的特徵。在本實施例中,特徵提取模組501通過神經網絡的多層迭代,產生可代表輸入圖像$IN的高層次摘要,用於供分類模組502進行分類。在本實施例中,特徵提取模組501中包含多個運算模塊,依照處理輸入圖像$IN的資料流順序,包含卷積模組510、最大池化層512、第一密集塊514、卷積模組520、平均池化層522、第二密集塊524、卷積模組530、平均池化層532、第三密集塊534、卷積模組540、平均池化層542、以及第四密集塊544依序串接。 The main function of the feature extraction module 501 is to extract meaningful features from the input image $IN. In this embodiment, the feature extraction module 501 generates a high-level summary that can represent the input image $IN through multiple layers of neural network iterations for classification by the classification module 502. In this embodiment, the feature extraction module 501 includes a plurality of operation modules, which are connected in sequence according to the data flow order of processing the input image $IN, including a convolution module 510, a maximum pooling layer 512, a first dense block 514, a convolution module 520, an average pooling layer 522, a second dense block 524, a convolution module 530, an average pooling layer 532, a third dense block 534, a convolution module 540, an average pooling layer 542, and a fourth dense block 544.
卷積模組510、卷積模組520、卷積模組530、以及卷積模組540泛指類神經網路中的卷積層(convolution layer),可執行卷積運算而提取輸入數據中的特徵的關鍵部分。卷積運算就是在輸入數據上滑動一個小的濾波器(也稱為卷積核或過濾器),計算濾波器與輸入數據之間的點積,以産生輸出特徵圖(feature map)。所述濾波器的參數可通過迭代訓練而更新優化。通過對輸入圖像$IN進行多次迭代的卷積運算,特徵提取模組501能够從數據中提取出各種不同的特徵,例如邊緣、紋理、形狀等,是分類模組502進行後續分類的基礎。 Convolution module 510, convolution module 520, convolution module 530, and convolution module 540 generally refer to convolution layers in a neural network, which can perform convolution operations to extract key features from input data. The convolution operation is to slide a small filter (also called a convolution kernel or filter) on the input data, calculate the dot product between the filter and the input data, and generate an output feature map. The parameters of the filter can be updated and optimized through iterative training. By performing multiple iterative convolution operations on the input image $IN, the feature extraction module 501 can extract various features from the data, such as edges, textures, shapes, etc., which are the basis for the classification module 502 to perform subsequent classification.
本實施例的視覺轉換器500是DenseNet架構的改良版。每個卷積模組的運算結果,在進入下個卷積模組之前,還經過池化(Pooling)和密集塊(Dense Block)的處理。舉例來說,卷積模組510的輸出,經過最大池化層512和第一密集塊514的處理,才傳給卷積模組520。其後依此類推。 The visual converter 500 of this embodiment is an improved version of the DenseNet architecture. The calculation result of each convolution module is processed by pooling and dense blocks before entering the next convolution module. For example, the output of the convolution module 510 is processed by the maximum pooling layer 512 and the first dense block 514 before being passed to the convolution module 520. And so on.
如圖5所示,最大池化層512耦接卷積模組510的輸出端,設置為可減小卷積模組510輸出的特徵圖的空間尺寸,同時保留重要的特徵。換句話說,特徵圖中的每個區域內的最大值被保留,而其他值被丟棄,從而實現對特徵圖的壓縮。卷積模組510有助於减少模型的參數數量和計算量,同時提高特徵的位置不變性,使得第二模型132對於輸入圖像中的微小變化具有更好的魯棒性。 As shown in FIG5 , the maximum pooling layer 512 is coupled to the output of the convolution module 510 and is configured to reduce the spatial size of the feature map output by the convolution module 510 while retaining important features. In other words, the maximum value in each region of the feature map is retained, while other values are discarded, thereby achieving compression of the feature map. The convolution module 510 helps reduce the number of parameters and the amount of computation of the model, while improving the position invariance of the features, so that the second model 132 has better robustness to small changes in the input image.
第一密集塊514耦接最大池化層512的輸出端,可與其後的卷積模組520、卷積模組530、及卷積模組540形成直接連結,接收後續卷積模組520、卷積模組530(連接線未圖示)和卷積模組540(連接線未圖示)向前回饋的特徵圖。第一密集塊514將卷積模組510經過最大池化層512輸出的特徵圖,以及卷積模組520、卷積模組530、和卷積模組540向前回饋的特徵圖,以通道維度堆疊(Concatenate)在一起,成為卷積模組520的輸入值。密集塊的設置是DenseNet的核心特色,作用在於促進信息流動和特徵重用。由於每一層卷積模組的輸出串接到前面所有卷積模組的輸入端,可以增强特徵的表達能力,減輕梯度消失問題,使得模型更易於訓練和優化。 The first dense block 514 is coupled to the output end of the maximum pooling layer 512, and can be directly connected to the subsequent convolution module 520, the convolution module 530, and the convolution module 540, and receives the feature map fed back by the subsequent convolution module 520, the convolution module 530 (connection line is not shown), and the convolution module 540 (connection line is not shown). The first dense block 514 concatenates the feature map output by the convolution module 510 through the maximum pooling layer 512 and the feature map fed back by the convolution module 520, the convolution module 530, and the convolution module 540 in the channel dimension to form the input value of the convolution module 520. The setting of dense blocks is the core feature of DenseNet, which is used to promote information flow and feature reuse. Since the output of each layer of convolutional modules is connected in series to the input of all previous convolutional modules, the expressive power of features can be enhanced, the gradient vanishing problem can be alleviated, and the model can be easier to train and optimize.
平均池化層522耦接卷積模組520的輸出端,用於降低特徵圖的空間尺寸,從而减少模型的參數數量和計算量。平均池化層522將輸入區域內的像素值取平均,將特徵圖的空間尺寸降低,從而减少模型的參數數量和計算量。這有助减輕模型的過擬合問題,並提高模型的計算效率。與最大池化層512不同的是,平均池化層522會保留輸入區域內所有像素值的信息,而不僅僅是最大值。這意味著平均池化層522更注重整體特徵的平均表達,而不是局部最顯著特徵的提取。更進一步說,平均池化層522可將特徵圖中的局部變化平緩化,而減低了噪聲對特徵圖的影響。 The average pooling layer 522 is coupled to the output end of the convolution module 520 and is used to reduce the spatial size of the feature map, thereby reducing the number of parameters and the amount of calculation of the model. The average pooling layer 522 averages the pixel values in the input area and reduces the spatial size of the feature map, thereby reducing the number of parameters and the amount of calculation of the model. This helps to alleviate the overfitting problem of the model and improve the computational efficiency of the model. Unlike the maximum pooling layer 512, the average pooling layer 522 retains the information of all pixel values in the input area, not just the maximum value. This means that the average pooling layer 522 pays more attention to the average expression of the overall features rather than the extraction of the most significant local features. Furthermore, the average pooling layer 522 can smooth the local changes in the feature map and reduce the impact of noise on the feature map.
第二密集塊524耦接平均池化層522的輸出端,可與其後的卷積模組530、及卷積模組540形成直接連結,接收後續卷積模組530(連接線未圖示)和卷積模組540(連接線未圖示)向前回饋的特徵圖。圖5中後續平均池化層532和平均池化層542的運作與平均池化層522相同,第二密集塊524、第三密集塊534、第四密集塊544的運作與第一密集塊514相同,而卷積模組530和卷積模組540運作與卷積模組520相同,不再贅述。經過上述元件的迭代運算後,特徵提取模組501產生#IN的特徵圖,並傳送至分類模組502。 The second dense block 524 is coupled to the output end of the average pooling layer 522, and can be directly connected to the subsequent convolution module 530 and the convolution module 540, and receive the feature map fed back by the subsequent convolution module 530 (connection line not shown) and the convolution module 540 (connection line not shown). The operation of the subsequent average pooling layer 532 and the average pooling layer 542 in Figure 5 is the same as that of the average pooling layer 522, the operation of the second dense block 524, the third dense block 534, and the fourth dense block 544 is the same as that of the first dense block 514, and the operation of the convolution module 530 and the convolution module 540 is the same as that of the convolution module 520, which will not be repeated. After iterative calculations of the above components, the feature extraction module 501 generates a feature map of #IN and transmits it to the classification module 502.
綜上所述,圖5的特徵提取模組501實作了四層卷積神經網路以利特徵提取,但不實作全連接層。分類模組502中所使用的超參數,可透過超帶寬演算法(Hyperband)權衡每組超參數的運算資源,找出最佳的超參數,做為後續分類模組502的設置依據。Hyperband是用於優化超參數的高效算法, 它結合了隨機搜索和資源分配策略,可在有限的計算資源下,以不同組合有效地找到最佳的超參數組合。 In summary, the feature extraction module 501 of FIG5 implements a four-layer convolutional neural network for feature extraction, but does not implement a fully connected layer. The hyperparameters used in the classification module 502 can be weighed by the hyperband algorithm (Hyperband) to find the best hyperparameters as the setting basis for the subsequent classification module 502. Hyperband is an efficient algorithm for optimizing hyperparameters. It combines random search and resource allocation strategies to effectively find the best hyperparameter combination with different combinations under limited computing resources.
在本實施例中,輸入圖像$IN可以是影像服務器114所提供的胃內視鏡圖像,例如圖2所述的第一數據和第二數據。這些圖像經由預處理模組134預處理後,傳送至特徵提取模組501進行訓練。特徵提取模組501進行卷積神經網路運算所需要的初始權重,可採用已知的影像網路ImageNet模型所預先訓練而得的權重。權重值會隨著特徵提取模組501學習圖像的過程逐步微調。 In this embodiment, the input image $IN can be a gastric endoscopic image provided by the image server 114, such as the first data and the second data described in FIG2. These images are pre-processed by the pre-processing module 134 and then transmitted to the feature extraction module 501 for training. The initial weights required for the feature extraction module 501 to perform convolutional neural network operations can be weights pre-trained by the known image network ImageNet model. The weight values will be gradually fine-tuned as the feature extraction module 501 learns the image.
分類模組502的主要功能是為輸入圖像$IN進行分類。分類模組502可依據特徵提取模組501產生的特徵圖,計算輸入圖像$IN屬於每一類別的機率。在深度學習中,分類模組502通常會在特徵提取模組501之後設置全連接層來實現分類功能。本實施例的分類模組502中包含全局池化層550、第一全連接層552和第二全連接層554。 The main function of the classification module 502 is to classify the input image $IN. The classification module 502 can calculate the probability that the input image $IN belongs to each category based on the feature map generated by the feature extraction module 501. In deep learning, the classification module 502 usually sets a fully connected layer after the feature extraction module 501 to implement the classification function. The classification module 502 of this embodiment includes a global pooling layer 550, a first fully connected layer 552 and a second fully connected layer 554.
全局池化層550(Global Pooling)是一種池化操作,可將特徵提取模組501提供的特徵圖的空間尺寸降低到一個固定大小的向量。與局部池化不同,全域池化不考慮區域,而是直接作用於整個特徵圖。在一實施例中,全局池化層550進行的是全域平均池化(Global Average Pooling)運算,對整個特徵圖中的每個通道進行平均池化,得到每個通道的平均值。最終得到的向量表示整個特徵圖的平均特徵。全局池化層550將整個特徵圖的信息綜合到一個固定大小的向量中,可提高模型的計算效率和泛化能力。 The global pooling layer 550 (Global Pooling) is a pooling operation that can reduce the spatial size of the feature map provided by the feature extraction module 501 to a fixed-size vector. Unlike local pooling, global pooling does not consider the region, but directly acts on the entire feature map. In one embodiment, the global pooling layer 550 performs a global average pooling operation, which performs average pooling on each channel in the entire feature map to obtain the average value of each channel. The final vector represents the average feature of the entire feature map. The global pooling layer 550 integrates the information of the entire feature map into a fixed-size vector, which can improve the computational efficiency and generalization ability of the model.
全局池化層550之後連接了第一全連接層552和第二全連接層554。第一全連接層552是64階層的全連接層,而第二全連接層554是480階層的全連接層。全連接層(Fully Connected Layer),也稱為密集連接層,是深度學習模型中的一種常見結構。第一全連接層552和第二全連接層554位於視覺轉換器500的末尾,用於將前面各層的特徵進行整合和組合,從而輸出最終的預測結果。 The global pooling layer 550 is connected to the first fully connected layer 552 and the second fully connected layer 554. The first fully connected layer 552 is a 64-layer fully connected layer, and the second fully connected layer 554 is a 480-layer fully connected layer. A fully connected layer, also known as a densely connected layer, is a common structure in a deep learning model. The first fully connected layer 552 and the second fully connected layer 554 are located at the end of the visual converter 500 and are used to integrate and combine the features of the previous layers to output the final prediction result.
在第一全連接層552和第二全連接層554中,每個神經元與前一層的所有神經元都有連接,每個連接都有一個權重,這樣可以將前一層的所有信息都傳遞給全連接層。第一全連接層552和第二全連接層554中的每個神 經元都可以學習到輸入數據的不同方面或特徵,從而使得模型能够學習到更加複雜的非綫性關係,並對輸入數據進行更精確的建模。由於第一全連接層552和第二全連接層554中的每個神經元都與前一層的所有神經元都有連接,參數量較大,容易出現過擬合,本發明的實施例採用兩種不同階數的全連接層串接,適當調節分類模組502的計算量。 In the first fully connected layer 552 and the second fully connected layer 554, each neuron is connected to all neurons in the previous layer, and each connection has a weight, so that all information of the previous layer can be transmitted to the fully connected layer. Each neuron in the first fully connected layer 552 and the second fully connected layer 554 can learn different aspects or features of the input data, so that the model can learn more complex nonlinear relationships and model the input data more accurately. Since each neuron in the first fully connected layer 552 and the second fully connected layer 554 is connected to all neurons in the previous layer, the number of parameters is large and overfitting is prone to occur. The embodiment of the present invention uses two fully connected layers of different orders to be connected in series to appropriately adjust the calculation amount of the classification module 502.
在一實施例中,分類模組502可採用SoftMax激活函數輸出3個胃部位置的機率值。SoftMax激活函數可將一個具有任意實數值的向量映射到一個概率分布,因此很適合用來解決分類問題。舉例來說,經由第一全連接層552和第二全連接層554的輸出結果,通常是矩陣向量,經由Softmax激活函數轉換後,即可顯示多種答案的概率分布。 In one embodiment, the classification module 502 may use the SoftMax activation function to output the probability values of the three stomach positions. The SoftMax activation function can map a vector with arbitrary real values to a probability distribution, so it is very suitable for solving classification problems. For example, the output results of the first fully connected layer 552 and the second fully connected layer 554 are usually matrix vectors. After being converted by the Softmax activation function, the probability distribution of multiple answers can be displayed.
在實作中,SoftMax激活函數的公式如下:
其中,分母是一個K維實數向量中所有元素X1到XK的指數函數值總和,而分子是第i個元素Xi的自然數指數值。相除後得到第i個元素Xi的發生概率。SoftMax激活函數的作用是將輸入的K維向量轉換成一個概率分布,每個元素Xi以自然數指數化後,可正規化為一個概率值。SoftMax激活函數的輸出值範圍在0到1之間,而所有輸出的和為1,因此可以解釋為一個概率分布。倘若每一個元素代表一個類別,則SoftMax激活函數可應用於多分類問題的概率預測。 Among them, the denominator is the sum of the exponential function values of all elements X1 to XK in a K-dimensional real vector, and the numerator is the natural number exponential value of the i-th element Xi. After division, the probability of occurrence of the i-th element Xi is obtained. The function of the SoftMax activation function is to convert the input K-dimensional vector into a probability distribution. After each element Xi is exponentialized with a natural number, it can be normalized to a probability value. The output value of the SoftMax activation function ranges from 0 to 1, and the sum of all outputs is 1, so it can be interpreted as a probability distribution. If each element represents a category, the SoftMax activation function can be applied to the probability prediction of multi-classification problems.
本實施例的視覺轉換器500可採用轉移訓練法(Transfer Learning),利用已知的模型樣板搭配微調參數,以加速模型訓練。在本實施例中,用於訓練視覺轉換器500的超參數如下。批次大小(Batch size)設為32。最大迭代次數(Epoch)設為50。損失函數(Loss function)選用Categorical cross-entropy用於多類別分類任務。優化器(Optimizer)選用Adam。學習率(Learning rate)初始值設為0.001,其中使用回調函式(ReduceLROnPlateau)並將耐心(Patience)設為3,步進係數(Step Factor)設為0.2,藉此觀測驗證損失(Validation loss)。 The visual converter 500 of this embodiment can adopt transfer training method (Transfer Learning), using known model templates with fine-tuning parameters to accelerate model training. In this embodiment, the hyperparameters used to train the visual converter 500 are as follows. The batch size (Batch size) is set to 32. The maximum number of iterations (Epoch) is set to 50. The loss function (Loss function) uses Categorical cross-entropy for multi-category classification tasks. The optimizer (Optimizer) uses Adam. The initial value of the learning rate (Learning rate) is set to 0.001, in which the callback function (ReduceLROnPlateau) is used and the patience (Patience) is set to 3, and the step factor (Step Factor) is set to 0.2, thereby observing the validation loss (Validation loss).
回調函式ReduceLROnPlateau的作用是根據驗證集上的表現,動態地調整學習率,以幫助優化算法更快地收斂或者避免陷入局部最小值。驗 證損失是一種性能指標,用於評估模型在驗證集上的預測結果與真實標籤之間的差異。驗證損失可採用交叉熵損失函數來計算。較低的驗證損失表示模型在驗證集上的預測更接近真實情况,即模型更準確。 The callback function ReduceLROnPlateau is used to dynamically adjust the learning rate according to the performance on the validation set to help the optimization algorithm converge faster or avoid falling into a local minimum. Validation loss is a performance indicator used to evaluate the difference between the model's prediction results on the validation set and the true label. Validation loss can be calculated using the cross entropy loss function. A lower validation loss means that the model's prediction on the validation set is closer to the actual situation, that is, the model is more accurate.
圖6是本發明的視覺轉換器模型600實施例,用於說明第三模型133的訓練流程。 FIG6 is an embodiment of the visual converter model 600 of the present invention, which is used to illustrate the training process of the third model 133.
Google團隊於2017年提出具有自關注機制(Self-Attention)的轉換器(Transformer)神經網路,使模型可平行化訓練,獲得全域性資訊,成為自然語言處理的經典模型。視覺轉換器(Vision Transformer;ViT)於2021年誕生,在轉換器網路的基礎上,進一步突破自然語言處理與電腦視覺的屏障。視覺轉換器適用於圖像分類和其他計算機視覺任務。ViT同樣使用自關注機制(Self-Attention)來捕獲圖像中的全局和局部信息,因此運算不受卷積核大小的限制。ViT模型的基本結構是一個由多個轉換器模塊組成的堆疊,其中每個轉換器模塊由多個關注頭組成。每個關注頭都會將輸入特徵圖中的每個位置與所有其他位置進行比較,並根據它們的相關性為每個位置分配一個權重。這樣,模型可以學習到圖像中不同位置之間的關係,從而有效地捕獲全局和局部信息。ViT模型的結構簡單,易於擴展到不同的圖像尺寸和任務。此外,為了處理圖像中的位置信息,ViT還引入了位置編碼技術,將每個位置的坐標信息嵌入到特徵表示中。由於ViT模型透過多個關注頭同時處理不同尺度的信息,因此可提高模型的泛化能力。ViT模型在繪圖視覺任務上具備優異的性能,包括圖像分類、目標檢測、語義分割等。本實施例將自關注機制應用於第三模型133中,從而有效識別輸入圖像中是否存在萎縮性胃炎和腸上皮化生的癌前病變。 In 2017, the Google team proposed a transformer neural network with a self-attention mechanism, which enabled the model to be trained in parallel and obtain global information, becoming a classic model for natural language processing. The Vision Transformer (ViT) was born in 2021. Based on the transformer network, it further broke through the barrier between natural language processing and computer vision. The Vision Transformer is suitable for image classification and other computer vision tasks. ViT also uses a self-attention mechanism to capture global and local information in the image, so the operation is not limited by the size of the convolution kernel. The basic structure of the ViT model is a stack of multiple transformer modules, each of which consists of multiple attention heads. Each attention head compares each position in the input feature map with all other positions and assigns a weight to each position based on their relevance. In this way, the model can learn the relationship between different positions in the image, thereby effectively capturing global and local information. The ViT model has a simple structure and is easy to expand to different image sizes and tasks. In addition, in order to process the position information in the image, ViT also introduces position encoding technology to embed the coordinate information of each position into the feature representation. Because the ViT model processes information of different scales simultaneously through multiple attention heads, the generalization ability of the model can be improved. The ViT model has excellent performance in drawing vision tasks, including image classification, object detection, semantic segmentation, etc. This embodiment applies the self-attention mechanism to the third model 133, thereby effectively identifying whether there are precancerous lesions of atrophic gastritis and intestinal metaplasia in the input image.
相對於卷積神經網路CNN,Vit普遍缺乏平移關聯性及局部性資訊,且需要大規模資料集進行轉移學習,才可達相當成效。為解決此問題,本發明的實施例在傳統的Vit流程中整合了平移補丁令符化(Shifted patch tokenization)以及局部自關注(Locality self-attention)機制,形成如圖6的視覺轉換器模型600。 Compared to convolutional neural networks (CNNs), Vit generally lacks translational relevance and locality information, and requires large-scale data sets for transfer learning to achieve considerable results. To solve this problem, the embodiment of the present invention integrates shifted patch tokenization and locality self-attention mechanisms into the traditional Vit process to form a visual converter model 600 as shown in FIG6 .
圖6將視覺轉換器模型分解為多道處理程序,分層說明。可以理解各層內容實質上為邏輯執行步驟,並非限定硬體或軟體實作方式。 Figure 6 decomposes the visual converter model into multiple processing procedures and explains them in layers. It can be understood that the contents of each layer are essentially logical execution steps and do not limit the hardware or software implementation method.
在圖6中,平移補丁層610接收輸入圖像602,將輸入圖像602沿四邊對角線(左上、右上、左下、右下)位移形成四種位移圖像。平移補丁層610再將這四張位移圖像與輸入圖像602堆疊在一起,成為一個平移堆疊圖。 In FIG. 6 , the translation patch layer 610 receives the input image 602 and shifts the input image 602 along the four diagonal lines (upper left, upper right, lower left, and lower right) to form four shifted images. The translation patch layer 610 then stacks the four shifted images with the input image 602 to form a translation stacked image.
圖像切割層620接續著平移補丁層610的輸出端,將上述平移堆疊圖切成多個補丁(patch),展平成一維序列。每個補丁代表輸入圖像602中的一個局部區域。圖像切割層620還可包含線性投影和正規化操作,使每一個補丁對應一個固定長度的特徵向量,類似自然語言處理的令符(token)。 The image cutting layer 620 follows the output of the translation patch layer 610, cutting the above translation stack into multiple patches and flattening it into a one-dimensional sequence. Each patch represents a local area in the input image 602. The image cutting layer 620 may also include linear projection and normalization operations, so that each patch corresponds to a feature vector of a fixed length, similar to a token in natural language processing.
位置嵌入層630接收圖像切割層620輸出的多個補丁後,為每一補丁加上位置資訊。在實作中,位置嵌入層630可將每個補丁的位置編碼添加到補丁的特徵向量中。 After receiving multiple patches output by the image segmentation layer 620, the position embedding layer 630 adds position information to each patch. In practice, the position embedding layer 630 can add the position code of each patch to the feature vector of the patch.
多層編碼器640連接位置嵌入層630的輸出端,對所輸入的多個補丁進行特徵提取。多層編碼器640中可包含多頭關注(Multi-head attention)模組,形成多層迭代架構,透過不斷地進行自關注運算以及正規化運算,逐步學習並提取每個補丁的重要特徵。多層編碼器640中的其中一或多個多頭關注模組,可進行局部自關注運算。局部自關注運算只為每個補丁計算與相鄰補丁的關聯性,而不處理非相鄰補丁。藉此,多層編碼器640可優先強調關聯性較大之局部重要區域,使模型能够更有效地捕獲圖像中的特徵向量矩陣。 The multi-layer encoder 640 is connected to the output end of the position embedding layer 630 to extract features from the input multiple patches. The multi-layer encoder 640 may include a multi-head attention module to form a multi-layer iterative architecture. By continuously performing self-attention operations and normalization operations, the important features of each patch are gradually learned and extracted. One or more multi-head attention modules in the multi-layer encoder 640 can perform local self-attention operations. The local self-attention operation only calculates the correlation between each patch and its neighboring patches, and does not process non-neighboring patches. In this way, the multi-layer encoder 640 can prioritize the local important areas with greater correlation, so that the model can more effectively capture the feature vector matrix in the image.
多層感知頭650銜接多層編碼器640的輸出端,為做視覺轉換器模型600的最後一道處理關卡,可以將多層編碼器640處理補丁後產出的特徵向量矩陣池化,以得到可代表整個輸入圖像602的機率分布矩陣604。在本實施例的視覺轉換器模型600中,多層感知頭650可以是位於深度學習模型頂部的一個或多個全連接層,用於將模型學習到的特徵表示轉換為最終的任務預測或分類結果。多層感知頭650的設計取决於具體的任務和模型結構,通常可以通過調整層數、神經元數量和激活函數等參數來進行客製化。 The multi-layer perception head 650 is connected to the output of the multi-layer encoder 640. As the last processing stage of the visual converter model 600, the feature vector matrix generated by the multi-layer encoder 640 after processing the patch can be pooled to obtain a probability distribution matrix 604 that can represent the entire input image 602. In the visual converter model 600 of this embodiment, the multi-layer perception head 650 can be one or more fully connected layers located at the top of the deep learning model, which is used to convert the feature representation learned by the model into the final task prediction or classification result. The design of the multi-layer sensor head 650 depends on the specific task and model structure, and can usually be customized by adjusting parameters such as the number of layers, number of neurons, and activation functions.
在視覺轉換器模型600的實作方面,可先採用開源資料集大腸癌組織切片圖像(5000,150,150,3)進行模型預訓練20個訓練週期,獲得初始權重向量矩陣。接著藉由學習轉移法,將初始權重向量矩陣輸入視覺轉換器模型600中進一步微調。多層感知頭650的全連接層數可隨著欲分類的個數而調整。在本實施例中,視覺轉換器模型600所預測的病症為多標籤(multi-label)形式, 故多層感知頭650可採用Sigmoid激活函式來分類輸出病理嚴重程度,(severity 0,severity 1,severity 2)。 In the implementation of the visual converter model 600, the open source dataset of colorectal cancer tissue slice images (5000, 150, 150, 3) can be used to pre-train the model for 20 training cycles to obtain the initial weight vector matrix. Then, the initial weight vector matrix is input into the visual converter model 600 for further fine-tuning by using the learning transfer method. The number of fully connected layers of the multi-layer perception head 650 can be adjusted according to the number of categories to be classified. In this embodiment, the disease predicted by the visual converter model 600 is in a multi-label form, so the multi-layer perception head 650 can use the Sigmoid activation function to classify and output the severity of the pathology (severity 0, severity 1, severity 2).
Sigmoid的函數公式如下:
其中,x代表輸入值。Sigmoid函數常應用於二分類問題中的輸出層。因為它將輸入值映射到0到1之間,可以將輸出解釋為概率值,表示正類的概率。在梯度下降算法中,Sigmoid函數的導數計算相對簡單,這使得它在反向傳播算法中計算梯度時更加高效。在另外的實施例中,多層感知頭650也可採用ReLU(Rectified Linear Unit)激活函式。 Where x represents the input value. The Sigmoid function is often used in the output layer of binary classification problems. Because it maps the input value to between 0 and 1, the output can be interpreted as a probability value, indicating the probability of the positive class. In the gradient descent algorithm, the derivative calculation of the Sigmoid function is relatively simple, which makes it more efficient when calculating the gradient in the back propagation algorithm. In another embodiment, the multi-layer perception head 650 can also use the ReLU (Rectified Linear Unit) activation function.
視覺轉換器模型600所使用的參數如下。批次大小(Batch size)設為32,最大訓練週期設為50。損失函數(Loss function)選用二元交叉熵(Binary Crossentropy),用於二分類問題,並最小化預測概率與實際標籤之間的差異。在二分類問題中,通常標籤值為0或1,各對應一個類別。當模型的預測結果與實際標籤一致時,損失函數的值應該越接近0。如果模型的預測結果與實際標籤不一致,損失函數的值會增大。二元交叉熵損失函數在訓練過程中會鼓勵模型産生接近於真實標籤的概率輸出,幷且在反向傳播過程中提供良好的梯度信息,有助於模型參數的優化。 The parameters used by the visual converter model 600 are as follows. The batch size is set to 32 and the maximum training epoch is set to 50. The loss function is binary cross entropy, which is used for binary classification problems and minimizes the difference between the predicted probability and the actual label. In binary classification problems, the label value is usually 0 or 1, each corresponding to a category. When the model's prediction result is consistent with the actual label, the value of the loss function should be closer to 0. If the model's prediction result is inconsistent with the actual label, the value of the loss function will increase. The binary cross entropy loss function encourages the model to produce probability outputs close to the true labels during training, and provides good gradient information during backpropagation, which helps optimize model parameters.
本實施例的優化器可選用Adam,一種自適應學習率的優化算法。Adam優化器結合動量梯度下降(Momentum)和自適應學習率調整(Adaptive Learning Rate)的特性,能够在訓練過程中有效地更新模型參數。 The optimizer of this embodiment can use Adam, an adaptive learning rate optimization algorithm. The Adam optimizer combines the characteristics of momentum gradient descent (Momentum) and adaptive learning rate adjustment (Adaptive Learning Rate), and can effectively update model parameters during the training process.
視覺轉換器模型600中的學習率初始值設為0.0001,其中使用ReduceLROnPlateau函式,耐心值設為3,係數Factor設為0.2。也就是當證損失超過3個訓練週期未下降,則按照係數減少學習率,使視覺轉換器模型600學習更多的細節與資訊。 The initial value of the learning rate in the visual converter model 600 is set to 0.0001, and the ReduceLROnPlateau function is used, the patience value is set to 3, and the coefficient Factor is set to 0.2. That is, when the evidence loss does not decrease for more than 3 training cycles, the learning rate is reduced according to the coefficient, so that the visual converter model 600 learns more details and information.
此外,視覺轉換器模型600還採用提前停止(Early stopping)的機制並給予耐心值為6。在訓練過程中,若經過6個訓練週期後,驗證損失仍未下降,視覺轉換器模型600停止訓練以避免後續無謂的訓練時間。 In addition, the visual converter model 600 also adopts an early stopping mechanism and gives a patience value of 6. During the training process, if the verification loss has not decreased after 6 training cycles, the visual converter model 600 stops training to avoid unnecessary subsequent training time.
綜上所述,本發明的第三模型133是採用視覺轉換器模型600的實施例訓練而成。而在推測模組138為輸入圖像進行診斷時,利用訓練完成 的第三模型133來判斷是否發生萎縮性胃炎和腸上皮化生之癌前病變,並預測其嚴重程度。推測模組138還可透過熱圖疊加的方式,將推測結果視覺化,以便於使用者判讀診斷結果。 In summary, the third model 133 of the present invention is trained using the embodiment of the visual converter model 600. When the inference module 138 performs diagnosis on the input image, the trained third model 133 is used to determine whether atrophic gastritis and precancerous lesions of intestinal metaplasia occur and predict their severity. The inference module 138 can also visualize the inference results by overlaying heat maps to facilitate the user to read the diagnosis results.
圖7是本發明實施例的影像增強結果。預處理模組134為輸入的圖像數據執行CLAHE運算,以增強圖像細節,突顯胃黏膜特徵。如圖7所示,原始胃竇圖像710、原始胃體圖像720和原始胃底圖像730經過預處理模組134的處理後,成為增強胃竇圖像712、增強胃體圖像722和增強胃底圖像732。由於圖像的對比增強了,更有助於後續的預測運算。 FIG7 is the image enhancement result of an embodiment of the present invention. The preprocessing module 134 performs CLAHE operation on the input image data to enhance the image details and highlight the gastric mucosal features. As shown in FIG7, the original gastric sinus image 710, the original gastric body image 720 and the original gastric fundus image 730 are processed by the preprocessing module 134 to become the enhanced gastric sinus image 712, the enhanced gastric body image 722 and the enhanced gastric fundus image 732. Since the contrast of the image is enhanced, it is more helpful for the subsequent prediction operation.
圖8是本發明實施例的熱圖疊加結果。推測模組138在對原始圖像810進行推測時,可利用梯度加權類激活映射(Gradient-weighted Class Activation Mapping;Grad-CAM)運算來生成熱圖(Heatmap)。透過將熱圖與原始圖像進行叠加,使原始圖像810成為熱圖疊加圖820,其中的色偏視覺效果代表值得關注的區域。藉此,熱圖疊加圖820有助於成為臨床診斷之參考依據。 FIG8 is a heat map superposition result of an embodiment of the present invention. When the inference module 138 infers the original image 810, it can generate a heat map by using a gradient-weighted class activation mapping (Grad-CAM) operation. By superimposing the heat map with the original image, the original image 810 becomes a heat map superposition 820, in which the color deviation visual effect represents the area worthy of attention. Thus, the heat map superposition 820 helps to become a reference for clinical diagnosis.
本發明所提出的醫療輔助診斷系統100,不但整合了人工智慧模型來模擬醫學判斷思維過程,還將資源有限地區的醫療數據納入分析。本發明的實作方式帶來幾個優點。舉例來說,傳統內視鏡圖像的品質,如接近度和模糊度,會影響圖像分類的逐步過程。本發明的實施例提供熱圖疊加的效果,可輔助內視鏡醫師判斷醫療輔助診斷系統100產出的推測結果。在傳統做法中,內視鏡圖像的取樣角度誤差,會影響人工智慧系統的判別能力。本發明除了遵循組織學標準採樣程序,還透過圖像預處理程序產生正規化的圖像數據,以有效減少誤差變數。本發明實施例在利用第三模型133推測癌前胃部疾病時,還增強黏膜圖像細節,有效輔助內視鏡圖像分析工作。此外,本實施例的人工智慧預測模型在對樣本進行推測分析時,將癌前狀況納入考慮。實測執行結果後發現,整體患病率具有較高負面預測值。這表示本發明的醫療輔助診斷系統100可有效減少低風險胃癌患者進行內視鏡監測的要求,節省了不必要的醫療成本 The medical assisted diagnosis system 100 proposed in the present invention not only integrates an artificial intelligence model to simulate the medical judgment thinking process, but also incorporates medical data from resource-limited areas into the analysis. The implementation method of the present invention brings several advantages. For example, the quality of traditional endoscopic images, such as proximity and blur, will affect the step-by-step process of image classification. The embodiment of the present invention provides a heat map overlay effect, which can assist endoscopists in judging the inference results produced by the medical assisted diagnosis system 100. In traditional practices, the sampling angle error of endoscopic images will affect the judgment ability of the artificial intelligence system. In addition to following the standard histological sampling procedure, the present invention also generates normalized image data through an image preprocessing procedure to effectively reduce error variables. When using the third model 133 to infer precancerous gastric diseases, the embodiment of the present invention also enhances the details of the mucosal image, effectively assisting the endoscopic image analysis work. In addition, the artificial intelligence prediction model of this embodiment takes precancerous conditions into consideration when inferring and analyzing the sample. After the actual implementation results were tested, it was found that the overall prevalence had a higher negative prediction value. This means that the medical auxiliary diagnosis system 100 of the present invention can effectively reduce the requirements for endoscopic monitoring of low-risk gastric cancer patients, saving unnecessary medical costs
需要說明的是,在本文中,術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含,從而使得包括一系列要素的過程、方法、物品或者裝置不僅包括那些要素,而且還包括沒有明確列出的其他要素,或者是還包括為這種過程、方法、物品或者裝置所固有的要素。在沒有更多限制的 情況下,由語句“包括一個…”限定的要素,並不排除在包括該要素的過程、方法、物品或者裝置中還存在另外的相同要素。 It should be noted that, in this article, the terms "include", "comprises" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of more restrictions, the elements defined by the phrase "includes a..." do not exclude the existence of other identical elements in the process, method, article or device including the element.
上面結合圖式對本發明的實施例進行了描述,但是本發明並不局限於上述的具體實施方式,上述的具體實施方式僅僅是示意性的,而不是限制性的,本領域的普通技術人員在本發明的啟示下,在不脫離本發明宗旨和申請專利範圍所保護的範圍情況下,還可做出很多形式,均屬本發明的保護範圍之內。 The above describes the embodiments of the present invention in combination with the drawings, but the present invention is not limited to the above specific embodiments. The above specific embodiments are only illustrative and not restrictive. Under the inspiration of the present invention, ordinary technical personnel in this field can make many forms without departing from the purpose of the present invention and the scope of protection of the patent application, all of which are within the scope of protection of the present invention.
100:醫療輔助診斷系統 100: Medical Assisted Diagnosis System
110:採樣系統 110: Sampling system
112:影像獲取裝置 112: Image acquisition device
114:影像服務器 114: Image Server
120:用戶裝置 120: User device
130:影像分析系統 130: Image analysis system
131:第一模型 131: First Model
132:第二模型 132: Second Model
133:第三模型 133: The third model
134:預處理模組 134: Preprocessing module
136:訓練模組 136: Training module
138:推測模組 138: Inference module
Claims (14)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/654,621 US20240420330A1 (en) | 2023-06-16 | 2024-05-03 | Method and system for auxiliary medical diagnosis for gastritis |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363521393P | 2023-06-16 | 2023-06-16 | |
| US63/521,393 | 2023-06-16 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI863845B true TWI863845B (en) | 2024-11-21 |
| TW202529126A TW202529126A (en) | 2025-07-16 |
Family
ID=94380244
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW113109840A TWI863845B (en) | 2023-06-16 | 2024-04-02 | Method and system for auxiliary medical diagnosis for gastritis |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI863845B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI895216B (en) * | 2025-01-22 | 2025-08-21 | 雲弼股份有限公司 | Automated remote medical control system |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI680440B (en) * | 2018-08-31 | 2019-12-21 | 雲云科技股份有限公司 | Image detection method and image detection device for determining postures of user |
| TW202006742A (en) * | 2018-06-22 | 2020-02-01 | 日商Ai醫療服務股份有限公司 | Diagnosis support method, diagnosis support system, diagnosis support program, and computer-readable recording medium that memorizes the diagnosis support program by endoscopy images of digestive organs |
| TW202037327A (en) * | 2018-11-21 | 2020-10-16 | 日商Ai醫療服務股份有限公司 | Disease diagnostic assistance method based on digestive organ endoscopic images, diagnostic assistance system, diagnostic assistance program, and computer-readable recording medium having diagnostic assistance program stored thereon |
| WO2021147429A1 (en) * | 2020-01-20 | 2021-07-29 | 腾讯科技(深圳)有限公司 | Endoscopic image display method, apparatus, computer device, and storage medium |
-
2024
- 2024-04-02 TW TW113109840A patent/TWI863845B/en active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW202006742A (en) * | 2018-06-22 | 2020-02-01 | 日商Ai醫療服務股份有限公司 | Diagnosis support method, diagnosis support system, diagnosis support program, and computer-readable recording medium that memorizes the diagnosis support program by endoscopy images of digestive organs |
| TWI680440B (en) * | 2018-08-31 | 2019-12-21 | 雲云科技股份有限公司 | Image detection method and image detection device for determining postures of user |
| TW202037327A (en) * | 2018-11-21 | 2020-10-16 | 日商Ai醫療服務股份有限公司 | Disease diagnostic assistance method based on digestive organ endoscopic images, diagnostic assistance system, diagnostic assistance program, and computer-readable recording medium having diagnostic assistance program stored thereon |
| WO2021147429A1 (en) * | 2020-01-20 | 2021-07-29 | 腾讯科技(深圳)有限公司 | Endoscopic image display method, apparatus, computer device, and storage medium |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI895216B (en) * | 2025-01-22 | 2025-08-21 | 雲弼股份有限公司 | Automated remote medical control system |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202529126A (en) | 2025-07-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11562820B2 (en) | Computer classification of biological tissue | |
| CN107103187B (en) | Method and system for detection, grading and management of pulmonary nodules based on deep learning | |
| CN110689025B (en) | Image recognition method, device, system and endoscopic image recognition method and device | |
| US20240420330A1 (en) | Method and system for auxiliary medical diagnosis for gastritis | |
| WO2020207377A1 (en) | Method, device, and system for image recognition model training and image recognition | |
| US20120283574A1 (en) | Diagnosis Support System Providing Guidance to a User by Automated Retrieval of Similar Cancer Images with User Feedback | |
| Asare et al. | Detection of anaemia using medical images: A comparative study of machine learning algorithms–A systematic literature review | |
| CN109102491A (en) | A kind of gastroscope image automated collection systems and method | |
| US20230206435A1 (en) | Artificial intelligence-based gastroscopy diagnosis supporting system and method for improving gastrointestinal disease detection rate | |
| CN117522861A (en) | Intelligent monitoring system and method for animal rotator cuff injury | |
| WO2024074921A1 (en) | Distinguishing a disease state from a non-disease state in an image | |
| CN117218129A (en) | Esophageal cancer image recognition and classification methods, systems, equipment and media | |
| TWI863845B (en) | Method and system for auxiliary medical diagnosis for gastritis | |
| Ayalew et al. | Atelectasis detection in chest X-ray images using convolutional neural networks and transfer learning with anisotropic diffusion filter | |
| Fu et al. | D2polyp-Net: A cross-modal space-guided network for real-time colorectal polyp detection and diagnosis | |
| CN114565786A (en) | Tomography image classification device and method based on channel attention mechanism | |
| CN114519705A (en) | Ultrasonic standard data processing method and system for medical selection and identification | |
| CN115409812A (en) | CT image automatic classification method based on fusion time attention mechanism | |
| CN118298237A (en) | Lung ultrasonic image classification method, system and medical equipment | |
| CN118552563A (en) | A breast ultrasound image segmentation method based on window attention semantic stream alignment | |
| CN118736292A (en) | A method, device and program product for auxiliary diagnosis of ovarian adnexal masses based on ultrasound | |
| Bhandari et al. | Improved diabetic retinopathy severity classification using squeeze-and-excitation and sparse light weight multi-level attention u-net with transfer learning from xception | |
| CN115607113A (en) | Hand diagnosis data processing method and system for patients with coronary heart disease based on deep learning model | |
| CA3205896A1 (en) | Machine learning enabled system for skin abnormality interventions | |
| Narne et al. | Evaluating Deep Learning Models for Accurate Pneumonia Diagnosis from Chest X-Rays |