[go: up one dir, main page]

TWI883261B - Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data - Google Patents

Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data Download PDF

Info

Publication number
TWI883261B
TWI883261B TW110134284A TW110134284A TWI883261B TW I883261 B TWI883261 B TW I883261B TW 110134284 A TW110134284 A TW 110134284A TW 110134284 A TW110134284 A TW 110134284A TW I883261 B TWI883261 B TW I883261B
Authority
TW
Taiwan
Prior art keywords
flow cytometer
data
cytometer data
matrix
vector
Prior art date
Application number
TW110134284A
Other languages
Chinese (zh)
Other versions
TW202311742A (en
Inventor
李政霖
陳玉霖
李祈均
王毓棻
宋文傑
梁昌昕
Original Assignee
先勁智能有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 先勁智能有限公司 filed Critical 先勁智能有限公司
Priority to TW110134284A priority Critical patent/TWI883261B/en
Publication of TW202311742A publication Critical patent/TW202311742A/en
Application granted granted Critical
Publication of TWI883261B publication Critical patent/TWI883261B/en

Links

Images

Landscapes

  • Investigating Or Analysing Biological Materials (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

Introduced here is an approach to improving the automatic identification of hematological diseases using computer-implemented models that are trained to rapidly distinguish between different collections of immunophenotypes that represent different disease types or disease states. Understanding the different patterns of immunophenotype collections contained in a given sample may permit a proposed diagnosis for a given hematological disease to be produced for the corresponding patient. For example, the proposed diagnoses may be output by a classification model based on the distribution of immunophenotypes across the given sample.

Description

流式細胞儀資料之免疫表型自動分類的方法與非暫態電腦可讀取媒體Method and non-transient computer-readable medium for automatic classification of immunophenotypes of flow cytometric data

本發明主張於民國109年9月14日所申請之「Systems and Methods for Automatic Classification of Flow Cytometry Data」美國專利臨時申請案US 63/078,312號的優先權,以及於民國109年9月15日所申請之「Methods for Automatic Preprocessing Flow Cytometry Data」美國專利臨時申請案US 63/078,662號的優先權。 This invention claims priority to U.S. Patent Provisional Application No. 63/078,312 filed on September 14, 2020, entitled "Systems and Methods for Automatic Classification of Flow Cytometry Data", and U.S. Patent Provisional Application No. 63/078,662 filed on September 15, 2020, entitled "Methods for Automatic Preprocessing Flow Cytometry Data".

本揭露所提供的具體實施例涉及電腦程式和相關的電腦實現的技術,其用於對流式細胞儀之資料進行自動分類。本發明揭示一種資料進行自動分類,特別是一種用於流式細胞儀資料之免疫表型自動分類。 The specific embodiments provided in this disclosure relate to computer programs and related computer-implemented techniques for automatically classifying flow cytometer data. The present invention discloses a method for automatically classifying data, in particular, an automatic classification of immunophenotypes for flow cytometer data.

白血病(Leukemia或leukaemia)是指從正常情況下會發展成不同類型的血球細胞中開始的血液學疾病。一般來說,白血病開始於骨髓,並導致產生大量的異常血球細胞。這些異常的血球細胞可能被稱為「白血病細胞」或「芽細胞」。白血病的確切原因尚未明確,所以通常根據血液檢查或骨髓檢查(又稱 「骨髓切片檢查」)的結果進行診斷。一般來說,當一個人(即「病人」或「受試者」)指出其出血、瘀傷、疲勞和發燒等症狀時會進行血液檢查或骨髓切片檢查。 Leukemia (or leukaemia) is a blood disease that begins in normally developing blood cells that develop into different types. Generally, leukemia begins in the bone marrow and causes large numbers of abnormal blood cells to develop. These abnormal blood cells may be called "leukemic cells" or "blast cells." The exact cause of leukemia is unknown, so it is usually diagnosed based on the results of a blood test or a bone marrow test (also called a "bone marrow biopsy"). Generally, a blood test or bone marrow biopsy is done when a person (the "patient" or "subject") reports symptoms such as bleeding, bruising, fatigue, and fever.

白血病有四種主要類型:急性淋巴球性白血病(ALL)、急性骨髓性白血病(AML)、慢性淋巴球性細胞白血病(CLL)和慢性骨髓性白血病(CML),以及一些罕見的類型。白血病屬於一廣泛疾病族群會影響血液、骨髓和淋巴系統。此一大類疾病通常被統稱為「造血和淋巴組織腫瘤」。 There are four main types of leukemia: acute lymphocytic leukemia (ALL), acute myeloid leukemia (AML), chronic lymphocytic leukemia (CLL), and chronic myeloid leukemia (CML), as well as some rarer types. Leukemia is a broad group of diseases that affect the blood, bone marrow, and lymphatic system. This large group of diseases is often collectively referred to as "hematopoietic and lymphoid tissue tumors."

上述類型於過往主要是根據(i)白血病是急性(即快速生長)還是慢性(即緩慢生長),以及(ii)白血病是始於髓細胞還是淋巴細胞來進行劃分的。白血病和急性髓細胞白血病一般從骨髓開始,但隨後經常轉移到血液和人體的其他部位,包括淋巴結、肝臟和脾臟。芽細胞在人體中擴散的速度與潛在的白血病是急性還是慢性相對應。芽細胞的存在和流行也可以表明其他血液疾病,例如淋巴瘤和多發性骨髓瘤。 These categories have been primarily divided based on (i) whether the leukemia is acute (i.e., growing rapidly) or chronic (i.e., growing slowly), and (ii) whether the leukemia begins in myeloid or lymphocyte cells. Leukemias and acute myeloid leukemias generally begin in the bone marrow but often then spread to the blood and other parts of the body, including the lymph nodes, liver, and spleen. The rate at which blasts spread through the body corresponds to whether the underlying leukemia is acute or chronic. The presence and prevalence of blasts can also indicate other blood diseases, such as lymphoma and multiple myeloma.

瞭解人體的血液和淋巴系統示有助於要瞭解白血病和淋巴瘤。 Understanding the body's blood and lymphatic systems helps in understanding leukemia and lymphoma.

骨髓是一些骨骼內部一個柔軟的部分。較上位地,骨髓是由造血細胞、脂肪細胞和支援組織所組成的。骨髓中的一小部分造血細胞通常是造血幹細胞。在骨髓內,造血幹細胞進行演變以便發展成紅血球、血小板或白血球。紅血球(RBCs)將氧氣從肺部帶到人體的其他組織中,並將二氧化碳帶回肺部排出(如藉由呼氣)。血小板是由一種叫做「巨核細胞」的血液幹細胞製造而成的 細胞碎片。血小板對於堵塞由割傷、擦傷等引起的血管孔洞非常重要。白血球(WBCs)負責幫助人體抵抗感染。 Bone marrow is a soft part inside some bones. At a higher level, bone marrow is made up of hematopoietic cells, fat cells, and support tissue. A small percentage of hematopoietic cells in the bone marrow are usually hematopoietic stem cells. Within the bone marrow, hematopoietic stem cells develop in order to develop into red blood cells, platelets, or white blood cells. Red blood cells (RBCs) carry oxygen from the lungs to other tissues in the body and carry carbon dioxide back to the lungs for excretion (such as by exhaling). Platelets are cell fragments made by a type of blood stem cell called a megakaryocyte. Platelets are important for plugging holes in blood vessels caused by cuts, scrapes, etc. White blood cells (WBCs) are responsible for helping the body fight infection.

白血球有三種主要類型:淋巴球、顆粒性白血球和單核球。淋巴球是構成淋巴結和人體其他部位的淋巴組織的主要細胞。淋巴球從「淋巴母細胞」發展成成熟的、抗感染的細胞。淋巴球有兩種主要類型:B淋巴球(又稱為「B細胞」)和T淋巴球(又稱「T細胞」)。B細胞藉由製造被稱為抗體的蛋白質附著於病菌上以保護人體,而T細胞一般幫助消滅這些病菌。ALL可由早期的淋巴球發展而來。ALL可始於B細胞或T細胞的早期成熟階段。淋巴瘤也開始於淋巴球,儘管它通常影響淋巴結中的B細胞或T細胞,而不是血液和骨髓。顆粒性白血球細胞是含有顆粒的白血球。這些顆粒通常含有酶和其他物質,而有助於消滅病菌。顆粒性白血球有三種類型:中性顆粒性白血球、嗜鹼性顆粒性白血球和嗜酸性顆粒性白血球,其可以藉由顆粒的大小及顏色加以區分。單核球也有助於保護身體免受細菌侵害。正常情況下,單核球在血液中循環的時間相對較短(例如大約一天),然後進入組織成為巨噬細胞,而巨噬細胞可以藉由包圍且消化病菌而消滅病菌。 There are three main types of white blood cells: lymphocytes, granulocytes, and monocytes. Lymphocytes are the main cells that make up the lymphatic tissues in lymph nodes and other parts of the body. Lymphocytes develop from "lymphoblasts" into mature, infection-fighting cells. There are two main types of lymphocytes: B lymphocytes (also called "B cells") and T lymphocytes (also called "T cells"). B cells protect the body by making proteins called antibodies that attach to germs, while T cells generally help to destroy these germs. ALL can develop from early lymphocytes. ALL can begin in the early maturation stages of B cells or T cells. Lymphoma also begins in lymphocytes, although it usually affects B cells or T cells in the lymph nodes rather than the blood and bone marrow. Granulocytes are white blood cells that contain granules. These granules often contain enzymes and other substances that help to destroy germs. There are three types of granulocytes: neutrophils, basophils, and eosinophils, which can be distinguished by the size and color of the granules. Monocytes also help protect the body from bacteria. Normally, monocytes circulate in the blood for a relatively short time (e.g., about a day) before entering tissues and becoming macrophages, which can destroy pathogens by surrounding and digesting them.

術語「骨髓細胞」通常用以指那些可發展成紅血球、血小板或淋巴細胞以外的白血球的血液幹細胞。相較於ALL,在AML中這些骨髓細胞才是不正常的。 The term "bone marrow cells" is usually used to refer to those blood stem cells that can develop into red blood cells, platelets, or white blood cells other than lymphocytes. In AML, compared to ALL, it is these bone marrow cells that are abnormal.

淋巴系統是一個器官,其是循環系統和免疫系統的一部分。淋巴系統是由淋巴、淋巴管、淋巴結、淋巴器官和淋巴組織所構成的大網路。這些血管將一種被稱為「淋巴」的透明液體向心臟輸送。與心血管系統不同,淋巴系統 不是一個封閉的系統。這意味著影響淋巴系統的問題如果不及時治療,會迅速擴散到整個身體。 The lymphatic system is an organ that is part of the circulatory system and the immune system. The lymphatic system is a large network of lymph, lymphatic vessels, lymph nodes, lymphatic organs, and lymphatic tissue. These vessels carry a clear fluid called lymph toward the heart. Unlike the cardiovascular system, the lymphatic system is not a closed system. This means that problems affecting the lymphatic system can quickly spread throughout the body if not treated promptly.

如上所述,白血病的診斷通常是由醫護人員根據血液檢查或骨髓切片檢查的結果作出。藉由觀察一個人的血液樣本,醫護人員可以確定是否存在紅血球、血小板或白血球異常,即可能有白血病之徵兆。血液檢查也可以檢測出芽細胞,儘管不是所有類型的白血病都會產生芽細胞在血液中循環。有時芽細胞會停留在骨髓中。基於此原因,專業醫事人員可能會建議進行骨髓檢查,即抽取骨髓樣本,以檢測是否有芽細胞。 As mentioned above, the diagnosis of leukemia is usually made by a healthcare provider based on the results of a blood test or a bone marrow biopsy. By looking at a sample of a person's blood, a healthcare provider can determine if there are abnormalities in the red blood cells, platelets, or white blood cells that could be a sign of leukemia. A blood test can also detect blasts, although not all types of leukemia produce blasts that circulate in the blood. Sometimes blasts stay in the bone marrow. For this reason, a healthcare professional may recommend a bone marrow test, which involves taking a sample of the bone marrow to test for blasts.

雖然最近年醫學進步而提高了白血病確診個體的生存率,但在某些情況下,意外的結果仍然會突然影響預後。目前臨床實務上使用微量殘存疾病(MRD)的鑒定作為預後指標,該指標使用流式細胞儀(FC)進行檢測。上位地,FC是一種用於檢測和測量細胞群特徵的技術。 Although recent medical advances have improved the survival rate of individuals diagnosed with leukemia, in some cases, unexpected results can still suddenly affect the prognosis. Current clinical practice uses the identification of minimal residual disease (MRD) as a prognostic indicator, which is detected using flow cytometry (FC). Generally, FC is a technology used to detect and measure the characteristics of cell populations.

在FC實驗(或簡稱「實驗」)中,含有細胞的樣本最初被懸浮在液體中。通常情況下,這些細胞用只與某些類型的細胞結合的螢光標誌進行標記,以便定義不同類型的細胞。然後將樣本注入流式細胞儀,而在流式細胞儀中該樣本被聚焦(理想情況下為一次一個細胞)並通過雷射束。而細胞所散射出的光具有細胞的特徵,從而形成反應樣本中所包含的細胞類型之照明模式。由於細胞是用螢光標誌標記的,所以光會被吸收,然後在特定的波長波段內被放射出來。 In an FC experiment (or just "the experiment"), a sample containing cells is initially suspended in a liquid. Typically, these cells are labeled with a fluorescent marker that only binds to certain types of cells, allowing for the definition of different types of cells. The sample is then injected into a flow cytometer, where it is focused (ideally one cell at a time) and passed through a laser beam. The light scattered by the cells is characteristic of the cells, resulting in an illumination pattern that reflects the type of cells contained in the sample. Because the cells are labeled with a fluorescent marker, the light is absorbed and then emitted in a specific wavelength band.

相應地,該實驗可能涉及測量抗體標誌上的螢光激發程度,以產生FC資料。於過往歷史上,醫療專業人員藉由對二維圖的視覺分析來手動檢查FC資料,以確定是否為適當的診斷。此方法不僅費力費時,因為細胞的數量往 往從幾萬到幾百萬不等且容易出錯,因為這些醫療專業人員必須做出主觀的決定。一些機構已經提出利用機器學習(ML)演算法或人工智慧(AI)演算法來管理FC資料;然而,處理大量的FC資料仍然是一個挑戰。 Accordingly, the experiment may involve measuring the level of fluorescence excitation on the antibody marker to generate FC data. Historically, medical professionals have manually reviewed FC data by visually analyzing two-dimensional images to determine whether a diagnosis is appropriate. This approach is not only laborious and time-consuming, as the number of cells often ranges from tens of thousands to millions, but also prone to error, as these medical professionals must make subjective decisions. Some institutions have proposed the use of machine learning (ML) algorithms or artificial intelligence (AI) algorithms to manage FC data; however, processing large amounts of FC data remains a challenge.

本揭露是關於一種改善血液疾病自動識別之方法,其為使用電腦可執行之模型(或簡稱「模型」),這些模型被訓練成能夠快速區分代表不同疾病類型或疾病狀態的不同免疫表型集合。瞭解輸入樣本中所包含的不同免疫表型集合的模式,可以為相應的病人產生一個輸入血液疾病之一建議診斷。該建議診斷可能是由疾病分析平臺(或簡稱「分析平臺」)依據整個樣本的免疫表型分佈所產生的多個輸出之一。舉例來說,分析平臺可以為一種以上的急性白血病(例如:ALL和AML)、全部血球減少症(即骨髓腫瘤和一種或多種非腫瘤性疾病)或另一種血液病產生建議診斷。 The present disclosure relates to a method for improving the automatic identification of blood diseases using computer executable models (or simply "models") that are trained to quickly distinguish different sets of immunophenotypes representing different disease types or disease states. Understanding the pattern of different sets of immunophenotypes contained in an input sample can generate a recommended diagnosis of one of the input blood diseases for the corresponding patient. The recommended diagnosis may be one of multiple outputs generated by a disease analysis platform (or simply "analysis platform") based on the immunophenotype distribution of the entire sample. For example, the analysis platform can generate a recommended diagnosis for more than one acute leukemia (e.g., ALL and AML), all hematopoiesis (i.e., myeloma and one or more non-neoplastic diseases), or another blood disease.

此種方法可以作為訓練框架的一部分,用於訓練模型以自動分類樣本(其由FC資料作為代表)。訓練框架可包含三個步驟:第一步驟,處理FC資料;第二步驟,將處理過的FC資料轉換為更適合訓練模型的格式;以及第三步驟,將格式化和處理過的FC資料用於訓練模型。一般來說,訓練框架要執行幾十次、幾百次或幾千次,因為各種樣本(例如:對應不同的血液疾病)可以用來訓練。此種模式(即處理、轉換,然後訓練)能可靠且迅速地獲得對整個樣本中細胞類型分佈的洞察力,然後進一步提交給例如醫療專業人士進行檢測和判定。 This method can be used as part of a training framework for training a model to automatically classify samples (represented by FC data). The training framework can include three steps: a first step of processing the FC data; a second step of converting the processed FC data into a format more suitable for training the model; and a third step of using the formatted and processed FC data for training the model. Generally, the training framework is executed tens, hundreds, or thousands of times because a variety of samples (e.g., corresponding to different blood diseases) can be used for training. This model (i.e., processing, conversion, and then training) can reliably and quickly gain insights into the distribution of cell types in the entire sample, which can then be further submitted to, for example, medical professionals for detection and judgment.

這種方法也可以作為分類框架的一部分,用於將訓練好的模型應用於FC資料以產生一個或多個輸出。每個輸出可以代表一種血液學疾病之建議診斷。更上位地,分類框架可能類似於訓練框架,因為處理和轉換步驟也可能被 執行。因此,在接收到一明確要求依據對FC資料的分析產生樣本的建議診斷的輸入後,FC資料可以首先被處理,然後被轉化為可以被訓練過的模型更容易處理的格式。然後,經過格式化和處理的FC資料可以作為輸入提供給受過訓練的模型,以產生輸出。而上述與手工分析FC資料以分類細胞類型相比,此種自動方法可以透過快速產生可用於診斷和監測病人的意見來提高醫療服務的品質、一致性和及時性。 This approach may also be used as part of a classification framework for applying a trained model to FC data to produce one or more outputs. Each output may represent a proposed diagnosis for a hematological disorder. More generally, a classification framework may be similar to a training framework in that processing and transformation steps may also be performed. Thus, upon receiving an input that explicitly requests a proposed diagnosis for a sample based on an analysis of FC data, the FC data may first be processed and then transformed into a format that can be more easily processed by the trained model. The formatted and processed FC data may then be provided as input to the trained model to produce an output. Compared to manually analyzing FC data to classify cell types, this automated approach can improve the quality, consistency, and timeliness of healthcare services by quickly generating insights that can be used to diagnose and monitor patients.

雖然可以參照特定的血液學疾病來描述具體實施例,但這些血液學疾病僅是為了說明的目的而選擇的。舉例來說,本揭示之方法可在以下模型背景之敘述進行說明,當應用於對應樣本的FC資料時,能夠產生且輸出指示ALL、AML和全部血球減少症之建議診斷。然而,該方法也同樣適用於其他血液病,例如:CLL、CML、何杰金氏淋巴瘤和非何杰金氏淋巴瘤(彌漫性大B細胞淋巴瘤、濾泡性淋巴瘤、套細胞淋巴瘤、T細胞淋巴瘤)、多發性骨髓瘤、急性紅白血病(AEL)、急性前骨隨細胞白血病(APL)和其他固態腫瘤。此外,該方法同樣可適用於惡性血液疾病和非惡性血液疾病(如全部血球減少症)。因此,該模型可以依據樣本中發現的細胞之樣本水平表徵,在各種血液疾病(惡性和/或非惡性)中對病人進行分類。 Although specific embodiments may be described with reference to specific hematological diseases, these hematological diseases are selected for illustrative purposes only. For example, the methods disclosed herein can be described in the context of the following model, which, when applied to FC data corresponding to a sample, can generate and output a suggested diagnosis indicative of ALL, AML, and total hematopoiesis. However, the methods are also equally applicable to other hematological diseases, such as: CLL, CML, Hodgkin's lymphoma and non-Hodgkin's lymphoma (diffuse large B-cell lymphoma, follicular lymphoma, mantle cell lymphoma, T-cell lymphoma), multiple myeloma, acute erythroleukemia (AEL), acute promyelocytic leukemia (APL) and other solid tumors. Furthermore, the approach is equally applicable to both malignant and non-malignant blood diseases (e.g., pancytopenias). Thus, the model can stratify patients into various blood diseases (malignant and/or non-malignant) based on sample-level characterization of the cells found in the sample.

進一步說明,以下具體實施例也可於可執行指令的情境下執行。然而,本發明技術領域具有通常知識者可以理解本揭露藉由硬體、韌體或軟體而實現。舉例來說,疾病分析平臺(或簡稱「分析平臺」)可為一個電腦程式之示例,該程式為檢驗與血液性惡性腫瘤的進程和/或狀態有關的資訊、編排治療方法、檢視模型建議的診斷等以提供支援。 To further illustrate, the following specific embodiments can also be executed in the context of executable instructions. However, those with ordinary knowledge in the field of the present invention can understand that the present disclosure is implemented by hardware, firmware or software. For example, a disease analysis platform (or simply "analysis platform") can be an example of a computer program that provides support for testing information related to the process and/or state of hematological malignancies, arranging treatment methods, and reviewing model-recommended diagnoses.

200:框架 200:Framework

202:資料獲得階段 202: Data acquisition phase

204:資料擷取階段 204: Data acquisition phase

206:資料轉換階段 206: Data conversion phase

208:訓練 208: Training

210:分類 210: Classification

300:資料庫 300:Database

302:FCS檔案 302: FCS file

304、800:FC資料矩陣 304, 800: FC data matrix

350、351、352、601、602、603、701、702、703、850、851、852、901、902、903、904、905、1001、1002、1003、1004、1005:步驟 350, 351, 352, 601, 602, 603, 701, 702, 703, 850, 851, 852, 901, 902, 903, 904, 905, 1001, 1002, 1003, 1004, 1005: Steps

600、700、900、1000:程序 600, 700, 900, 1000: Program

802:混合模型 802: Hybrid Model

804:向量 804: Vector

1100:網路環境 1100: Network environment

1102:分析平臺 1102:Analysis Platform

1104:介面 1104: Interface

1106a、1106b、1210、1314:網路 1106a, 1106b, 1210, 1314: Network

1108、1204:伺服器系統 1108, 1204: Server system

1200:系統 1200: System

1202:流式細胞儀 1202: Flow cytometer

1206:資料儲存裝置 1206: Data storage device

1208:電腦裝置 1208:Computer device

1300:處理系統 1300: Processing system

1302:處理器 1302: Processor

1304、1308、1328:指令 1304, 1308, 1328: Instructions

1306:主記憶體 1306: Main memory

1310:非揮發性記憶體 1310: Non-volatile memory

1312:網路介面卡 1312: Network interface card

1316:匯流排 1316: Bus

1318:視頻顯示單元 1318: Video display unit

1320:輸入/輸出裝置 1320: Input/output device

1322:控制裝置 1322: Control device

1324:驅動裝置 1324:Drive device

1326:儲存媒體 1326: Storage media

1330:信號產生裝置 1330:Signal generating device

圖1為一示意圖用以揭示血液學疾病在歷史上如何被分類。 Figure 1 is a diagram showing how hematological diseases have been classified historically.

圖2A為一上位圖示示意用以揭示一種框架,該框架可被分析平臺執行,以擷取、處理和轉換流式細胞儀(FC)資料,以促進自動檢測血液學疾病並指示血液學異常情況。 FIG2A is a schematic diagram illustrating a framework that can be executed by an analysis platform to acquire, process, and transform flow cytometer (FC) data to facilitate automated detection of hematological diseases and indication of hematological abnormalities.

圖2B為用以說明瞭圖2A所示框架如何用於(i)獲取與患者相關的「原始」FC資料、(ii)選擇相交或相互關聯的參數、(iii)藉由患者級別的編碼轉化「原始」FC資料、然後(iv)將分類模型應用於轉化的FC資料對患者進行分類、或者(v)訓練分類模型來做同樣的事情。 Figure 2B illustrates how the framework shown in Figure 2A can be used to (i) obtain "raw" FC data related to patients, (ii) select intersecting or interrelated parameters, (iii) transform the "raw" FC data by encoding the patient level, and then (iv) apply a classification model to the transformed FC data to classify patients, or (v) train a classification model to do the same.

圖3為一上位圖示用以揭示從源頭獲得FC資料的過程。 Figure 3 is a high-level diagram to reveal the process of obtaining FC data from the source.

圖4為用以說明來自其他螢光強度的溢出信號如何使目前感興趣的主要螢光強度的純信號出現偏差。 Figure 4 is used to illustrate how spillover signals from other fluorescent intensities can bias the pure signal of the primary fluorescent intensity of current interest.

圖5為用以說明如何用沿Y軸的正向散射光高度(FSC-H)和沿X軸的正向散射光面積(FSC-A)生成散射圖,以方便手動單一圈選。 Figure 5 is used to illustrate how to generate a scatter plot using forward scattered light height (FSC-H) along the Y axis and forward scattered light area (FSC-A) along the X axis to facilitate manual single circle selection.

圖6為一示意流程圖用以揭示自動執行單一圈選程序。 Figure 6 is a schematic flow chart for explaining the automatic execution of a single lap selection procedure.

圖7為一示意流程圖用以揭示將從流式細胞儀標準檔案(FCS檔案)中提取出的FC資料集進行標準化。 FIG7 is a schematic flow chart for explaining the standardization of FC data sets extracted from flow cytometer standard files (FCS files).

圖8為一上位圖示用以揭示處理過的FC資料從其矩陣形式轉化為向量的過程。 Figure 8 is a high-level diagram to illustrate the process of converting the processed FC data from its matrix form to a vector.

圖9為一示意流程圖用以揭示用於訓練模型以分類血液病之程序。 Figure 9 is a schematic flow chart illustrating the process for training a model to classify blood diseases.

圖10為一示意流程圖用以揭示藉由應用分類模型對樣本進行分類之程序。 Figure 10 is a schematic flow chart for explaining the process of classifying samples by applying a classification model.

圖11為一示意圖用以揭示分析平臺的網路環境。 Figure 11 is a schematic diagram showing the network environment of the analysis platform.

圖12為一示意圖用以揭示一系統之實施例,該系統能夠自動對免疫表型集合的不同模式進行分類,以便識別血液疾病。 FIG. 12 is a schematic diagram illustrating an embodiment of a system capable of automatically classifying different patterns of immunophenotype sets for identifying blood diseases.

圖13為一方塊示意圖用以揭示處理系統之一示例,其可執行本揭露中所述之操作。 FIG. 13 is a block diagram illustrating an example of a processing system that can perform the operations described in this disclosure.

藉由結合附圖與詳細說明之揭示,本發明技術領域具有通常知識者將更清楚地瞭解本揭露所述技術之各種特徵。圖示中所揭示部分實施例其主要是用以說明。然而,本發明技術領域具有通常知識者可以理解於不背離本技術原理的條件下,可以採用替代性的實施方案。因此,在圖示中所揭示本發明技術之具體示例並不侷限本發明,而可以進行各種適應性修改。 By combining the drawings and detailed descriptions, people with ordinary knowledge in the field of the present invention will more clearly understand the various features of the technology disclosed in the present invention. Some of the embodiments disclosed in the drawings are mainly used for illustration. However, people with ordinary knowledge in the field of the present invention can understand that alternative implementation schemes can be adopted without deviating from the principles of the present technology. Therefore, the specific examples of the present invention disclosed in the drawings do not limit the present invention, but can be modified in various ways.

名詞解釋 Glossary

本揭露內容中所提及「一個具體實施例」或「某些具體實施例」是指其所描述的特徵、功能、結構或特性包括在至少一個具體實施例中。此類術語不一定是指同一具體實施例,也不一定是指相互排斥的替代實施方式。 The reference to "one specific embodiment" or "certain specific embodiments" in this disclosure means that the features, functions, structures or characteristics described are included in at least one specific embodiment. Such terms do not necessarily refer to the same specific embodiment, nor do they necessarily refer to mutually exclusive alternative embodiments.

除非內容中另有明確註記,否則術語「包括」、「包含」和「由」應以包容性意義進行解釋,而非以排他性或詳盡性意義進行解釋(換句話說,即「包括但不限於」的意義)。術語「基於」也應從包容性的角度來解釋,而不是 從排他性或詳盡性的角度來進行解釋。因此,除非另有說明註記,術語「基於」意指「至少部分基於」。 Unless otherwise expressly noted in the context, the terms "include", "including" and "by" shall be interpreted in an inclusive sense, rather than in an exclusive or exhaustive sense (in other words, "including but not limited to"). The term "based on" shall also be interpreted in an inclusive sense, rather than in an exclusive or exhaustive sense. Therefore, unless otherwise expressly noted, the term "based on" means "based at least in part on".

術語「連接」、「耦合」及其同意字指兩個或多個元素之間的任何連接或耦合,且無論是以直接的還是間接的方式。連接或耦合可以是物理的、邏輯的,或兩者之組合。舉例來說,儘管沒有共用物理連接,但元素可以在電學上或通信上彼此耦合。 The terms "connected," "coupled," and their synonyms refer to any connection or coupling between two or more elements, whether directly or indirectly. The connection or coupling can be physical, logical, or a combination of both. For example, elements can be electrically or communicatively coupled to each other despite not having a common physical connection.

術語「模組」可泛指軟體、韌體、硬體或上述任意之組合。模組通常是基於一個或多個輸入產生一個或多個輸出的功能性元件。一個電腦程式可以包括或利用一個或多個模組。舉例來說,一個電腦程式可以利用多個負責完成不同任務的模組,或者一個電腦程式可以利用一個負責完成所有任務的單一模組。 The term "module" may refer to software, firmware, hardware, or any combination of these. A module is generally a functional component that produces one or more outputs based on one or more inputs. A computer program may include or utilize one or more modules. For example, a computer program may utilize multiple modules that are responsible for completing different tasks, or a computer program may utilize a single module that is responsible for completing all tasks.

當用於提及多個項目的列表時,術語「或」意在涵蓋以下所有解釋:列表中的任何項目、列表中的所有項目、以及清單中的任何專案組合。 When used in reference to a list of multiple items, the term "or" is intended to cover all of the following interpretations: any item in the list, all items in the list, and any combination of items in the list.

免疫表型之簡介 Introduction to immunophenotype

透過FC進行免疫分型是一種實驗室技術,通常用於檢測WBC標誌(簡稱抗原)的存在或不存在。這些抗原是在白血球的細胞上或細胞中所發現的蛋白質結構,這些抗原的特定分群是特定細胞類型所特有的。因為FC免疫分型可以作為血液疾病之敏銳過濾或檢驗,所以它是一個有用的工具,可以對以前診斷過的血液疾病進行階段分期、證明有無血液疾病、監測對治療的反應(例如:藉由分析MRD)、記錄血液疾病的復發或進展、以及檢測相互併發的血液疾病。簡單來說,FC免疫分型可用於檢測正常細胞和異常細胞,這些細胞的標誌模式一般在特定血液疾病中可被觀察到。 Immunophenotyping by FC is a laboratory technique that is typically used to detect the presence or absence of WBC markers (abbreviated as antigens). These antigens are protein structures found on or in white blood cells, and specific groups of these antigens are unique to specific cell types. Because FC immunophenotyping can serve as a sensitive filter or test for blood disorders, it is a useful tool for staging previously diagnosed blood disorders, proving the presence or absence of blood disorders, monitoring response to treatment (e.g., by analyzing MRD), documenting the recurrence or progression of blood disorders, and detecting inter-complications of blood disorders. In simple terms, FC immunophenotyping can be used to detect normal and abnormal cells whose marker patterns are generally observed in specific blood disorders.

於習知技術中,由流式細胞儀產生的FC資料要麼以單一維度繪製以產生長條圖,要麼以多維度繪製以產生「點狀圖」或「散射圖」。這些圖上的區域藉由創建一系列的子集擷取(也被稱為「圈選」(gates)),依據螢光強度依次分開。針對不同診斷目的會利用不同的圈選協定,特別是與血液學相關之疾病。在習知技術中,藉由對這些圖譜的視覺分析,單一細胞可被從雙聯體和更多的細胞聚集中被區分出來。本文所使用的術語「雙聯體」可指流式細胞儀測量到一個以上的細胞的事件。雙聯體通常依據雷射光束的「飛行時間(time-of-flight)」或「脈寬(pulse-width)」來識別。正確識別雙聯體在細胞分選(sorting)中至關重要,因為FC資料中的相應數值不應影響分析。然而,由於雙倍體的排除在很大程度上依賴於視覺分析,而使此過程容易出現錯誤,下面將進一步討論。 In the conventional art, FC data generated by a flow cytometer is either plotted in a single dimension to produce a bar graph, or in multiple dimensions to produce a "dot plot" or "scatter plot." Regions on these plots are sequentially separated by fluorescence intensity by creating a series of subset captures (also called "gates"). Different gate protocols are utilized for different diagnostic purposes, particularly those related to hematology. In the conventional art, single cells can be distinguished from doublets and larger aggregates of cells by visual analysis of these plots. The term "doublet" as used herein may refer to an event in which a flow cytometer measures more than one cell. Doublets are usually identified based on the "time-of-flight" or "pulse-width" of the laser beam. Correct identification of doublets is crucial in cell sorting, as the corresponding values in the FC data should not affect the analysis. However, since the exclusion of doublets relies heavily on visual analysis, this process is prone to errors, which will be discussed further below.

使用FC資料,使用者可以使用已知的控制來確定細胞的相對大小。舉例來說,正向散射光(FSC)和側向散射光(SSC)值通常用於圈選擷取中。更具體地說,FSC和SSC值可用於根據細胞大小和細胞顆粒度來確定感興趣的細胞。一般來說,FSC和SSC值用於標準化與其他光散射參數有關的資料,特別是用於透過FC資料的傳統視覺分析識別不同細胞類型的螢光標誌。 Using FC data, the user can determine the relative size of cells using known controls. For example, forward scatter (FSC) and side scatter (SSC) values are often used in gated acquisitions. More specifically, FSC and SSC values can be used to identify cells of interest based on cell size and cell granularity. In general, FSC and SSC values are used to normalize data relative to other light scatter parameters, particularly fluorescent markers used to identify different cell types through traditional visual analysis of FC data.

然而,FC免疫分型有幾個缺點。首先,由流式細胞儀在實驗過程中所產生的資料可能難以理解。其可能進一步導致錯誤,因為負責分析資料的醫療專業人員可能需要做出主觀的決定。其次,確定一個適當的血液疾病分類有可能是困難的。圖1揭示了一示意圖,其用以說明血液疾病在歷史上是如何進行分類的。然而,正確地瀏覽該圖表依賴於對樣本中細胞類型分佈準確的理解。由於對圖上填充的FC資料進行視覺分析的局限性,細胞很容易被錯誤定性。 However, FC immunophenotyping has several disadvantages. First, the data generated by flow cytometry during the experiment can be difficult to interpret. This can further lead to errors because the medical professional responsible for analyzing the data may need to make subjective decisions. Second, determining an appropriate classification of blood disorders can be difficult. Figure 1 reveals a schematic diagram that illustrates how blood disorders have been classified historically. However, correctly viewing this diagram relies on an accurate understanding of the distribution of cell types in the sample. Due to the limitations of visual analysis of the FC data populated on the graph, cells can easily be misclassified.

本揭露所提出的方法不僅涉及以自動方式對單個細胞進行分類以減少錯誤,而且還可能涉及生成跨樣本的細胞類型之表徵,以確定如何在不同的血液疾病中對相應的患者進行分層。換句話說,細胞類型之樣本水平表徵(又稱為「病人水平表徵」)可用於確定對輸入樣本(即輸入之病人)預測哪種血液疾病(如果該病人有疾病的話)。樣本水平表徵有助於在不同的血液疾病中對患者進行分類,以及分配病理狀態(例如:復發、進展等),並將細胞類型分佈與臨床干預相關聯以確定療效。臨床干預的例子包括:化療、標靶治療、免疫檢查點抑制劑和嵌合抗原受體(CAR)T細胞療法。 The methods proposed in the present disclosure not only involve classifying single cells in an automated manner to reduce errors, but may also involve generating representations of cell types across samples to determine how to stratify corresponding patients in different blood diseases. In other words, sample-level representations of cell types (also referred to as "patient-level representations") can be used to determine which blood disease (if the patient has a disease) is predicted for an input sample (i.e., the input patient). Sample-level representations help classify patients in different blood diseases, as well as assign pathological states (e.g., relapse, progression, etc.), and correlate cell type distribution with clinical intervention to determine treatment efficacy. Examples of clinical interventions include: chemotherapy, targeted therapy, immune checkpoint inhibitors, and chimeric antigen receptor (CAR) T cell therapy.

用於自動分析FC資料之電腦計算管道概述 Overview of a computational pipeline for automated analysis of FC data

本揭露涉及一種改善血液疾病自動識別的方法,該方法使用被訓練成快速(i)區分樣本中不同細胞類型,然後(ii)依據整個樣本的免疫表型集合的分佈確定一適當之預測。如下所進一步討論的,該方法可以透過支持多個計算管道的框架來執行,即第一計算管道:其用於訓練模型以透過免疫表型集合的分佈對細胞進行分類,然後依據細胞類型分佈對樣本進行分類;以及第二計算管道:其用於將訓練好的模型應用於FC資料以產生指示血液疾病的建議診斷之輸出。 The present disclosure relates to a method for improving the automatic identification of blood diseases using a method trained to quickly (i) distinguish different cell types in a sample and then (ii) determine an appropriate prediction based on the distribution of a set of immunophenotypes across the sample. As further discussed below, the method can be implemented through a framework that supports multiple computational pipelines, namely a first computational pipeline: which is used to train a model to classify cells by the distribution of a set of immunophenotypes and then classify samples based on the cell type distribution; and a second computational pipeline: which is used to apply the trained model to FC data to generate an output indicating a suggested diagnosis of a blood disease.

圖2A為一上位圖示示意用以揭示一種框架200,該框架可藉由分析平臺執行,以獲取、處理和轉換FC資料,而促進自動檢測指示血液學疾病的血液學異常情況。如下文所進一步討論,FC資料可以作為輸入提供給分類模型用於訓練,或者FC資料可以作為輸入提供給分類模型用於分類。該分類模型(又稱為「分類器模型」或簡稱「分類器」)可以進行多類分類。因此,當應用於FC資料時,分類模型可產生複數輸出,其代表不同血液疾病之建議診斷。舉例來說, 分析平臺可以利用多維度多色流式細胞儀(MFC)表型之分類模型,該模型使用例如:深度神經網路(DNN)或支持向量機(SVMs)與高斯混合模型(GMM)結合訓練。在一些具體實施例中,分類模型是透過監督學習,透過分析MFC資料集來發展對MFC的解釋或理解,以便客觀地檢測MRD。監督學習是指人工智慧的一個分支,其中資料集和附帶的標誌被用來訓練模型以提供可靠預測。 Figure 2A is a top-level diagram schematically illustrating a framework 200 that can be executed by an analysis platform to acquire, process and transform FC data to facilitate the automatic detection of hematological abnormalities indicative of hematological diseases. As further discussed below, FC data can be provided as input to a classification model for training, or FC data can be provided as input to a classification model for classification. The classification model (also referred to as a "classifier model" or simply "classifier") can perform multi-class classification. Therefore, when applied to FC data, the classification model can generate multiple outputs representing recommended diagnoses for different blood diseases. For example, the analysis platform can utilize a classification model of multidimensional multicolor flow cytometer (MFC) phenotypes, which is trained using, for example, a deep neural network (DNN) or support vector machines (SVMs) in combination with a Gaussian mixture model (GMM). In some embodiments, the classification model is developed through supervised learning by analyzing MFC datasets to develop an explanation or understanding of MFC in order to objectively detect MRD. Supervised learning refers to a branch of artificial intelligence in which datasets and accompanying markers are used to train models to provide reliable predictions.

如圖2A所揭示,框架200可以包含各種階段。該些階段可以包含:資料獲得階段202、資料擷取階段204和資料轉換階段206。請參照圖3,以下將進一步討論資料獲得階段202;另外請參照圖4至7,將進一步討論資料擷取階段204;另外請參照圖8,將進一步討論資料轉換階段206。在完成資料轉換階段206後,分析平臺可以將輸出提供給分類模型,以用於訓練208或分類210之目的。請參照圖9、10,將進一步討論訓練和分類。 As shown in FIG2A, the framework 200 may include various phases. The phases may include: a data acquisition phase 202, a data capture phase 204, and a data transformation phase 206. Please refer to FIG3 for further discussion of the data acquisition phase 202; please refer to FIG4 to FIG7 for further discussion of the data capture phase 204; please refer to FIG8 for further discussion of the data transformation phase 206. After completing the data transformation phase 206, the analysis platform may provide the output to the classification model for the purpose of training 208 or classification 210. Please refer to FIG9 and FIG10 for further discussion of training and classification.

在資料獲得階段202,分析平臺可以獲得FC資料,該FC資料描述了一個來自來源的樣本之特徵,其中該樣本包含用螢光標誌標記細胞。FC資料可以包含在一個檔案中,該檔案的格式為流式細胞儀標準檔案(FCS檔案)。FCS檔案是一種檔案格式標準,用於讀取和寫入FC實驗的資料。該檔案格式描述了一個由文本資料組合而成的檔案,該檔案後面是二進位資料,且該檔案格式之順序通常為:(1)標題段、(2)文本段、(3)資料段、(4)可選的分析段、(5)迴圈冗餘檢查(CRC)值、以及(6)可選的其他段。FC資料可以代表在M波長上由N參數所組成的測量矩陣,其中M和N之數值是整數,可以從FCS檔案的資料段中提取。這些參數可以包括光散射參數和/或螢光標誌參數。 In the data acquisition phase 202, the analysis platform can obtain FC data, which describes the characteristics of a sample from a source, wherein the sample includes cells labeled with fluorescent markers. The FC data can be contained in a file whose format is a flow cytometer standard file (FCS file). The FCS file is a file format standard used to read and write data from FC experiments. The file format describes a file composed of text data followed by binary data, and the order of the file format is generally: (1) header segment, (2) text segment, (3) data segment, (4) optional analysis segment, (5) loop redundancy check (CRC) value, and (6) optional other segments. FC data can be represented as a measurement matrix consisting of N parameters at M wavelengths, where M and N are integers, which can be extracted from the data segment of the FCS file. These parameters can include light scattering parameters and/or fluorescence signature parameters.

在一些具體實施例中,分析平臺獲得FC資料的來源是產生FC資料的流式細胞儀。在其他具體實施例中,來源是可被分析平臺訪問的儲存媒體 (例如:透過網際網路)。該儲存媒體可與管理流式細胞儀的實體或其他實體相關聯。在一些具體實施例中,儲存媒體是可供公開訪問的(例如:透通過網際網路)。在該些具體實施例中,為了從儲存媒體獲得FC資料,分析平臺可以透過資料介面(例如:應用程式所設計之介面)啟動與儲存媒體間的連接。在其他具體實施例中,儲存媒體是被私人維護和管理。舉例來說,儲存媒體可包含由醫療系統長期產生的專有臨床資料,並且分析平臺可以根據醫療系統和管理分析平臺的實體之間的協定被授予對儲存媒體的訪問。 In some specific embodiments, the source of FC data obtained by the analysis platform is the flow cytometer that generates the FC data. In other specific embodiments, the source is a storage medium that can be accessed by the analysis platform (e.g., via the Internet). The storage medium can be associated with an entity that manages the flow cytometer or other entities. In some specific embodiments, the storage medium is publicly accessible (e.g., via the Internet). In these specific embodiments, in order to obtain FC data from the storage medium, the analysis platform can initiate a connection with the storage medium through a data interface (e.g., an interface designed by an application). In other specific embodiments, the storage medium is privately maintained and managed. For example, the storage medium may contain proprietary clinical data generated over a long period of time by a medical system, and the analytics platform may be granted access to the storage medium based on an agreement between the medical system and the entity managing the analytics platform.

在資料擷取階段204,分析平臺可處理FC資料以準備進一步處理。資料擷取階段204的性質可以取決於分析平臺獲得的FC資料的形式。舉例來說,假設分析平臺如上述從FCS檔中擷取FC資料矩陣。在該具體實施例中,分析平臺可以透過執行補償操作、圈選操作和/或標準化操作來處理FC資料矩陣中所包含的值,以下將進一步討論。更上位地,資料擷取階段204可以確保分析平臺能夠在相對較短的時間間隔內以一致、準確的方式分析大批次的FC資料。由於處理發生在分析平臺檢查FC資料之前,以便從中獲得洞察力,所以資料擷取階段204也可以較佳地為「資料預處理階段」或簡稱「資料處理階段」。 In the data capture phase 204, the analysis platform may process the FC data in preparation for further processing. The nature of the data capture phase 204 may depend on the form of the FC data obtained by the analysis platform. For example, assume that the analysis platform captures the FC data matrix from the FCS file as described above. In this specific embodiment, the analysis platform may process the values contained in the FC data matrix by performing compensation operations, selection operations and/or normalization operations, which will be discussed further below. More generally, the data capture phase 204 can ensure that the analysis platform is able to analyze large batches of FC data in a consistent and accurate manner within a relatively short time interval. Since the processing occurs before the analysis platform examines the FC data in order to gain insights from it, the data acquisition phase 204 may also be preferably referred to as the "data pre-processing phase" or simply the "data processing phase".

在資料轉換階段206,分析平臺可以將FC資料轉換成更適合進一步處理的形式。舉例來說,分析平臺可以實施ML演算法將FC資料矩陣轉換為多維度向量的函數。因此,分析平臺可以將FC資料矩陣轉換成FC資料向量。 In the data transformation stage 206, the analysis platform may transform the FC data into a form more suitable for further processing. For example, the analysis platform may implement an ML algorithm to transform the FC data matrix into a function of a multi-dimensional vector. Thus, the analysis platform may transform the FC data matrix into a FC data vector.

該FC資料向量可以不同的方式被使用,其取決於目前正在由分析平臺實施或執行的計算管道。 This FC data vector can be used in different ways, depending on the computational pipeline currently being implemented or executed by the analysis platform.

舉例來說,假設分析平臺對將訓練分類模型用以識別辨識血液疾病中血液學異常感興趣。在這樣的情況下,分析平臺可以將(i)FC資料向量和 (ii)一標誌組(用以代表該流式細胞儀資料向量中每個被描述細胞之一免疫表型集合模式)提供給一分類模型以作為訓練目的208。舉例來說,標誌可為向量中描述每個細胞的疾病類型、疾病狀態或生理狀態(又稱為「病理狀態」)之表徵。一般來說,FC資料向量為複數FC資料向量之一,並且其為訓練目的而提供給分類模型208,並且複數FC資料向量可對應於不同血液疾病(即分類模型被訓練為可識別的血液疾病)。因此,分類模型可以透過依據作為輸入之FC資料向量,學習如何在多種血液學疾病中對樣本進行分類,而這些免疫表型可以代表多種血液學疾病中每個疾病。 For example, suppose the analysis platform is interested in training a classification model to identify hematological abnormalities in blood diseases. In such a case, the analysis platform may provide (i) FC data vectors and (ii) a set of markers (used to represent an immunophenotypic set pattern for each cell described in the flow cytometer data vector) to a classification model for training purposes 208. For example, the markers may be representations of the disease type, disease state, or physiological state (also known as "pathological state") describing each cell in the vector. Generally, the FC data vector is one of a plurality of FC data vectors provided to the classification model 208 for training purposes, and the plurality of FC data vectors may correspond to different blood diseases (i.e., the classification model is trained to recognize blood diseases). Therefore, the classification model may learn how to classify samples in a plurality of hematological diseases based on the FC data vectors as input, and these immunophenotypes may represent each of the plurality of hematological diseases.

另一方面,當分析平臺對將分類模型應用於FC資料向量以達到分類目的210感興趣時。在這樣的情況下,分析平臺可以向分類模型提供FC資料向量作為輸入,以便獲得指示血液疾病的建議診斷之輸出。以下將進一步討論的,在一些具體實施例中,分類模型可以在一種以上的血液疾病中對指定樣本(因此也代表是指定病人)生成的FC資料向量進行分類。在該具體實施例中,分類模型可以產生多個輸出,其中每個輸出可以代表對不同血液疾病的建議診斷。 On the other hand, when the analysis platform is interested in applying a classification model to the FC data vector for classification purposes 210. In such a case, the analysis platform can provide the FC data vector as an input to the classification model in order to obtain an output indicating a recommended diagnosis of a blood disease. As will be discussed further below, in some specific embodiments, the classification model can classify the FC data vector generated for a specified sample (and therefore also for a specified patient) in more than one blood disease. In this specific embodiment, the classification model can generate multiple outputs, each of which can represent a recommended diagnosis for a different blood disease.

圖2B用以說明圖2A所揭示的框架如何用於(i)獲得與患者相關的「原始」FC資料、(ii)選擇相互交叉或相互關聯的參數(例如:螢光標誌參數)、(iii)藉由病患水平編碼轉換「原始」FC資料(例如:使用GMMs和Fisher向量(Fisher Vectorization)),然後(iv)透過對轉換後的FC資料應用分類模型(例如:多元SVM)對患者進行分類,或者(v)訓練分類模型(例如:多元SVM)來做同樣的事情。更上位地,圖2B代表了一個框架的概述,它提供了上述計算管道的一般步驟。然而,訓練的性質可能取決於目標任務。舉例來說,步驟(ii) 可以藉由重採樣、填充或選擇螢光標誌參數(例如:依據人類知識或模型產生之輸出)中的任何一個來實現,以得出特徵維度。如果目標任務涉及具有不同螢光標誌表徵之患者,那麼步驟(ii)可以被實施,以匹配來自不同表徵的各自FC資料的特徵維度。此外,步驟(ii)可能涉及一種方法,即把原始的FC資料形成一個矩陣,其中包括訓練資料和測試資料。因此,在進行編碼和分類時可以同時考慮所有的螢光標誌參數。 Figure 2B illustrates how the framework disclosed in Figure 2A can be used to (i) obtain “raw” FC data related to patients, (ii) select parameters that are mutually or mutually correlated (e.g., fluorescence signature parameters), (iii) transform the “raw” FC data by patient-level encoding (e.g., using GMMs and Fisher Vectorization), and then (iv) classify patients by applying a classification model (e.g., multivariate SVM) to the transformed FC data, or (v) train a classification model (e.g., multivariate SVM) to do the same. At a high level, Figure 2B represents an overview of a framework that provides the general steps of the above computational pipeline. However, the nature of the training may depend on the target task. For example, step (ii) can be implemented by any of resampling, padding or selecting fluorescence signature parameters (e.g., based on human knowledge or model-generated output) to obtain feature dimensions. If the target task involves patients with different fluorescence signature representations, then step (ii) can be implemented to match the feature dimensions of the respective FC data from the different representations. In addition, step (ii) may involve a method of forming the original FC data into a matrix that includes both training data and test data. Therefore, all fluorescence signature parameters can be considered simultaneously when encoding and classifying.

A.資料獲取 A. Data Acquisition

圖3為一上位圖示用以揭示從源頭獲得FC資料的過程。在此,源頭是一個資料庫300,其中一個或複數個流式細胞儀能夠存儲藉由實驗所產生的FC資料。此過程可以由分析平臺執行,作為資料獲取步驟的一部分(例如:圖2A的資料獲得階段202)。更上位地,此為分析平臺可以獲得FC資料的過程,該FC資料可依據對細胞之分析訓練分類模型以對樣本進行分類,而該分析已透過流式細胞儀對該細胞進行表徵之描述。同樣地,這也是分析平臺可以獲得FC資料的過程,該分類模型可以應用於生成一個或複數個輸出(例如:建議診斷)。 FIG3 is a high-level diagram illustrating the process of obtaining FC data from a source. Here, the source is a database 300 in which one or more flow cytometers are capable of storing FC data generated by experiments. This process can be performed by an analysis platform as part of a data acquisition step (e.g., data acquisition stage 202 of FIG2A ). More generally, this is the process by which an analysis platform can obtain FC data that can be used to train a classification model to classify a sample based on an analysis of cells that have been characterized by a flow cytometer. Likewise, this is also the process by which the analysis platform can obtain FC data, and the classification model can be applied to generate one or more outputs (e.g., recommended diagnosis).

一般來說,資料庫300具有包含通過實驗測試的不同樣本(即不同的病人)之FC資料之條目。在圖3中,該些條目包括FCS檔案302,其中存儲了與相應樣本(即相應病人)相關的FC資料。然而,值得注意的是,FC資料可以其他格式存儲在資料庫300中。 Generally speaking, the database 300 has entries containing FC data of different samples (i.e., different patients) tested by the experiment. In FIG. 3 , the entries include an FCS file 302 in which the FC data associated with the corresponding sample (i.e., the corresponding patient) is stored. However, it is worth noting that the FC data can be stored in other formats in the database 300.

資料庫300可為分析平臺能從中獲得FC資料的複數資料庫之一。舉例來說,假設分析平臺對獲取由位於與不同醫療系統相關的不同醫療機構中的流式細胞儀所產生的FC資料感興趣。於此種情況下,分析平臺可以被允許訪問(i)第一資料庫,其中存儲了由第一流式細胞儀產生的FC資料;(ii)第二資 料庫,其中存儲了由第二流式細胞儀產生的FC資料。因此,分析平臺可以按順序或同時從一個以上的來源獲得FC資料。一般來說,存儲在不同資料庫中的FC資料將與不同的病人集相關聯,儘管可能有一些重疊(例如:一個病人可能有一個由與第一醫療系統相關的第一流式細胞儀檢查的樣本和另一個由與第二醫療系統相關的第二流式細胞儀檢查的樣本)。 Database 300 may be one of a plurality of databases from which the analysis platform can obtain FC data. For example, assume that the analysis platform is interested in obtaining FC data generated by flow cytometers located in different medical institutions associated with different medical systems. In this case, the analysis platform may be allowed to access (i) a first database in which FC data generated by the first flow cytometer is stored; and (ii) a second database in which FC data generated by the second flow cytometer is stored. Thus, the analysis platform may obtain FC data from more than one source sequentially or simultaneously. In general, FC data stored in different databases will be associated with different sets of patients, although there may be some overlap (e.g., a patient may have a sample examined by a first flow cytometer associated with a first medical system and another sample examined by a second flow cytometer associated with a second medical system).

一般來說,流式細胞儀將同時處理大量的螢光標誌,此外還有若干正向和側向散射特性。舉例來說,實驗期間所產生的FC資料可能包括17至23個通道,其中6個通道對應於正向和側向散射特性,而其餘通道對應於不同的螢光標誌特性。正向和側向散射特性(也被稱為「光學特性」或「光學參數」)可包含正向散射光面積(FSC-A)、正向散射光寬度(FSC-W)、正向散射光高度(FSC-H)、側向散射光面積(SSC-A)、側向散射光寬度(SSC-W)和側向散射光寬度(SSC-H)。同時,螢光標誌屬性(也被稱為「螢光標誌參數」)可包含:CD117_PerCP-Cy5-A、KAPPA_FITC-A、HLA-DR_V450-A、CD38_APC-H7-A和CD123_PE-A等。因此,一個實驗可以產生一個大的資料集。 Typically, a flow cytometer will process a large number of fluorescent markers simultaneously, in addition to several forward and side scatter properties. For example, the FC data generated during an experiment may include 17 to 23 channels, with 6 channels corresponding to forward and side scatter properties and the remaining channels corresponding to different fluorescent marker properties. Forward and side scatter properties (also called "optical properties" or "optical parameters") may include forward scatter area (FSC-A), forward scatter width (FSC-W), forward scatter height (FSC-H), side scatter area (SSC-A), side scatter width (SSC-W), and side scatter width (SSC-H). Meanwhile, the fluorescence marker attributes (also called "fluorescence marker parameters") can include: CD117_PerCP-Cy5-A, KAPPA_FITC-A, HLA-DR_V450-A, CD38_APC-H7-A, and CD123_PE-A, etc. Therefore, one experiment can generate a large data set.

當流式細胞儀分析樣本時,將產生FC資料作為輸出。該FC資料可具有一個以上維度的矩陣形式。舉例來說,FC資料可包含:FSC信號(例如:FSC-A、FSC-W或FSC-H信號)、SSC信號(例如:SSC-A、SSC-W或SSC-H信號)或螢光信號,並且這些信號中的每一個都可被視為一個單獨的維度。這些信號的特徵也可以作為維度來處理。該特徵的例子包含:振幅、頻率、振幅變化、頻率變化、時間依賴性、空間依賴性等等。此外,螢光信號可以包含:紅色螢光信號、綠色螢光信號,或一種或複數種其他顏色的螢光信號。一般來說,矩陣將有至少三個維度(可以有七個或更多維度)。為了標準化之目的,FC資料可以以二維 矩陣的形式呈現,用於訓練、驗證或測試的單個信號值為列,而特徵為行。該FC資料矩陣可以在FCS檔中從流式細胞儀中匯出。 When a flow cytometer analyzes a sample, FC data is generated as output. The FC data may be in the form of a matrix with more than one dimension. For example, FC data may include: FSC signals (e.g., FSC-A, FSC-W, or FSC-H signals), SSC signals (e.g., SSC-A, SSC-W, or SSC-H signals), or fluorescence signals, and each of these signals may be considered as a separate dimension. The characteristics of these signals may also be treated as dimensions. Examples of such characteristics include: amplitude, frequency, amplitude variation, frequency variation, time dependency, spatial dependency, and the like. In addition, the fluorescence signal may include: a red fluorescence signal, a green fluorescence signal, or one or more other color fluorescence signals. In general, the matrix will have at least three dimensions (it can have seven or more). For normalization purposes, FC data can be presented as a two-dimensional matrix with individual signal values for training, validation, or testing as columns and features as rows. This FC data matrix can be exported from a flow cytometer in an FCS file.

如上述,分析平臺可將分類模型應用於從FCS檔案中所擷取出的一個FC資料,以便對單個細胞進行分類,然後將樣本作為一個整體進行分類(例如:作為血液疾病之代表)。然而,在其原始形式下,FC資料對分類模型來說可能很難處理。舉例來說,分類模型要迅速處理矩陣形式的FC資料,可能需要大量的計算資源。因此,分類模型可以轉而被訓練為在已被轉化或轉換為可由分類模型更容易處理的另一種形式的FC資料上操作。 As described above, the analysis platform can apply a classification model to a FC data extracted from an FCS file in order to classify individual cells and then classify the sample as a whole (e.g., as a representative of a blood disease). However, in its original form, FC data can be difficult for the classification model to handle. For example, it may require a large amount of computational resources for the classification model to quickly process FC data in matrix form. Therefore, the classification model can instead be trained to operate on FC data that has been transformed or converted into another form that can be more easily processed by the classification model.

參照圖8,以下雖然說明可轉換FC資料矩陣的過程,但分析平臺可在獲得圖3中所揭示的FCS檔時提取FC資料矩陣。因此,分析平臺可以啟動與一個或複數個流式細胞儀能夠上傳FCS檔的資料庫的連接(步驟350),從資料庫300獲得一系列FCS檔案302(步驟351),然後從每個FCS檔擷取FC資料矩陣,從而獲得一系列FC資料矩陣304(步驟352)。在分析平臺對分類樣本而不是訓練分類模型感興趣的具體實施例中,分析平臺可以只從資料庫中獲得一個FCS檔案。 Referring to FIG8 , although the process of converting the FC data matrix is described below, the analysis platform can extract the FC data matrix when obtaining the FCS file disclosed in FIG3 . Therefore, the analysis platform can initiate a connection with a database to which one or more flow cytometers can upload FCS files (step 350), obtain a series of FCS files 302 from the database 300 (step 351), and then extract the FC data matrix from each FCS file, thereby obtaining a series of FC data matrices 304 (step 352). In a specific embodiment in which the analysis platform is interested in classification samples rather than training classification models, the analysis platform can only obtain one FCS file from the database.

B.資料擷取 B. Data Acquisition

如上述,流式細胞儀依據抗體表達的螢光反應來測量細胞類型。依據不同的應用,在一個實驗中測量螢光的細胞數量可以從幾千到幾百萬不等。由於這個原因,由流式細胞儀產生的FC資料集(例如:矩陣形式)可以非常大的。 As mentioned above, flow cytometry measures cell types based on the fluorescence response expressed by antibodies. Depending on the application, the number of cells whose fluorescence is measured in an experiment can range from thousands to millions. For this reason, the FC data sets (e.g., in matrix form) generated by flow cytometry can be very large.

傳統上,這些大的FC資料集被使用降維技術來處理,該技術導致產生單個值的散射圖(例如:FSC、SSC和螢光)。雖然在這些散射圖上可能會 顯示一些關於聚類(clustering)的資訊,但醫療專業人員通常負責定義「圈選」,以確定這些散射圖上的不同區域。 Traditionally, these large FC datasets are processed using dimensionality reduction techniques that result in scatter plots of single values (e.g., FSC, SSC, and fluorescence). Although some information about clustering may be revealed on these scatter plots, medical professionals are usually responsible for defining "circles" to identify distinct regions on these scatter plots.

分析平臺不是在細胞水平上建議細胞的集群,而是可以將大量的細胞水平資料編碼為病人水平的表徵,以用於自動分類。為了實現此點,分析平臺可以採用一種編碼FC資料的方法,該方法依賴於依據ML的技術(例如:GMMs和Fisher向量),以便為不同級別的識別任務聚集FC資料。GMM模型的訓練涉及將用於訓練的FC資料中代表的所有病人的所有細胞水平數據連接起來。因此,該方法會消耗大量的計算資源。因此,為了節省計算資源,可以採用降採樣(downsampling)和/或匯集之方式。降採樣可以藉由選擇資料的子集來實現(例如:透過對資料進行均勻採樣),而匯集可以透過統計學上代表細胞集來達成,這些細胞集被聚集在一起,其依據的假設是處理後的資料仍然可能形成與原始資料相似的分佈。舉例來說,分析平臺可以用一個平均向量來表示細胞集(例如:3、5或10個細胞),以減少記憶體消耗。重要地,提供給分類模型作為輸入的FC資料是高品質的。因此,分析平臺可以在進一步處理FC資料(例如:從矩陣形式轉化為向量形式)之前擷取或處理FC資料。 Instead of proposing clusters of cells at the cellular level, the analysis platform can encode large amounts of cell-level data into patient-level representations for automatic classification. To achieve this, the analysis platform can adopt a method for encoding FC data that relies on ML-based techniques (e.g., GMMs and Fisher vectors) to cluster FC data for different levels of recognition tasks. Training of the GMM model involves concatenating all cell-level data for all patients represented in the FC data used for training. Therefore, this method consumes a lot of computational resources. Therefore, in order to save computational resources, downsampling and/or aggregation can be used. Downsampling can be achieved by selecting subsets of the data (e.g., by uniformly sampling the data), while aggregation can be achieved by statistically representing clusters of cells that are grouped together under the assumption that the processed data are still likely to form a distribution similar to the original data. For example, the analysis platform can represent clusters of cells (e.g., 3, 5, or 10 cells) with a mean vector to reduce memory consumption. Importantly, the FC data provided as input to the classification model is of high quality. Therefore, the analysis platform can extract or process the FC data before further processing it (e.g., converting from matrix form to vector form).

作為資料截取階段的一部分,分析平臺可以執行(i)補償操作、(ii)圈選操作、和(iii)標準化操作。以下將進一步討論這些操作中的每一個。 As part of the data extraction phase, the analysis platform can perform (i) compensation operations, (ii) selection operations, and (iii) normalization operations. Each of these operations is discussed further below.

補償是分析平臺試圖藉由消除包含在FC資料集中的其他螢光強度的溢出信號來獲得每個螢光強度的純淨信號的過程。因此,補償是為了確保流式細胞儀的雷射性能在一個適當的範圍內。圖4說明了來自其他螢光強度的溢出信號是如何偏離目前感興趣的主要螢光強度的純信號。當使用者在散射圖上對螢光強度進行手動圈選時,這可能(而且經常)導致不恰當的結果。為了解決這 個問題,在進行任何實驗之前,使用者都會於流式細胞儀運行過程中使用補償珠,以建立一個補償設置。換句話說,補償珠可以在沒有任何樣本的情況下通過流式細胞儀,以試圖建立溢出信號。這樣,流式細胞儀將產生一個溢出矩陣,以後可用於計算、推斷或以其他方式確定每個螢光強度之純信號。溢出矩陣一般由流式細胞儀保存在一段時間內生成的每個FCS檔案的文本段中。 Compensation is the process by which the analysis platform attempts to obtain a pure signal for each fluorescence intensity by eliminating the spillover signal from other fluorescence intensities contained in the FC dataset. Compensation is therefore performed to ensure that the laser performance of the flow cytometer is within an appropriate range. Figure 4 illustrates how the spillover signal from other fluorescence intensities deviates from the pure signal of the primary fluorescence intensity of interest. This can (and often does) lead to inappropriate results when the user manually circles the fluorescence intensities on the scatter plot. To address this issue, the user uses compensation beads during the flow cytometer run to establish a compensation setting before any experiment is performed. In other words, the compensation beads can be passed through the flow cytometer without any sample in an attempt to create a spillover signal. In doing so, the flow cytometer will generate a spillover matrix that can later be used to calculate, infer, or otherwise determine the pure signal of each fluorescence intensity. The spillover matrix is typically saved by the flow cytometer in a text segment of each FCS file generated over time.

當分析平臺獲得FCS檔案時,分析平臺不僅可以從資料段中擷取FC資料集,而且可以從文本段中擷取溢出矩陣。然後,分析平臺可以使用溢出矩陣來執行補償操作。換句話說,分析平臺可以利用溢出矩陣,從從FCS檔案中擷取的原始FC資料集產生一個補償的FC資料集。一般來說,溢出矩陣是一個n x n矩陣,其中「n」是與相應樣本相關的螢光標記物的數量。考慮到每一行是相應螢光標誌的原始測量值,那麼同一行中的每個數位可以代表一個螢光標誌對測量的貢獻。該貢獻被稱為「溢出係數」,且其最大值為1。因此,溢出矩陣的對角線元素都是1,而其餘的數字都在0和1之間。藉由將溢出矩陣的反轉與每個螢光標誌的未補償資料矩陣相乘,可以使用溢出矩陣來計算每個螢光標誌的補償測量。 When the analysis platform obtains an FCS file, the analysis platform can not only extract the FC dataset from the data segment, but also extract the overflow matrix from the text segment. Then, the analysis platform can use the overflow matrix to perform compensation operations. In other words, the analysis platform can use the overflow matrix to generate a compensated FC dataset from the original FC dataset extracted from the FCS file. Generally speaking, the overflow matrix is an n x n matrix, where "n" is the number of fluorescent markers associated with the corresponding sample. Considering that each row is the original measurement value of the corresponding fluorescent marker, each digit in the same row can represent the contribution of a fluorescent marker to the measurement. This contribution is called the "overflow coefficient" and has a maximum value of 1. Therefore, the diagonal elements of the overflow matrix are all 1, and the remaining numbers are between 0 and 1. The overflow matrix can be used to calculate the compensated measurement for each fluorescence flag by multiplying the inverse of the overflow matrix with the uncompensated data matrix for each fluorescence flag.

值得注意的是,補償操作可能不會在每個實施例中被執行。舉例來說,只有當分析平臺確定FC資料集的品質不足以進行訓練或分類時,才可能需要補償。分析平臺可以根據對原始FC資料集的分析來確定品質。舉例來說,分析平臺可以嘗試確定FC資料集中包含的測量的密度、分佈或絕對值是否滿足共同定義品質的標準。舉例來說,分析平臺可以藉由計算分析確定類似於圖4中的散射圖中標有「未補償」的FC資料集具有足夠的品質,而分析平臺可以透過 計算分析確定類似於圖5中的散射圖中標有「已補償」的FC資料集具有足夠的品質。可能為了需要更好的品質,而使分析平臺能夠以更好的精度進行自動分析。 It is worth noting that compensation operations may not be performed in every embodiment. For example, compensation may only be required when the analysis platform determines that the quality of the FC dataset is not sufficient for training or classification. The analysis platform can determine the quality based on analysis of the original FC dataset. For example, the analysis platform can attempt to determine whether the density, distribution, or absolute values of the measurements contained in the FC dataset meet the criteria that jointly define quality. For example, the analysis platform can determine through computational analysis that the FC data set labeled "uncompensated" in the scatter plot similar to Figure 4 is of sufficient quality, and the analysis platform can determine through computational analysis that the FC data set labeled "compensated" in the scatter plot similar to Figure 5 is of sufficient quality. It may be necessary to require better quality so that the analysis platform can perform automatic analysis with better accuracy.

單一圈選(或簡稱「圈選」)是指在FC資料集的內容實際圈選之前,將非特異性結合事件或雙聯體的不準確信號從該資料集中去除的過程。舉例來說,假設兩個細胞同時被流式細胞儀測量,因為這些細胞在通過雷射光束時是對齊的。為了確保由流式細胞儀產生的相應測量值不影響分類模型的性能,可能希望從FC資料集中刪除相應的測量值。 Single binning (or simply "binning") refers to the process of removing inaccurate signals from non-specific binding events or doublets from an FC dataset before its contents are actually binned. For example, suppose two cells are measured simultaneously by a flow cytometer because the cells are aligned as they pass through the laser beam. In order to ensure that the corresponding measurements produced by the flow cytometer do not affect the performance of the classification model, it may be desirable to remove the corresponding measurements from the FC dataset.

此過程(又稱「雙聯體排除」)歷來涉及繪製高度或寬度與FSC或SSC的面積。舉例來說,圖5說明了如何生成散射圖,其中FSC-H沿Y軸,FSC-A沿X軸,以方便人工單倍體圈選。為了進行圈選,使用者傳統上是藉由在散射圖上定義一個區域來確定單細胞的區域。這種方法依賴於FSC-H和FSC-A之間的線性關係,因此該區域通常沿直線繪製,大致相當於圖5中所示之對角線。 This process (also known as "doublet exclusion") has traditionally involved plotting the area of height or width versus FSC or SSC. For example, Figure 5 illustrates how a scatter plot is generated with FSC-H along the Y-axis and FSC-A along the X-axis to facilitate manual haploid binning. To perform binning, users have traditionally determined the area of single cells by defining a region on the scatter plot. This method relies on the linear relationship between FSC-H and FSC-A, so the region is usually plotted along a straight line, roughly corresponding to the diagonal line shown in Figure 5.

為了消除手動圈選所固有的模糊性,分析平臺可以執行一個功能,即以自動方式執行圈選或雙聯體識別。該功能可以說明確保FC資料集中的每個值對應於單一個細胞。圖6為一示意流程圖用以揭示自動執行單一圈選程序600。在一開始,分析平臺可以刪除其FSC-A的值達到閾值之細胞(步驟601)。舉例來說,分析平臺可以刪除其FSC-A的值為最大值之所有細胞。最大值可以是218,這是在線性比例中FC資料可能的最高值。在另一具體實施例中,分析平臺可以刪除其FSC-A的值在整個FC資料集的前2%、3%或5%之所有細胞。因此,閾值可以被程式設計在分析平臺可執行的指令中,或者閾值可以由分析平臺根據FC資料集動態地確定。在執行單體圈選時,FSC-H和FSC-A以線性比例顯示,因此該步驟可由分析平臺執行,以模擬醫療專業人員的操作。更具體地,該步驟可 自動執行,以去除不自然地「黏」在散射圖右側的細胞,如圖5所揭示,因為如果由醫療專業人員手動定義,這些細胞不會被包含在該區域中。 To eliminate the ambiguity inherent in manual ring selection, the analysis platform can perform a function that performs ring selection or doublet identification in an automated manner. This function can be explained to ensure that each value in the FC data set corresponds to a single cell. Figure 6 is a schematic flow chart to reveal the automatic execution of a single ring selection process 600. At the beginning, the analysis platform can delete cells whose FSC-A values reach a threshold (step 601). For example, the analysis platform can delete all cells whose FSC-A values are the maximum value. The maximum value can be 218, which is the highest possible value of FC data in a linear scale. In another specific embodiment, the analysis platform can delete all cells whose FSC-A values are in the top 2%, 3% or 5% of the entire FC data set. Therefore, the threshold can be programmed into the command executable by the analysis platform, or the threshold can be determined dynamically by the analysis platform based on the FC dataset. When performing singleton gating, FSC-H and FSC-A are displayed in a linear scale, so this step can be performed by the analysis platform to simulate the operation of medical professionals. More specifically, this step can be performed automatically to remove cells that are unnaturally "stuck" to the right side of the scatter plot, as revealed in Figure 5, because these cells would not be included in this area if it was manually defined by a medical professional.

然後,分析平臺可以在包含其餘細胞的散射圖上對分佈最密集的細胞進行圈選(步驟602)。更具體而言,分析平臺可以依據包含在剩餘細胞的補償FC資料集中的FSC-H和FSC-A值產生散射圖,然後分析平臺可以在散射圖上對分佈最密集的細胞進行圈選。舉例來說,分析平臺可以對散射圖上90%、95%或98%的最密集分佈的細胞進行圈選。這個百分比可以被稱為「圈選分數」。由於FSC-H和FSC-A之間的高度線性關係,這些圈選應該主要捕獲單細胞而不是雙連細胞。 Then, the analysis platform can circle the most densely distributed cells on the scatter plot containing the remaining cells (step 602). More specifically, the analysis platform can generate a scatter plot based on the FSC-H and FSC-A values contained in the compensated FC data set of the remaining cells, and then the analysis platform can circle the most densely distributed cells on the scatter plot. For example, the analysis platform can circle the 90%, 95%, or 98% most densely distributed cells on the scatter plot. This percentage can be called the "circle score". Due to the highly linear relationship between FSC-H and FSC-A, these circles should mainly capture single cells rather than doublets.

因此,分析平臺可以計算在步驟602之後仍然保留的圈選細胞之間的決定係數(R2)(步驟603)。如果R2值超過上限值(例如:0.80、0.85或0.90),由分析平臺執行的函數可以返回FC資料集中與這些細胞相關的資料,然後終止。否則,該函數可以指示分析平臺重複執行步驟602至603,每次都以預定的量(例如:2%、3%、5%或10%)減少圈選分數,直到R2值超過上限。如果當圈選分數達到較低的閾值(例如:70%、75%或80%)時,R2值仍然沒有超過上限,分析平臺可以生成一個警報,指明樣本在FSC-H和FSC-A之間缺乏線性。因為其可能導致使用FC資料集之進一步問題(例如:在訓練或分類中),分析平臺可以簡單地返回原始FC資料集或已補償的FC資料集。 Therefore, the analysis platform can calculate the coefficient of determination (R 2 ) between the gated cells that remain after step 602 (step 603). If the R 2 value exceeds an upper limit (e.g., 0.80, 0.85, or 0.90), the function executed by the analysis platform can return the data associated with these cells in the FC data set and then terminate. Otherwise, the function can instruct the analysis platform to repeatedly execute steps 602 to 603, each time reducing the gated score by a predetermined amount (e.g., 2%, 3%, 5%, or 10%) until the R 2 value exceeds the upper limit. If the R2 value still does not exceed the upper limit when the gated score reaches a lower threshold (e.g., 70%, 75%, or 80%), the analysis platform can generate an alert indicating that the sample lacks linearity between FSC-H and FSC-A. Because it may cause further problems in using the FC dataset (e.g., in training or classification), the analysis platform can simply return to the original FC dataset or the compensated FC dataset.

AI輔助分析代表一種具有吸引力之選擇,其與個人負責手動檢查FC資料集相比,能以更具系統性和一致性的方式對FC資料集所代表的樣本進行分類。然而,為了使人工智慧輔助分析對臨床和診斷實踐產生廣泛影響,機構可能有必要遵守協議。 AI-assisted analysis represents an attractive option to classify samples represented by FC datasets in a more systematic and consistent manner than individuals manually reviewing FC datasets. However, in order for AI-assisted analysis to have a widespread impact on clinical and diagnostic practice, it may be necessary for institutions to adhere to protocols.

標準化是分析平臺可以克服FC資料集的非標準化處理問題的過程。標準化可以作為改善分類模型的性能和訓練穩定性的一種手段,FC資料集作為輸入而被提供給它以用於訓練或分類目的。圖7為一程序700之示意流程圖用以揭示將從流式細胞儀標準檔案(FCS檔案)中提取出的FC資料集進行標準化。如上述,分析平臺通常將在執行補償和圈選操作之後執行標準化操作,以確保在這些值被標準化之前去除不適當和不準確之值。 Normalization is a process by which the analysis platform can overcome the problem of non-standardized processing of FC datasets. Normalization can be used as a means to improve the performance and training stability of the classification model to which the FC dataset is provided as input for training or classification purposes. FIG. 7 is a schematic flow chart of a process 700 for normalizing an FC dataset extracted from a flow cytometer standard file (FCS file). As described above, the analysis platform will typically perform a normalization operation after performing compensation and ringing operations to ensure that inappropriate and inaccurate values are removed before these values are normalized.

如前述,FC資料集通常將包含複數參數之值。舉例來說,除了一個或複數個螢光標誌參數之值外,FC資料集還可以包含一個或複數個光散射參數之值。作為標準化操作之一部分,分析平臺最初可將屬於每個參數的值匯總為一個獨特的特徵維度(步驟701)。然後,分析平臺可以將獨特的特徵維度重新取樣到相同的樣本量,以確保每個參數具有相同數量的細胞(步驟702)。值得注意的是,在一些具體實施例中,分析平臺可以對獨特的特徵維度進行重新取樣,以使參數具有大致相同的細胞數(例如:在2%、5%或10%之內)而不是完全相同的細胞數。 As previously mentioned, FC data sets will typically contain values for multiple parameters. For example, in addition to values for one or more fluorescence marker parameters, FC data sets may also contain values for one or more light scattering parameters. As part of the normalization operation, the analysis platform may initially aggregate the values belonging to each parameter into a unique feature dimension (step 701). The analysis platform may then resample the unique feature dimension to the same sample size to ensure that each parameter has the same number of cells (step 702). It is worth noting that in some specific embodiments, the analysis platform may resample the unique feature dimension so that the parameters have approximately the same number of cells (e.g., within 2%, 5%, or 10%) rather than exactly the same number of cells.

在一實施例中,螢光標誌參數(例如:CD56-APC)的值可以作為單個參數在多個樣本聚集,以確保值的數量(即細胞的數量)符合藉由重新採樣所確定的計數標準。在另一實施例中,光散射參數(例如:FSC-A或SSC-A)的值可以在多個樣本中匯總然後降採樣,以確保值的數量(從而細胞的數量)符合通過重新取樣確定的計數標準。更上位地,計數標準可以代表分析平臺確定的適當的樣本數量。 In one embodiment, the values of a fluorescent marker parameter (e.g., CD56-APC) can be aggregated across multiple samples as a single parameter to ensure that the number of values (i.e., the number of cells) meets the counting criteria determined by resampling. In another embodiment, the values of a light scatter parameter (e.g., FSC-A or SSC-A) can be aggregated across multiple samples and then downsampled to ensure that the number of values (and thus the number of cells) meets the counting criteria determined by resampling. More generally, the counting criteria can represent the appropriate number of samples determined by the analysis platform.

以下將進一步說明,藉由標準化分析平臺可以生成一個經過處理的FC資料集,該資料集可以被框架的其他元素作為一個輸入。舉例來說,分析 平臺可以根據z-score標準化技術進行標準化,以確保FC資料集中的數值處於類似的尺度(scale)上(步驟703),從而產生一個經過處理的FC資料集。z-score標準化技術是一種尺度變異(a variation of scaling),其代表遠離平均值的標準差的數量。計算z-score中的值(x)之公式如下:

Figure 110134284-A0305-12-0023-1
其中μ是平均值,σ是標準差。z-score標準化技術可以確保分佈的平均值為0並且標準差為1,因此其為有用的,當有一些離群值但沒有多到需要採取更嚴厲的措施(例如:削減)時。分析平臺還可使用其他標準化技術,舉例來說,分析平臺可以執行對範圍的縮放、削減或對數縮放,以取代或補充Z-score。 As will be further described below, by standardization, the analysis platform can generate a processed FC dataset that can be used as an input by other elements of the framework. For example, the analysis platform can perform standardization based on the z-score standardization technique to ensure that the values in the FC dataset are on a similar scale (step 703), thereby generating a processed FC dataset. The z-score standardization technique is a variation of scaling, which represents the number of standard deviations away from the mean. The formula for calculating the value (x) in the z-score is as follows:
Figure 110134284-A0305-12-0023-1
Where μ is the mean and σ is the standard deviation. The z-score normalization technique ensures that the mean of the distribution is 0 and the standard deviation is 1, so it is useful when there are some outliers but not so many that more drastic measures (such as clipping) are required. The analytics platform can also use other normalization techniques. For example, the analytics platform can perform range scaling, clipping, or logarithmic scaling instead of or in addition to the z-score.

因此,為了處理從FCS檔案中擷取的FC資料集,分析平臺可以執行(i)補償操作、(ii)圈選操作、和(iii)標準化操作。此後,FC資料集可以存儲在分析平臺可以訪問的儲存媒體中,或者FC資料集可以由分析平臺按照適當的計算管道進一步處理。也可以執行其他步驟。舉例來說,分析平臺可以生成處理後留在FC資料集中的數值(例如:FSC-H和FSC-A之值)的視覺指標,作為允許使用者審查分析平臺如何自動補償、圈選和標準化FC資料集的方法。在一實施例中,分析平臺可以生成一份報告,其中包含對處理後留在FC資料集中的數值的分析。或是,分析平臺可以生成一個散射圖,其包含處理後留在FC資料集中的數值。無論其形式和內容如何,視覺指標可以發佈到由分析平臺生成的介面上以供使用者審查。 Therefore, to process the FC dataset extracted from the FCS file, the analysis platform can perform (i) compensation operations, (ii) selection operations, and (iii) standardization operations. Thereafter, the FC dataset can be stored in a storage medium accessible to the analysis platform, or the FC dataset can be further processed by the analysis platform according to an appropriate computational pipeline. Other steps can also be performed. For example, the analysis platform can generate visual indicators of the values remaining in the FC dataset after processing (e.g., FSC-H and FSC-A values) as a method to allow the user to review how the analysis platform automatically compensates, selects, and standardizes the FC dataset. In one embodiment, the analysis platform can generate a report that includes an analysis of the values remaining in the FC dataset after processing. Alternatively, the analysis platform can generate a scatter plot containing the values remaining in the FC dataset after processing. Regardless of its form and content, the visual indicators can be published to an interface generated by the analysis platform for user review.

以預定的方式處理FC資料集,可以確保分析平臺在相對較短的時間內分析大量的資料並提高品質,而且信號偏移的影響在很大程度上(假若不是完全的狀態下)獲得緩解。 Processing FC data sets in a predetermined manner ensures that the analysis platform can analyze large amounts of data in a relatively short time with improved quality, and the effects of signal offset are largely (if not completely) mitigated.

C.資料轉換 C. Data conversion

雖然藉由分析原始FC資料或「處理過的」FC資料可以獲得有用的分析,但分類模型可能難以處理這些資料,特別是如果分類模型的任務是在短時間內處理數十或數百個樣本的資料。因此,分析平臺可以將處理過的FC資料轉換成更適合進一步使用之形式。特別是,分析平臺可將處理過的FC資料轉化為很適合輸入分類模型之形式。 Although useful analysis can be obtained by analyzing raw or "processed" FC data, classification models may have difficulty processing such data, especially if the classification model is tasked with processing data from dozens or hundreds of samples in a short period of time. Therefore, analysis platforms can transform processed FC data into a form that is more suitable for further use. In particular, analysis platforms can transform processed FC data into a form that is well suited for input into classification models.

圖8為一上位圖示用以揭示處理過的FC資料從其矩陣形式轉化為向量的過程。該過程可由分析平臺執行,作為資料轉換步驟的一部分(例如:圖2A的資料轉換步驟206)。更上位地,分析平臺可以執行該過程,將處理過的FC資料轉換成可由分類模型更容易處理的形式。在一實施例中,可以使用Fisher向量編碼和GMM分佈將處理後的FC資料轉化為向量804。在轉換發生後,每個樣本的表徵可以是一個高維度向量,該表徵相應病人的樣本表型。該表徵可以很容易地被不同類型的分類模型(包含SVMs、DNNs和隨機森林)使用。 FIG8 is a high-level diagram illustrating the process of converting processed FC data from its matrix form to a vector. The process can be performed by the analysis platform as part of a data conversion step (e.g., data conversion step 206 of FIG2A ). More generally, the analysis platform can perform the process to convert the processed FC data into a form that can be more easily processed by a classification model. In one embodiment, the processed FC data can be converted into a vector 804 using Fisher vector encoding and GMM distribution. After the conversion occurs, the representation of each sample can be a high-dimensional vector that corresponds to the sample phenotype of the patient. The representation can be easily used by different types of classification models (including SVMs, DNNs, and random forests).

在一開始,分析平臺可以獲得經過處理的FC資料矩陣800(步驟850)。如前述圖4至7所說明,處理過的FC資料矩陣800通常由分析平臺藉由處理「原始」FC資料矩陣所產生。因此,處理過的FC資料矩陣800可以隨時提供給分析平臺,並且該過程可以簡單地成為由分析平臺執行的框架(例如:圖2A中框架200)中的下一個階段。 At the beginning, the analysis platform can obtain the processed FC data matrix 800 (step 850). As described in Figures 4 to 7 above, the processed FC data matrix 800 is usually generated by the analysis platform by processing the "raw" FC data matrix. Therefore, the processed FC data matrix 800 can be provided to the analysis platform at any time, and the process can simply become the next stage in the framework (e.g., framework 200 in Figure 2A) executed by the analysis platform.

可替代地,分析平臺可以從其他地方獲得經過處理的FC資料矩陣800。舉例來說,分析平臺可以持續或定期地獲得由流式細胞儀產生的FCS檔案。如上述,分析平臺可以處理每個FCS檔案中包含的「原始」FC資料矩陣。然而,分析平臺可以將處理後的FC資料矩陣存儲在儲存媒體中以供將來使用,而 不是立即轉換處理後的FC資料矩陣。因此,分析平臺在處理「原始」FC資料矩陣後,可能不會立即執行圖8所揭示之過程。相反,分析平臺可以儲存經過處理的FC資料矩陣,以便它可以執行「批量訓練」方案,其中訓練定期發生(並且經過處理的FC資料矩陣只需要定期轉換)。 Alternatively, the analysis platform may obtain the processed FC data matrix 800 from elsewhere. For example, the analysis platform may continuously or periodically obtain FCS files generated by a flow cytometer. As described above, the analysis platform may process the "raw" FC data matrix contained in each FCS file. However, the analysis platform may store the processed FC data matrix in a storage medium for future use, rather than immediately converting the processed FC data matrix. Therefore, the analysis platform may not immediately perform the process disclosed in FIG. 8 after processing the "raw" FC data matrix. Instead, the analytics platform can store the processed FC data matrix so that it can perform a "batch training" scheme where training occurs periodically (and the processed FC data matrix only needs to be transformed periodically).

然後,分析平臺可以依據處理過的FC資料矩陣800創建混合模型802(步驟851)。更上位地,混合模型是一種概率模型,其目的是藉由聚類可比的值來代表處理過的FC資料矩陣內細胞類型的存在。因此,混合模型802可以對應於代表細胞類型觀察值在由加工的FC資料矩陣800代表的整個樣本中的概率分佈的混合分佈。該混合模型的一個例子是GMM。 The analysis platform can then create a mixture model 802 based on the processed FC data matrix 800 (step 851). More generally, a mixture model is a probabilistic model that aims to represent the presence of cell types within the processed FC data matrix by clustering comparable values. Thus, the mixture model 802 can correspond to a mixture distribution representing the probability distribution of cell type observations in the entire sample represented by the processed FC data matrix 800. An example of such a mixture model is a GMM.

因此,可使用ML演算法計算混合模型802的梯度,以推導出處理過的FC資料矩陣800的向量表徵(步驟852)。這種依據梯度的特徵空間轉換可以依靠距離函數來估計處理過的FC資料矩陣800中的細胞和由GMM定義的聚類間之關係。在Fisher向量中使用的Fisher kernel distance是距離函數的一個例子,該函數依據概率性叢集(cluster)分佈來測量高階關係。因此,衍生的向量表示可以利用與每個叢集的關係來代表處理過的FC資料矩陣的複雜細胞分佈。舉例來說,分析平臺可以使用混合模型計算Fisher向量,以構建向量804。雖然混合物模型802可以嘗試對可比值進行叢集,但Fisher向量化(當由分析平臺執行時)可以基於混合物模型802的訓練參數對處理過的FC資料矩陣進行進一步編碼。 Therefore, the gradient of the mixture model 802 can be calculated using an ML algorithm to derive a vector representation of the processed FC data matrix 800 (step 852). This gradient-based feature space transformation can rely on a distance function to estimate the relationship between cells in the processed FC data matrix 800 and the clusters defined by the GMM. The Fisher kernel distance used in the Fisher vector is an example of a distance function that measures high-order relationships based on a probabilistic cluster distribution. Therefore, the derived vector representation can represent the complex cell distribution of the processed FC data matrix using the relationship with each cluster. For example, the analysis platform can use the mixture model to calculate the Fisher vector to construct the vector 804. While the mixture model 802 may attempt to cluster comparable values, Fisher vectorization (when performed by the analysis platform) may further encode the processed FC data matrix based on the trained parameters of the mixture model 802.

向量804的維度可以依據處理過的FC資料矩陣800的維度和叢集編號(又稱「混合物編號」)。相應地,如果處理過的FC資料矩陣800包含如上述之各種維度,那麼向量804可以是高維度向量。在處理過的FC資料矩陣800中代表的每個細胞可以與高維度向量中的多個條目相關聯,這些條目中的每個條 目可以對應於不同的參數(例如:FSC、SSC、螢光強度、以及諸如振幅、頻率等特徵)以描述與GMM中叢集分佈之關係。將基礎FC資料表示為高維度向量的好處來自高維度(例如:n=17、23或更多參數乘以混合物數量,其可能導致數百或數千個維度),部分來自於藉由學習獲得對不同維度之間相互聯繫的更大洞察能力。 The dimension of vector 804 may be based on the dimension of processed FC data matrix 800 and the cluster number (also called "mixture number"). Accordingly, if processed FC data matrix 800 includes various dimensions as described above, then vector 804 may be a high-dimensional vector. Each cell represented in processed FC data matrix 800 may be associated with multiple entries in the high-dimensional vector, and each of these entries may correspond to different parameters (e.g., FSC, SSC, fluorescence intensity, and features such as amplitude, frequency, etc.) to describe the relationship with the cluster distribution in the GMM. The benefits of representing the underlying FC data as high-dimensional vectors come from the high dimensionality (e.g., n=17, 23, or more parameters times the number of mixtures, which may result in hundreds or thousands of dimensions), and in part from the ability to gain greater insight into the connections between different dimensions through learning.

藉由使用整個訓練資料集(例如:由複數個FC資料集組成)訓練的GMM,分析平臺可以計算每個細胞水平FC資料集的後驗概率,以確定細胞屬於由GMM定義的每個「叢集」或「混合物」之可能性。藉由考慮每個叢集的後驗概率以及細胞向量與為每個叢集創建的中心向量之間的距離,可以使用Fisher向量來轉換細胞向量。Fisher向量中使用的這個距離考慮了平均向量、共變異數矩陣和GMM之權重,因此其可代表細胞向量和每個叢集之間複雜的高階關係。Fisher向量是藉由後驗概率權衡距離的方法中的一個例子。有了GMM參數,其他距離函數也可以被應用於估計細胞與叢集的關係。最後,每個FC資料集可以由一個平均的細胞表示,嵌入其後驗概率的資訊和它與叢集的關係。 By using a GMM trained on the entire training dataset (e.g., consisting of multiple FC datasets), the analysis platform can calculate the posterior probability for each cell-level FC dataset to determine the likelihood that the cell belongs to each "cluster" or "mixture" defined by the GMM. Fisher vectors can be used to transform the cell vector by considering the posterior probability of each cluster and the distance between the cell vector and the center vector created for each cluster. This distance used in the Fisher vector takes into account the mean vector, the covariance matrix, and the weights of the GMM, so it can represent the complex, high-order relationship between the cell vector and each cluster. Fisher vectors are an example of a method that weighs distances by posterior probabilities. With the GMM parameters, other distance functions can also be applied to estimate the relationship between cells and clusters. Finally, each FC dataset can be represented by an averaged cell, embedding the information of its posterior probability and its relationship with the cluster.

D.訓練 D. Training

圖9為一示意流程圖用以揭示用於訓練模型以分類血液病之程序900。在一開始,分析平臺可以接收指示選擇一個或複數個來源以獲得FC資料之輸入(步驟901)。在一實施例中,輸入可以指定多個資料庫,其存儲有單獨的FCS檔案集(例如:與不同的病人相關、由不同的流式細胞儀產生)。在另一實施例中,輸入可以指定複數個流式細胞儀,從其獲取FCS檔案。在選擇一個以上來源的具體實施例中,從每個來源獲得的FC資料通常與不同的病人組別有關。 然而,病人可以包含在兩個組別中。另外,輸入可以指定一個單一的資料庫或流式細胞儀,從其獲取FCS檔案。 FIG9 is a schematic flow chart illustrating a process 900 for training a model to classify blood diseases. Initially, the analysis platform may receive an input indicating the selection of one or more sources to obtain FC data (step 901). In one embodiment, the input may specify multiple databases storing separate sets of FCS files (e.g., associated with different patients and generated by different flow cytometers). In another embodiment, the input may specify multiple flow cytometers from which FCS files are obtained. In a specific embodiment in which more than one source is selected, the FC data obtained from each source is typically associated with a different group of patients. However, a patient may be included in two groups. Alternatively, the input can specify a single database or flow cytometer from which to obtain FCS files.

然後,分析平臺可以從一個或多個來源獲得FC資料的多個矩陣,該矩陣描述包含用螢光標誌標記細胞之樣本(步驟902)。舉例來說,分析平臺可以獲得由上述流式細胞儀生成的多個FCS檔案,然後分析平臺可以從每個FCS檔案中擷取FC資料之矩陣。 Then, the analysis platform can obtain multiple matrices of FC data from one or more sources, the matrices describing samples containing cells labeled with fluorescent markers (step 902). For example, the analysis platform can obtain multiple FCS files generated by the above-mentioned flow cytometer, and then the analysis platform can extract the matrix of FC data from each FCS file.

多個FC資料矩陣的性質可取決於分析平臺訓練分類模型之目標。舉例來說,假設分析平臺對訓練分類模型以區分四種不同的血液學疾病有興趣。在這種情況下,對應於FC資料的多個矩陣的樣本可能是已知對應於該四種不同血液疾病的確認實例。因此,分析平臺可以為每種感興趣的血液學疾病獲得至少一個FC資料矩陣。 The nature of the multiple FC data matrices may depend on the analysis platform's goal of training a classification model. For example, suppose the analysis platform is interested in training a classification model to distinguish four different hematological diseases. In this case, the samples corresponding to the multiple matrices of FC data may be confirmed instances known to correspond to the four different hematological diseases. Therefore, the analysis platform may obtain at least one FC data matrix for each hematological disease of interest.

雖然每個FC資料矩陣的內容可能不同,但其結構往往是相當一致的。舉例來說,每個矩陣可以包含:FSC值、SSC值,或由N參數在M波長上的螢光值,其中M和N之數值是整數。因此,每個矩陣可以包含:第一組FSC值、第二組SSC值、或第三組螢光值。 Although the contents of each FC data matrix may be different, their structure is often quite consistent. For example, each matrix may contain: FSC values, SSC values, or fluorescence values at M wavelengths from N parameters, where M and N are integers. Therefore, each matrix may contain: a first set of FSC values, a second set of SSC values, or a third set of fluorescence values.

然後,分析平臺可以執行將FC資料的多個矩陣轉換為FC資料的多個向量的函數(步驟903)。當執行時,該函數可以獨立地將FC資料的每個矩陣轉換為FC資料之相應向量。一般來說,這是藉由使用ML演算法來完成的。舉例來說,如上所述,FC資料的每個矩陣可以使用Fisher向量編碼和GMM分佈轉換為FC資料之相應向量。在函數藉由Fisher向量編碼轉換FC資料的矩陣之具體實施例中,每個向量可以是包含在相應矩陣中的FC資料的Fisher向量表徵。 Then, the analysis platform may execute a function that converts multiple matrices of FC data into multiple vectors of FC data (step 903). When executed, the function may independently convert each matrix of FC data into a corresponding vector of FC data. Generally, this is accomplished by using an ML algorithm. For example, as described above, each matrix of FC data may be converted into a corresponding vector of FC data using Fisher vector encoding and GMM distribution. In a specific embodiment where the function converts matrices of FC data by Fisher vector encoding, each vector may be a Fisher vector representation of the FC data contained in the corresponding matrix.

因此,分析平臺可以提供(i)FC資料的多個向量和(ii)對應的標誌集給分類模型作為訓練資料,以產生訓練好的分類模型(步驟904)。每組標誌可以代表相應向量中編碼或表徵的免疫表型集合的類型,以及相應樣本所代表的血液學疾病的類型。舉例來說,每個標誌集可用以代表其相應向量中每個被描述細胞所代表疾病類型、疾病狀態或生理狀態。因此,標誌可能不會說明分類模型學習如何對單個細胞進行分類,而是學習如何根據其免疫表型集合的分佈對整個樣本(例如:在多種血液病中)進行分類。如上所述,如果向量包含與一種以上的血液病有關的FC資料,那麼分類模型可以被訓練來區分多種血液病(例如:ALL、AML、APM和全部血球減少症)。 Therefore, the analysis platform can provide (i) multiple vectors of FC data and (ii) corresponding marker sets to the classification model as training data to generate a trained classification model (step 904). Each set of markers can represent the type of immunophenotype set encoded or represented in the corresponding vector, as well as the type of hematological disease represented by the corresponding sample. For example, each marker set can be used to represent the type of disease, disease state, or physiological state represented by each described cell in its corresponding vector. Therefore, the marker may not explain how the classification model learns to classify a single cell, but rather learns how to classify the entire sample (for example: in multiple blood diseases) based on the distribution of its immunophenotype set. As mentioned above, if the vector contains FC data related to more than one blood disease, then the classification model can be trained to distinguish multiple blood diseases (e.g., ALL, AML, APM, and pancytopenia).

在一些具體實施例中,FC資料的多個向量和相應的標誌集是被包含在用於訓練分類模型的較大的訓練資料集中。該較大的訓練資料集可進一步包含關於一個或複數個光學參數和/或一個或複數個螢光標誌參數之資訊。光學參數之示例包含:正向散射光面積(FSC-A)、正向散射光寬度(FSC-W)、正向散射光高度(FSC-H)、側向散射光面積(SSC-A)、側向散射光寬度(SSC-W)、和側向散射光寬度(SSC-H)。同時,螢光標誌參數之示例包含:CD117_PerCP-Cy5-A、KAPPA_FITC-A、HLA-DR_V450-A、CD38_APC-H7-A、和CD123_PE-A。 In some embodiments, multiple vectors of FC data and corresponding sets of signatures are included in a larger training dataset used to train a classification model. The larger training dataset may further include information about one or more optical parameters and/or one or more fluorescence signature parameters. Examples of optical parameters include: forward scattered light area (FSC-A), forward scattered light width (FSC-W), forward scattered light height (FSC-H), side scattered light area (SSC-A), side scattered light width (SSC-W), and side scattered light width (SSC-H). Meanwhile, examples of fluorescent marker parameters include: CD117_PerCP-Cy5-A, KAPPA_FITC-A, HLA-DR_V450-A, CD38_APC-H7-A, and CD123_PE-A.

然後,分析平臺可以將訓練好的分類模型存儲在資料結構中(步驟905)。如下所進一步說明,分析平臺隨後可使用訓練好的分類模型來產生代表不同血液病的建議診斷之分類。因此,分析平臺可以程式化地將訓練有素的分類模型與它可以產生建議診斷的每種血液病進行關聯。舉例來說,分析平臺可以 用識別字(例如:字母數位識別碼符)以填充資料結構,該識別字識別分類模型能夠產生建議診斷的血液學疾病。 The analysis platform may then store the trained classification model in a data structure (step 905). As further described below, the analysis platform may then use the trained classification model to generate classifications representing recommended diagnoses for different blood disorders. Thus, the analysis platform may programmatically associate a trained classification model with each blood disorder for which it can generate a recommended diagnosis. For example, the analysis platform may populate the data structure with an identifier (e.g., an alphanumeric identifier) that identifies the blood disorder for which the classification model can generate a recommended diagnosis.

綜合上述,FC資料的多個向量和代表FC資料中表徵的免疫表型類型的相應標誌集可被送至分類模型以進行訓練。因此,多個向量可代表訓練資料,可用於訓練分類模型,在不同的血液病中對給定的樣本進行分類。用於訓練分類模型的訓練資料可包含與不同樣本(即不同病人)相關的高維度向量之集合。一旦訓練完成,分類模型可以根據對其相應的FC資料的分析對樣本進行分類,以確定免疫表型集合之不同模式,然後根據免疫表型的全樣本分佈確定樣本是否代表一種血液疾病。 In summary, multiple vectors of FC data and corresponding sets of markers representing the types of immunophenotypes represented in the FC data can be sent to the classification model for training. Therefore, multiple vectors can represent training data that can be used to train the classification model to classify a given sample in different blood diseases. The training data used to train the classification model can include a set of high-dimensional vectors associated with different samples (i.e., different patients). Once the training is completed, the classification model can classify the samples based on the analysis of their corresponding FC data to determine different patterns of immunophenotype sets, and then determine whether the sample represents a blood disease based on the full sample distribution of the immunophenotype.

E.分類 E. Classification

圖10為一示意流程圖用以揭示藉由應用分類模型對樣本進行分類之程序1000。舉例來說,假設分析平臺收到一個指示請求的輸入,以根據檔案的內容提出對一種或多種血液疾病之診斷(步驟1001)。該輸入可以代表透過分析平臺生成的介面選擇檔案(即相應之病人),或者該輸入可以代表收到檔案(例如:來自流式細胞儀)。在一實施例中,該檔案可以按照FCS進行格式化。在這種情況下,分析平臺可以從該檔案中擷取第一種形式的FC資料,然後將FC資料轉化為第二種形式,使其更容易被分類模型處理。舉例來說,分析平臺可以從該檔案中擷取FC資料之矩陣(步驟1002)。然後,分析平臺可以執行一個函數,將FC資料之矩陣轉化為FC資料之向量(步驟1003)。該函數可以是前述圖9中步驟903所討論的同一函數。 FIG. 10 is a schematic flow chart for illustrating a process 1000 for classifying a sample by applying a classification model. For example, assume that an analysis platform receives an input indicating a request to make a diagnosis of one or more blood diseases based on the contents of a file (step 1001). The input may represent the selection of a file (i.e., the corresponding patient) through an interface generated by the analysis platform, or the input may represent the receipt of a file (e.g., from a flow cytometer). In one embodiment, the file may be formatted in accordance with FCS. In this case, the analysis platform may extract FC data in a first form from the file and then convert the FC data into a second form to make it easier to be processed by the classification model. For example, the analysis platform can extract the matrix of FC data from the file (step 1002). Then, the analysis platform can execute a function to convert the matrix of FC data into a vector of FC data (step 1003). The function can be the same function discussed in step 903 of Figure 9 above.

然後,分析平臺可以將FC資料之向量提供給分類模型以作為輸入,而獲得一個或多個輸出(步驟1004)。每個輸出可以代表對不同血液疾病的 建議診斷。因此,分析平臺可以根據輸出結果為樣本得出一個分類,該分類是由FC資料所代表的(步驟1005)。如上所述,由分類模型產生的輸出的數量可以依據在訓練階段提供訓練資料的血液學疾病的數量。通常分類模型被訓練成在應用於FC資料向量時產生多種血液學疾病之輸出;然而,分類模型可以被訓練成在應用於FC資料向量時產生一種血液學疾病的之單一輸出。在分類模型與單一血液病相關的具體實施例中,分析平臺可以應用多個分類模型,並且這些模型已被訓練為依照本文所述方法對不同的血液疾病進行分類。此外或備選地,由分類模型所產生的輸出的數量可以依據為指定的血液疾病定義的疾病狀態的數量和/或為MRD定義的數位範圍的數量。 The analysis platform can then provide the vector of FC data to the classification model as input to obtain one or more outputs (step 1004). Each output can represent a recommended diagnosis for a different blood disease. Therefore, the analysis platform can derive a classification for the sample based on the output results, which classification is represented by the FC data (step 1005). As described above, the number of outputs generated by the classification model can depend on the number of hematological diseases for which training data is provided during the training phase. Typically, the classification model is trained to generate outputs for multiple hematological diseases when applied to the FC data vector; however, the classification model can be trained to generate a single output for one hematological disease when applied to the FC data vector. In specific embodiments where the classification model is associated with a single blood disease, the analysis platform may apply multiple classification models that have been trained to classify different blood diseases according to the methods described herein. Additionally or alternatively, the number of outputs generated by the classification model may be based on the number of disease states defined for the specified blood disease and/or the number of digital ranges defined for MRD.

值得注意的是,本揭露過程中的步驟的順序是示例性的,但是步驟可以以各種順序和組合進行。例如,可將步驟添加到這些過程中,或從這些過程中移除。同樣地,步驟可以被替換或重新排序。因此,對這些過程的描述是開放式的。 It is worth noting that the order of steps in the disclosed processes is exemplary, but the steps can be performed in various orders and combinations. For example, steps can be added to these processes, or removed from these processes. Likewise, steps can be replaced or reordered. Therefore, the description of these processes is open-ended.

在一些具體實施例中還可以包含額外的步驟。舉例來說,分析平臺可以根據如上述的分類模型產生的輸出,以得出分類(例如:對血液病的建議診斷)。在這樣的情況下,分析平臺可以在一個介面上顯示該分類,該介面可供與基礎FC資料相關的病人訪問。同樣地,分析平臺可以使分類顯示在醫療專業人員可以訪問的介面上。在一些具體實施例中,分析平臺能夠與醫療保健提供者的中央電腦系統相連接。舉例來說,分析平臺能夠藉由資料介面訪問中央電腦系統以訪問FC資料。在這種情況下,分析平臺可能能夠將分類自動填充到相應病人的電子健康記錄(EHR)中。舉例來說,分析平臺可以將分類傳送給中央電腦系統,並發出指令將分類填入電子健康記錄,以便記錄。 Additional steps may also be included in some specific embodiments. For example, the analysis platform can generate a classification (e.g., a recommended diagnosis for a blood disease) based on the output generated by the classification model as described above. In such a case, the analysis platform can display the classification on an interface that is accessible to patients associated with basic FC data. Similarly, the analysis platform can display the classification on an interface accessible to medical professionals. In some specific embodiments, the analysis platform can be connected to a central computer system of a healthcare provider. For example, the analysis platform can access the central computer system through a data interface to access FC data. In this case, the analytics platform may be able to automatically populate the classification into the corresponding patient's electronic health record (EHR). For example, the analytics platform can transmit the classification to a central computer system and issue a command to populate the classification into the EHR for recording.

E.使用實例 E. Usage examples

在一個典型的環境中,本揭露所述的方法可用於進一步檢查感興趣的FC資料。在一實施例中,感興趣的FC資料可能對應於一個不確定的實驗室結果,,因此醫療專業人員希望在確定適當的處理方案之前對於該結果獲得進一步的資訊。為了達到此一目標,分析平臺可以將分類模型應用於所關注的FC資料。舉例來說,假設分類模型被訓練成對不同模式的免疫表型集合進行分類,以便區分多種血液疾病(例如:ALL、AML、APM和全部血球減少症)。藉由分類模型的迅速且準確地分類,醫療專業人員能夠選擇適當的一治療方案。 In a typical setting, the methods described in the present disclosure may be used to further examine the FC data of interest. In one embodiment, the FC data of interest may correspond to an uncertain laboratory result, and therefore the medical professional wishes to obtain further information about the result before determining an appropriate treatment plan. To achieve this goal, the analysis platform can apply a classification model to the FC data of interest. For example, assume that the classification model is trained to classify a set of immunophenotypes of different patterns in order to distinguish between multiple blood diseases (e.g., ALL, AML, APM, and total cytopenia). With the rapid and accurate classification of the classification model, the medical professional is able to select an appropriate treatment plan.

如上述,分析平臺可執行分類模型以便按類型對疾病或生理狀態進行自動分類。一般來說,分析平臺是自動分類系統(或簡稱「系統」)的一部分,以下請參照圖11至12以進一步討論。該系統可包含:一個流式細胞儀、一個可藉由網路訪問的伺服器系統、一個資料記憶體和一個電腦裝置(又稱「電子裝置」或「使用者裝置」)。在一些具體實施例中,整個系統是被設置在單一個殼體內。 As described above, the analysis platform can execute a classification model to automatically classify diseases or physiological conditions by type. Generally speaking, the analysis platform is part of an automatic classification system (or simply "system"), which is further discussed below with reference to Figures 11 to 12. The system may include: a flow cytometer, a server system accessible via a network, a data memory, and a computer device (also known as an "electronic device" or "user device"). In some specific embodiments, the entire system is disposed in a single housing.

一般來說,分析平臺自動分類樣本的過程始於使用者準備樣本並將該樣本插入流式細胞儀中。在一實施例中,該使用者可以準備一系列管子,且每個管子包含不同的樣本。每個管子都可能受到不同且合適的螢光標誌的影響。當該一系列試管被流式細胞儀檢查時,其會產生FC資料並且該FC資料會被編碼成獨立檔案。如上述,這些檔案可由分析平臺用於訓練分類模型,以產生對於診斷有用的輸出。 Generally speaking, the process of automatically classifying samples by the analysis platform begins with the user preparing the sample and inserting the sample into the flow cytometer. In one embodiment, the user can prepare a series of tubes, and each tube contains a different sample. Each tube may be subjected to different and appropriate fluorescent markers. When the series of test tubes are examined by the flow cytometer, FC data is generated and the FC data is encoded into independent files. As described above, these files can be used by the analysis platform to train classification models to produce outputs useful for diagnosis.

為了確保良好的覆蓋率,被分析平臺用來訓練分類模型的訓練資料集可以依據大量的檔案,或從大量的檔案中得到。在一實施例中,訓練資料集 可包含已知且已被診斷為ALL、AML或APL的幾千名(例如:1,000、2,000或4,000名)患者的FC資料。每個樣本可以與一個病人相關聯,儘管一個樣本可以與多個試管相關聯(從而由流式細胞儀生成多個檔案)。舉例來說,因為維度大小之限制,一個大約1,000至2,000個樣本的樣本集可能與大約4,000至12,000個試管相關聯。 To ensure good coverage, the training dataset used by the analysis platform to train the classification model can be based on or obtained from a large number of files. In one embodiment, the training dataset can contain FC data for several thousand (e.g., 1,000, 2,000, or 4,000) patients who are known to have been diagnosed with ALL, AML, or APL. Each sample can be associated with a single patient, although a sample can be associated with multiple test tubes (thus generating multiple files by the flow cytometer). For example, due to dimensionality limitations, a sample set of approximately 1,000 to 2,000 samples may be associated with approximately 4,000 to 12,000 test tubes.

為了更進一步說明其功效,本揭露所描述的框架被用來開發一個四種類分類模型,並且其為使用由流式細胞儀(即Becton Dickinson Bioscience的FASCantoII)生成的FCS檔案。FCS檔案對應於大約550個骨髓樣本,其中有大約100例為ALL,大約200例為AML,以及大約200例為無血液疾病的全部血球細胞減少症。這些診斷是依據常規的形態學、細胞遺傳學、分子學和臨床結果所進行判定的。將四個類別中每個類別中90%的樣本所利用抗體和螢光染料(fluorochrome)結合物之原始螢光強度和光散射參數來建立GMMs。對於每個GMM,使用Fisher向量化計算每個光散射參數的梯度,而得出用於訓練四種類分類模型的高維度表徵。 To further illustrate its efficacy, the framework described in this disclosure was used to develop a four-class classification model using FCS files generated by a flow cytometer (i.e., FASCantoII from Becton Dickinson Bioscience). The FCS files corresponded to approximately 550 bone marrow samples, of which approximately 100 were ALL, approximately 200 were AML, and approximately 200 were total cytopenias without blood disease. These diagnoses were determined based on conventional morphological, cytogenetic, molecular, and clinical findings. GMMs were established using raw fluorescence intensity and light scattering parameters of antibody and fluorochrome conjugates for 90% of the samples in each of the four classes. For each GMM, the gradient of each light scattering parameter is calculated using Fisher vectorization, resulting in a high-dimensional representation used to train four-class classification models.

為了評估其性能,準確性(ACC)被進一步使用且被用以定義人工和自動方法所做出的診斷之間的一致性。此外,敏感性和特異性也進一步依據接收者操作特徵(ROC)曲線下的面積進行評估,又稱為「AUC」。 To evaluate its performance, accuracy (ACC) was further used and was used to define the agreement between the diagnoses made by manual and automatic methods. In addition, sensitivity and specificity were further evaluated based on the area under the receiver operating characteristic (ROC) curve, also known as "AUC".

首先進行了單參數分析,並且發現與其他36個參數(包含經常用於衡量FC分析性能的31個標誌)相比,FSC-A提供了最高的準確性。研究中使用的完整參數列表包含:FSC-A、SSC-H、CD117_PerCP-Cy5-A、FSC-H、KAPPA_FITC-A、HLA-DR_V450-A、CD38_APC-H7-A、CD123_PE-A、FSC-W、CD34_APC-A、CD19_PE-Cy7-A、CD2_V450-A、CD14_APC-H7-A、SSC-W、 CD4_PerCP-Cy5-A、CD45_V500-A、CD64_PerCP-Cy5-A、CD7_FITC-A、CD8_APC-H7-A、CD10_APC-A、SSC-A、CD7_PE-A、CD19_PerCP-Cy5-A、CD33_PE-Cy7-A。相應地,這些參數包含光學參數和螢光標誌參數。 A single parameter analysis was first performed and it was found that FSC-A provided the highest accuracy compared with 36 other parameters, including 31 markers commonly used to measure the performance of FC analysis. The complete list of parameters used in the study includes: FSC-A, SSC-H, CD117_PerCP-Cy5-A, FSC-H, KAPPA_FITC-A, HLA-DR_V450-A, CD38_APC-H7-A, CD123_PE-A, FSC-W, CD34_APC-A, CD19_PE-Cy7-A, CD2_V450-A, CD14_APC-H7-A, SSC-W, CD4_PerCP-Cy5-A, CD45_V500-A, CD64_PerCP-Cy5-A, CD7_FITC-A, CD8_APC-H7-A, CD10_APC-A, SSC-A, CD7_PE-A, CD19_PerCP-Cy5-A, CD33_PE-Cy7-A. Accordingly, these parameters include optical parameters and fluorescent marker parameters.

因為一個光學參數(即FSC-A)表現出最好的性能,所以進一步研究了所有六個光學參數,並比較了在準確性和AUC方面的附加效應。結果如下表一所揭示。從表一中可以看出,三個光學參數(即FSC-A、SSC-H和SSC-W)的組合表現出0.921的合理精度,而所有六個光學參數的組合表現出0.938的精度。分析顯示,當包含一個額外的光學參數(即FSC-W)時,準確率上升到0.928,AUC為0.990。同時,增加兩個光學參數(即FSC-W和FSC-H),準確率只增加到0.940,AUC為0.991。因此,只用三個參數訓練的分類模型的表現幾乎與用所有六個參數訓練的分類模型一樣好。 Because one optical parameter (i.e., FSC-A) showed the best performance, all six optical parameters were further investigated and the additional effects in terms of accuracy and AUC were compared. The results are revealed in Table 1 below. As can be seen from Table 1, the combination of three optical parameters (i.e., FSC-A, SSC-H, and SSC-W) showed a reasonable accuracy of 0.921, while the combination of all six optical parameters showed an accuracy of 0.938. The analysis showed that when an additional optical parameter (i.e., FSC-W) was included, the accuracy rose to 0.928 and the AUC was 0.990. Meanwhile, adding two optical parameters (i.e., FSC-W and FSC-H) only increased the accuracy to 0.940 and the AUC to 0.991. Therefore, a classification model trained with only three parameters performs almost as well as a classification model trained with all six parameters.

Figure 110134284-A0305-12-0033-2
Figure 110134284-A0305-12-0033-2

進一步調查發現,用選定的螢光標誌參數可以提高準確度和AUC,而不需要使用所有測試的37個螢光標誌參數。結果顯示在下面的表二中。從表二中所揭示,加入一個CD117標記(即CD117_PerCP-Cy5-5-A)後,準確率 為0.932,AUC為0.983--比FSC-A和SSC-H的雙參數組合有明顯的改善。加入另一個螢光標誌參數(即KAPPA_FITC-A)使準確率增加到0.948,AUC為0.990。如表二所示,加入HLA-DR_V450-A、CD38_APC-H7-A和CD123_PE-A也提供了更好的準確性和AUC。 Further investigation revealed that accuracy and AUC can be improved with selected fluorescent marker parameters without using all 37 fluorescent marker parameters tested. The results are shown in Table 2 below. As revealed in Table 2, the addition of a CD117 marker (i.e., CD117_PerCP-Cy5-5-A) resulted in an accuracy of 0.932 and an AUC of 0.983 - a significant improvement over the two-parameter combination of FSC-A and SSC-H. The addition of another fluorescent marker parameter (i.e., KAPPA_FITC-A) increased the accuracy to 0.948 and the AUC to 0.990. As shown in Table 2, the addition of HLA-DR_V450-A, CD38_APC-H7-A, and CD123_PE-A also provided better accuracy and AUC.

Figure 110134284-A0305-12-0034-3
Figure 110134284-A0305-12-0034-3

分析平臺概述 Analysis platform overview

圖11為一示意圖用以揭示分析平臺1102的網路環境1100。一個人(又稱「使用者」)可以藉由介面1104與分析平臺1102對接。在一實施例中,使用者可以訪問一個介面,藉由該介面可以查看關於病人的資訊以及對病人的建議診斷。這些介面1104可以允許使用者與分析平臺1102互動,因為它實現了本揭述所述之框架。本揭露中所使用的術語「用戶」可以指對檢查建議診斷感興趣之人(例如:病人或醫療專業人員,或對開發、培訓或執行模型感興趣之人)。 FIG. 11 is a schematic diagram illustrating a network environment 1100 of an analysis platform 1102. A person (also referred to as a "user") can interface with the analysis platform 1102 via an interface 1104. In one embodiment, the user can access an interface through which information about a patient and a recommended diagnosis for the patient can be viewed. These interfaces 1104 can allow the user to interact with the analysis platform 1102 because it implements the framework described in this disclosure. The term "user" used in this disclosure can refer to a person who is interested in reviewing a recommended diagnosis (e.g., a patient or a medical professional, or a person who is interested in developing, training, or implementing a model).

如圖11所揭示,分析平臺1102可以駐留在網路環境1100中。因此,實現分析平臺1102的電腦裝置可以連接到一個或複數個網路1106a至1106b。這些網路1106a至1106b可以是個人區域網路(PAN)、區域網路(LAN)、廣域網路(WAN)、都會網路(MAN)、蜂巢式網路、或網際網路。舉例來說,分析平臺1102可以通過網際網路(例如:藉由相應的應用程式設計介面)間接連接到一個或多個流式細胞儀,或者分析平臺1102可以直接連接到一個或多個流式細胞儀(例如:通過相應的管道)。在另一實施例中,分析平臺1102可以直接或間接地連接到由各自醫療系統管理的儲存媒體。這些儲存媒體可以是實驗室資訊系統、電子健康記錄系統等的一部分。此外或替代地,分析平臺1102可以藉由短距離無線連接技術(例如:藍牙、近距離無線通訊(NFC)、Wi-Fi Direct(又稱「Wi-Fi P2P」)等)與一個或多個電腦裝置通信耦合。 As shown in FIG. 11 , the analysis platform 1102 can reside in a network environment 1100. Therefore, the computer device implementing the analysis platform 1102 can be connected to one or more networks 1106a to 1106b. These networks 1106a to 1106b can be a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a cellular network, or the Internet. For example, the analysis platform 1102 can be indirectly connected to one or more flow cytometers through the Internet (e.g., through a corresponding application programming interface), or the analysis platform 1102 can be directly connected to one or more flow cytometers (e.g., through a corresponding pipeline). In another embodiment, the analysis platform 1102 can be directly or indirectly connected to storage media managed by respective medical systems. These storage media can be part of laboratory information systems, electronic health record systems, etc. In addition or alternatively, the analysis platform 1102 can be communicatively coupled with one or more computer devices via short-range wireless connection technology (e.g., Bluetooth, Near Field Communication (NFC), Wi-Fi Direct (also known as "Wi-Fi P2P"), etc.).

介面1104可以藉由網路瀏覽器、桌面應用程式、移動應用程式或OTT應用程式訪問。在一實施例中,醫療專業人員可以訪問一個介面,並且藉由該介面可以輸入關於患者的資訊。這些資訊可以包含:姓名、出生日期、症狀、藥物和實驗結果(例如:以FCS檔案之形式)。有了這些資訊,醫療專業人員就可以執行該框架,以產生一個代表建議診斷之分類。在另一實施例中,個人可以訪問一個介面,藉由該介面其可以識別資料集,然後在分析平臺1102執行框架以使用資料集訓練分類模型時進行監控。相應地,介面1104可以在電腦裝置上查看,例如:移動工作站(也被稱為"醫療車")、個人電腦、平板電腦、行動電話、可穿戴電子裝置等。 The interface 1104 can be accessed via a web browser, desktop application, mobile application, or OTT application. In one embodiment, a healthcare professional can access an interface through which they can enter information about a patient. This information can include: name, date of birth, symptoms, medications, and laboratory results (e.g., in the form of an FCS file). With this information, the healthcare professional can execute the framework to generate a classification representing a recommended diagnosis. In another embodiment, an individual can access an interface through which they can identify a dataset and then monitor as the analysis platform 1102 executes the framework to train a classification model using the dataset. Accordingly, interface 1104 can be viewed on a computer device, such as a mobile workstation (also known as a "medical cart"), a personal computer, a tablet computer, a mobile phone, a wearable electronic device, etc.

在一些實施例中,分析平臺1102的至少一些元件被本地託管。換句話說,分析平臺1102的一部分可以駐留在用於訪問介面1104的電腦裝置上。舉 例來說,分析平臺1102可以為桌面應用程式,該應用程式可由一個或多個醫療專業人員訪問的移動工作站執行。然而,值得注意的是,桌面應用程式可以通信地連接到伺服器系統1108,分析平臺1102的其他元件被託管於該系統上。 In some embodiments, at least some elements of the analysis platform 1102 are hosted locally. In other words, a portion of the analysis platform 1102 may reside on a computer device used to access the interface 1104. For example, the analysis platform 1102 may be a desktop application that is executable by a mobile workstation accessed by one or more medical professionals. However, it is noted that the desktop application may be communicatively connected to a server system 1108 on which other elements of the analysis platform 1102 are hosted.

在其他具體實施例中,分析平臺1102完全由雲端計算服務所執行(例如:Amazon Web Services、Google Cloud Platform、Microsoft Azure)。在此具體實施例中,分析平臺1102可以駐留在由一個或多個電腦伺服器組成的伺服器系統1108上。這些電腦伺服器可以包含:模型、演算法(例如:用於處理FC資料、生成報告等)、患者資訊(例如:檔案、證書和健康相關資訊,例如:年齡、出生日期、疾病分類、醫療保健提供者等)以及其他資產。本發明技術領域具有通常知識者應可以理解該資訊也可以分佈在伺服器系統1108和一個或多個電腦裝置之間。舉例來說,出於安全或隱私之目的,由分析平臺1102所在的電腦裝置產生的一些資料可以存儲在該電腦裝置上並由其處理。 In other embodiments, the analysis platform 1102 is entirely executed by a cloud computing service (e.g., Amazon Web Services, Google Cloud Platform, Microsoft Azure). In this embodiment, the analysis platform 1102 can reside on a server system 1108 consisting of one or more computer servers. These computer servers can include models, algorithms (e.g., for processing FC data, generating reports, etc.), patient information (e.g., files, certificates, and health-related information, such as age, date of birth, disease classification, healthcare providers, etc.), and other assets. It should be understood by those skilled in the art that the information can also be distributed between the server system 1108 and one or more computer devices. For example, for security or privacy purposes, some data generated by the computer device on which the analysis platform 1102 resides may be stored and processed by the computer device.

圖12為一示意圖用以揭示一系統1200之實施例,該系統能夠自動對免疫表型集合的不同模式進行分類,以便識別血液學疾病。該系統1200可包含與分析平臺通信連接之流式細胞儀1202。於本實施例中,分析平臺是在一個可透過網路訪問的伺服器系統1204上執行,儘管是上述分析平臺可以在其他地方執行。系統1200還包含一個資料儲存裝置1206和一個電腦裝置1208。電腦裝置1208可以是複數個電腦裝置中的一個,其可以用來與分析平臺連接。舉例來說,在一具體實施例中,複數使用者(例如:醫療系統所雇用的醫療專業人員)能與分析平臺連接,一個以上的電腦裝置可以是系統1200的一部分。如圖12所揭示,系統1200的元件可以透過網路1210以直接或間接的方式相互通信連接。此外或可替代地,系統1200的元件可以藉由物理通信介面彼此通信地連接。 Figure 12 is a schematic diagram illustrating an embodiment of a system 1200 that can automatically classify different patterns of immunophenotype sets to identify hematological diseases. The system 1200 may include a flow cytometer 1202 that is communicatively connected to an analysis platform. In this embodiment, the analysis platform is executed on a server system 1204 that can be accessed via a network, although the above-mentioned analysis platform can be executed elsewhere. The system 1200 also includes a data storage device 1206 and a computer device 1208. The computer device 1208 can be one of a plurality of computer devices that can be used to connect to the analysis platform. For example, in one embodiment, multiple users (e.g., medical professionals employed by a medical system) can connect to the analysis platform, and more than one computer device can be part of the system 1200. As shown in FIG. 12, the components of the system 1200 can be connected to each other in a direct or indirect manner through a network 1210. In addition or alternatively, the components of the system 1200 can be connected to each other in a communicative manner through a physical communication interface.

如上述,網路可訪問的伺服器系統1204、資料儲存1206和電腦裝置1208的功能可以在單一個裝置中執行。類似地,流式細胞儀1202、網路可訪問伺服器系統1204、資料庫1206和電腦裝置1208的功能可以在單一個流式細胞儀中執行,此種情況下,流式細胞儀可以被稱為「組合流式細胞儀」或「綜合流式細胞儀」。 As described above, the functions of the network accessible server system 1204, the data storage 1206, and the computer device 1208 can be performed in a single device. Similarly, the functions of the flow cytometer 1202, the network accessible server system 1204, the database 1206, and the computer device 1208 can be performed in a single flow cytometer, in which case the flow cytometer can be referred to as a "combined flow cytometer" or "integrated flow cytometer."

處理系統 Processing system

圖13為一方塊示意圖用以揭示處理系統1300之一示例,其可執行本揭露中所述之操作。舉例來說,處理系統1300的元件可以被託管在一個包含分析平臺(例如:圖11中的分析平臺1102)的電腦裝置上。在另一具體實施例中,處理系統1300的元件可以被託管在流式細胞儀(例如:圖12中的流式細胞儀1202)上。 FIG. 13 is a block diagram illustrating an example of a processing system 1300 that can perform the operations described in the present disclosure. For example, the components of the processing system 1300 can be hosted on a computer device including an analysis platform (e.g., analysis platform 1102 in FIG. 11 ). In another specific embodiment, the components of the processing system 1300 can be hosted on a flow cytometer (e.g., flow cytometer 1202 in FIG. 12 ).

處理系統1300可包含:處理器1302、主記憶體1306、非揮發性記憶體1310、網路介面卡1312、視頻顯示單元1318、輸入/輸出裝置1320、控制裝置1322(例如:鍵盤、指向裝置或諸如按鈕的機械輸入)、包含儲存媒體1326的驅動裝置1324、或信號產生裝置1330,它們以通信連接之方式連接到匯流排1316。匯流排1316是一個抽象概念,其代表一個或多個物理匯流排和/或點對點的連接,這些連接由適當的橋接器、適配器或控制器連接。因此,匯流排1316可包含:系統匯流排、周邊組件互連(PCI)匯流排、PCI-Express匯流排、HyperTransport匯流排、工業標準結構(ISA)匯流排、小型電腦系統介面(SCSI)匯流排、通用序列匯流排(USB)、內部整合電路(I2C)匯流排,或符合電機電子工程師學會(IEEE)1394標準的匯流排。 The processing system 1300 may include: a processor 1302, a main memory 1306, a non-volatile memory 1310, a network interface card 1312, a video display unit 1318, an input/output device 1320, a control device 1322 (e.g., a keyboard, a pointing device, or a mechanical input such as a button), a drive device 1324 including a storage medium 1326, or a signal generating device 1330, which are connected to the bus 1316 in a communicative manner. The bus 1316 is an abstract concept that represents one or more physical buses and/or point-to-point connections that are connected by appropriate bridges, adapters, or controllers. Thus, bus 1316 may include a system bus, a peripheral component interconnect (PCI) bus, a PCI-Express bus, a HyperTransport bus, an Industry Standard Architecture (ISA) bus, a Small Computer System Interface (SCSI) bus, a Universal Serial Bus (USB), an Inter-Integrated Circuit (I 2 C) bus, or a bus compliant with the Institute of Electrical and Electronics Engineers (IEEE) 1394 standard.

處理系統1300可與下述之電腦處理器架構相似,例如:電腦伺服器、路由器、桌型電腦、平板電腦、行動電話、影像遊戲機、可穿戴電子裝置(例如:手錶或健身追蹤器)、網路連接(「智慧型」)裝置(例如:電視或家庭助理裝置)、增強或虛擬實境系統(例如:頭戴式顯示器)、或其他電子裝置能夠執行一指令集(依序地或其他)其指定由處理系統1300所執行。 Processing system 1300 may be similar in architecture to a computer processor such as a computer server, router, desktop computer, tablet computer, mobile phone, video game console, wearable electronic device (e.g., watch or fitness tracker), network-connected ("smart") device (e.g., television or home assistant device), augmented or virtual reality system (e.g., head mounted display), or other electronic device capable of executing a set of instructions (sequentially or otherwise) that are specified to be executed by processing system 1300.

雖然主記憶體1306、非揮發性記憶體1310和儲存媒體1326被顯示為單一媒體,但術語「儲存媒體」和「機器可讀媒體」應被理解為包含儲存指令的單一媒體或複數個媒體。術語「儲存媒體」和「機器可讀媒體」也應該被理解為包括能夠儲存、編碼或攜帶指令以便由處理系統1300執行的任何媒體。 Although main memory 1306, non-volatile memory 1310, and storage medium 1326 are shown as a single medium, the terms "storage medium" and "machine-readable medium" should be understood to include a single medium or multiple media that store instructions. The terms "storage medium" and "machine-readable medium" should also be understood to include any medium capable of storing, encoding, or carrying instructions for execution by processing system 1300.

一般來說,為實現本揭露的具體實施例而執行之指令可以作業系統或特定的應用程式、元件、程式、物件、模組或指令序列(統稱為「電腦程式」)的一部分來實現。電腦程式通常包含在不同時間設置在計算裝置的各種記憶體和存放裝置中的指令(例如:指令1304、1308、1328)。當由處理器1302讀取和執行時,指令可導致處理系統1300執行操作以執行本揭露。 Generally speaking, instructions executed to implement specific embodiments of the present disclosure may be implemented as part of an operating system or a specific application, component, program, object, module, or sequence of instructions (collectively referred to as a "computer program"). Computer programs typically include instructions (e.g., instructions 1304, 1308, 1328) that are located at various times in various memories and storage devices of a computing device. When read and executed by processor 1302, the instructions may cause processing system 1300 to perform operations to execute the present disclosure.

雖然具體實施例已揭示本揭露中計算裝置之所有功能,但本發明技術領域具有通常知識者應可理解,各具體實施例能夠以各種形式之程式產品發佈。無論用於實際導致應用之機器或電腦可讀媒體的特定類型為何,其均落入本揭露之範圍。機器和電腦可讀媒體的具體示例包含可記錄型媒體,(例如:揮發性和非揮發性記憶體1310)、抽取式磁碟、硬碟驅動器、光碟(例如:唯讀記憶光碟(CD-ROM)和數位多功能光碟(DVD))、雲端儲存空間、以及傳輸型媒體(例如:數位和類比通訊連結)。 Although the specific embodiments have disclosed all functions of the computing device in this disclosure, it should be understood by those with ordinary knowledge in the field of the present invention that each specific embodiment can be released as a program product in various forms. Regardless of the specific type of machine or computer-readable medium used to actually cause the application, it falls within the scope of this disclosure. Specific examples of machine and computer-readable media include recordable media (e.g., volatile and non-volatile memory 1310), removable disks, hard disk drives, optical disks (e.g., compact disks read-only memory (CD-ROM) and digital versatile disks (DVD)), cloud storage space, and transmission media (e.g., digital and analog communication links).

網路介面卡1312使處理系統1300能夠在網路1314中透過任何處理系統1300所支援的通信協定與外部實體(其位於處理系統1300之外部)調解資料。網路介面卡1312可包含:網路介面卡、無線網路介面卡、交換器、通訊協定轉換器、閘道、橋接器、集線器、接收器、中繼器、或包括積體電路的收發器(例如:可透過藍牙或Wi-Fi進行通信)。 The network interface card 1312 enables the processing system 1300 to mediate data with an external entity (which is external to the processing system 1300) in the network 1314 through any communication protocol supported by the processing system 1300. The network interface card 1312 may include: a network interface card, a wireless network interface card, a switch, a communication protocol converter, a gateway, a bridge, a hub, a receiver, a repeater, or a transceiver including an integrated circuit (for example: capable of communicating via Bluetooth or Wi-Fi).

備註 Notes

上述對權利要求範圍之各種具體實施例描述僅為說明和描述目的。其無法詳盡無遺地揭露,也並非限制本申請範圍僅及於所揭示之內容。對本發明技術領域具有通常知識者而言,其可相應地進行修改和變化。本揭露所揭示具體實施例中選擇和描述僅為提供較佳原理及其實際應用,而使本發明技術領域具有通常知識者可理解本揭露之主題、各種實施例以及適合特定用途之各種修改。 The above description of various specific embodiments of the scope of the claims is for illustration and description purposes only. It is not possible to disclose in detail, nor does it limit the scope of this application to the disclosed content. For those with ordinary knowledge in the technical field of the present invention, it can be modified and changed accordingly. The selection and description of the specific embodiments disclosed in this disclosure are only to provide better principles and their practical applications, so that those with ordinary knowledge in the technical field of the present invention can understand the subject matter of this disclosure, various embodiments, and various modifications suitable for specific purposes.

儘管揭示了較佳具體實施例,但無論再怎詳細說明,本揭露之技術仍可以許多方式實施。因此,可導致具體實施例中之細節上可有很大的不同,但該範圍仍一被認定落入本揭示之範圍。在描述各種具體實施例的某些特徵或方面時所使用的特定術語不應被認為暗示該術語在本揭露中是被重新定義,且其僅限於與該術語相關技術之任何具體特徵、特點或方面。一般來說,以下申請專利範圍中所使用的術語不應解釋為將該技術限制在本揭示所公開的具體實施例,除非該些術語於本揭露中有明確被定義。因此,本揭露之權力範圍不僅包含公開的具體實施例,還包含實施或實現所述具體實施例之所有等效方式。 Although preferred specific embodiments are disclosed, the technology disclosed herein can be implemented in many ways, no matter how detailed the description is. Therefore, the details in the specific embodiments may vary greatly, but the scope is still considered to fall within the scope of the present disclosure. The specific terms used to describe certain features or aspects of various specific embodiments should not be considered to imply that the terms are redefined in the present disclosure and are limited to any specific features, characteristics or aspects of the technology related to the terms. In general, the terms used in the scope of the following patent application should not be interpreted as limiting the technology to the specific embodiments disclosed in the present disclosure unless the terms are clearly defined in the present disclosure. Therefore, the scope of the present disclosure includes not only the disclosed specific embodiments, but also all equivalent ways of implementing or realizing the specific embodiments.

本揭露使用的用語主要是為了容易閱讀和指導之目的而所選擇的。該用語並無被選擇來劃定或限定特定標的。因此,本揭露之技術的範圍不受 本揭示內容之限制,而是應以申請專利範圍內容之限制。因此,對各種具體實施例之揭露是為了說明而不是限制以下申請專利範圍中所請求之範圍。 The terms used in this disclosure are primarily selected for the purpose of easy reading and guidance. The terms are not selected to define or limit specific subject matter. Therefore, the scope of the technology disclosed in this disclosure is not limited by the content of this disclosure, but should be limited by the content of the patent application scope. Therefore, the disclosure of various specific embodiments is for the purpose of illustrating rather than limiting the scope requested in the following patent application scope.

200:框架 202:資料獲得階段 204:資料擷取階段 206:資料轉換階段 208:訓練 210:分類200: Framework 202: Data acquisition phase 204: Data extraction phase 206: Data conversion phase 208: Training 210: Classification

Claims (35)

一種用於流式細胞儀資料之自動分類的存有指令的非暫態電腦可讀取媒體,當其被一電腦裝置的一處理器執行時,導致該電腦裝置執行以下操作步驟,包含: 獲得一流式細胞儀資料矩陣,其用以描述一樣本且該樣本包含用螢光標誌標記細胞; 執行一函數,其將該流式細胞儀資料矩陣轉換為一流式細胞儀資料向量; 將(i)該流式細胞儀資料向量和(ii)一標誌集提供給一分類模型並作為輸入,而產生一已訓練的分類模型,其中該標誌集用以代表該流式細胞儀資料向量中每個被描述細胞之一免疫表型集合模式;及 將該已訓練的分類模型儲存在一資料結構中。 A non-transitory computer-readable medium storing instructions for automatic classification of flow cytometer data, when executed by a processor of a computer device, causes the computer device to execute the following operation steps, including: Obtaining a flow cytometer data matrix, which is used to describe a sample and the sample includes cells marked with fluorescent markers; Executing a function, which converts the flow cytometer data matrix into a flow cytometer data vector; Providing (i) the flow cytometer data vector and (ii) a set of markers as input to a classification model to generate a trained classification model, wherein the set of markers is used to represent an immunophenotype set pattern for each described cell in the flow cytometer data vector; and Storing the trained classification model in a data structure. 如請求項1所述的非暫態媒體,其中該流式細胞儀資料矩陣包含由N參數所組成M波長之螢光值,其中該M和該N之數值是整數。The non-transitory medium of claim 1, wherein the flow cytometer data matrix comprises fluorescence values of M wavelengths composed of N parameters, wherein the values of M and N are integers. 如請求項1所述的非暫態媒體,其中該流式細胞儀資料矩陣是由該電腦裝置所獲得的複數流式細胞儀資料矩陣中的一個;及 其中該複數流式細胞儀資料矩陣對應不同樣本,並且該不同樣本為已知可代表至少兩種血液疾病。 The non-transitory medium of claim 1, wherein the flow cytometer data matrix is one of a plurality of flow cytometer data matrices obtained by the computer device; and wherein the plurality of flow cytometer data matrices correspond to different samples, and the different samples are known to represent at least two blood diseases. 如請求項3所述的非暫態媒體,其中該複數流式細胞儀資料矩陣中的每一個都被轉換為一相應流式細胞儀資料向量,而產生複數流式細胞儀資料向量;及 其中該複數流式細胞儀資料向量被提供給該分類模型作為輸入,而讓該分類模型學習區分該至少兩種血液疾病。 A non-transient medium as described in claim 3, wherein each of the plurality of flow cytometer data matrices is converted into a corresponding flow cytometer data vector to generate a plurality of flow cytometer data vectors; and wherein the plurality of flow cytometer data vectors are provided as input to the classification model, so that the classification model learns to distinguish the at least two blood diseases. 如請求項1所述的非暫態媒體,其中當應用於對應一新樣本的一新流式細胞儀資料向量時,該已訓練的分類模型產生依據樣本水平分析而非細胞水平分析輸出針對該新樣本之一分類。The non-transitory medium of claim 1, wherein when applied to a new flow cytometer data vector corresponding to a new sample, the trained classification model produces a classification for the new sample based on sample-level analysis rather than cell-level analysis output. 如請求項5所述的非暫態媒體,其中該分類用以代表一輸入血液疾病之一建議診斷,並且是依據整個該新樣本的一免疫表型集合之分佈確定的。The non-transitory medium of claim 5, wherein the classification is used to represent a suggested diagnosis of a transfused blood disorder and is determined based on the distribution of an immunophenotype set across the new sample. 如請求項1所述的非暫態媒體,其中該流式細胞儀資料矩陣包含:一螢光強度之第一值集、一正向散射光(FSC)之第二值集、和一側向散射光(SSC)之第三值集。The non-transitory medium of claim 1, wherein the flow cytometer data matrix comprises: a first value set of fluorescence intensity, a second value set of forward scattered light (FSC), and a third value set of side scattered light (SSC). 如請求項1所述的非暫態媒體,其中該流式細胞儀資料矩陣是被包含在一檔案中,並且該檔案是從一用以描述樣本之流式細胞儀儀器中所被接收的。The non-transitory medium of claim 1, wherein the flow cytometer data matrix is contained in a file and the file is received from a flow cytometer describing a sample. 如請求項1所述的非暫態媒體,其中該流式細胞儀資料矩陣是從一存儲媒體中被檢索出,該存儲媒體可藉由一網路被該電腦裝置訪問。The non-transitory medium of claim 1, wherein the streaming cytometer data matrix is retrieved from a storage medium that is accessible by the computer device via a network. 如請求項1所述的非暫態媒體,其中該函數藉由Fisher向量(FV)編碼將該流式細胞儀資料矩陣轉換為該流式細胞儀資料向量,因此該流式細胞儀資料向量是被包含在該流式細胞儀資料矩陣中的該流式細胞儀資料的Fisher向量表徵。The non-transitory medium of claim 1, wherein the function converts the flow cytometer data matrix into the flow cytometer data vector by Fisher vector (FV) encoding, so that the flow cytometer data vector is a Fisher vector representation of the flow cytometer data contained in the flow cytometer data matrix. 一種用於流式細胞儀資料之自動分類的方法,包含: 接收由一流式細胞儀儀器所產生的一流式細胞儀標準檔案(FCS檔案),其描述了一樣本且該樣本包含不同波長的螢光標誌標記細胞; 從該FCS檔案中擷取一流式細胞儀資料矩陣; 將該流式細胞儀資料矩陣轉換為一流式細胞儀資料向量;及 對於該流式細胞儀資料向量中所界定的每個細胞, 將(i)該流式細胞儀資料之向量和(ii)用以代表一疾病類型、一疾病狀態或一生理狀態之一標誌集提供給一分類模型並作為輸入,而產生一已訓練的分類模型,其中該標誌集用以代表該流式細胞儀資料向量中每個被描述細胞。 A method for automatic classification of flow cytometer data, comprising: Receiving a flow cytometer standard file (FCS file) generated by a flow cytometer, which describes a sample and the sample contains cells marked with fluorescent markers of different wavelengths; Extracting a flow cytometer data matrix from the FCS file; Converting the flow cytometer data matrix into a flow cytometer data vector; and For each cell defined in the flow cytometer data vector, Providing (i) the vector of flow cytometer data and (ii) a set of markers representing a disease type, a disease state, or a physiological state to a classification model as input to generate a trained classification model, wherein the set of markers is used to represent each described cell in the vector of flow cytometer data. 如請求項11所述的方法,其中該FCS檔案為從一來源所接收到的複數FCS檔案之一,該複數FCS檔案中的每一個都對應於一不同樣本; 其中依據一流式細胞儀資料相應矩陣而為該複數FCS檔案中每個檔案產生一個別向量,而得出複數流式細胞儀資料向量;及 其中使用該複數流式細胞儀資料向量訓練該分類模型,以使該分類模型學習如何區分不同血液疾病。 A method as claimed in claim 11, wherein the FCS file is one of a plurality of FCS files received from a source, each of the plurality of FCS files corresponding to a different sample; wherein a separate vector is generated for each file in the plurality of FCS files according to a flow cytometer data corresponding matrix, thereby obtaining a plurality of flow cytometer data vectors; and wherein the classification model is trained using the plurality of flow cytometer data vectors so that the classification model learns how to distinguish between different blood diseases. 如請求項11所述的方法,其中該轉換包含: 依據該流式細胞儀資料矩陣產生一混合模型;及 計算該混合模型之一梯度,以得出該流式細胞儀資料向量。 The method of claim 11, wherein the transformation comprises: generating a mixture model based on the flow cytometer data matrix; and calculating a gradient of the mixture model to obtain the flow cytometer data vector. 如請求項11所述的方法,其中該流式細胞儀資料向量和該標誌集被包括在一訓練資料集中,且其還包含關於一個或複數個光學參數和一個或複數個螢光標誌參數之資訊。A method as described in claim 11, wherein the flow cytometer data vector and the marker set are included in a training data set and further include information about one or more optical parameters and one or more fluorescent marker parameters. 如請求項14所述的方法,其中該一個或複數個光學參數包含:正向散射光面積(FSC-A)、正向散射光寬度(FSC-W)、側向散射光面積(SSC-A)、側向散射光寬度(SSC-W)、側向散射光高度(SSC-H)或上述任一之組合。A method as described in claim 14, wherein the one or more optical parameters include: forward scattered light area (FSC-A), forward scattered light width (FSC-W), side scattered light area (SSC-A), side scattered light width (SSC-W), side scattered light height (SSC-H) or any combination of the above. 一種用於流式細胞儀資料之自動分類的方法,包含: 接收一請求輸入,依據對一資料檔案之分析提出對複數血液疾病之診斷; 從該資料檔案中擷取一流式細胞儀資料矩陣,其描述了一樣本且該樣本包含用不同波長的螢光標誌標記細胞; 將該流式細胞儀資料矩陣轉換為一流式細胞儀資料向量;及 將該流式細胞儀資料向量提輸入一分類模型,以獲得複數輸出, 其中該複數輸出中的每一個輸出代表對該複數血液疾病中相應血液疾病的一建議診斷。 A method for automatic classification of flow cytometer data, comprising: receiving a request input to propose a diagnosis of a plurality of blood diseases based on an analysis of a data file; extracting a flow cytometer data matrix from the data file, which describes a sample and the sample includes cells labeled with fluorescent markers of different wavelengths; converting the flow cytometer data matrix into a flow cytometer data vector; and inputting the flow cytometer data vector into a classification model to obtain a plurality of outputs, wherein each of the plurality of outputs represents a suggested diagnosis of a corresponding blood disease in the plurality of blood diseases. 如請求項16所述的方法,其中該流式細胞儀資料向量是一高維(high-dimensional)向量,其包含每個細胞的(i)一正向散射光(FSC)、(ii)一FSC特徵、(iii)一側向散射光(SSC)、(iv)一SSC特徵、(v)一螢光、以及(vi)一螢光特徵之數值。A method as described in claim 16, wherein the flow cytometer data vector is a high-dimensional vector that includes, for each cell, values of (i) a forward scatter (FSC), (ii) a FSC feature, (iii) a side scatter (SSC), (iv) a SSC feature, (v) a fluorescence, and (vi) a fluorescence feature. 如請求項17所述的方法,其中該FSC、SSC和螢光特性是描述相同特徵。The method of claim 17, wherein the FSC, SSC and fluorescence properties describe the same characteristic. 如請求項17所述的方法,其中FSC、SSC和螢光特性是選自振幅、頻率、振幅變化、頻率變化、時間依賴性或空間依賴性。A method as described in claim 17, wherein the FSC, SSC and fluorescence properties are selected from amplitude, frequency, amplitude variation, frequency variation, time dependence or spatial dependence. 一種用於流式細胞儀資料之自動分類的存有指令的非暫態電腦可讀取媒體,當其被一電腦裝置的一處理器執行時,導致該電腦裝置執行以下操作步驟,包含: 接收由一流式細胞儀儀器生成的一流式細胞儀標準檔案(FCS檔案),其描述一樣本且該樣本包含用不同波長的螢光標誌標記細胞; 從該FCS檔案中擷取(i)一流式細胞儀資料集和(ii)一溢出矩陣; 依據該溢出出矩陣執行涉及該流式細胞儀資料集的補償操作,以產生一補償的流式細胞儀資料集; 執行一函數,其進行雙聯體辨別(doublet discrimination)以確保包含在該補償的流式細胞儀資料集中的每個值對應於一單一細胞;及 對該補償的流式細胞儀資料集進行標準化操作,以產生一標準化的流式細胞儀資料集。 A non-transitory computer-readable medium containing instructions for automatic classification of flow cytometer data, which, when executed by a processor of a computer device, causes the computer device to perform the following operation steps, including: Receiving a flow cytometer standard file (FCS file) generated by a flow cytometer, which describes a sample and the sample includes cells marked with fluorescent markers of different wavelengths; Extracting (i) a flow cytometer data set and (ii) an overflow matrix from the FCS file; Performing a compensation operation involving the flow cytometer data set according to the overflow matrix to generate a compensated flow cytometer data set; Performing a function that performs doublet discrimination to ensure that each value contained in the compensated flow cytometer data set corresponds to a single cell; and Performing a normalization operation on the compensated flow cytometer data set to generate a normalized flow cytometer data set. 如請求項20所述的非暫態媒體,其中該流式細胞儀資料集是以一矩陣形式。The non-transitory medium of claim 20, wherein the flow cytometer data set is in the form of a matrix. 如請求項20所述的非暫態媒體,其中該流式細胞儀資料集是從該FCS檔案的一個資料區段中所擷取出的,並且其中該溢出矩陣是從該FCS檔案中的一文字段中所擷取出的。The non-transitory medium of claim 20, wherein the flow cytometer data set is extracted from a data segment of the FCS file, and wherein the overflow matrix is extracted from a text segment in the FCS file. 如請求項20所述的非暫態媒體,其中該操作還包含: 依據一流式細胞儀資料集分析進行測定,其進行補償對提高該流式細胞儀資料集之品質是必要的; 其中該補償之操作是回應該測定而執行的。 The non-transitory medium of claim 20, wherein the operation further comprises: performing a determination based on an analysis of a flow cytometer data set, wherein compensation is necessary to improve the quality of the flow cytometer data set; wherein the compensation operation is performed in response to the determination. 如請求項20所述的非暫態媒體,其中該操作還包含: 依據被包含在該補償的流式細胞儀資料集中正向散射光面積值(FSC-A值)和正向散射光高度值(FSC-H值)製作一散射圖。 The non-transitory medium as described in claim 20, wherein the operation further comprises: Creating a scatter plot based on the forward scattered light area value (FSC-A value) and the forward scattered light height value (FSC-H value) contained in the compensated flow cytometer data set. 如請求項24所述的非暫態媒體,其中當執行時該功能使該電腦裝置執行: (i)將該散射圖中其FSC-A值達到最大值之細胞刪除; (ii)圈選一部分仍在該散射圖中之細胞;及 (iii)計算該被圈選細胞間之一測定係數。 The non-transitory medium of claim 24, wherein when executed, the function causes the computer device to perform: (i) deleting cells whose FSC-A values in the scatter plot reach the maximum value; (ii) selecting a portion of cells still in the scatter plot; and (iii) calculating a determination coefficient between the selected cells. 如請求項25所述的非暫態媒體,其中當執行時該功能使該電腦裝置執行: (iv)檢測該測定係數是否超過一閾值;及 (v)從該補償流式細胞儀資料集中返回被圈選細胞之資料以回應一測定, 其中該測定係數超過該閾值。 The non-transitory medium of claim 25, wherein the function, when executed, causes the computer device to perform: (iv) detecting whether the assay coefficient exceeds a threshold; and (v) returning data of the selected cells from the compensated flow cytometer data set in response to an assay, wherein the assay coefficient exceeds the threshold. 如請求項25所述的非暫態媒體,其中當執行時該功能使該電腦裝置執行: (iv)檢測該測定係數是否超過第二閾值;及 (v)重複執行步驟(ii)和(iii)並每次減少被圈選細胞一個預定量以回應一測定,且該測定係數不超過該預定閾值。 The non-transitory medium of claim 25, wherein the function, when executed, causes the computer device to perform: (iv) detecting whether the assay coefficient exceeds a second threshold; and (v) repeatedly performing steps (ii) and (iii) and each time reducing the number of cells selected by a predetermined amount in response to an assay, and the assay coefficient does not exceed the predetermined threshold. 如請求項27所述的非暫態媒體,其中重複執行步驟(ii)和(iii)並每次減少被圈選細胞一個預定量以回應一測定直到該測定係數超過該預定閾值。The non-transitory medium of claim 27, wherein steps (ii) and (iii) are repeated and each time the gated cells are reduced by a predetermined amount in response to an assay until the assay coefficient exceeds the predetermined threshold. 如請求項20所述的非暫態媒體,其中該流式細胞儀資料集包含複數參數之值。The non-transitory medium of claim 20, wherein the flow cytometer data set comprises values of a plurality of parameters. 如請求項29所述的非暫態媒體,其中該複數參數包含一個或複數個光學參數和一個或複數個螢光標誌參數。The non-transitory medium of claim 29, wherein the plurality of parameters comprises one or more optical parameters and one or more fluorescent marker parameters. 如請求項29所述的非暫態媒體,其中該標準化操作包含: 匯集複數值,其為將該複數參數中每個參數視為一個獨特的特徵維度; 對該獨特的特徵維度進行重新取樣,使其達到一相同樣本量,以確保該複數參數中的每個參數具有相同數量的值;及 對該獨特的特徵維度進行標準化處理,以使該複數值均具有一相似尺度(scale)。 The non-transient media of claim 29, wherein the normalization operation comprises: aggregating multiple values by treating each parameter in the multiple parameters as a unique feature dimension; resampling the unique feature dimensions to achieve the same sample size to ensure that each parameter in the multiple parameters has the same number of values; and normalizing the unique feature dimensions so that the multiple values have a similar scale. 如請求項31所述的非暫態媒體,其中該標準化處理涉及實施Z-分數標準化技術(z-score normalization technique)。The non-transitory media of claim 31, wherein the normalization process involves implementing a z-score normalization technique. 一種用於流式細胞儀資料之自動分類的方法,包含: 接收由一流式細胞儀儀器產生的一流式細胞儀標準檔案(FCS檔案),其描述了一樣本且該樣本包含不同波長的螢光標誌標記細胞; (i)從該FCS檔案的一資料段中擷取一流式細胞儀資料矩陣和(ii)從該FCS檔的一文字段中擷取一溢出矩陣; 依據該溢出矩陣執行涉及該流式細胞儀資料矩陣的一補償操作,從而產生一補償的流式細胞儀資料矩陣; 執行一函數,進行雙聯體辨別(doublet discrimination)以確保包含在該補償的流式細胞儀資料集中的每個值對應於一單一細胞; 對該補償的流式細胞儀資料集進行標準化操作,從而產生一標準化的流式細胞儀資料集;及 將該標準化的流式細胞儀資料矩陣存儲在一記憶體中。 A method for automatic classification of flow cytometer data, comprising: Receiving a flow cytometer standard file (FCS file) generated by a flow cytometer, which describes a sample and the sample contains fluorescent markers of different wavelengths. (i) extracting a flow cytometer data matrix from a data segment of the FCS file and (ii) extracting an overflow matrix from a text segment of the FCS file; Performing a compensation operation involving the flow cytometer data matrix based on the overflow matrix to generate a compensated flow cytometer data matrix; Executing a function to perform doublet recognition (doublet discrimination) to ensure that each value contained in the compensated flow cytometer data set corresponds to a single cell; normalizing the compensated flow cytometer data set to generate a normalized flow cytometer data set; and storing the normalized flow cytometer data matrix in a memory. 如請求項33所述的方法,其中還進一步包含: 在該標準化的流式細胞儀資料矩陣中生成複數值的一視覺指標;及 在一介面上顯示該視覺指標以供一個人查看。 The method of claim 33, further comprising: generating a visual indicator of complex values in the normalized flow cytometer data matrix; and displaying the visual indicator on an interface for viewing by a person. 如請求項34所述的方法,其中該視覺指標是一報告,其包含對該標準化流式細胞儀資料中該複數值的分析。The method of claim 34, wherein the visual indicator is a report comprising an analysis of the complex values in the normalized flow cytometer data.
TW110134284A 2021-09-14 2021-09-14 Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data TWI883261B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110134284A TWI883261B (en) 2021-09-14 2021-09-14 Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110134284A TWI883261B (en) 2021-09-14 2021-09-14 Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data

Publications (2)

Publication Number Publication Date
TW202311742A TW202311742A (en) 2023-03-16
TWI883261B true TWI883261B (en) 2025-05-11

Family

ID=86690531

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110134284A TWI883261B (en) 2021-09-14 2021-09-14 Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data

Country Status (1)

Country Link
TW (1) TWI883261B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113155936A (en) * 2015-05-20 2021-07-23 普诺森公司 System and method for electrophoretic separation and analysis of analytes
CN113330292A (en) * 2018-07-31 2021-08-31 科罗拉多大学评议会法人团体 System and method for applying machine learning to analyze microscopic images in high throughput systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113155936A (en) * 2015-05-20 2021-07-23 普诺森公司 System and method for electrophoretic separation and analysis of analytes
CN113330292A (en) * 2018-07-31 2021-08-31 科罗拉多大学评议会法人团体 System and method for applying machine learning to analyze microscopic images in high throughput systems

Also Published As

Publication number Publication date
TW202311742A (en) 2023-03-16

Similar Documents

Publication Publication Date Title
CN113454733B (en) Multi-instance learner for prognostic tissue pattern recognition
US20230215571A1 (en) Automated classification of immunophenotypes represented in flow cytometry data
JP5425814B2 (en) Method and system for analyzing flow cytometry data using a support vector machine
US11164082B2 (en) Methods for using artificial neural network analysis on flow cytometry data for cancer diagnosis
US11056236B2 (en) Methods for using artificial neural network analysis on flow cytometry data for cancer diagnosis
US12461105B2 (en) System, method, and article for detecting abnormal cells using multi-dimensional analysis
US20180247195A1 (en) Methods for using artificial neural network analysis on flow cytometry data for cancer diagnosis
US20160169786A1 (en) Automated flow cytometry analysis method and system
Duetz et al. Computational flow cytometry as a diagnostic tool in suspected‐myelodysplastic syndromes
CN115715416A (en) Medical data inspector based on machine learning
WO2019173233A1 (en) Methods for using artificial neural network analysis on flow cytometry data for cancer diagnosis
Acharya et al. Prediction of tuberculosis from lung tissue images of diversity outbred mice using jump knowledge based cell graph neural network
AU2024243554A1 (en) Computer-implemented method for determining the states in vivo and in vitro by analyzing the blood parameters measured in a hematological analysis device
CN107430587A (en) Automate flow cytometry method and system
US10235495B2 (en) Method for analysis and interpretation of flow cytometry data
TWI883261B (en) Method and non-transitory computer readable medium for automated classification of immunophenotypes represented in flow cytometry data
WO2021011698A1 (en) Artificial intelligence for early cancer detection
Mashford et al. Comparison of Deep-learning Models for Classification of Cellular Phenotype from Flow Cytometry Data
JP2024525499A (en) Complete Blood Count Anomaly Detection Using Machine Learning
Bashashati et al. A pipeline for automated analysis of flow cytometry data: preliminary results on lymphoma sub-type diagnosis
BG4963U1 (en) A system for measuring and analysis of minimal residual disease in childhood b-precursor acute lymphoblastic leukemia by multiparameter flow cytometry
ATAS et al. Detection of Thrombocytopenia, Anemia and Leukocytosis by Using Ensemble Learning
TW202223921A (en) Transfer learning across hematological malignancies
CN115240841A (en) A method and device for prostate cancer risk classification based on automated machine learning
HK40010122B (en) System, method, and article for detecting abnormal cells using multi-dimensional analysis