WO2025142539A1

WO2025142539A1 - Biological particle analysis system, information processing device, and information processing method

Info

Publication number: WO2025142539A1
Application number: PCT/JP2024/044143
Authority: WO
Inventors: 翔太山本
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2023-12-28
Filing date: 2024-12-13
Publication date: 2025-07-03
Anticipated expiration: 2026-06-28

Abstract

A biological particle analysis system according to the present disclosure comprises: an acquisition unit that acquires measurement data measured from biological particles included in a sample; a compression unit that performs a data compression process on the measurement data acquired by the acquisition unit; a gate unit that gates the measurement data compressed by the compression unit into measurement data for learning and measurement data for verification, and adds a label to the measurement data for learning; a learning unit that constructs a learning model using the measurement data for learning and the label; an estimation unit that inputs the measurement data for verification to the learning model and outputs the degree of certainty of the measurement data for verification; and a threshold value setting unit that, on the basis of the degree of certainty, sets a threshold value for separating the sample.

Description

Biological particle analysis system, information processing device, and information processing method

　本開示は、生体粒子分析システム、情報処理装置、及び情報処理方法に関する。 This disclosure relates to a bioparticle analysis system, an information processing device, and an information processing method.

　医学又は生化学等の分野では、大量の粒子の特性を迅速に測定するために、フローサイトメータを用いることが一般的になっている。フローサイトメータは、流れる細胞又はビーズ等の粒子に光線を照射し、該粒子から発せられる蛍光等を検出することで、粒子の各々の特性を測定する装置である。 In fields such as medicine and biochemistry, it is common to use flow cytometers to rapidly measure the characteristics of large numbers of particles. A flow cytometer is a device that measures the characteristics of each particle by irradiating flowing particles such as cells or beads with a beam of light and detecting the fluorescence emitted by the particles.

　また、フローサイトメータにて検出された蛍光情報に基づいて粒子の移動先を制御することで、測定サンプルの中から特定の蛍光を発する粒子を分取する装置も開発されている。このような分取装置は、セルソータとも称される。 In addition, devices have been developed that separate particles that emit specific fluorescence from a measurement sample by controlling the movement of the particles based on the fluorescence information detected by a flow cytometer. Such separation devices are also called cell sorters.

　ここで、近年、フローサイトメータでは、一度に測定可能な蛍光物質の数を増やすことで、粒子のより詳細な解析を可能とすることが検討されている。しかしながら、蛍光物質の数を増やすことは、測定データの次元数を増加させてしまうため、フローサイトメータにおける解析をより複雑化させてしまう。 In recent years, studies have been conducted on flow cytometers to enable more detailed analysis of particles by increasing the number of fluorescent substances that can be measured at one time. However, increasing the number of fluorescent substances increases the number of dimensions of the measurement data, making analysis by the flow cytometer more complicated.

　そこで、フローサイトメータにおける測定データの解析方法が種々検討されている。例えば、下記の特許文献１には、光線を照射した生体由来対象から検出されるパルス波形のピーク位置に基づいて、生体由来対象の形状情報を推定する技術が開示されている。 Therefore, various methods for analyzing measurement data in flow cytometers have been investigated. For example, the following Patent Document 1 discloses a technique for estimating shape information of a biological object based on the peak position of a pulse waveform detected from the biological object irradiated with a light beam.

特開２０１７－５８３６１号公報JP 2017-58361 A

　一方で、セルソータなどの分取装置では、流れる粒子について測定及び解析を行い、測定及び解析結果に基づいて該粒子を分取するか否かを判別する処理を粒子が装置内を通流する限られた時間内に行うことが求められる。 On the other hand, in a cell sorter or other sorting device, it is necessary to measure and analyze the flowing particles, and to determine whether or not to sort the particles based on the measurement and analysis results within the limited time that the particles flow through the device.

　したがって、セルソータなどの分取装置では、粒子が分取対象であるか否かをより迅速かつリアルタイムで判別することが求められていた。 Therefore, there is a need for cell sorters and other sorting devices to be able to determine more quickly and in real time whether or not a particle is a target for sorting.

　第１の開示の生体粒子分析システムは、サンプルに含まれる生体由来粒子から測定された測定データを取得する取得部と、前記取得部により取得された前記測定データにデータ圧縮処理を行う圧縮部と、前記圧縮部により圧縮された測定データを学習用測定データと、検証用測定データとにゲートし、前記学習用測定データにラベルを付加するゲート部と、前記学習用測定データと、前記ラベルとを用いて学習モデルを構築する学習部と、前記学習モデルに前記検証用測定データを入力し、前記検証用測定データの確信度を出力する推定部と、前記確信度に基づいて前記サンプルを分取するための閾値を設定する閾値設定部とを有する。 The bioparticle analysis system of the first disclosure includes an acquisition unit that acquires measurement data measured from bioparticles contained in a sample, a compression unit that performs a data compression process on the measurement data acquired by the acquisition unit, a gating unit that gates the measurement data compressed by the compression unit into training measurement data and verification measurement data and adds a label to the training measurement data, a learning unit that constructs a learning model using the training measurement data and the label, an estimation unit that inputs the verification measurement data to the learning model and outputs a confidence level of the verification measurement data, and a threshold setting unit that sets a threshold for separating the sample based on the confidence level.

実施の形態に係る生体粒子分析システムの構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of a biological particle analysis system according to an embodiment. 測定ユニットのフィルタ方式の検出機構を説明する説明図である。FIG. 4 is an explanatory diagram illustrating a filter-type detection mechanism of the measurement unit. 測定ユニットのスペクトル方式の検出機構を説明する説明図である。FIG. 2 is an explanatory diagram illustrating a spectral detection mechanism of the measurement unit. 同実施の形態に係る情報処理装置の構成例を示すブロック図である。2 is a block diagram showing a configuration example of an information processing device according to the embodiment. FIG. 分取装置から取得される生体由来粒子の蛍光に関する情報の一例を示す表図である。FIG. 4 is a table showing an example of information regarding the fluorescence of biogenic particles obtained from the sorting device. クラスタリング処理の結果を示す説明図である。FIG. 11 is an explanatory diagram showing a result of a clustering process. クラスタリング処理の結果を示す説明図である。FIG. 11 is an explanatory diagram showing a result of a clustering process. 生体由来粒子の各蛍光物質の発現量に関する情報をｔ－ＳＮＥアルゴリズムを用いて２次元まで次元圧縮処理した結果を示す説明図である。FIG. 13 is an explanatory diagram showing the results of dimensionality reduction processing of information on the expression level of each fluorescent substance in a biological particle to two dimensions using the t-SNE algorithm. 第１の実施の形態に係る検証用データを示す図である。FIG. 4 is a diagram showing verification data according to the first embodiment; 第１の実施の形態に係る次元圧縮された測定データの純度と効率とを示す画面を示す図である。FIG. 13 is a diagram showing a screen showing the purity and efficiency of dimensionally reduced measurement data according to the first embodiment. 第１の実施の形態に係る次元圧縮された測定データのクラスと確信度とを示す図である。FIG. 11 is a diagram showing classes and confidence levels of dimension-reduced measurement data according to the first embodiment; 第１の実施の形態に係るモードと、純度及び効率との関係の画面を示す図である。FIG. 13 is a diagram showing a screen showing the relationship between the mode and purity and efficiency in the first embodiment. 第１の実施の形態に係る測定データ毎に閾値を設定する場合を説明するための図である。10A and 10B are diagrams for explaining a case where a threshold is set for each measurement data according to the first embodiment; 第１の実施の形態に係る検証用の測定データのＲＯＣ曲線を使用して閾値を設定する場合を説明するための図である。10 is a diagram for explaining a case where a threshold is set using an ROC curve of verification measurement data in the first embodiment. FIG. 第１の実施の形態に係る次元圧縮された測定データの表示例を示す図である。FIG. 11 is a diagram illustrating a display example of dimensionally compressed measurement data according to the first embodiment. 第１の実施の形態に係る測定対象となる細胞の蛍光補正前の測定データを類似度に従って色を変えて表示する表示例を示す図である。11A and 11B are diagrams showing an example of display in which measurement data before fluorescence correction of cells to be measured is displayed in different colors according to similarity in the first embodiment; 第１の実施の形態に係る測定対象となる細胞の蛍光補正後の測定データを類似度に従って色を変えて表示する表示例を示す図である。11A and 11B are diagrams showing an example of displaying measurement data after fluorescence correction of cells as measurement objects in a manner that changes color according to similarity in the first embodiment; 第１の実施の形態に係る情報処理装置のディープラーニングにおける測定データの分取を行う機能ブロック図である。1 is a functional block diagram illustrating a configuration of an information processing device according to a first embodiment for dividing measurement data in deep learning. FIG. 第１の実施の形態に係る情報処理装置のディープラーニングにおける測定データの分取を説明するためのフローチャートである。11 is a flowchart for explaining the sorting of measurement data in deep learning of the information processing device according to the first embodiment. 第１の実施の形態の変形例に係る生体粒子分析システムの構成例を示すブロック図である。FIG. 11 is a block diagram showing a configuration example of a biological particle analysis system according to a modified example of the first embodiment. 第１の実施の形態の変形例に係る情報処理システムの機能ブロック図である。FIG. 11 is a functional block diagram of an information processing system according to a modified example of the first embodiment. 第１の実施の形態に係る情報処理システムの変形例を示す機能ブロック図である。FIG. 11 is a functional block diagram illustrating a modified example of the information processing system according to the first embodiment. 第２の実施の形態に係るクラスタリング分取の閾値の考え方を説明するための図である。FIG. 13 is a diagram for explaining the concept of thresholds for clustering sorting according to the second embodiment. 第２の実施の形態に係るクラスタリング分取における閾値を５０％にした場合の範囲の考え方を説明するための図である。FIG. 13 is a diagram for explaining the concept of the range when the threshold value is set to 50% in the clustering sorting according to the second embodiment. 第２の実施の形態に係る情報処理装置のクラスタリング分取を行う機能ブロック図である。FIG. 11 is a functional block diagram of an information processing device according to a second embodiment for performing clustering sorting; 第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第１の例を示す図である。FIG. 13 is a diagram illustrating a first example of a circuit of FlowSOM according to a second embodiment. 第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第２の例を示す図である。FIG. 13 is a diagram illustrating a second example of a circuit of FlowSOM according to the second embodiment. 第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第３の例を示す図である。FIG. 13 is a diagram illustrating a third example of a circuit of FlowSOM according to the second embodiment. 第２の実施の形態に係る情報処理装置のクラスタリング分取を説明するためのフローチャートである。13 is a flowchart for explaining clustering sorting of an information processing device according to a second embodiment. 第２の実施の形態の変形例に係る情報処理システムの機能ブロック図である。FIG. 13 is a functional block diagram of an information processing system according to a modified example of the second embodiment. 第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第１の例の動作を説明するためのフローチャートである。11 is a flowchart for explaining the operation of a first example of a circuit of FlowSOM according to a second embodiment; 第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第３の例の動作を説明するためのフローチャートである。13 is a flowchart for explaining the operation of a third example of the circuit of FlowSOM according to the second embodiment. 第３の実施の形態に係る情報処理装置のＩＦＣＭ分取を行う機能ブロック図を示す図である。FIG. 13 is a functional block diagram showing IFCM fractionation of an information processing device according to a third embodiment. 第３の実施の形態に係る情報処理装置におけるＩＦＣＭ分取を説明するためのフローチャートである。13 is a flowchart for explaining IFCM sorting in an information processing device according to a third embodiment. 第３の実施の形態の変形例に係る情報処理システムの機能ブロック図である。FIG. 13 is a functional block diagram of an information processing system according to a modified example of the third embodiment. 実施の形態に係る情報処理装置の演算装置を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 2 is a hardware configuration diagram illustrating an example of a computer that realizes a calculation unit of the information processing device according to the embodiment.

　以下、添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。説明は以下の順序で行うものとする。 Below, a preferred embodiment of the present disclosure will be described in detail with reference to the attached drawings. In this specification and drawings, components having substantially the same functional configuration will be denoted by the same reference numerals to avoid duplicated explanations. The explanation will be given in the following order.

　０．基本概念
　　０．１．生体粒子分析システムの構成
　　０．２．情報処理装置の構成
　１．第１の実施の形態
　　１．１．確信度に基づく分取
　　１．２．確信度の使い方
　　１．３．閾値の設定
　　　１．３．１．閾値の設定方法１
　　　１．３．２．閾値の設定方法２
　　　１．３．３．閾値の設定方法３
　　１．４．類似度や確信度を用いた可視化
　　１．５．情報処理装置３００の機能ブロック図
　　１．６．動作説明
　　１．７．変形例
　２．第２の実施の形態
　　２．１．確信度に基づく分取（クラスタリング）
　　２．２．クラスタリング分取の閾値
　　　２．２．１．パラメータ毎に閾値の判定を行う場合
　　　２．２．２．全パラメータ平均で閾値の判定を行う場合
　　２．３．情報処理装置４００の機能ブロック図
　　２．４.ＦｌｏｗＳＯＭの回路
　　２．５．動作説明
　　２．６．変形例
　　２．７．ＦｌｏｗＳＯＭ分取時のフローチャート
　３．第３の実施の形態
　　３．１．確信度に基づく分取（画像フローサイトメータ）
　　３．２．情報処理装置６００の機能ブロック図
　　３．３．動作説明
　　３．４．変形例
　４．ハードウェア構成 0. Basic Concept 0.1. Configuration of Bioparticle Analysis System 0.2. Configuration of Information Processing Device 1. First Embodiment 1.1. Fractionation Based on Certainty Factor 1.2. How to Use Certainty Factor 1.3. Setting of Threshold Value 1.3.1. Threshold Value Setting Method 1
1.3.2. Threshold setting method 2
1.3.3. Threshold setting method 3
1.4. Visualization using similarity and certainty 1.5. Functional block diagram of information processing device 300 1.6. Operational description 1.7. Modification 2. Second embodiment 2.1. Sorting (clustering) based on certainty
2.2. Threshold for clustering sorting 2.2.1. When threshold judgment is performed for each parameter 2.2.2. When threshold judgment is performed by averaging all parameters 2.3. Functional block diagram of information processing device 400 2.4. FlowSOM circuit 2.5. Operation description 2.6. Modification 2.7. Flowchart for FlowSOM sorting 3. Third embodiment 3.1. Sorting based on confidence factor (imaging flow cytometer)
3.2. Functional block diagram of information processing device 600 3.3. Operational description 3.4. Modifications 4. Hardware configuration

　＜０．基本概念＞
　近年、機械学習を用いて、細胞などが含まれるサンプルから分取対象細胞を測定データ（例えば、標識された細胞から発せられる蛍光の強さ等を含む）に基づいて分取する機械学習ソートと呼ばれる手法が開発されている。機械学習ソートの基本概念については特許文献２に開示されており、本開示では特許文献２の内容が適宜参照されてよい。 <0. Basic concept>
In recent years, a method called machine learning sorting has been developed that uses machine learning to separate target cells from a sample containing cells, etc., based on measurement data (including, for example, the intensity of fluorescence emitted from labeled cells, etc.) The basic concept of machine learning sorting is disclosed in Patent Document 2, and the contents of Patent Document 2 may be referred to as appropriate in the present disclosure.

特開２０２０－１９３８７７号公報JP 2020-193877 A

　＜０．１．生体粒子分析システムの構成＞
　生体粒子分析システムについて説明する。 <0.1. Configuration of bioparticle analysis system>
A bioparticle analysis system is described.

　まず、図１を参照して、実施の形態に係る生体粒子分析システム１の構成について説明する。図１は、実施の形態に係る生体粒子分析システム１の構成例を示すブロック図である。 First, the configuration of a bioparticle analysis system 1 according to an embodiment will be described with reference to FIG. 1. FIG. 1 is a block diagram showing an example of the configuration of a bioparticle analysis system 1 according to an embodiment.

　図１に示すように、本実施の形態に係る生体粒子分析システム１は、サンプルＳから測定データを取得し、かつ情報処理装置２０の判別に基づいて分取対象の粒子を分取する分取装置１０と、分取装置１０にて取得された測定データを解析し、該粒子が分取対象であるか否かを判別する情報処理装置２０と、を備える。生体粒子分析システム１は、例えば、いわゆるセルソータとして用いられ得る。 As shown in FIG. 1, the bioparticle analysis system 1 according to this embodiment includes a fractionation device 10 that acquires measurement data from a sample S and fractionates particles to be sorted based on the judgment of an information processing device 20, and an information processing device 20 that analyzes the measurement data acquired by the fractionation device 10 and judges whether the particles are to be sorted. The bioparticle analysis system 1 can be used, for example, as a so-called cell sorter.

　サンプルＳは、例えば、細胞、微生物又は生体関連粒子などの生体由来粒子であり、複数の生体由来粒子の集団を含む。分取装置１０は、サンプルＳの測定データを解析することによって、生体由来粒子をそれぞれ内的結合及び外的分離された複数の集団に分類し、分類された特定の集団を分取することができる。サンプルＳは、例えば、動物細胞（例えば、血球系細胞など）、若しくは植物細胞などの細胞、大腸菌等の細菌類、タバコモザイクウイルス等のウイルス類、若しくはイースト等の菌類などの微生物、染色体、リポソーム、ミトコンドリア、若しくは各種オルガネラ（細胞小器官）などの細胞を構成する生体関連粒子、又は核酸、タンパク質、脂質、糖鎖、若しくはこれらの複合体などの生体関連高分子などの生体由来の微小粒子であってもよい。 The sample S is, for example, a particle of biological origin, such as a cell, a microorganism, or a biologically-related particle, and includes a plurality of groups of biologically-derived particles. The sorting device 10 can classify the biologically-derived particles into a plurality of groups that are internally bound and externally separated, and sort a specific group by analyzing the measurement data of the sample S. The sample S may be, for example, a cell such as an animal cell (e.g., a blood cell), or a plant cell, a microorganism such as bacteria such as E. coli, viruses such as tobacco mosaic virus, or fungi such as yeast, a biologically-related particle that constitutes a cell, such as a chromosome, a liposome, a mitochondria, or various organelles (cell organelles), or a biologically-related macromolecule, such as a nucleic acid, a protein, a lipid, a sugar chain, or a complex of these, which is derived from a biological organism.

　サンプルＳは、例えば、ラテックス粒子やゲル粒子、工業用粒子等の合成粒子などを含む。工業用粒子は例えば、有機若しくは無機高分子材料、金属等であってもよい。有機高分子材料には、ポリスチレン、スチレン・ジビニルベンゼン、ポリメチルメタクリレート等が含まれる。無機高分子材料には、ガラス、シリカ、磁性体材料等が含まれる。金属には、金コロイド、アルミ等が含まれる。これらの微小粒子の形状は、球形であってよいが、非球形であってもよい。微小粒子は空洞を有し、空洞内に生体由来粒子を捕捉するように構成されていてもよい。これらの微小粒子の大きさ及び質量は、当業者により適宜選択されてよく、特に限定されない。 The sample S includes, for example, synthetic particles such as latex particles, gel particles, and industrial particles. The industrial particles may be, for example, organic or inorganic polymeric materials, metals, etc. Organic polymeric materials include polystyrene, styrene-divinylbenzene, polymethyl methacrylate, etc. Inorganic polymeric materials include glass, silica, magnetic materials, etc. Metals include gold colloids, aluminum, etc. The shape of these microparticles may be spherical, but may also be non-spherical. The microparticles may have a cavity and may be configured to capture biological particles in the cavity. The size and mass of these microparticles may be appropriately selected by those skilled in the art and are not particularly limited.

　ここで、サンプルＳは、１つ以上の蛍光色素によって標識（染色）されている。蛍光色素によるサンプルＳの標識は、公知の手法によって行うことができる。例えば、サンプルＳが細胞である場合、細胞表面に存在する抗原に対して選択的に結合する蛍光標識抗体と、測定対象の細胞とを混合し、細胞表面の抗原に蛍光標識抗体を結合させることで、測定対象の細胞を蛍光色素にて標識することができる。 Here, the sample S is labeled (stained) with one or more fluorescent dyes. The labeling of the sample S with the fluorescent dyes can be performed by known methods. For example, if the sample S is a cell, the cells to be measured can be labeled with the fluorescent dye by mixing a fluorescently labeled antibody that selectively binds to an antigen present on the cell surface with the cells to be measured and binding the fluorescently labeled antibody to the antigen on the cell surface.

　蛍光標識抗体は、標識として蛍光色素を結合させた抗体である。具体的には、蛍光標識抗体は、ビオチン標識した抗体に、アビジンを結合させた蛍光色素をアビジン－ビオジン反応によって結合させたものであってもよい。または、蛍光標識抗体は、抗体に蛍光色素を直接結合させたものであってもよい。なお、抗体は、ポリクローナル抗体又はモノクローナル抗体のいずれを用いることも可能である。また、細胞を標識するための蛍光色素も特に限定されず、細胞等の染色に使用される公知の色素を少なくとも１つ以上用いることが可能である。 A fluorescently labeled antibody is an antibody to which a fluorescent dye is bound as a label. Specifically, a fluorescently labeled antibody may be an antibody to which avidin-bound fluorescent dye is bound by an avidin-biodin reaction, which is then bound to a biotin-labeled antibody. Alternatively, a fluorescently labeled antibody may be an antibody to which a fluorescent dye is directly bound. The antibody may be either a polyclonal antibody or a monoclonal antibody. Furthermore, the fluorescent dye for labeling cells is not particularly limited, and at least one or more well-known dyes used for staining cells, etc. may be used.

　分取装置１０は、測定ユニットと、分取ユニットと、を含む。分取装置１０は、いわゆるフローセル型の分取装置１０であってもよく、マイクロ流路チップ型の分取装置であってもよい。 The fractionation device 10 includes a measurement unit and a fractionation unit. The fractionation device 10 may be a so-called flow cell type fractionation device 10, or may be a microchannel chip type fractionation device.

　測定ユニットは、サンプルＳに対してレーザ光等の光線を照射することで、サンプルＳから発せられる蛍光を測定する。具体的には、測定ユニットは、サンプルＳを分散させたシース液を層流とすることでサンプルＳを一方向に整列させる。このとき、測定ユニットは、整列したサンプルＳに、サンプルＳを標識する蛍光色素を励起可能な波長を有するレーザ光を照射し、レーザ光が照射されたサンプルＳから発生する蛍光をＣＣＤ（Ｃｈａｒｇｅ　Ｃｏｕｐｌｅｄ　Ｄｅｖｉｃｅ）、ＣＭＯＳ（Ｃｏｍｐｌｅｍｅｎｔａｒｙ　Ｍｅｔａｌ　Ｏｘｉｄｅ　Ｓｅｍｉｃｏｎｄｕｃｔｏｒ）又はフォトダイオード、ＰＭＴ（Ｐｈｏｔｏ　Ｍｕｌｔｉｐｌｉｅｒ　Ｔｕｂｅ）などの公知の光電変換素子によって光電変換する。これにより、測定ユニットは、サンプルＳからの蛍光を取得することができる。 The measurement unit measures the fluorescence emitted from the sample S by irradiating the sample S with a beam of light such as a laser beam. Specifically, the measurement unit aligns the sample S in one direction by forming a laminar flow in the sheath liquid in which the sample S is dispersed. At this time, the measurement unit irradiates the aligned sample S with a laser beam having a wavelength capable of exciting the fluorescent dye that labels the sample S, and photoelectrically converts the fluorescence generated from the sample S irradiated with the laser beam using a known photoelectric conversion element such as a CCD (Charge Coupled Device), CMOS (Complementary Metal Oxide Semiconductor), photodiode, or PMT (Photo Multiplier Tube). This allows the measurement unit to acquire the fluorescence from the sample S.

　測定ユニットにおけるサンプルＳからの蛍光の検出機構は、フィルタ方式又はスペクトル方式のいずれでもよい。ここで、サンプルＳからの蛍光の検出機構について、図２及び図３を参照して説明する。図２は、フィルタ方式の検出機構を説明する説明図であり、図３は、スペクトル方式の検出機構を説明する説明図である。 The detection mechanism for the fluorescence from the sample S in the measurement unit may be either a filter type or a spectral type. Here, the detection mechanism for the fluorescence from the sample S will be described with reference to Figures 2 and 3. Figure 2 is an explanatory diagram for explaining the detection mechanism of the filter type, and Figure 3 is an explanatory diagram for explaining the detection mechanism of the spectral type.

　図２に示すように、フィルタ方式の検出機構では、流路１３を通流するサンプルＳに、光源１１からの光線を照射することで得られた蛍光をダイクロイックミラー１５Ａ、１５Ｂ、１５Ｃで分光する。これにより、フィルタ方式の検出機構は、光検出器１７Ａ、１７Ｂ、１７Ｃにて所定の波長帯域ごとに蛍光の強度を取得することができる。 As shown in Figure 2, in the filter-type detection mechanism, the sample S flowing through the flow path 13 is irradiated with light from the light source 11, and the resulting fluorescence is separated by dichroic mirrors 15A, 15B, and 15C. As a result, the filter-type detection mechanism can obtain the intensity of the fluorescence for each predetermined wavelength band using photodetectors 17A, 17B, and 17C.

　具体的には、ダイクロイックミラー１５Ａ、１５Ｂ、１５Ｃは、特定の波長帯域の光を反射し、その他の波長帯域の光を透過させるミラーである。これにより、測定ユニットは、サンプルＳからの蛍光の光路上に、異なる波長帯域の光を反射するダイクロイックミラー１５Ａ、１５Ｂ、１５Ｃを設けることで、蛍光を波長帯域ごとに分光することができる。例えば、測定ユニットは、サンプルＳからの蛍光が入射する側から順に、赤色の波長帯域の光を反射するダイクロイックミラー１５Ａ、緑色の波長帯域の光を反射するダイクロイックミラー１５Ｂ、及び青色の波長帯域の光を反射するダイクロイックミラー１５Ｃをそれぞれ設けることで、サンプルＳからの蛍光を波長帯域ごとに分光することができる。 Specifically, dichroic mirrors 15A, 15B, and 15C are mirrors that reflect light in specific wavelength bands and transmit light in other wavelength bands. As a result, the measurement unit can separate the fluorescence into wavelength bands by providing dichroic mirrors 15A, 15B, and 15C that reflect light in different wavelength bands on the optical path of the fluorescence from sample S. For example, the measurement unit can separate the fluorescence from sample S into wavelength bands by providing, in order from the side where the fluorescence from sample S is incident, dichroic mirror 15A that reflects light in the red wavelength band, dichroic mirror 15B that reflects light in the green wavelength band, and dichroic mirror 15C that reflects light in the blue wavelength band.

　図３に示すように、スペクトル方式の検出機構では、流路１３を通過するサンプルＳに光源１１からの光線を照射することで得られた蛍光をプリズム１６で分光する。これにより、スペクトル方式の検出機構は、光検出器アレイ１８にて連続的な蛍光スペクトルを取得することができる。 As shown in FIG. 3, in the spectral detection mechanism, the sample S passing through the flow path 13 is irradiated with light from the light source 11, and the resulting fluorescence is dispersed by the prism 16. This allows the spectral detection mechanism to obtain a continuous fluorescence spectrum at the photodetector array 18.

　具体的には、プリズム１６は、入射する光を分散させる光学部材である。これにより、測定ユニットは、サンプルＳからの蛍光をプリズム１６にて分散させることで、複数の光電変換素子をアレイ状に配置した光検出器アレイ１８にて蛍光の連続的なスペクトルを検出することができる。 Specifically, the prism 16 is an optical element that disperses the incident light. As a result, the measurement unit disperses the fluorescence from the sample S using the prism 16, and is able to detect a continuous spectrum of the fluorescence using the photodetector array 18, which has multiple photoelectric conversion elements arranged in an array.

　分取ユニットは、分取対象となったサンプルＳの一部を分取する。具体的には、まず、分取ユニットは、サンプルＳの液滴を生成し、分取対象となるサンプルＳの液滴を荷電させる。次に、分取ユニットは、生成した液滴を偏向板により生成された電場中に移動させる。このとき、荷電した液滴は、帯電した偏光板側に引き寄せられるため、液滴の移動方向が変更される。これにより、分取ユニットは、分取対象となるサンプルＳの液滴と、分取対象ではないサンプルＳの液滴とを分離することができるため、分取対象となる生体由来粒子を分取することが可能となる。なお、分取ユニットの分取方式は、ジェットインエアー方式又はキュベットフローセル方式のいずれであってもよい。また、サンプルＳは、フローセル又はマイクロ流路チップの外部に射出されることで分取されてもよく、マイクロ流路チップの内部にて分取されてもよい。サンプルＳを分取するか否かは、分取装置１０に備えられたロジック回路（例えば、ＦＰＧＡ（ｆｉｅｌｄ－ｐｒｏｇｒａｍｍａｂｌｅ　ｇａｔｅ　ａｒｒａｙ）回路）にて判断されてもよく、情報処理装置２０からの指示にて判断されてもよい。 The fractionation unit fractionates a portion of the sample S to be fractionated. Specifically, the fractionation unit first generates droplets of the sample S and charges the droplets of the sample S to be fractionated. Next, the fractionation unit moves the generated droplets into the electric field generated by the polarizing plate. At this time, the charged droplets are attracted to the charged polarizing plate, so the direction of movement of the droplets is changed. This allows the fractionation unit to separate droplets of the sample S to be fractionated from droplets of the sample S that are not to be fractionated, making it possible to fractionate the biological particles to be fractionated. The fractionation method of the fractionation unit may be either a jet-in-air method or a cuvette flow cell method. The sample S may be fractionated by being ejected outside the flow cell or the microchannel chip, or may be fractionated inside the microchannel chip. The decision as to whether or not to collect sample S may be made by a logic circuit (e.g., a field-programmable gate array (FPGA) circuit) provided in the collection device 10, or may be made based on an instruction from the information processing device 20.

　情報処理装置２０は、測定ユニットによって取得されたサンプルＳの測定データを解析し、解析したデータをユーザに提示する。ユーザは、情報処理装置２０にて解析されたデータを確認することで、分取対象となる生体由来粒子の集団を特定することができる。 The information processing device 20 analyzes the measurement data of the sample S acquired by the measurement unit and presents the analyzed data to the user. By checking the data analyzed by the information processing device 20, the user can identify the group of biological particles to be separated.

　＜０．２．情報処理装置の構成＞
　続いて、図４を参照して、本実施の形態に係る生体粒子分析システム１に含まれる情報処理装置２０のより具体的な構成について説明する。図４は、本実施の形態に係る情報処理装置２０の構成例を示すブロック図である。 <0.2. Configuration of information processing device>
Next, a more specific configuration of the information processing device 20 included in the biological particle analysis system 1 according to the present embodiment will be described with reference to Fig. 4. Fig. 4 is a block diagram showing an example of the configuration of the information processing device 20 according to the present embodiment.

　図４に示すように、情報処理装置２０は、取得部２０１と、解析部２０３と、リファレンススペクトル記憶部２０５と、データ圧縮処理部２０７と、インターフェース部２０９と、学習部２１１と、学習モデル記憶部２１３と、判別部２１５と、を備える。 As shown in FIG. 4, the information processing device 20 includes an acquisition unit 201, an analysis unit 203, a reference spectrum storage unit 205, a data compression processing unit 207, an interface unit 209, a learning unit 211, a learning model storage unit 213, and a discrimination unit 215.

　取得部２０１は、分取装置１０から生体由来粒子の蛍光に関する情報を取得する。具体的には、分取装置１０は、スペクトル方式の検出機構によって生体由来粒子の光を検出し、取得部２０１は、生体由来粒子の光のスペクトルに関する情報を取得する。生体由来粒子の光とは、レーザ光を照射された生体由来粒子からの散乱光又は蛍光のいずれかであってもよく、その両方であってもよい。取得部２０１は、例えば、ネットワーク等を介して分取装置１０から生体由来粒子の光に関する情報を取得してもよく、有線若しくは無線のＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）、又は有線ケーブルを介して分取装置１０から生体由来粒子の光に関する情報を取得してもよい。 The acquisition unit 201 acquires information about the fluorescence of the biological particles from the sorting device 10. Specifically, the sorting device 10 detects the light of the biological particles using a spectral detection mechanism, and the acquisition unit 201 acquires information about the spectrum of the light of the biological particles. The light of the biological particles may be either scattered light or fluorescence from the biological particles irradiated with laser light, or it may be both. The acquisition unit 201 may acquire information about the light of the biological particles from the sorting device 10 via a network, for example, or may acquire information about the light of the biological particles from the sorting device 10 via a wired or wireless LAN (Local Area Network) or a wired cable.

　例えば、取得部２０１にて取得された生体由来粒子の光に関する情報は、図５に示すような情報であってもよい。図５は、分取装置１０から取得される生体由来粒子の光に関する情報の一例を示す表図である。 For example, the information about the light of the biological particles acquired by the acquisition unit 201 may be information as shown in FIG. 5. FIG. 5 is a table showing an example of information about the light of the biological particles acquired from the sorting device 10.

　図５に示すように、生体由来粒子の光に関する情報は、細胞（すなわち、生体由来粒子）の識別番号ごとに、光検出器アレイに配置されたＮ個の光電子増倍管（Ｐｈｏｔｏ　Ｍｕｌｔｉｐｌｉｅｒ　Ｔｕｂｅ：ＰＭＴ）の各々にて検出されたゲインを「ＰＭＴ１」～「ＰＭＴＮ」として示したものであってもよい。これらのＮ個の光電子増倍管は、プリズムによる光の分散方向に一列にアレイ状に配置されている。そのため、これらのＮ個の光電子増倍管のゲインをヒストグラムとして連続的に並べることで、細胞の光のスペクトルを取得することができる。図５では、Ｎ個の細胞の各々について、Ｎ個の光電子増倍管のゲインの測定結果が示されている。 As shown in FIG. 5, the information about the light of the biogenic particles may be represented by the gains detected by N photomultiplier tubes (PMTs) arranged in a photodetector array for each identification number of a cell (i.e., a biogenic particle) as "PMT1" to "PMTN". These N photomultiplier tubes are arranged in a line in an array in the direction of light dispersion by a prism. Therefore, by continuously arranging the gains of these N photomultiplier tubes as a histogram, the spectrum of the light of the cell can be obtained. FIG. 5 shows the measurement results of the gains of N photomultiplier tubes for each of N cells.

　解析部２０３は、分取装置１０にて測定された生体由来粒子の光に関する情報を解析することで、生体由来粒子の特性に関する情報を導出する。具体的には、解析部２０３は、分取装置１０にて測定された蛍光スペクトルに含まれる蛍光の各々を分離することで、蛍光の各々に対応する蛍光物質の生体由来粒子における発現量を導出する。 The analysis unit 203 derives information about the characteristics of the biological particles by analyzing information about the light of the biological particles measured by the sorting device 10. Specifically, the analysis unit 203 separates each of the fluorescent light contained in the fluorescent spectrum measured by the sorting device 10, and derives the expression amount in the biological particles of the fluorescent substance corresponding to each of the fluorescent light.

　測定対象の生体由来粒子は、互いに重なり合った波長分布の蛍光を発する複数の蛍光物質によって標識されている。そのため、解析部２０３は、分取装置１０にて測定された蛍光スペクトルに対して、各蛍光物質から発せられる蛍光の波長分布を重み付けしてフィッティングすることで、各蛍光物質の発現量を導出することができる。 The biological particles to be measured are labeled with multiple fluorescent substances that emit fluorescence with overlapping wavelength distributions. Therefore, the analysis unit 203 can derive the expression level of each fluorescent substance by weighting and fitting the wavelength distribution of the fluorescence emitted from each fluorescent substance to the fluorescence spectrum measured by the fractionation device 10.

　より具体的には、まず、解析部２０３は、リファレンススペクトル記憶部２０５から生体由来粒子を標識している蛍光物質が発する蛍光の波長分布を示すリファレンススペクトルをそれぞれ取得する。次に、解析部２０３は、各蛍光物質のリファレンススペクトルを重ね合わせて、重み付け最小二乗法を用いて分取装置１０にて測定された蛍光スペクトルにフィッティングすることで、各蛍光物質の発現量を推定することができる。 More specifically, first, the analysis unit 203 acquires from the reference spectrum storage unit 205 reference spectra indicating the wavelength distribution of the fluorescence emitted by the fluorescent substances labeling the biological particles. Next, the analysis unit 203 superimposes the reference spectra of each fluorescent substance and fits them to the fluorescence spectrum measured by the fractionation device 10 using the weighted least squares method, thereby estimating the expression level of each fluorescent substance.

　リファレンススペクトル記憶部２０５は、生体由来粒子を標識可能な蛍光物質が発する蛍光の波長分布を示すリファレンススペクトルをそれぞれ記憶する。リファレンススペクトル記憶部２０５は、情報処理装置２０又は分取装置１０のいずれかに備えられていてもよく、ネットワークを介して通信可能な他の情報処理装置又は情報処理サーバに備えられていてもよい。 The reference spectrum storage unit 205 stores reference spectra that indicate the wavelength distribution of fluorescence emitted by fluorescent substances capable of labeling biological particles. The reference spectrum storage unit 205 may be provided in either the information processing device 20 or the fractionation device 10, or may be provided in another information processing device or information processing server that can communicate via a network.

　データ圧縮処理部２０７は、解析部２０３にて解析した生体由来粒子の光情報に対してデータ圧縮処理を行う。 The data compression processing unit 207 performs data compression processing on the optical information of the biological particles analyzed by the analysis unit 203.

　データ圧縮処理とは、非線形処理、又は線形処理のいずれをも含む。例えば、非線形処理としては、次元圧縮処理、クラスタリング処理、又はグルーピング処理を含んでもよい。例えば、線形処理としては、蛍光分離を行うことで、生体由来粒子の光のスペクトル情報から蛍光色素ごとの蛍光情報を生成する処理を含んでもよい。 Data compression processing includes both nonlinear processing and linear processing. For example, nonlinear processing may include dimensionality reduction processing, clustering processing, or grouping processing. For example, linear processing may include processing for generating fluorescence information for each fluorescent dye from the optical spectrum information of biological particles by performing fluorescence separation.

　なお、非線形処理には、教師あり若しくは教師なしの機械学習、又は弱教師ありの機械学習のいずれのアルゴリズムが用いられてもよい。ただし、非線形処理に用いられる機械学習アルゴリズムは、後述する学習部２１１にて用いられる機械学習アルゴリズムとは異なることが望ましい。 Note that any algorithm, whether supervised or unsupervised machine learning or weakly supervised machine learning, may be used for the nonlinear processing. However, it is preferable that the machine learning algorithm used for the nonlinear processing is different from the machine learning algorithm used by the learning unit 211 described below.

　具体的には、データ圧縮処理部２０７は、生体由来粒子の各蛍光物質の発現量に関する情報に対してクラスタリング処理を行ってもよい。これによれば、データ圧縮処理部２０７は、生体由来粒子を外的分離及び内的結合した複数の集団に分類することができる。 Specifically, the data compression processing unit 207 may perform clustering processing on information related to the expression level of each fluorescent substance in the bioparticles. In this way, the data compression processing unit 207 can classify the bioparticles into a plurality of groups that are externally separated and internally combined.

　クラスタリング処理のアルゴリズムは、特に限定されず、公知のクラスタリングアルゴリズムを用いることが可能である。例えば、データ圧縮処理部２０７は、ｋ－ｍｅａｎｓ等のクラスタ数を指定できるアルゴリズムを用いてクラスタリング処理を行ってもよく、ｆｌｏｗｓｏｍ等の自動的にクラスタ数を決定するようなアルゴリズムを用いてクラスタリング処理を行ってもよい。 The algorithm for the clustering process is not particularly limited, and any known clustering algorithm can be used. For example, the data compression processing unit 207 may perform the clustering process using an algorithm that can specify the number of clusters, such as k-means, or may perform the clustering process using an algorithm that automatically determines the number of clusters, such as flowsom.

　データ圧縮処理部２０７によるクラスタリング処理の結果は、図６及び図７に示すような形式にてユーザに提示されてもよい。図６及び図７は、クラスタリング処理の結果を示す説明図である。 The results of the clustering process performed by the data compression processing unit 207 may be presented to the user in a format as shown in Figures 6 and 7. Figures 6 and 7 are explanatory diagrams showing the results of the clustering process.

　例えば、図６に示すように、データ圧縮処理部２０７によるクラスタリング結果は、表形式にてユーザに提示されてもよい。 For example, as shown in FIG. 6, the clustering results by the data compression processing unit 207 may be presented to the user in a table format.

　図６では、１０００個の細胞（すなわち、生体由来粒子）の集団がＮ個のクラスタに分割されており、クラスタ及び細胞の各々に付された識別番号にて、各クラスタへの細胞の所属が示されている。具体的には、図６では、識別番号「１」のクラスタには、識別番号「１」、「２」、「３」及び「１０」の細胞が所属しており、識別番号「２」のクラスタには、識別番号「１１」、「１２」、「２２」及び「３１」の細胞が所属しており、識別番号「３」のクラスタには、識別番号「４」～「６」、「１４」及び「１５」の細胞が所属しており、識別番号「Ｎ」のクラスタには、識別番号「１０００」の細胞が所属している。このような表形式によるユーザへの提示では、細胞の各クラスタへの所属を簡潔に示すことができる。 In FIG. 6, a group of 1000 cells (i.e., biological particles) is divided into N clusters, and the identification numbers assigned to the clusters and cells indicate which cells belong to which cluster. Specifically, in FIG. 6, the cells with identification numbers "1", "2", "3", and "10" belong to the cluster with identification number "1", the cells with identification numbers "11", "12", "22", and "31" belong to the cluster with identification number "2", the cells with identification numbers "4" to "6", "14", and "15" belong to the cluster with identification number "3", and the cell with identification number "1000" belongs to the cluster with identification number "N". By presenting the data to the user in such a tabular format, the cell's belonging to each cluster can be simply shown.

　例えば、図７に示すように、データ圧縮処理部２０７によるクラスタリング結果は、ミニマムスパニングツリー（Ｍｉｎｉｍｕｍ　Ｓｐａｎｎｉｎｇ　Ｔｒｅｅ）形式にてユーザに提示されてもよい。 For example, as shown in FIG. 7, the clustering results by the data compression processing unit 207 may be presented to the user in a minimum spanning tree format.

　図７では、複数の色（図７では色をハッチングの種類で区別する）で塗り分けられたレーダチャートが互いに接続された樹状に配列されている。各レーダチャートは、各細胞（すなわち、生体由来粒子）を表している。具体的には、各レーダチャートの分布及び大きさは、細胞の各蛍光物質の発現量に対応するベクトルを表している。ここで、各色で塗り分けられた領域は、各細胞が所属するクラスタを表す。例えば、同じ色（すなわち、同一種のハッチング）で塗り分けられたレーダチャートで示される細胞は、同じクラスタに所属していることを表す。 In Figure 7, radar charts painted in multiple colors (in Figure 7, colors are distinguished by the type of hatching) are arranged in a tree shape that is connected to each other. Each radar chart represents a cell (i.e., a biological particle). Specifically, the distribution and size of each radar chart represents a vector corresponding to the expression level of each fluorescent substance in the cell. Here, the areas painted in each color represent the cluster to which each cell belongs. For example, cells shown in radar charts painted in the same color (i.e., the same type of hatching) belong to the same cluster.

　さらに、図７では、レーダチャート間の距離がレーダチャートで表される細胞同士の類似度に対応している。すなわち、図７では、互いに接近したレーダチャートが表す細胞は互いに類似しており、互いに離れたレーダチャートが表す細胞は互いに類似していないことを示している。このようなミニマムスパニングツリー形式によるユーザへの提示によれば、細胞のクラスタへの所属に加えて、細胞の互いの類似関係を示すことができる。 Furthermore, in Figure 7, the distance between radar charts corresponds to the similarity between the cells represented by the radar charts. In other words, Figure 7 shows that cells represented by radar charts that are close to each other are similar to each other, and cells represented by radar charts that are far from each other are dissimilar to each other. By presenting the data to the user in this minimum spanning tree format, it is possible to show the similarity between the cells in addition to the cluster affiliation of the cells.

　または、データ圧縮処理部２０７は、生体由来粒子の各蛍光物質の発現量に関する情報に対して次元圧縮処理を行ってもよい。これによれば、データ圧縮処理部２０７は、複数の蛍光物質の発現量を含む高次元データの次元を圧縮することで、高次元データの各々の関係性を低次元のマップ上にわかりやすく可視化することができる。したがって、ユーザは、次元圧縮処理後の低次元の情報を確認することで、次元圧縮処理前の高次元の情報よりも、より容易に生体由来粒子を複数の集団に分類することができる。データ圧縮処理部２０７は、次元数を少なくとも１以上減少させる次元圧縮処理を行うことができればよいが、例えば、生体由来粒子の各蛍光物質の発現量に関する情報の次元を三次元以下に圧縮することで、高次元データの各々の関係性をより明確に可視化することが可能である。 Alternatively, the data compression processing unit 207 may perform a dimensionality compression process on information relating to the expression levels of each fluorescent substance in the biological particles. In this way, the data compression processing unit 207 can compress the dimensions of high-dimensional data including the expression levels of multiple fluorescent substances, thereby making it possible to easily visualize the relationships between each piece of high-dimensional data on a low-dimensional map. Therefore, by checking the low-dimensional information after the dimensionality compression process, the user can classify the biological particles into multiple groups more easily than with the high-dimensional information before the dimensionality compression process. The data compression processing unit 207 only needs to be able to perform a dimensionality compression process that reduces the number of dimensions by at least one or more, but for example, by compressing the dimensions of the information relating to the expression levels of each fluorescent substance in the biological particles to three dimensions or less, it is possible to more clearly visualize the relationships between each piece of high-dimensional data.

　次元圧縮処理のアルゴリズムは、特に限定されず、公知の次元圧縮アルゴリズムを用いることが可能である。例えば、データ圧縮処理部２０７は、ＰＣＡ、ｔ－ＳＮＥ又はＵｍａｐ等のアルゴリズムを用いて次元圧縮処理を行ってもよい。 The algorithm for the dimensionality compression process is not particularly limited, and any known dimensionality compression algorithm can be used. For example, the data compression processing unit 207 may perform the dimensionality compression process using an algorithm such as PCA, t-SNE, or Umap.

　データ圧縮処理部２０７による次元圧縮処理の結果は、図８に示すような形式にてユーザに提示されてもよい。図８は、生体由来粒子の各蛍光物質の発現量に関する情報をｔ－ＳＮＥアルゴリズムを用いて二次元まで次元圧縮処理した結果を示す説明図である。 The results of the dimensionality compression process by the data compression processing unit 207 may be presented to the user in a format as shown in FIG. 8. FIG. 8 is an explanatory diagram showing the results of dimensionality compression process of information on the expression levels of each fluorescent substance in biological particles to two dimensions using the t-SNE algorithm.

　例えば、図８では、細胞の各蛍光物質の発現量という高次元データのユークリッド距離をスチューデントのｔ－分布の確率分布を用いて確率に変換して二次元座標上にマッピングしている。これにより、ユーザは、各蛍光物質の発現量を各々比較せずとも、細胞の各蛍光物質の発現量の類似度をより単純化して比較することができる。例えば、図８では、同じ集団に属する細胞を異なる色で表している。図８を参照すると、次元圧縮処理によって、同じ集団に属する細胞が適切に内的結合及び外的分離されてグルーピングされることがわかる。 For example, in Figure 8, the Euclidean distance of high-dimensional data, such as the expression levels of each fluorescent substance in a cell, is converted into a probability using the probability distribution of Student's t-distribution and mapped onto two-dimensional coordinates. This allows the user to more simply compare the similarity of the expression levels of each fluorescent substance in cells without having to compare the expression levels of each fluorescent substance individually. For example, in Figure 8, cells belonging to the same population are shown in different colors. Referring to Figure 8, it can be seen that the dimensionality reduction process allows cells belonging to the same population to be grouped with appropriate internal connections and external separation.

　インターフェース部２０９は、出力装置及び入力装置を含み、ユーザとの間での情報の入出力を行う。具体的には、インターフェース部２０９は、ＣＲＴ（Ｃａｔｈｏｄｅ　Ｒａｙ　Ｔｕｂｅ）表示装置、液晶表示装置又はＯＬＥＤ（Ｏｒｇａｎｉｃ　Ｌｉｇｈｔ　Ｅｍｉｔｔｉｎｇ　Ｄｉｏｄｅ）表示装置等を用いて、データ圧縮処理部２０７による非線形処理後の情報をユーザに提示してもよい。また、インターフェース部２０９は、タッチパネル、キーボード、マウス、ボタン、マイクロフォン、スイッチ又はレバーなどの入力装置を用いて、分取対象とする生体由来粒子を特定するユーザの入力を受け付けてもよい。 The interface unit 209 includes an output device and an input device, and performs input and output of information with the user. Specifically, the interface unit 209 may present the information after nonlinear processing by the data compression processing unit 207 to the user using a CRT (Cathode Ray Tube) display device, a liquid crystal display device, an OLED (Organic Light Emitting Diode) display device, or the like. The interface unit 209 may also accept user input specifying the biological particles to be separated, using an input device such as a touch panel, a keyboard, a mouse, a button, a microphone, a switch, or a lever.

　ユーザは、インターフェース部２０９から出力されるデータ圧縮処理後の情報を確認することで、分取対象となる生体由来粒子の集団をより容易に指定することができる。例えば、ユーザは、クラスタリング処理後の情報を確認することで、分取対象となる生体由来粒子のクラスタを特定することができる。または、ユーザは、次元圧縮処理後の情報を確認することで、分取対象となる生体由来粒子の集団を範囲指定することができる。 By checking the information after data compression processing output from the interface unit 209, the user can more easily specify the population of biological particles to be separated. For example, by checking the information after clustering processing, the user can identify the cluster of biological particles to be separated. Alternatively, by checking the information after dimensional compression processing, the user can specify the range of the population of biological particles to be separated.

　学習部２１１で実施される学習モデルの構築に関しては後述する。 The construction of the learning model performed by the learning unit 211 will be described later.

　構築された学習モデルは、例えば、情報処理装置２０に備えられる学習モデル記憶部２１３に記憶されてもよい。これによれば、分取装置１０は、情報処理装置２０からの分取制御によって分取対象となる生体由来粒子を分取することができる。または、構築された学習モデルは、分取装置１０に設けられたＦＰＧＡ回路等のロジック回路に実装されてもよい。例えば、分取装置１０には、判別部２１５が設けられており、分取装置１０に設けられたＦＰＧＡ回路には、判別部２１５の種類に基づいて設計され、構築された学習モデルを実行するロジックが実装されていてもよい。構築された学習モデルを実行するロジックは、学習部２１１が設計してもよい。 The constructed learning model may be stored, for example, in a learning model storage unit 213 provided in the information processing device 20. In this way, the sorting device 10 can sort the biological particles to be sorted by sorting control from the information processing device 20. Alternatively, the constructed learning model may be implemented in a logic circuit such as an FPGA circuit provided in the sorting device 10. For example, the sorting device 10 is provided with a discrimination unit 215, and the FPGA circuit provided in the sorting device 10 may be implemented with logic designed based on the type of discrimination unit 215 and for executing the constructed learning model. The logic for executing the constructed learning model may be designed by the learning unit 211.

　学習部２１１が行う機械学習のアルゴリズムは、分取対象と特定された生体由来粒子の蛍光スペクトルに関する情報を教師とする教師あり学習である。例えば、学習部２１１は、ランダムフォレスト、サポートベクターマシン、又はディープラーニングなどの機械学習アルゴリズムを用いて学習モデルを構築してもよい。 The machine learning algorithm performed by the learning unit 211 is supervised learning that uses information on the fluorescence spectrum of the biological particles identified as the target for sorting as a teacher. For example, the learning unit 211 may construct a learning model using a machine learning algorithm such as a random forest, a support vector machine, or deep learning.

　本実施の形態に係る生体粒子分析システム１では、規格化されていない様々な情報を教師として用いるため、規格化の必要がないランダムフォレストの機械学習アルゴリズムを好適に用いることができる。また、ランダムフォレストの機械学習アルゴリズムは、学習モデルをハードウェア化しやすいため、生体由来粒子が分取対象であるか否かを迅速に判別することが重要な本実施の形態に係る生体粒子分析システム１に好適に用いることができる。 The bioparticle analysis system 1 according to this embodiment uses various non-standardized information as training data, and therefore can suitably use a random forest machine learning algorithm that does not require standardization. In addition, the random forest machine learning algorithm is easy to implement as a learning model in hardware, and therefore can be suitably used in the bioparticle analysis system 1 according to this embodiment, in which it is important to quickly determine whether or not a particle of biological origin is a target for separation.

　なお、学習部２１１は、分取対象の判別が十分可能な学習モデルが構築されたか否かを判断し、ユーザに通知してもよい。例えば、学習部２１１は、学習した生体由来粒子の情報の数、又は全体に対する割合が閾値を超えた場合に、分取対象の判別が十分可能な学習モデルが構築されたことをユーザに通知してもよい。 The learning unit 211 may determine whether a learning model capable of sufficiently identifying the separation target has been constructed, and notify the user. For example, when the number of pieces of learned information on biological particles, or the proportion of the total, exceeds a threshold, the learning unit 211 may notify the user that a learning model capable of sufficiently identifying the separation target has been constructed.

　また、学習部２１１は、学習モデルの正答率が閾値を超えた場合に、分取対象の判別が十分可能な学習モデルが構築されたことをユーザに通知してもよい。学習モデルの正答率は、例えば、Ｎ－ｆｏｌｄ－ｃｒｏｓｓ　ｖａｌｉｄａｔｉｏｎによって判断することが可能である。具体的には、教師に用いる情報の全体をＮ分割し、Ｎ－１個の分割部分に含まれる情報で学習を行って学習モデルを構築した後、残りの１個の分割部分に含まれる情報の判別を行うことで、構築した学習モデルの正答率を判断することができる。 In addition, when the accuracy rate of the learning model exceeds a threshold, the learning unit 211 may notify the user that a learning model capable of sufficiently discriminating between the separation targets has been constructed. The accuracy rate of the learning model can be determined, for example, by N-fold-cross validation. Specifically, the entire information to be used as a teacher is divided into N parts, learning is performed using the information contained in the N-1 divided parts to construct a learning model, and then the accuracy rate of the constructed learning model can be determined by discriminating between the information contained in the remaining divided part.

　学習モデル記憶部２１３は、学習部２１１が構築した学習モデルを記憶する。学習モデル記憶部２１３は、ＦＰＧＡ（Ｆｉｅｌｄ－Ｐｒｏｇｒａｍｍａｂｌｅ　Ｇａｔｅ　Ａｒｒａｙ）回路などを用いて、学習モデルをハードウェア化して記憶してもよい。これによれば、生体由来粒子が分取対象であるか否かの判別をより高速に行うことができる。 The learning model storage unit 213 stores the learning model constructed by the learning unit 211. The learning model storage unit 213 may store the learning model in the form of hardware using a field-programmable gate array (FPGA) circuit or the like. This makes it possible to more quickly determine whether or not a biological particle is a target for separation.

　判別部２１５は、学習モデル記憶部２１３に記憶された学習モデルに基づいて、分取装置１０にて測定された蛍光を発する生体由来粒子が分取対象であるか否かを判別する。生体由来粒子が分取対象であると判別される場合、判別部２１５は、分取装置１０に該生体由来粒子を分取するように指示を出す。 The discrimination unit 215 discriminates whether or not the fluorescent biogenic particles measured by the fractionation device 10 are to be separated, based on the learning model stored in the learning model storage unit 213. If it is determined that the biogenic particles are to be separated, the discrimination unit 215 instructs the fractionation device 10 to separate the biogenic particles.

　なお、学習モデル記憶部２１３、及び判別部２１５は、分取装置１０に設けられてもよい。 The learning model storage unit 213 and the discrimination unit 215 may be provided in the fraction collection device 10.

　また、分取装置１０が複数の生体由来粒子の集団を別々に分取することが可能である場合、判別部２１５は、生体由来粒子が分取対象であるか否かだけでなく、いずれの回収部に生体由来粒子を回収するかを分取装置１０に指示してもよい。このような場合、学習部２１１は、分取後にいずれの回収部に回収するのかをさらに特定した生体由来粒子の蛍光
スペクトルに関する情報を教師データとして機械学習を行う。これによれば、判別部２１５は、複数の生体由来粒子の集団を別々に分取するように分取装置１０に指示を出力することが可能である。 Furthermore, when the sorting device 10 is capable of separately sorting a plurality of groups of biogenic particles, the discrimination unit 215 may instruct the sorting device 10 not only whether or not the biogenic particles are to be sorted, but also which collection unit the biogenic particles should be collected in. In such a case, the learning unit 211 performs machine learning using as training data information on the fluorescence spectrum of the biogenic particles that further specifies which collection unit the biogenic particles should be collected in after sorting. In this way, the discrimination unit 215 can output an instruction to the sorting device 10 to separately sort a plurality of groups of biogenic particles.

　以上のように機械学習ソートでは、判別部２１５の判別に従って生体由来粒子の分取が行われる。機械学習ソートでは、最も確信度の高い判別結果を出力するため、その確信度が低い場合でも他の判別結果よりも確信度が高ければ、対象粒子の分取を実行してしまう可能性がある。そのため、より測定データの純度（正解に対する確信度）が必要な場合には好ましくない。 As described above, in machine learning sorting, the biological particles are separated according to the discrimination made by the discrimination unit 215. In machine learning sorting, the discrimination result with the highest degree of certainty is output, so even if the degree of certainty is low, if it is higher than the other discrimination results, there is a possibility that the target particles will be separated. For this reason, it is not preferable when greater purity of the measurement data (certainty in the correct answer) is required.

　実施の形態の情報に係る生体粒子分析システムは、機械学習モデルに細胞情報を入力し分取判断後、分取判断になった粒子をさらに閾値に基づいて分取判断するものである。以下、生体粒子分析システムの実施の形態について説明する。 The bioparticle analysis system according to the embodiment inputs cell information into a machine learning model, determines whether or not to separate the particles, and then determines whether or not to separate the particles based on a threshold value. The following describes an embodiment of the bioparticle analysis system.

　＜１．第１の実施の形態＞
　　＜１．１．確信度に基づく分取＞
　機械学習による分取は、過去の傾向（学習データ）に基づいて分取可否を決定しており、そこには必ず曖昧性を含む。 1. First embodiment
1.1. Fractional collection based on confidence
When using machine learning to separate samples, the decision as to whether or not to separate samples is based on past trends (learning data), which necessarily contains ambiguity.

　また、ディープラーニング（Ｄｅｅｐ　Ｌｅａｒｎｉｎｇ：深層学習）で出力層にＳｏｆｔｍａｘ関数を採用する場合、それぞれのＣｌａｓｓに分取する場合の確信度が合計して１００％となるように算出される。 In addition, when using the Softmax function in the output layer of deep learning, the confidence levels for each class are calculated to sum to 100%.

　閾値を設定せずに最も高い確率で判定されたクラス（Ｃｌａｓｓ）に分取してしまうと、クラス０＝２０％、クラス１＝４０％、クラス２＝３０％、クラス３＝１０％と判定されるようなイベント（Ｅｖｅｎｔ）があった場合でも最も確率の高いクラス１に分取してしまう。だが、純度（確信度）を高くしたいユーザの場合、このようなケースの分取は、クラス１の確率が４０％と低いために非分取対象とするべきである。この場合、分取効率は低下する。ここで、「クラス」とは、データのカテゴリやグループを示す。 If the class determined with the highest probability is sorted without setting a threshold, even if there is an event where the probability is determined to be class 0 = 20%, class 1 = 40%, class 2 = 30%, and class 3 = 10%, the data will be sorted into class 1, which has the highest probability. However, for users who want to increase purity (certainty), such cases should not be sorted because the probability of class 1 is low at 40%. In this case, the efficiency of sorting will decrease. Here, "class" refers to the category or group of data.

　そこで、実施の形態では、確信度が高いイベントのみ分取するために閾値を設ける。なお、この閾値は可変で設定でき、ユーザの意図に合わせて調整可能としても良い。 Therefore, in the embodiment, a threshold is set to separate out only events with a high degree of certainty. Note that this threshold can be set variably and can be adjusted according to the user's intentions.

　＜１．２．確信度の使い方＞
　第１の実施の形態におけるディープラーニングにおける確信度は、以下の動作において使用される。ここで、「確信度」は、ディープラーニングにおける推定結果が正しい確率である。 <1.2. How to use confidence level>
The confidence level in deep learning in the first embodiment is used in the following operations. Here, the "confidence level" is the probability that the estimation result in deep learning is correct.

　ステップ１：まず一部のサンプルを流して次元圧縮をする。 Step 1: First, run some samples to reduce the dimensions.

　ステップ２：次元圧縮結果から分取したい生体粒子の集団の範囲を指定（ゲーティング）する。 Step 2: Specify (gate) the range of the population of bioparticles you want to separate from the dimensionality reduction results.

　ステップ３：次元圧縮した一部のサンプルを学習用データと検証用データに分ける。 Step 3: Divide some of the dimensionally reduced samples into training data and validation data.

　ステップ４：学習後、検証用データを使って閾値を変化させ、純度や効率の変動を確認する。 Step 4: After training, vary the threshold using validation data to check for variations in purity and efficiency.

　なお、上記動作では、学習用と検証用のデータをまとめて次元圧縮をしてゲーティングをしているが、新規に追加したデータに対しても再現性が保たれる次元圧縮アルゴリズムであれば、学習用のデータだけで次元圧縮をしてゲーティングをした後に、その次元圧縮結果に検証用データを新規で追加することで正解ラベルを付けても良い。「ラベル」とは、個々のデータがどのクラスに属するかを示す。 In the above operation, the training and validation data are dimensionally compressed together and then gated, but if the dimensionality compression algorithm maintains reproducibility for newly added data, it is possible to perform dimensionality compression and gating on just the training data, and then add new validation data to the dimensionality compression result to give it a correct label. A "label" indicates which class each piece of data belongs to.

　図９は、第１の実施の形態に係る検証用データを示す図である。図９に示すように、検証データとして、細胞１、細胞２、細胞３、細胞４、・・・のイベントがあり、それぞれのイベントに対して、「正解」、「推定」及び「確信度」が対応付けられている。「正解」のデータは、後述するゲーティング処理において付され、「推定」及び「確信度」のデータは、検証用データを使用した推論処理において付される。ここで、「正解」は、実際にその細胞が含まれるべきクラスを示す。「推定」は、機械学習において推定されたクラスを示す。 FIG. 9 is a diagram showing the verification data according to the first embodiment. As shown in FIG. 9, the verification data includes events of cell 1, cell 2, cell 3, cell 4, ..., and each event is associated with a "correct answer," "estimate," and "confidence." The "correct answer" data is added in the gating process described below, and the "estimate" and "confidence" data are added in the inference process using the verification data. Here, the "correct answer" indicates the class that the cell should actually be included in. The "estimate" indicates the class estimated in machine learning.

　「細胞１」のイベントには、「正解」が”１”のクラス、「推定」が”２”のクラス、「確信度」が５５％であることが対応付けられている。「細胞２」のイベントには、「正解」が”３”のクラス、「推定」が”３”のクラス、「確信度」が８０％であることが対応付けられている。「細胞３」のイベントには、「正解」が”５”のクラス、「推定」が”５”のクラス、「確信度」が９８％であることが対応付けられている。「細胞４」のクラスには、「正解」が”２”のクラス、「推定」が”４”のクラス、「確信度」が４０％であることが対応付けられている。 The events of "Cell 1" are associated with a "correct answer" class of "1", a "estimate" class of "2", and a "confidence" of 55%. The events of "Cell 2" are associated with a "correct answer" class of "3", a "estimate" class of "3", and a "confidence" of 80%. The events of "Cell 3" are associated with a "correct answer" class of "5", a "estimate" class of "5", and a "confidence" of 98%. The classes of "Cell 4" are associated with a "correct answer" class of "2", a "estimate" class of "4", and a "confidence" of 40%.

　図９において、例えば、閾値＝６０％とすることで、「細胞１」及び「細胞４」のイベントを分取しないため、純度が高めることができる。しかし、閾値＝９０％と高く設定しすぎると正しい推定をしているイベントまで非分取対象にしてしまう確率が高まるので、イベントを取得する効率が低下する。 In Figure 9, for example, by setting the threshold to 60%, events of "cell 1" and "cell 4" are not collected, and therefore purity can be increased. However, if the threshold is set too high, such as 90%, there is a high probability that even events that are correctly estimated will not be collected, and the efficiency of event acquisition will decrease.

　図１０は、第１の実施の形態に係る次元圧縮された測定データの純度と効率（収率）とを示す画面を示す図である。図１０に示すように、画面では、閾値による純度と効率が表示され、また、どの測定データが分取されているが示されている。図１０では、Ｘ軸は次元圧縮した１次元目の測定データの値を示し、Ｙ軸は、次元圧縮した２次元目の測定データの値を示している。図１０では、２次元の測定データの例を示したが、測定データは、３次元で示されてもよい。 FIG. 10 is a diagram showing a screen showing the purity and efficiency (yield) of dimensionally compressed measurement data according to the first embodiment. As shown in FIG. 10, the screen displays purity and efficiency based on a threshold value, and also shows which measurement data has been separated. In FIG. 10, the X-axis shows the value of the dimensionally compressed first-dimensional measurement data, and the Y-axis shows the value of the dimensionally compressed second-dimensional measurement data. Although FIG. 10 shows an example of two-dimensional measurement data, the measurement data may be displayed in three dimensions.

　ここで、「純度」は、測定データに正しいラベル付けがされる百分率であり、「効率」は、ラベル付けされた測定データに含まれる正しい測定データの百分率である。 Here, "purity" is the percentage of measurement data that is correctly labeled, and "efficiency" is the percentage of correct measurement data contained in the labeled measurement data.

　図１０において、黒星、×、黒四角、黒三角、黒丸は、次元圧縮された測定データを示し、四角の実線で囲まれた部分はラベルが付与される領域を示している。ラベル１０１では黒星、ラベル１０２では黒四角、ラベル１０３では黒三角、ラベル１０４では黒丸がラベル付けされるのが正しいものとする。 In Figure 10, black stars, crosses, black squares, black triangles, and black circles indicate dimensionally compressed measurement data, and the solid line-enclosed parts of the squares indicate the areas to which labels are assigned. It is assumed that the correct labels are black stars for label 101, black squares for label 102, black triangles for label 103, and black circles for label 104.

　例えば、閾値が０％の場合（図１０の左側の図）、ラベル１０１の範囲が分取される場合では純度１００％、効率１００％であり、ラベル１０２の範囲が分取される場合では純度７０％、効率７０％であり、ラベル１０３の範囲が分取される場合では純度８０％、効率１００％、ラベル１０４の範囲が分取される場合では純度１００％、効率７０％を示している。 For example, when the threshold is 0% (left diagram in Figure 10), when the range of label 101 is separated, the purity is 100% and the efficiency is 100%, when the range of label 102 is separated, the purity is 70% and the efficiency is 70%, when the range of label 103 is separated, the purity is 80% and the efficiency is 100%, and when the range of label 104 is separated, the purity is 100% and the efficiency is 70%.

　閾値が７０％の場合（図１０の中央の図）、ラベル１０１の範囲が分取される場合では純度１００％、効率１００％であり、ラベル１０２の範囲が分取される場合では純度７５％、効率６０％であり、ラベル１０３の範囲が分取される場合では純度８８．９％、効率１００％、ラベル１０４の範囲が分取される場合では純度１００％、効率６０％を示している。 When the threshold is 70% (center diagram in Figure 10), when the range of label 101 is separated, the purity is 100% and the efficiency is 100%, when the range of label 102 is separated, the purity is 75% and the efficiency is 60%, when the range of label 103 is separated, the purity is 88.9% and the efficiency is 100%, and when the range of label 104 is separated, the purity is 100% and the efficiency is 60%.

　閾値が９０％の場合（図１０の右側の図）、ラベル１０１の範囲が分取される場合では純度９８％、効率８４％であり、ラベル１０２の範囲が分取される場合では純度８５．７％、効率６０％であり、ラベル１０３の範囲が分取される場合では純度１００％、効率８７．５％、ラベル１０４の範囲が分取される場合では純度１００％、効率６０％を示している。 When the threshold is 90% (the diagram on the right side of Figure 10), when the range of label 101 is separated, the purity is 98% and the efficiency is 84%, when the range of label 102 is separated, the purity is 85.7% and the efficiency is 60%, when the range of label 103 is separated, the purity is 100% and the efficiency is 87.5%, and when the range of label 104 is separated, the purity is 100% and the efficiency is 60%.

　図１０のような画面を表示することで、ユーザは、閾値に応じた定量的な純度と効率の変化、どの測定データのプロットが分取判定されているかの定性的な変化を確認しながら閾値を設定することができる。 By displaying a screen like that shown in Figure 10, the user can set the threshold while checking the quantitative changes in purity and efficiency according to the threshold, and the qualitative changes in which plots of measurement data are judged to be fractions.

　図１１は、第１の実施の形態に係る次元圧縮された測定データのクラスと確信度とを示す図である。 FIG. 11 shows the classes and confidence levels of dimensionally compressed measurement data in the first embodiment.

　例えば、図１１においてユーザが測定データをクリックし、又は、ゲート等が複数のイベントを選択する。１つの測定データだけが選択された場合、選択された１つの測定データの各クラスの確信度が表示される。ユーザは選択された測定データの各クラスの確信度を確認することができる。 For example, in FIG. 11, the user clicks on a measurement data item, or a gate or the like selects multiple events. If only one measurement data item is selected, the confidence level of each class of the selected measurement data item is displayed. The user can check the confidence level of each class of the selected measurement data item.

　複数の測定データが選択された場合、選択された複数の測定データの平均や中央値などを使用した各クラスの確信度が表示される。ユーザは、選択された複数の測定データの各クラスの確信度を確認することができる。 When multiple measurement data are selected, the confidence level for each class is displayed using the average or median of the selected measurement data. The user can check the confidence level for each class of the selected measurement data.

　図１１の左側には、選択された複数の測定データの各クラスと確信度とを示す表１０５が示されている。図１１の右側には、選択された１つの測定データの各クラスと確信度とを示す表１０６が示されている。 On the left side of FIG. 11, a table 105 is shown showing the classes and confidence levels of multiple selected measurement data. On the right side of FIG. 11, a table 106 is shown showing the classes and confidence levels of one selected measurement data.

　　＜１．３．閾値の設定＞
　　　＜１．３．１．閾値の設定方法１＞
　閾値の設定は、モード毎にあらかじ定められた閾値が設定されていても良い（閾値の設定方法１）。例えば、Ｐｕｒｉｔｙモード＝９５％、Ｎｏｒｍａｌモード＝７５％、Ｙｉｅｌｄモード＝０％などのように閾値が設定される。方法１では、ユーザがモードを選択することにより、ユーザの選択を受けて閾値が設定される。 1.3. Setting the Threshold
<1.3.1. Threshold setting method 1>
The thresholds may be set in advance for each mode (threshold setting method 1). For example, the thresholds are set as follows: Purity mode = 95%, Normal mode = 75%, Yield mode = 0%, etc. In method 1, the user selects a mode, and the thresholds are set in response to the user's selection.

　図１２は、第１の実施の形態に係るモードと、純度及び効率との関係の画面を示す図である。図１２では、Ｙｉｅｌｄモード、Ｎｏｒｍａｌモード、Ｐｕｒｉｔｙモードの純度及び効率、また、どの測定データが分取されているが示されている。ユーザは、図１２に示す画面を参照して、どのモードを選択するかを決定しても良い。 FIG. 12 is a diagram showing a screen showing the relationship between the mode and purity and efficiency according to the first embodiment. FIG. 12 shows the purity and efficiency of the Yield mode, Normal mode, and Purity mode, as well as which measurement data is being collected. The user may refer to the screen shown in FIG. 12 to determine which mode to select.

　このような閾値の設定方法では、モードを選択し、選択されたモードに応じて閾値が設定されるアルゴリズムを提供する。従って、閾値の設定が難しいというユーザが、閾値の設定を使い易くなる。 In this type of threshold setting method, an algorithm is provided in which a mode is selected and the threshold is set according to the selected mode. This makes it easier for users who find it difficult to set thresholds to use the threshold setting.

　　　＜１．３．２．閾値の設定方法２＞
　閾値の設定は、ユーザがＧＵＩ（Ｇｒａｐｈｉｃａｌ　Ｕｓｅｒ　Ｉｎｔｅｒｆａｃｅ）上で任意の閾値の数値を入力しても良い（閾値の設定方法２）。閾値の入力は、数値の直接入力、スライドバーを使用した入力等でも良い。 <1.3.2. Threshold setting method 2>
The threshold value may be set by the user inputting an arbitrary threshold value on a GUI (Graphical User Interface) (threshold setting method 2). The threshold value may be input by direct input of a value, input using a slide bar, or the like.

　　　＜１．３．３．閾値の設定方法３＞
　閾値の設定方法１では、過去のデータをもとにモード毎に閾値があらかじめ決められていた。閾値の設定方法３では、測定データ毎に適切な閾値をモード毎に自動で算出する。 <1.3.3. Threshold setting method 3>
In threshold setting method 1, a threshold is determined in advance for each mode based on past data. In threshold setting method 3, an appropriate threshold is automatically calculated for each mode for each measurement data.

　図１３は、第１の実施の形態に係る測定データ毎に閾値を設定する場合を説明するための図である。図１３において、太線は検証用の測定データの純度、太線の点線は検証用の測定データの純度の３区間平均移動線、細線は検証用の測定データの効率、細線の点線は検証用の測定データの効率の３区間平均移動線を示す。 FIG. 13 is a diagram for explaining a case where a threshold is set for each measurement data in the first embodiment. In FIG. 13, the thick line indicates the purity of the measurement data for verification, the thick dotted line indicates the three-section average moving line of the purity of the measurement data for verification, the thin line indicates the efficiency of the measurement data for verification, and the thin dotted line indicates the three-section average moving line of the efficiency of the measurement data for verification.

　図１３の純度に着目して、傾きが緩やかになった確信度のところでＮｏｒｍａｌモードとし、閾値を６２～６３％に設定し、傾きが緩やかなところからまた急になった純度のところでＰｕｒｉｔｙモードと設定し、閾値を８７～８８％に設定しても良い。傾きが穏やか又は傾きが急であるかの判断は、例えば、確信度の閾値の区間において、その前後における閾値の区間の傾きの差が所定の差以下である場合には、傾きが穏やかであると判定し、その前後における閾値の区間の傾きの差が所定の差以上である場合には、傾きが急であると判断してもよい。なお、確信度の閾値の９９％付近の再度傾きが緩やかになったところをＰｕｒｉｔｙモードと設定しても良い。つまり、純度などの傾きに対して何らかの特徴を有する箇所に閾値を設定できる。 Focusing on the purity in FIG. 13, the Normal mode may be set at the certainty level where the slope becomes gentle, with the threshold set to 62-63%, and the Purity mode may be set at the purity level where the slope changes from gentle to steep, with the threshold set to 87-88%. Whether the slope is gentle or steep may be determined, for example, by determining that the slope is gentle when the difference in slope between the threshold levels before and after the certainty level threshold is less than a predetermined difference, and determining that the slope is steep when the difference in slope between the threshold levels before and after the certainty level threshold is greater than a predetermined difference. Note that the Purity mode may be set at the point where the slope becomes gentle again, near 99% of the certainty level threshold. In other words, the threshold can be set at a point that has some characteristic feature for the slope of the purity, etc.

　なお、「純度」ではなく、効率の傾きや、純度と効率とを組みあせたものの傾き、純度の移動平均線の傾き、効率の移動平均線の傾きなどに基づいて、モードと閾値を設定しても良い。また、閾値は、傾きを使わない閾値の算出方法でも良い。 In addition, the mode and threshold may be set based on the slope of efficiency, the slope of a combination of purity and efficiency, the slope of the moving average line of purity, the slope of the moving average line of efficiency, etc., instead of "purity". Also, the threshold may be calculated using a method that does not use the slope.

　閾値を自動で決定する方法としてＲＯＣ（Ｒｅｃｅｉｖｅｒ　Ｏｐｅｒａｔｏｒａｔｉｎｇ　Ｃｈａｒａｓｔｅｒｉｓｔｉｃ）曲線を用いても良い。図１４は、第１の実施の形態に係る検証用の測定データのＲＯＣ曲線を使用して閾値を設定する場合を説明するための図である。 As a method for automatically determining the threshold, a receiver operating characteristic (ROC) curve may be used. Figure 14 is a diagram for explaining a case where a threshold is set using an ROC curve of the measurement data for verification according to the first embodiment.

　図１４において、真陽性率（ＴＰＲ：Ｔｒｕｅ　Ｐｏｓｉｔｉｖｅ　Ｒａｔｅ）とは、全てのポジティブのうち、実際にポジティブだったものを正しくポジティブと判定できた割合をいう。偽陽性率（ＦＰＲ：Ｆａｌｓｅ　Ｐｏｓｉｔｉｖｅ　Ｒａｔｅ）とは、全てのネガティブのうち、実際にはネガティブだったが間違えてポジティブだったと判定した割合をいう。 In Figure 14, the true positive rate (TPR) is the percentage of all positives that were correctly determined to be positive when they were actually positive. The false positive rate (FPR) is the percentage of all negatives that were actually negative but were mistakenly determined to be positive.

　純度と効率のバランスが取れた閾値はＲＯＣ曲線を引いた際に最も左上（０，１）に近いところに位置する閾値であるので、この値を閾値として採用しても良い。 The threshold that balances purity and efficiency is the threshold that is located closest to the upper left (0, 1) when drawing the ROC curve, so this value can be used as the threshold.

　最も（０，１）に近い閾値を算出するには、ユークリッド距離などを使って探索しても良いし、それ以外の方法で求めても良い。 To calculate the threshold closest to (0, 1), you can search using the Euclidean distance, or you can use other methods.

　　＜１．４．類似度や確信度を用いた可視化＞
　次元圧縮では多次元の情報を低次元の情報に圧縮するため、多次元空間での関係性は、低次元空間で完全に表現することは不可能である。 1.4. Visualization using similarity and certainty
Dimensionality reduction compresses multidimensional information into lower dimensional information, so relationships in a multidimensional space cannot be fully represented in a lower dimensional space.

　そのため、ＣＤ４＋Ｔ　ｃｅｌｌとＣＤ８＋Ｔ　ｃｅｌｌのような類似している細胞種であっても離れた距離に分布してしまうことがある。従って、このような類似している細胞腫の解析の効率が落ちてしまう。 As a result, even similar cell types, such as CD4+ T cells and CD8+ T cells, can be distributed at large distances. This reduces the efficiency of analyzing such similar cell types.

　本開示の解析方法は、次元圧縮上においてゲーティング等で細胞群を選択し、選択した細胞群と測定対象となる細胞との類似度や確信度を各測定データで計算し、計算された類似度や確信度に基づいて色を変えて表示する。 The analysis method disclosed herein selects a cell group using gating or other methods in dimensionality reduction, calculates the similarity and confidence between the selected cell group and the cells being measured for each measurement data, and displays the calculated similarity and confidence in different colors.

　可視化の方法の一例として、測定データは類似度に基づいて測定データの濃淡を変えて可視化してもよいし、色を変えて可視化してもよい。 As an example of a visualization method, the measurement data may be visualized by changing the shade of the measurement data based on the similarity, or by changing the color.

　類似度の計算はユークリッド距離やマンハッタン距離、チェビシェフ距離などの距離ベースの計算を用いても良いし、コサイン類似度やジャッカード係数、ダイス係数などの類似度ベースの計算を用いても良いし、それ以外でも良い。 The similarity may be calculated using a distance-based calculation such as Euclidean distance, Manhattan distance, or Chebyshev distance, or a similarity-based calculation such as cosine similarity, Jaccard coefficient, or Dice coefficient, or it may be calculated using other methods.

　本可視化は、解析目的で行っても良いし、分取後の測定データに対して行っても良い。 This visualization can be done for analytical purposes, or it can be done on measurement data after fractionation.

　図１５は、第１の実施の形態に係る次元圧縮された測定データの表示例を示す図である。図１５において、次元圧縮された測定データは、選択された細胞群の測定データ１１１に対する類似度に従って示されている。図１５では、測定データは、選択された細胞群の測定データ１１１に対して類似度が高いほど濃い色で示されている。 FIG. 15 is a diagram showing an example of the display of dimensionally compressed measurement data according to the first embodiment. In FIG. 15, the dimensionally compressed measurement data is displayed according to the similarity to the measurement data 111 of the selected cell group. In FIG. 15, the measurement data is displayed in a darker color the higher the similarity to the measurement data 111 of the selected cell group.

　また、類似度や確信度を用いた可視化は次元圧縮上のプロットだけではなく、図１６に示すように蛍光補正前のデータや、図１７に示すように蛍光補正後のデータに対して行ってもよい。 In addition, visualization using similarity and confidence can be performed not only on plots on dimensionality reduction, but also on data before fluorescence correction as shown in Figure 16, or on data after fluorescence correction as shown in Figure 17.

　図１６は、第１の実施の形態に係る測定対象となる細胞の蛍光補正前の測定データを類似度に従って色を変えて表示する表示例を示す図である。図１６に示すように、測定データは、選択された細胞群の測定データ１１１に対する類似度に従って、色を変えて表示されている。 FIG. 16 shows an example of a display in which the measurement data before fluorescence correction of the cells to be measured in the first embodiment is displayed in a different color according to the similarity. As shown in FIG. 16, the measurement data is displayed in a different color according to the similarity to the measurement data 111 of the selected cell group.

　測定対象となる細胞の蛍光補正前の測定データを表示する場合、各受光系のｃｈの値をｃｈ毎に表示してもよい。また、横軸を各蛍光色素の蛍光強度、縦軸を各受光系のｃｈの値にしてもよい。図１７は、第１の実施の形態に係る測定対象となる細胞の蛍光補正後の測定データを類似度に従って色を変えて表示する表示例を示す図である。ここで、図１７のＸ軸及びＹ軸は、測定データに含まれる各蛍光色素（Ｃｏｌｏｒ）の蛍光補正後の蛍光強度を示している。 When displaying the measurement data of the cells to be measured before fluorescence correction, the channel values of each light receiving system may be displayed for each channel. The horizontal axis may represent the fluorescence intensity of each fluorescent dye, and the vertical axis may represent the channel values of each light receiving system. Figure 17 shows an example of a display in which measurement data after fluorescence correction of the cells to be measured according to the first embodiment is displayed in different colors according to similarity. Here, the X and Y axes in Figure 17 represent the fluorescence intensity after fluorescence correction of each fluorescent dye (Color) included in the measurement data.

　＜１．５．情報処理装置３００の機能ブロック図＞
　図１８は、第１の実施の形態に係る情報処理装置３００のディープラーニングにおける測定データの分取を行う機能ブロック図である。 <1.5. Functional block diagram of information processing device 300>
FIG. 18 is a functional block diagram illustrating the sorting of measurement data in deep learning of the information processing device 300 according to the first embodiment.

　図１８に示すように、情報処理装置３００には、測定装置３１１が接続されている。測定装置３１１は、サンプル（例えば、細胞など）の測定を行い、測定した測定データに必要なデータ（例えば、細胞の蛍光の色、蛍光の強さ等）を付加し、情報処理装置３００に出力する。測定では、少なくとも測定データのイベント（例えば、細胞１など）の測定を行う。 As shown in FIG. 18, a measuring device 311 is connected to the information processing device 300. The measuring device 311 measures a sample (e.g., a cell, etc.), adds necessary data (e.g., the color of the cell's fluorescence, the intensity of the fluorescence, etc.) to the measured measurement data, and outputs the data to the information processing device 300. In the measurement, at least an event of the measurement data (e.g., cell 1, etc.) is measured.

　情報処理装置３００は、取得部３１２、前処理部３１３、次元圧縮部３１４、ゲート部３１５、分割部３１６、学習部３１７、推定部３１８、閾値設定部３１９、表示部３２０、分取部３２１を有する。 The information processing device 300 has an acquisition unit 312, a preprocessing unit 313, a dimensional compression unit 314, a gate unit 315, a division unit 316, a learning unit 317, an estimation unit 318, a threshold setting unit 319, a display unit 320, and a fractionation unit 321.

　取得部３１２は、情報処理装置３００の外部の測定装置３１１から複数の測定データを取得する。前処理部３１３は、取得部３１２により測定された測定データに対してダウンサンプリングや目的の集団（ｐｏｐｕｌａｔｉｏｎ）の絞り込みなどを行う。 The acquisition unit 312 acquires multiple pieces of measurement data from a measurement device 311 external to the information processing device 300. The preprocessing unit 313 performs downsampling and narrowing down the target population on the measurement data measured by the acquisition unit 312.

　次元圧縮部３１４は、前処理部３１３により前処理が行われた測定データの次元圧縮を行う。「次元圧縮」とは、多次元データにおいて、データに共通する特徴を見つけ、多次元空間でのデータ分布の関係をなるべく保持しながら低次元で表現することをいう。 The dimensionality reduction unit 314 performs dimensionality reduction on the measurement data that has been preprocessed by the preprocessing unit 313. "Dimensionality reduction" refers to finding common features in multidimensional data and expressing it in low dimensions while preserving as much as possible the relationships of data distribution in multidimensional space.

　次元圧縮部３１４は、測定データの次元圧縮後に、分取対象範囲を決定する。次元圧縮部３１４により次元圧縮された測定データは、検証用の測定データ及び学習用の測定データを含む。 The dimensionality compression unit 314 determines the range to be separated after compressing the dimensions of the measurement data. The measurement data compressed by the dimensionality compression unit 314 includes measurement data for verification and measurement data for learning.

　測定データの説明変数はスペクトルなど蛍光補正前の生の値を使っても良いし、蛍光補正後のデータであっても良い。また、蛍光補正をする際に逆行列計算を行うが、その際にガウスジョルダン法を用いて解いても良い。また、クラスタリングの前処理としてバッチ効果を抑える目的で正規化などのアルゴリズムを用いても良い。 The explanatory variables for the measurement data may be raw values before fluorescence correction, such as spectra, or may be data after fluorescence correction. In addition, when performing fluorescence correction, an inverse matrix calculation is performed, and the Gauss-Jordan method may be used to solve the problem. Furthermore, algorithms such as normalization may be used as preprocessing for clustering in order to suppress batch effects.

　ゲート部３１５は、次元圧縮部３１４により次元圧縮された測定データ（検証用の測定データ及び学習用の測定データを含む）をゲートする。また、ゲート部３１５は、次元圧縮部３１４により次元圧縮された測定データの学習用の測定データにラベルを付加する。分割部３１６は、ゲート部３１５によりゲートされた次元圧縮された複数の測定データを学習用の複数の測定データと、検証用の複数の測定データとに分割する。 The gate unit 315 gates the measurement data (including the measurement data for verification and the measurement data for learning) that has been dimensionally compressed by the dimensional compression unit 314. The gate unit 315 also adds a label to the measurement data for learning that has been dimensionally compressed by the dimensional compression unit 314. The division unit 316 divides the multiple pieces of dimensionally compressed measurement data gated by the gate unit 315 into multiple pieces of measurement data for learning and multiple pieces of measurement data for verification.

　学習部３１７は、分割部３１６により分割された学習用の測定データ（蛍光補正前の測定データ又は蛍光補正後の測定データ）と、ゲート部３１５で学習用の測定データに付加されたラベルとを用いて機械学習を実施し学習モデルを構築する。学習モデルは、生体由来粒子が分取対象であるかどうかを判別するための測定データの推定及び確信度を推定する。 The learning unit 317 performs machine learning using the learning measurement data (measurement data before or after fluorescence correction) split by the splitting unit 316 and the labels added to the learning measurement data by the gate unit 315 to construct a learning model. The learning model estimates the measurement data and estimates the confidence level for determining whether or not the biological particles are to be separated.

　推定部３１８は、学習部３１７によって作成された学習モデルに複数の測定データの少なくとも一部（検証用の測定データ）を入力し、分取対象であるかどうかを推論する。 The estimation unit 318 inputs at least a portion of the multiple measurement data (measurement data for verification) into the learning model created by the learning unit 317, and infers whether or not the data is a target for separation.

　推定部３１８は、取得部３１２により取得された複数の測定データのうち、検証用の複数の測定データについて検証用の複数の測定データの正解に対する推定及び推定に対する確信度を推定する。具体的には、推定部３１８は、学習部３１７により生成された学習モデルにより検証用の複数の測定データの推定及び確信度を推定する。 The estimation unit 318 estimates the accuracy of the multiple measurement data for verification among the multiple measurement data acquired by the acquisition unit 312, and estimates the confidence level of the estimate. Specifically, the estimation unit 318 estimates the accuracy of the multiple measurement data for verification using the learning model generated by the learning unit 317.

　推定部３１８は、推定部３１８による推論に使用された複数の測定データとデータ圧縮処理により得られた情報に基づいて推定結果の確信度を算出する確信度算出部を有する。 The estimation unit 318 has a confidence calculation unit that calculates the confidence of the estimation result based on the multiple measurement data used in the inference by the estimation unit 318 and the information obtained by the data compression process.

　閾値設定部３１９は、推定部３１８により推定された確信度に対する測定データに対して、取得部３１２により取得された複数の測定データを分取するための閾値を設定する。 The threshold setting unit 319 sets a threshold for dividing the multiple measurement data acquired by the acquisition unit 312 into measurement data for the confidence level estimated by the estimation unit 318.

　表示部３２０は、検証用の測定データ、閾値、分類（クラス）、閾値、モード、検証用の測定データの純度、効率などを画面に表示する。表示部３２０は、推定部３１８による推定の結果を表示可能である。 The display unit 320 displays the measurement data for verification, thresholds, classification (class), thresholds, mode, purity of the measurement data for verification, efficiency, etc. on the screen. The display unit 320 can display the results of the estimation by the estimation unit 318.

　分取部３２１は、閾値設定部３１９により設定された閾値に基づいて、取得部３１２により取得された複数の測定データのうち、分取の対象とする測定データを分取する。具体的には、分取部３２１は、推定部３１８により推定及び確信度が推定された残りの測定データ及び検証用の測定データをクラスに分類し、分類されたクラスに含まれる測定データを設定された閾値を利用して分取する。 The sorting unit 321 sorts out the measurement data to be sorted out from the multiple measurement data acquired by the acquisition unit 312 based on the threshold value set by the threshold setting unit 319. Specifically, the sorting unit 321 classifies the remaining measurement data and the verification measurement data whose estimation and confidence level have been estimated by the estimation unit 318 into classes, and sorts out the measurement data included in the classified classes using the set threshold value.

　残りの測定データは、学習用の測定データのサンプル及び検証用の測定データのサンプル以外のサンプルの測定用の測定データである。この測定用の測定データのサンプルは、情報処理装置３００から測定装置３１１へ指示が行われた後に、測定装置３１１に流される。そして、測定装置３１１は、流されたサンプルを分取し、分取したサンプルの測定データを情報処理装置３００の取得部３１２に出力する。情報処理装置３００から測定装置３１１への指示は、例えば、閾値設定部３１９により閾値の設定が行われた後に行われる。 The remaining measurement data is measurement data for measuring samples other than the learning measurement data samples and the verification measurement data samples. This measurement data sample for measurement is sent to the measurement device 311 after an instruction is given from the information processing device 300 to the measurement device 311. The measurement device 311 then collects an aliquot of the sample and outputs the measurement data of the collected sample to the acquisition unit 312 of the information processing device 300. The instruction from the information processing device 300 to the measurement device 311 is given, for example, after the threshold setting unit 319 has set a threshold.

　　＜１．６．動作説明＞
　図１９は、第１の実施の形態に係る情報処理装置３００のディープラーニングにおける測定データの分取を説明するためのフローチャートである。 <1.6. Operation Description>
FIG. 19 is a flowchart for explaining the sorting of measurement data in deep learning of the information processing device 300 according to the first embodiment.

　まず、測定装置３１１に複数のサンプルの一部が流されて、一部の複数のサンプルが測定される（ステップＳ１）。次に、測定された一部の複数のサンプルの測定データのダウンサンプリングや目的の集団の絞り込みなどの前処理が行われる（ステップＳ２）。 First, a portion of the multiple samples is passed through the measuring device 311 and the portion of the multiple samples is measured (step S1). Next, pre-processing such as downsampling of the measurement data of the portion of the multiple samples and narrowing down of the target group is performed (step S2).

　次に、前処理が行われた一部の複数の測定データの次元圧縮が行われ（ステップＳ３）、次元圧縮された一部の複数の測定データのゲーティングが行われる（ステップＳ４）。ここで次元圧縮対象のデータや学習時の説明変数はスペクトルなど蛍光補正前の生の値を使っても良いし、蛍光補正後のデータであっても良い。また、蛍光補正をする際に逆行列計算を行うが、その際にガウスジョルダン法を用いて解いても良い。また、次元圧縮の前処理としてバッチ効果を抑える目的で正規化などのアルゴリズムを用いても良い。 Next, the preprocessed portion of the multiple measurement data is dimensionally compressed (step S3), and the dimensionally compressed portion of the multiple measurement data is gated (step S4). Here, the data to be dimensionally compressed and the explanatory variables during learning may be raw values before fluorescence correction, such as spectra, or may be data after fluorescence correction. In addition, an inverse matrix calculation is performed when performing fluorescence correction, and the Gauss-Jordan method may be used to solve this. Furthermore, an algorithm such as normalization may be used as a preprocessing step for dimensionality compression in order to suppress batch effects.

　次に、ゲート部３１５によりゲートされた次元圧縮された一部の複数の測定データを学習用の複数の測定データと、検証用の複数の測定データとに分割する（ステップＳ５）。 Next, the multiple measurement data that have been gated by the gate unit 315 and have been dimension-compressed are divided into multiple measurement data for learning and multiple measurement data for validation (step S5).

　次に、分割された学習用の複数の測定データを使用して学習を行い、学習モデルを生成する（ステップＳ６）。そして、生成された学習モデルを使用して、検証用の複数の測定データについて検証用の複数の測定データの正解に対する推定及び推定に対する確信度を推定する（ステップＳ７）。 Next, the divided multiple measurement data for learning are used to perform learning and generate a learning model (step S6). Then, the generated learning model is used to estimate the correct answer for the multiple measurement data for validation and the confidence level for the estimation for the multiple measurement data for validation (step S7).

　そして、推定された確信度に対する閾値が設定される（ステップＳ８）。閾値の設定は、ユーザからの指示により設定されるものや、自動で設定されるものでも良い。次に、ユーザは、表示部３２０に表示された純度、効率の値及び測定データのプロットの様子などを確認し（ステップＳ９）、閾値の設定が妥当でなければ（ステップＳ９のＮＧ）、ステップＳ８の処理に戻り、再度閾値の設定が行われる。 Then, a threshold value for the estimated confidence level is set (step S8). The threshold value may be set by user instruction or automatically. Next, the user checks the purity and efficiency values and the plot of the measurement data displayed on the display unit 320 (step S9), and if the threshold value setting is not appropriate (NG in step S9), the process returns to step S8 and the threshold value is set again.

　一方、閾値の設定が妥当の場合（ステップＳ９のＯＫ）、残りのサンプルが流され（ステップＳ１０）、残りのサンプルについて測定された残りの測定データについて、測定データの分取が行われ（ステップＳ１１）、分取された測定データのクラスの測定データの確信度について、設定された閾値により分取の対象とする測定データの分取の判断が行われる（ステップＳ１２）。 On the other hand, if the threshold setting is appropriate (OK in step S9), the remaining samples are flushed (step S10), the remaining measurement data measured on the remaining samples is fractionated (step S11), and a decision is made on fractionation of the measurement data to be fractionated based on the confidence level of the measurement data for the fractionated measurement data class (step S12).

　＜１．７．変形例＞
　第１の実施の形態では、情報処理装置３００が残りの測定データについて分取を行う場合について説明したが、残りの測定データについての分取は、処理に時間を要するため、測定装置３１１側で行っても良い。 1.7. Modifications
In the first embodiment, the case where the information processing device 300 collects the remaining measurement data has been described. However, since the collection of the remaining measurement data takes time, the collection of the remaining measurement data may be performed on the measuring device 311 side.

　図２０は、第１の実施の形態の変形例に係る生体粒子分析システム１の構成例を示すブロック図である。図４と同一部分には同一符号を付して説明する。 FIG. 20 is a block diagram showing a configuration example of a bioparticle analysis system 1 according to a modified example of the first embodiment. The same parts as those in FIG. 4 are described with the same reference numerals.

　変形例に係る生体粒子分析システム１は、図４で示す情報処理装置２０の機能がネットワークを介して接続された分取装置１０に分割されて設けられる例である。 The biological particle analysis system 1 according to the modified example is an example in which the functions of the information processing device 20 shown in FIG. 4 are divided and provided in a fractionation device 10 connected via a network.

　具体的には、図２０に示すように、変形例に係る生体粒子分析システムは、分取装置１０が解析部２０３、リファレンススペクトル記憶部２０５、データ圧縮処理部２０７、学習部２１１を有する。分取装置１０は、サンプルＳから測定データを取得し、かつ情報処理装置２０の判別に基づいて分取対象の粒子を分取する。情報処理装置２０は、取得部２０１、学習モデル記憶部２１３、判別部２１５を有する。 Specifically, as shown in FIG. 20, in the modified biological particle analysis system, the fractionation device 10 has an analysis unit 203, a reference spectrum storage unit 205, a data compression processing unit 207, and a learning unit 211. The fractionation device 10 acquires measurement data from the sample S, and separates particles to be separated based on the discrimination of the information processing device 20. The information processing device 20 has an acquisition unit 201, a learning model storage unit 213, and a discrimination unit 215.

　なお、情報処理装置２０及び分取装置１０は、インターネット、電話回線網若しくは衛星通信網などの公衆回線網、Ｅｔｈｅｒｎｅｔ（登録商標）を含む各種のＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）、又はＷＡＮ（Ｗｉｄｅ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）等のネットワークにて互いに通信可能に接続されてもよい。 The information processing device 20 and the fractionation device 10 may be connected to each other so as to be able to communicate with each other via a network such as the Internet, a public line network such as a telephone network or a satellite communication network, various LANs (Local Area Networks) including Ethernet (registered trademark), or a WAN (Wide Area Network).

　変形例に係る生体粒子分析システムでは、計算負荷が大きい機能（例えば、解析部２０３、データ圧縮処理部２０７、及び学習部２１１）を分取装置１０に担当させることができる。一方、迅速な判別のためにネットワーク等による遅延を避けたいこと、及び計算負荷が大きくないことから、判別部２１５及び学習モデル記憶部２１３の機能は、分取装置１０と直接接続される情報処理装置２０に担当させてもよい。 In the bioparticle analysis system according to the modified example, functions with a large computational load (e.g., the analysis unit 203, the data compression processing unit 207, and the learning unit 211) can be handled by the fraction collection device 10. On the other hand, since delays due to networks, etc., must be avoided for rapid discrimination, and the computational load is not large, the functions of the discrimination unit 215 and the learning model storage unit 213 may be handled by the information processing device 20 that is directly connected to the fraction collection device 10.

　図２１は、第１の実施の形態の変形例に係る情報処理システムの機能ブロック図である。なお、図１８と同一部分には同一符号を付して説明する。図２１に示すように、情報処理装置３００に設けられていた分取部３２１が測定装置３１１に設けられても良い。 FIG. 21 is a functional block diagram of an information processing system according to a modified example of the first embodiment. Note that the same parts as those in FIG. 18 are denoted by the same reference numerals. As shown in FIG. 21, the fractionation unit 321 provided in the information processing device 300 may be provided in the measurement device 311.

　なお、前処理部３１３や閾値設定部３１９が測定装置３１１に設けられていてもよい。 The preprocessing unit 313 and the threshold setting unit 319 may be provided in the measurement device 311.

　図２１に示すように、閾値設定部３１９により設定された閾値及び推定部３１８により検証と学習に使用されなかった残りの測定データが情報処理装置３００から測定装置３１１に出力される。 As shown in FIG. 21, the threshold set by the threshold setting unit 319 and the remaining measurement data not used for verification and learning by the estimation unit 318 are output from the information processing device 300 to the measurement device 311.

　測定装置３１１の分取部３２１は、情報処理装置３００から出力された閾値及び推定された残りの測定データを受信し、受信した閾値を利用して残りの測定データを分取する。 The fractionation unit 321 of the measurement device 311 receives the threshold and the estimated remaining measurement data output from the information processing device 300, and fractionates the remaining measurement data using the received threshold.

　生体由来粒子分取装置である測定装置３１１の分取部（判定部）は、分取用生体由来粒子から測定された光情報を学習部３１７によって作成された学習モデルに入力し、分取用生体由来粒子が分取対象であるかを推論し、分取対象であると推論した場合に閾値設定部１１９で設定された閾値に基づいて分取判断する。生体由来粒子分取装置は、判定部の分取判断に基づいて、分取対象粒子を分取する。分取用生体由来粒子はサンプルに含まれる。次に、第１の実施の形態に係る情報処理システムの変形例について説明する。図２２は、第１の実施の形態に係る情報処理システムの変形例を示す機能ブロック図である。 The sorting section (determination section) of the measuring device 311, which is a biological particle sorting device, inputs optical information measured from the biological particles to be sorted into a learning model created by the learning section 317, infers whether the biological particles to be sorted are to be sorted, and if it is inferred that they are to be sorted, makes a sorting determination based on the threshold set by the threshold setting section 119. The biological particle sorting device sorts the particles to be sorted based on the sorting determination by the determination section. The biological particles to be sorted are included in the sample. Next, a modified example of the information processing system according to the first embodiment will be described. FIG. 22 is a functional block diagram showing a modified example of the information processing system according to the first embodiment.

　第１の実施の形態に係る情報処理システムの変形例では、図２２に示すように、計算付加が大きい機能（例えば、前処理部３１３、次元圧縮部３１４、ゲート部３１５、分割部３１６、学習部３１７、推定部３１８、閾値設定部３１９）をより演算能力が高い装置（図２２の例では、情報処理サーバ３０１）に担当させることができる。 In a modified example of the information processing system according to the first embodiment, as shown in FIG. 22, functions requiring a large amount of calculation (e.g., preprocessing unit 313, dimensional compression unit 314, gate unit 315, division unit 316, learning unit 317, estimation unit 318, threshold setting unit 319) can be assigned to a device with higher computing power (information processing server 301 in the example of FIG. 22).

　情報処理装置３００は、測定装置３１１とネットワークを介して接続されたクラウドコンピュータであっても良い。この場合、クラウドコンピュータは、情報処理装置３００の次元圧縮部３１４、機械学習の学習部３１７、閾値設定部３１９などの一部の機能を実行しても良い。 The information processing device 300 may be a cloud computer connected to the measurement device 311 via a network. In this case, the cloud computer may execute some of the functions of the information processing device 300, such as the dimensional compression unit 314, the machine learning learning unit 317, and the threshold setting unit 319.

　一方、迅速な判別のためにネットワーク等による遅延を避けたいこと、及び計算負荷がそれほど大きくない機能については、測定装置３１１と直接接続される情報処理装置３００に担当させてもよい。 On the other hand, in order to make rapid judgments and to avoid delays due to networks, etc., and functions that do not impose a large computational load, the information processing device 300 directly connected to the measurement device 311 may be responsible.

　第１の実施の形態の変形例に係る情報処理システムによれば、第１の実施の形態に係る情報処理装置３００と同様に、測定データの分類を適切に行うことができる。 According to the information processing system according to the modified example of the first embodiment, measurement data can be appropriately classified, similar to the information processing device 300 according to the first embodiment.

　＜２．第２の実施の形態＞
　　＜２．１．確信度に基づく分取（クラスタリング）＞
　第２の実施の形態は、クラスタリング分取の閾値を設定するものである。クラスタリングアルゴリズムを使って分取する場合、全クラスタのうち相対的に最も類似性が高いクラスタに必ず分類される。しかし、その分類結果が絶対値的に近いかどうかは不明である。 2. Second embodiment
2.1. Confidence-Based Clustering
The second embodiment is to set a threshold value for clustering sorting. When sorting is performed using a clustering algorithm, the cluster is always classified into the cluster with the highest relative similarity among all clusters. However, it is unclear whether the classification results are close in absolute value.

　分類されたクラスタの絶対距離が遠かった場合、純度を優先するユーザにとって、分類されたクラスタは、非分取対象にすべきかもしれない。第２の実施の形態では、ある一定の距離より近い場合しか測定データを分取しないなどの閾値を設ける。 If the absolute distance between the classified clusters is far, a user who prioritizes purity may want to exclude the classified cluster from separation. In the second embodiment, a threshold is set so that measurement data is only separated when the distance is closer than a certain value.

　＜２．２．クラスタリング分取の閾値＞
　図２３は、第２の実施の形態に係るクラスタリング分取の閾値の考え方を説明するための図である。 2.2. Threshold for clustering fractionation
FIG. 23 is a diagram for explaining the concept of thresholds for clustering sorting according to the second embodiment.

　図２３において、横軸のパラメータは、例えば、蛍光色素抗体や抗原マーカーやCD分類の種類を示し、縦軸は、イベント（例えば、細胞）の蛍光強度を示している。実線は、クラスタの代表値を示し、点線は対象のイベント（残りの測定データの測定値）を示している。 In Figure 23, the parameters on the horizontal axis indicate, for example, the type of fluorescent dye antibody, antigen marker, or CD classification, and the vertical axis indicates the fluorescence intensity of an event (e.g., a cell). The solid line indicates the representative value of the cluster, and the dotted line indicates the target event (the measured value of the remaining measurement data).

　図２３に示すように、例えば、一番左側のパラメータに対応するクラスタの代表値に５０％の閾値が設定された場合、図２４に示すように、クラスタの代表値の２５％～７５％が閾値の範囲とされる。図２４は、第２の実施の形態に係るクラスタリング分取における閾値を５０％にした場合の範囲の考え方を説明するための図である。そして、このクラスタの代表値の２５％～７５％に図２３に示した一番左側の対象イベントのパラメータの測定値（蛍光強度）が入る場合には、分取の対象となる測定値とされる。図２３の場合、一番左側のイベントのパラメータの測定値は、閾値の範囲に入らないので、分取の対象とはされない。 As shown in FIG. 23, for example, if a threshold of 50% is set for the representative value of the cluster corresponding to the leftmost parameter, then 25% to 75% of the cluster's representative value is set as the threshold range, as shown in FIG. 24. FIG. 24 is a diagram for explaining the concept of the range when the threshold is set to 50% in clustering sorting according to the second embodiment. If the measured value (fluorescence intensity) of the parameter of the leftmost target event shown in FIG. 23 falls within 25% to 75% of the representative value of this cluster, then it is set as the measured value to be sorted. In the case of FIG. 23, the measured value of the parameter of the leftmost event does not fall within the threshold range, so it is not set as the target for sorting.

　例えば、一番左側から２番目のパラメータに対応するクラスタの代表値に５０％の閾値が設定された場合、図２４に示すように、クラスタの代表値の２５％～７５％が閾値の範囲とされる。そして、このクラスタの代表値の２５％～７５％に一番左側から２番目の対象のイベントのパラメータの測定値が入る場合には、分取の対象となる測定値とされる。図２３の場合、一番左側から２番目の測定値は、閾値の範囲に入らないので、分取の対象とはされない。 For example, if a threshold of 50% is set for the representative value of the cluster corresponding to the second parameter from the left, then 25% to 75% of the cluster's representative value will be set as the threshold range, as shown in Figure 24. If the measured value of the parameter of the second target event from the left falls within 25% to 75% of this cluster's representative value, then it will be set as the measured value to be sampled. In the case of Figure 23, the measured value second from the left does not fall within the threshold range, so it is not set as the target for sampled.

　例えば、一番左側から３番目のパラメータに対応するクラスタの代表値に５０％の閾値が設定された場合、図２４に示すように、クラスタの代表値の２５％～７５％が閾値の範囲とされる。そして、このクラスタの代表値の２５％～７５％に一番左側から３番目の測定値が入る場合には、分取の対象となる測定値とされる。図２４の場合、一番左側から３番目の測定値は、閾値の範囲に入るので、分取の対象とされる。 For example, if a threshold of 50% is set for the representative value of the cluster corresponding to the third parameter from the left, then 25% to 75% of the cluster's representative value will be set as the threshold range, as shown in Figure 24. If the third measurement value from the left falls within the range of 25% to 75% of this cluster's representative value, it will be set as the measurement value to be sampled. In the case of Figure 24, the third measurement value from the left falls within the threshold range, so it will be set as the measurement value to be sampled.

　第２の実施の形態では、クラスタリング分取の閾値は、以下のように判定しても良い。 In the second embodiment, the threshold for clustering may be determined as follows:

　＜２．２．１．パラメータ毎に閾値の判定を行う場合＞
・絶対値の閾値を入力し、測定値がクラスタの代表値±閾値以内にすべてのパラメータが収まっていれば分取する。
・割合の閾値を入力し、測定値がクラスタの代表値±代表値×閾値以内にすべてのパラメータが収まっていれば分取する。
・各クラスタでパラメータ毎に度数分布等で、ユーザが入力した閾値以内にすべてのパラメータで収まっていれば分取する。 <2.2.1. When threshold determination is performed for each parameter>
- Enter an absolute threshold value, and if the measured values for all parameters are within the cluster representative value ± the threshold value, the sample is collected.
- Enter a percentage threshold, and if the measured values for all parameters are within the cluster representative value ± representative value × threshold, the sample will be sorted.
- For each cluster, if the frequency distribution of each parameter is within the threshold value entered by the user, the parameters are separated.

　＜２．２．２．全パラメータ平均で閾値の判定を行う場合＞
・絶対値の閾値を入力し、ｍｅａｎ（｜測定値－代表値｜）が閾値以内に収まっていれば分取する。
・割合の閾値を入力し、ｍｅａｎ（｜測定値－代表値｜）が代表値の平均×閾値以内に収まっていれば分取する。
　ここで、「ｍｅａｎ」は平均を意味する。ランダムフォレストをアルゴリズムとして採用する場合、決定木の多数決を行う際に本数や本数の割合を閾値として設定しても良い。閾値については測定データをもとに自動で決定しても良いし、ユーザが決定しても良い。 <2.2.2. When threshold is determined by averaging all parameters>
Input an absolute threshold value, and if the mean (|measured value-representative value|) falls within the threshold value, the sample is collected.
Enter a percentage threshold, and if the mean (|measured value - representative value|) is within the average of the representative values x the threshold value, the sample is collected.
Here, "mean" means average. When using a random forest as an algorithm, the number or ratio of the number of decision trees may be set as a threshold when performing majority voting of the decision trees. The threshold may be determined automatically based on the measurement data, or may be determined by the user.

　閾値の判定は、平均値だけではなく、クラスタに含まれる複数の測定データの中央値を使用しても良い。また、閾値の判定は、学習部３１７で決定される代表値を使用しても良い。 The threshold value may be determined not only by the average value but also by the median value of multiple measurement data included in a cluster. The threshold value may also be determined by using a representative value determined by the learning unit 317.

　　＜２．３．情報処理装置４００の機能ブロック図＞
　図２５は、第２の実施の形態に係る情報処理装置４００のクラスタリング分取を行う機能ブロック図である。 <2.3. Functional block diagram of information processing device 400>
FIG. 25 is a functional block diagram of the information processing device 400 according to the second embodiment for performing clustering sorting.

　図２５に示すように、情報処理装置４００には、測定装置４１１が接続されている。測定装置４１１は、サンプル（例えば、細胞など）の測定を行い、測定した測定データに必要なデータ（例えば、細胞の蛍光の色、蛍光の強さ等）を付加し、情報処理装置４００に出力する。測定では、少なくとも測定データのイベント（例えば、細胞１など）の測定を行う。 As shown in FIG. 25, a measuring device 411 is connected to the information processing device 400. The measuring device 411 measures a sample (e.g., a cell, etc.), adds necessary data (e.g., the color of the cell's fluorescence, the intensity of the fluorescence, etc.) to the measured measurement data, and outputs the data to the information processing device 400. In the measurement, at least an event of the measurement data (e.g., cell 1, etc.) is measured.

　情報処理装置４００は、取得部４１２、前処理部４１３、クラスリング及びクラスタリング部４１４、クラスタ選択部４１５、表示部４１６、閾値設定部４１７、分取部４１８を有する。 The information processing device 400 has an acquisition unit 412, a preprocessing unit 413, a classification and clustering unit 414, a cluster selection unit 415, a display unit 416, a threshold setting unit 417, and a fractionation unit 418.

　取得部４１２は、情報処理装置４００の外部の測定装置４１１から複数の測定データを取得する。前処理部４１３は、取得部４１２により測定された測定データに対してダウンサンプリングや目的の集団（ｐｏｐｕｌａｔｉｏｎ）の絞り込みなどを行う。 The acquisition unit 412 acquires multiple pieces of measurement data from a measurement device 411 external to the information processing device 400. The preprocessing unit 413 performs downsampling and narrowing down the target population on the measurement data measured by the acquisition unit 412.

　クラスリング及びクラスタリング部４１４は、取得部４１２により取得された複数の測定データをクラスに分類する。また、クラスリング及びクラスタリング部４１４は、取得部４１２により取得された複数の測定データをクラスタに分類する。 The classifying and clustering unit 414 classifies the multiple pieces of measurement data acquired by the acquiring unit 412 into classes. The classifying and clustering unit 414 also classifies the multiple pieces of measurement data acquired by the acquiring unit 412 into clusters.

　クラスタ選択部４１５は、クラスリング及びクラスタリング部４１４により分類されたクラスから分取対象となるクラスタを選択する。表示部４１６は、クラスリングされた測定データの効率等（例えば、測定データ、クラス、閾値、モード、純度、効率分類された測定データ、分類された測定データのクラスタ）の画面を表示する。閾値設定部４１７は、クラスタ選択部４１５により選択されたクラスタに含まれる複数の測定データの平均であるクラスタの代表値に対する閾値を設定する。 The cluster selection unit 415 selects a cluster to be collected from the classes classified by the classification and clustering unit 414. The display unit 416 displays a screen showing the efficiency of the classified measurement data (e.g., measurement data, class, threshold, mode, purity, efficiency-classified measurement data, cluster of classified measurement data). The threshold setting unit 417 sets a threshold for a representative value of the cluster, which is the average of multiple measurement data included in the cluster selected by the cluster selection unit 415.

　閾値には、クラスタに含まれる複数の測定データの中央値を使用しても良い。また、閾値の判定は、学習部３１７で決定される代表値を使用しても良い。 The threshold value may be the median value of the multiple measurement data included in the cluster. The threshold value may also be determined using a representative value determined by the learning unit 317.

　分取部４１８は、閾値設定部４１７により設定された閾値に基づいて、クラスリング及びクラスタリング部４１４により分類されたクラスタに含まれる測定データのうち、分取の対象とする測定データを分取する。 The fractionation unit 418 fractionates the measurement data to be fractionated from among the measurement data contained in the clusters classified by the classification and clustering unit 414 based on the threshold set by the threshold setting unit 417.

　具体的には、分取部４１８は、クラスリング及びクラスタリング部４１４により分類されたクラスタに含まれる複数の測定データの全ての測定値が代表値±閾値に収まっていれば、クラスリング及びクラスタリング部４１４により分類されたクラスタに含まれるサンプリングデータを分取の対象として分取する。 Specifically, if all the measurement values of the multiple measurement data included in the cluster classified by the classifying and clustering unit 414 are within a representative value ±threshold, the sorting unit 418 sorts the sampling data included in the cluster classified by the classifying and clustering unit 414 as the target for sorting.

　分取部４１８は、クラスリング及びクラスタリング部４１４により分類されたクラスタに含まれる複数の測定データの全ての測定値が代表値±代表値×閾値に収まっていれば、クラスタリング部により分類されたクラスタに含まれるサンプリングデータを分取の対象として分取しても良い。 The fractionation unit 418 may fractionate the sampling data included in the cluster classified by the clustering and clustering unit 414 as a fractionation target, if all the measurement values of the multiple measurement data included in the cluster classified by the classifying and clustering unit 414 are within the range of a representative value ± representative value × threshold value.

　＜２．４．ＦｌｏｗＳＯＭの回路＞
　図２６は、第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第１の例を示す図である。ＦｌｏｗＳＯＭは、公知のクラスタリングアルゴリズムである。図２６に示すように、差分器５５１には、イベントデータａ（ｄ次元）と、ｄ次元の代表値が入っているノード（クラスタ）１のデータｂとが入力され、差分（ａ－ｂ）が算出される。 <2.4. FlowSOM circuit>
26 is a diagram showing a first example of a circuit of FlowSOM according to the second embodiment. FlowSOM is a known clustering algorithm. As shown in FIG. 26, event data a (d dimension) and data b of node (cluster) 1 containing a representative value of the d dimension are input to a difference calculator 551, and a difference (a-b) is calculated.

　二乗器５５２は、差分器５５１から算出された差分（ａ－ｂ）の二乗（ａ－ｂ）^２を算出し、総和器５５３に出力する。総和器５５３は、二乗器５５２から算出された差分（ａ－ｂ）の二乗（ａ－ｂ）^２の総和Σ（ａ－ｂ）^２を算出して比較器５５４に出力する。 The squarer 552 calculates the square (a−b) ² of the difference (a−b) calculated by the differentiator 551, and outputs it to a summation calculator 553. The summation calculator 553 calculates the sum Σ(a−b ⁾² of the square (a−b) ² of the difference (a−b) calculated by the squarer 552, and outputs it to a comparator 554.

　比較器５５４は、最小距離保持器５５５に保持された最小距離と、総和器５５３から出力された総和Σ（ａ－ｂ）^２とを比較して、小さいほうの距離を最小距離として最小距離保持器５５５に保持する。 The comparator 554 compares the minimum distance held in the minimum distance holder 555 with the sum Σ(a−b) ² output from the summation unit 553 , and holds the smaller distance in the minimum distance holder 555 as the minimum distance.

　具体的には、比較器５５４は、最小距離保持器５５５に保持されるイベントデータａとデータｂとのユークリッド距離が近い総和Σ（ａ－ｂ）^２に入れ替える。すなわち、比較器５５４は、最も誤差の小さいノードを探索するために比較を行う。これにより、最小距離保持器５５５に保持された最小距離のノード（クラスタ）に分類される。 Specifically, the comparator 554 replaces the event data a and data b held in the minimum distance holder 555 with the sum Σ(a−b) ² , which has the closest Euclidean distance between them. That is, the comparator 554 performs comparison to search for the node with the smallest error. As a result, the data is classified into the node (cluster) with the smallest distance held in the minimum distance holder 555.

　差分器５５１には、ノード１、ノード２、．．．、ノードＮのデータｂが直列に順に入力されるが、ノード１、ノード２、．．．、ノードＮのデータｂは並列処理されても良い。 The data b from node 1, node 2, ..., node N are input serially to the difference calculator 551, but the data b from node 1, node 2, ..., node N may be processed in parallel.

　図２７は、第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第２の例を示す図である。図２７に示すように、１００個のノード１～ノード１００のデータが並列数１０で入力される。 FIG. 27 is a diagram showing a second example of a circuit of FlowSOM according to the second embodiment. As shown in FIG. 27, data of 100 nodes 1 to 100 is input with a parallel number of 10.

　具体的には、ノード１、ノード２、ノード３、・・・、ノード１０のｄ次元の代表値が入っているデータｂが並列に差分器５５１＿１～差分器５５１＿１０にそれぞれ入力される。また、差分器５５１＿１～差分器５５１＿１０には、イベントデータａ（ｄ次元）が入力される。 Specifically, data b containing d-dimensional representative values of node 1, node 2, node 3, ..., node 10 is input in parallel to subtractors 551_1 to 551_10. Event data a (d-dimension) is also input to subtractors 551_1 to 551_10.

　ノード１、ノード１１、ノード２１、・・・、ノード９１のデータｂは、順に入力される。ノード２、ノード１２、ノード２２、・・・、ノード９２のデータｂは、順に入力され、ノード３、ノード１３、ノード２３、・・・、ノード９３のデータｂは、順に入力され、・・・、ノード１０、ノード２０、ノード３０、・・・、ノード１００のデータｂは、順に入力される。 The data b of node 1, node 11, node 21, ..., node 91 is input in order. The data b of node 2, node 12, node 22, ..., node 92 is input in order, the data b of node 3, node 13, node 23, ..., node 93 is input in order, ..., the data b of node 10, node 20, node 30, ..., node 100 is input in order.

　差分器５５１＿１～差分器５５１＿１０には、イベントデータａ（ｄ次元）と、ｄ次元の代表値が入っているノード（クラスタ）１、ノード２、．．．、ノードＮのデータｂとのデータｂとがそれぞれ入力され、差分（ａ－ｂ）が算出される。 Differentiators 551_1 to 551_10 receive event data a (d dimension) and data b from nodes (clusters) 1, 2, ..., and N that contain the representative value of the d dimension, and calculate the difference (a - b).

　二乗器５５２＿１～二乗器５５２＿１０は、差分器５５１＿１～差分器５５１＿１０から算出された差分（ａ－ｂ）の二乗（ａ－ｂ）^２をそれぞれ算出し、総和器５５３＿１～総和器５５３＿１０にそれぞれ出力する。総和器５５３＿１～総和器５５３＿１０は、二乗器５５２＿１～二乗器５５２＿１０から算出された差分（ａ－ｂ）の二乗（ａ－ｂ）^２の総和Σ（ａ－ｂ）^２をそれぞれ算出して比較器５５４＿１～比較器５５４＿１０にそれぞれ出力する。 The squarers 552_1 to 552_10 calculate the squares (a-b) ² of the differences (a-b) calculated by the differencers 551_1 to 551_10, respectively, and output the squares to the summations 553_1 to 553_10. The summations 553_1 to 553_10 calculate the sums Σ(a-b) ² of the squares (a-b) ² of the differences (a-b) calculated by the squarers 552_1 to 552_10, respectively, and output the sums to the comparators 554_1 to 554_10, respectively.

　比較器５５４＿１～比較器５５４＿１０は、最小距離保持器５５５＿１～最小距離保持器５５５＿１０に保持された最小距離と、総和器５５３＿１～総和器５５３＿１０から出力された総和Σ（ａ－ｂ）^２とをそれぞれ比較して、小さいほうの距離を最小距離として最小距離保持器５５５＿１～最小距離保持器５５５＿１０にそれぞれ保持する。 Comparators 554_1 to 554_10 compare the minimum distances held in minimum distance holders 555_1 to 555_10 with the sums Σ(a−b) ² output from summaries 553_1 to 553_10, respectively, and hold the smaller distance as the minimum distance in minimum distance holders 555_1 to 555_10, respectively.

　これにより、最小距離保持器５５５＿１～最小距離保持器５５５＿１０には、ノード１、ノード１１、ノード２１、・・・、ノード９１のうちの最小距離のノード（クラスタ）、ノード２、ノード１２、ノード２２、・・・、ノード９２のうちの最小距離のノード（クラスタ）、・・・、ノード１０、ノード２０、ノード３０、・・・、ノード１００のうちの最小距離のノード（クラスタ）に分類される。 As a result, the nodes (clusters) that are the shortest distance among node 1, node 11, node 21, ..., node 91 are classified into the nodes (clusters) that are the shortest distance among node 2, node 12, node 22, ..., node 92, ..., node 10, node 20, node 30, ..., node 100 are classified into the nodes (clusters) that are the shortest distance among node 1, node 11, node 21, ..., node 91, ..., node 100.

　比較器５５６は、最小距離保持器５５５＿１～最小距離保持器５５５＿１０に保持された最小距離を比較し、小さいほうの距離を最小距離として最小距離保持器２５７に保持する。これにより、最小距離保持器２５７には、ノード１～ノード１００のうちの最小距離のノード（クラスタ）に分類される。 Comparator 556 compares the minimum distances stored in minimum distance holder 555_1 to minimum distance holder 555_10, and stores the smaller distance as the minimum distance in minimum distance holder 257. As a result, minimum distance holder 257 classifies nodes (clusters) with the smallest distance from node 1 to node 100.

　なお、図２７では、ノード数を１００、並列数を１０としているが、回路リソースに応じて柔軟な値をとっても良い。また、図２７では、比較器５５６が１つの場合を示したが、比較器５５６も複数の比較器５５６を使用して並列処理を行っても良い。 In FIG. 27, the number of nodes is 100 and the number of parallel connections is 10, but these values may be flexible depending on the circuit resources. Also, in FIG. 27, the case where one comparator 556 is used is shown, but multiple comparators 556 may be used to perform parallel processing.

　図２８は、第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第３の例を示す図である。図２８において、◇はメタクラスタを示しており、□は最小値が選択されたメタクラスタに紐づいたノードを示している。 FIG. 28 is a diagram showing a third example of a circuit of FlowSOM according to the second embodiment. In FIG. 28, ◇ indicates a metacluster, and □ indicates a node associated with the metacluster for which the minimum value is selected.

　図２８においては、メタクラスタ数を８、最小値が選択されたメタクラスタに紐づくノード数を１０としているが、メタクラスタ数やノード数はこれに限られない。図２８では、メタクラスタに紐づくノード１～ノード１０が直列に計算される場合を示しているが、計算は並列に行われても良い。 In FIG. 28, the number of metaclusters is 8, and the number of nodes linked to the metacluster for which the minimum value was selected is 10, but the number of metaclusters and nodes is not limited to this. In FIG. 28, the case where nodes 1 to 10 linked to the metacluster are calculated in series is shown, but the calculations may be performed in parallel.

　図２８に示した、ＦｌｏｗＳＯＭの回路の第３の例では、メタクラスタ１－８の中で最小距離となるメタクラスタを差分器５７１～最小距離保持器５７５の処理で見つける。その後、最小距離となるメタクラスタに紐づく１０個のノードの中から最終距離となるノードに分類される。 In the third example of the FlowSOM circuit shown in Figure 28, the metacluster with the smallest distance is found among metaclusters 1-8 by the processing of the difference calculator 571 to the minimum distance holder 575. After that, the 10 nodes linked to the metacluster with the smallest distance are classified into the node with the final distance.

　図２８において、差分器５７１には、イベントデータａ（ｄ次元）と、ｄ次元の代表値が入っている誤差が最小の選択されたメタクラスタに紐づくノード（クラスタ）のデータｂとが入力され、差分（ａ－ｂ）が算出される。 In FIG. 28, event data a (d dimension) and data b of a node (cluster) linked to a selected meta cluster with the smallest error containing the representative value of the d dimension are input to a subtractor 571, and the difference (a-b) is calculated.

　二乗器５７２は、差分器５７１から算出された差分（ａ－ｂ）の二乗（ａ－ｂ）^２を算出し、総和器５７３に出力する。総和器５７３は、二乗器５７２から算出された差分（ａ－ｂ）の二乗（ａ－ｂ）^２の総和Σ（ａ－ｂ）^２を算出して比較器５７４に出力する。 The squarer 572 calculates the square (a−b) ² of the difference (a−b) calculated by the differentiator 571, and outputs it to a summation calculator 573. The summation calculator 573 calculates the sum Σ(a−b ⁾² of the square (a−b) ² of the difference (a−b) calculated by the squarer 572, and outputs it to a comparator 574.

　比較器５７４は、最小距離保持器５５５に保持された最小距離と、総和器５７３から出力された総和Σ（ａ－ｂ）^２とを比較して、小さいほうの距離を最小距離として最小距離保持器５７５に保持する。 The comparator 574 compares the minimum distance held in the minimum distance holder 555 with the sum Σ(a−b) ² output from the summation unit 573 , and holds the smaller distance in the minimum distance holder 575 as the minimum distance.

　具体的には、比較器５７４は、最小距離保持器５７５に保持されるイベントデータａとデータｂとのユークリッド距離が近い総和Σ（ａ－ｂ）^２に入れ替える。すなわち、比較器５７４は、最も誤差の小さいノードを探索するために比較を行う。これにより、最小距離保持器５７５に保持された最小距離のノード（クラスタ）がクラスタリングされる。 Specifically, the comparator 574 replaces the event data a and data b held in the minimum distance holder 575 with the sum Σ(a−b) ² , which has the closest Euclidean distance between them. That is, the comparator 574 performs comparison to search for the node with the smallest error. As a result, the node (cluster) with the smallest distance held in the minimum distance holder 575 is clustered.

　差分器５７１には、誤差が最小距離のメタクラスタに紐づけられたノード１、ノード２、．．．、ノード１０のデータｂが直列に順に入力されるが、ノード１、ノード２、．．．、ノード１０のデータｂは並列処理されても良い。 The data b from node 1, node 2, ..., node 10 linked to the meta cluster with the smallest error distance is input in series to the difference calculator 571, but the data b from node 1, node 2, ..., node 10 may be processed in parallel.

　＜２．５．動作説明＞
　図２９は、第２の実施の形態に係る情報処理装置４００のクラスタリング分取を説明するためのフローチャートである。 <2.5. Operation Description>
FIG. 29 is a flowchart for explaining the clustering sorting of the information processing device 400 according to the second embodiment.

　まず、測定装置４１１に複数のサンプルの一部が流されて、一部の複数のサンプルが測定される（ステップＳ２１）。次に、測定された一部の複数のサンプルの測定データのダウンサンプリングや目的の集団の絞り込みなどの前処理が行われる（ステップＳ２２）。 First, a portion of the multiple samples is passed through the measuring device 411 and the portion of the multiple samples is measured (step S21). Next, pre-processing such as downsampling of the measurement data of the portion of the multiple samples and narrowing down of the target group is performed (step S22).

　次に、前処理が行われた一部の複数の測定データをクラスに分類するクラスリングが行われる（ステップＳ２３）。クラスリングが行われたクラスタから分取対象となるクラスタが選択される（ステップＳ２４）。 Next, classification is performed to classify the preprocessed part of the measurement data into classes (step S23). A cluster to be collected is selected from the classified clusters (step S24).

　次に、選択されたクラスタに含まれる複数の測定データの平均である代表値に対する閾値が設定される（ステップＳ２５）。なお、選択されたクラスタに含まれる複数の測定データの中央値に対する閾値であっても良い。次に、ユーザは、表示部４１６に表示された効率の値を確認し（ステップＳ２６）、効率が１００％でなければ（ステップＳ２６のＮＧ）、ステップＳ２５の処理に戻り、再度閾値の設定が行われる。なお、効率の値は１００％ではなくユーザ判断で任意の値でも良い。 Next, a threshold is set for a representative value that is the average of multiple measurement data included in the selected cluster (step S25). Note that the threshold may be set for the median value of multiple measurement data included in the selected cluster. Next, the user checks the efficiency value displayed on the display unit 416 (step S26), and if the efficiency is not 100% (NG in step S26), the process returns to step S25, where the threshold is set again. Note that the efficiency value may not be 100% but may be any value determined by the user.

　一方、効率が１００％である場合（ステップＳ２６のＯＫ）、残りのサンプルが流され（ステップＳ２７）、残りの測定データについて、クラスタリングが行われる（ステップＳ２８）。次に、設定された閾値に基づいて、分類されたクラスタに含まれる残りの測定データのうち、分取の対象とする測定データを設定された閾値を利用して分類する（ステップＳ２９）。 On the other hand, if the efficiency is 100% (OK in step S26), the remaining sample is passed through (step S27), and clustering is performed on the remaining measurement data (step S28). Next, from the remaining measurement data contained in the clusters classified based on the set threshold, the measurement data to be collected is classified using the set threshold (step S29).

　ここで、クラスタリング対象のデータの説明変数はスペクトルなど蛍光補正前の生の値を使っても良いし、蛍光補正後のデータであっても良い。また、蛍光補正をする際に逆行列計算を行うが、その際にガウスジョルダン法を用いて解いても良い。また、クラスタリングの前処理としてバッチ効果を抑える目的で正規化などのアルゴリズムを用いても良い。 Here, the explanatory variables for the data to be clustered may be raw values before fluorescence correction, such as spectra, or may be data after fluorescence correction. In addition, an inverse matrix calculation is performed when performing fluorescence correction, and the Gauss-Jordan method may be used to solve this. Furthermore, algorithms such as normalization may be used as preprocessing for clustering in order to suppress batch effects.

　　＜２．６．変形例＞
　第２の実施の形態では、情報処理装置４００が残りの測定データについて分類を行う場合について説明したが、残りの測定データについての分類は、処理に時間を要するため、測定装置４１１側で行っても良い。 2.6. Modifications
In the second embodiment, the case where the information processing device 400 classifies the remaining measurement data has been described. However, since classification of the remaining measurement data takes time, it may be performed on the measuring device 411 side.

　図３０は、第２の実施の形態の変形例に係る情報処理システムの機能ブロック図である。なお、図２５と同一部分には同一符号を付して説明する。図３０に示すように、情報処理装置４００に設けられていた分取部４１８が測定装置４１１に設けられても良い。 FIG. 30 is a functional block diagram of an information processing system according to a modified example of the second embodiment. Note that the same parts as those in FIG. 25 are denoted by the same reference numerals. As shown in FIG. 30, the fractionation unit 418 provided in the information processing device 400 may be provided in the measurement device 411.

　図３０に示すように、閾値設定部４１７により設定された閾値及びクラスリング及びクラスタリング部４１４によりクラスタリングされたクラスタが情報処理装置４００から測定装置４１１に出力される。 As shown in FIG. 30, the threshold set by the threshold setting unit 417 and the clusters clustered by the classification and clustering unit 414 are output from the information processing device 400 to the measurement device 411.

　測定装置４１１の分取部４１８は、情報処理装置４００から出力された閾値及びクラスタリングされたクラスタを受信し、受信した閾値を利用してクラスタに含まれる測定データを分取する。 The fractionation unit 418 of the measurement device 411 receives the threshold and the clustered clusters output from the information processing device 400, and fractionates the measurement data contained in the clusters using the received threshold.

　第２の実施の形態の変形例に係る情報処理システムによれば、第２の実施の形態に係る情報処理装置４００と同様に、測定データの分類を適切に行うことができる。 The information processing system according to the modified example of the second embodiment can appropriately classify measurement data, similar to the information processing device 400 according to the second embodiment.

　＜２．７．ＦｌｏｗＳＯＭ分取時のフローチャート＞
　次に、図２６に示したＦｌｏｗＳＯＭの回路の第１の例の動作について説明する。図３１は、第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第１の例の動作を説明するためのフローチャートである。 <2.7. Flowchart for FlowSOM fractionation>
Next, a description will be given of the operation of the first example of the circuit of FlowSOM shown in Fig. 26. Fig. 31 is a flowchart for explaining the operation of the first example of the circuit of FlowSOM according to the second embodiment.

　図３１に示すように、ｉ＝０が設定され（ステップＳ３１）、ｉ＜ｄ（ｄ：次元数）かが判断される（ステップＳ３２）。ステップＳ３２において、ｉ＜ｄである場合（ステップＳ３２のＹｅｓ）、各ノードの代表ベクトルのｉ次元の値と分取対象イベントのｉ次元の値との差分を計算する（ステップＳ３３）。 As shown in FIG. 31, i = 0 is set (step S31), and it is determined whether i < d (d: number of dimensions) (step S32). If i < d in step S32 (Yes in step S32), the difference between the i-th dimension value of the representative vector of each node and the i-th dimension value of the event to be sorted is calculated (step S33).

　次に、ステップＳ３３で計算された各ノードの代表ベクトルのｉ次元の値と分取対象イベントのｉ次元の値との差分の値を二乗し（ステップＳ３４）、二乗した差分の値を積算する（ステップＳ３５）。次に、ｉ＝ｉ＋１として（ステップＳ３６）、ステップＳ３２の処理に戻る。 Next, the difference between the i-dimension value of the representative vector of each node calculated in step S33 and the i-dimension value of the event to be sorted is squared (step S34), and the squared difference value is integrated (step S35). Next, i = i + 1 is set (step S36), and the process returns to step S32.

　ステップＳ３２において、ｉ＜ｄでない場合（ステップＳ３２のＮｏ）、二乗した差分の積算値が最小値のノードを算出し（ステップＳ３７）、処理を終了する。これにより、誤差が最小距離のノード（クラスタ）がクラスタリングされる。 If i<d is not satisfied in step S32 (No in step S32), the node with the smallest integrated value of the squared differences is calculated (step S37), and the process ends. As a result, the node (cluster) with the smallest error distance is clustered.

　次に、図２８に示したＦｌｏｗＳＯＭの回路の第３の例の動作について説明する。図３２は、第２の実施の形態に係るＦｌｏｗＳＯＭの回路の第３の例の動作を説明するためのフローチャートである。 Next, the operation of the third example of the FlowSOM circuit shown in FIG. 28 will be described. FIG. 32 is a flowchart for explaining the operation of the third example of the FlowSOM circuit according to the second embodiment.

　図３２に示すように、ｉ＝０が設定され（ステップＳ４１）、ｉ＜ｄ（ｄ：次元数）かが判断される（ステップＳ４２）。ステップＳ４２において、ｉ＜ｄである場合（ステップＳ４２のＹｅｓ）、各メタクラスタの代表ベクトルのｉ次元の値と分取対象イベントのｉ次元の値との差分を計算する（ステップＳ４３）。 As shown in FIG. 32, i = 0 is set (step S41), and it is determined whether i < d (d: number of dimensions) (step S42). If i < d in step S42 (Yes in step S42), the difference between the i-th dimension value of the representative vector of each metacluster and the i-th dimension value of the event to be sorted is calculated (step S43).

　次に、ステップＳ４３で計算された各メタクラスタの代表ベクトルのｉ次元の値と分取対象イベントのｉ次元の値との差分の値を二乗し（ステップＳ４４）、二乗した差分の値を積算する（ステップＳ４５）。次に、ｉ＝ｉ＋１として（ステップＳ４６）、ステップＳ４２の処理に戻る。 Next, the difference between the i-dimension value of the representative vector of each metacluster calculated in step S43 and the i-dimension value of the event to be sorted is squared (step S44), and the squared difference value is integrated (step S45). Next, i = i + 1 is set (step S46), and the process returns to step S42.

　ステップＳ４２において、ｉ＜ｄでない場合（ステップＳ４２のＮｏ）、二乗した差分の値が最小値のメタクラスタを算出し（ステップＳ４７）、ｊ＝０と設定する（ステップＳ４８）。 In step S42, if i<d is not satisfied (No in step S42), the metacluster with the smallest squared difference is calculated (step S47), and j is set to 0 (step S48).

　次に、ｊ＜ｄ（ｄ：次元数）かが判断される（ステップＳ４９）。ステップＳ４９において、ｊ＜ｄである場合（ステップＳ４９のＹｅｓ）、二乗した差分の値が最小値のメタクラスタに所属する各ノードの代表ベクトルのｊ次元の値と分取対象イベントのｊ次元の値との差分を計算する（ステップＳ５０）。 Next, it is determined whether j<d (d: number of dimensions) (step S49). If j<d is true in step S49 (Yes in step S49), the difference between the j-dimensional value of the representative vector of each node belonging to the metacluster with the smallest squared difference value and the j-dimensional value of the event to be sorted is calculated (step S50).

　次に、ステップＳ５０で計算されたメタクラスタに所属する各ノードの代表ベクトルのｊ次元の値と分取対象イベントのｊ次元の値との差分の値を二乗し（ステップＳ５１）、二乗した差分の値を積算する（ステップＳ５２）。次に、ｊ＝ｊ＋１として（ステップＳ５３）、ステップＳ４９の処理に戻る。 Next, the difference between the j-dimension value of the representative vector of each node belonging to the meta-cluster calculated in step S50 and the j-dimension value of the event to be sorted is squared (step S51), and the squared difference value is integrated (step S52). Next, j = j + 1 is set (step S53), and the process returns to step S49.

　ステップＳ４９において、ｊ＜ｄでない場合（ステップＳ４９のＮｏ）、二乗した差分の積算値が最小値のノードを算出し（ステップＳ５４）、処理を終了する。これにより、誤差が最小距離のノード（クラスタ）がクラスタリングされる。 If j<d is not satisfied in step S49 (No in step S49), the node with the smallest integrated value of the squared differences is calculated (step S54), and the process ends. As a result, the node (cluster) with the smallest error distance is clustered.

　第３の例では、まず最もユークリッド距離が近いメタクラスタを選択してから、そのメタクラスタに所属する１つ１つのノードとの距離を計算することで、計算リソースの削減と処理速度の高速化が期待できる。
　＜３．第３の実施の形態＞
　　＜３．１．確信度に基づく分取＞
　第２の実施の形態では、画像を使用しない蛍光強度を主とした一般的なＦＣＭ（Ｆｌｏｗ　Ｃｙｔｏｍｅｔｅｒ）観点で説明した。第３の実施の形態では、ＩＦＣＭ（画像フローサイトメータ：Ｉｍａｇｉｎｇ　Ｆｌｏｗ　Ｃｙｔｏｍｅｔｅｒ）に確信度に基づく分取を適用する場合について説明する。 In the third example, a metacluster with the shortest Euclidean distance is first selected, and then the distance to each node belonging to that metacluster is calculated, which is expected to reduce computing resources and increase processing speed.
3. Third embodiment
3.1. Fractionation based on confidence
In the second embodiment, the explanation is given from the viewpoint of a general FCM (Flow Cytometer) that does not use images and focuses mainly on fluorescence intensity. In the third embodiment, the explanation is given for a case where certainty-based sorting is applied to an IFCM (Imaging Flow Cytometer).

　ＩＦＣＭでは、通常のＦＣＭと同様に蛍光強度が測定できることに加えて、１つ１つの細胞の画像を撮影することができる。第３の実施の形態では、蛍光強度または画像を入力として次元圧縮やクラスタリング等で分取したい集団を特定した後（目的変数）、蛍光強度または画像を説明変数として学習する。その後、適切な閾値を設定して分取を実行する。 In IFCM, in addition to being able to measure fluorescence intensity like regular FCM, it is also possible to take images of individual cells. In the third embodiment, the fluorescence intensity or image is used as input to identify the group to be separated using dimensionality reduction, clustering, etc. (objective variable), and then the fluorescence intensity or image is used as the explanatory variable for learning. After that, an appropriate threshold is set and separation is performed.

　ここで、蛍光強度は蛍光補正前のデータでも蛍光補正後のデータでもどちらでも良く、画像については、そのままの画像データでも畳み込みなどの前処理を加えてもどちらでも良い。また、閾値の設定には、＜１．３．閾値の設定＞で説明した方法を採用しても良い。 Here, the fluorescence intensity data may be either data before or after fluorescence correction, and the image may be either the raw image data or data that has undergone preprocessing such as convolution. In addition, the method described in <1.3. Setting the threshold> may be used to set the threshold.

　＜３．２．情報処理装置６００の機能ブロック図＞
　図３３は、第３の実施の形態に係る情報処理装置６００のＩＦＣＭ分取を行う機能ブロック図を示す図である。 <3.2. Functional block diagram of information processing device 600>
FIG. 33 is a functional block diagram showing IFCM fractionation of an information processing device 600 according to the third embodiment.

　図３３に示すように、情報処理装置６００には、測定装置６１１が接続されている。測定装置３１１は、サンプルの測定を行い、測定した測定データに必要なデータを付加し、情報処理装置６００に出力する。測定では、少なくとも測定データのイベント（例えば、細胞１など）の測定を行う。 As shown in FIG. 33, a measuring device 611 is connected to the information processing device 600. The measuring device 611 measures the sample, adds necessary data to the measured measurement data, and outputs the data to the information processing device 600. In the measurement, at least an event of the measurement data (e.g., cell 1, etc.) is measured.

　情報処理装置６００は、取得部６１２、前処理部６１３、決定部６１４、次元圧縮／クラスタリング部６１５、集団特定部６１６、分割部６１７、学習部６１８、推定部６１９、表示部６２０、閾値設定部６２１、分取部６２２を有する。 The information processing device 600 has an acquisition unit 612, a preprocessing unit 613, a determination unit 614, a dimensionality reduction/clustering unit 615, a population identification unit 616, a division unit 617, a learning unit 618, an estimation unit 619, a display unit 620, a threshold setting unit 621, and a fractionation unit 622.

　取得部６１２は、情報処理装置６００の外部の測定装置６１１から複数の測定データを取得する。前処理部６１３は、取得部６１２により測定された測定データに対してダウンサンプリングや目的の集団（ｐｏｐｕｌａｔｉｏｎ）の絞り込みなどを行う。 The acquisition unit 612 acquires multiple pieces of measurement data from a measurement device 611 external to the information processing device 600. The pre-processing unit 613 performs downsampling and narrowing down the target population on the measurement data measured by the acquisition unit 612.

　決定部６１４は、取得部６１２により取得された複数の測定データのうち、測定データに含まれる蛍光データ又は画像データを入力とするかを決定する。次元圧縮／クラスタリング部６１５は、決定部により決定された蛍光データ又は画像データを次元圧縮又はクラスタに分類する。 The determination unit 614 determines whether to input the fluorescence data or image data contained in the measurement data acquired by the acquisition unit 612 from among the multiple measurement data. The dimensionality reduction/clustering unit 615 performs dimensionality reduction or classifies the fluorescence data or image data determined by the determination unit into clusters.

　集団特定部６１６は、次元圧縮／クラスタリング部６１５により分類された次元圧縮された蛍光データ又は画像データ、又は分類されたクラスタから分取対象となる集団を特定する。分割部６１７は、集団特定部６１６により特定された蛍光データ又は画像データを学習用の蛍光データ又は画像データと、検証用の前記蛍光データ又は画像データとに分割する。 The population identification unit 616 identifies a population to be separated from the dimensionally compressed fluorescent data or image data classified by the dimensionality compression/clustering unit 615, or from the classified clusters. The division unit 617 divides the fluorescent data or image data identified by the population identification unit 616 into fluorescent data or image data for learning and the fluorescent data or image data for verification.

　学習部６１８は、分割部６１７により分割された学習用の複数の測定データを使用して学習を行い、学習モデルを生成する。推定部６１９は、集団特定部６１６により特定された集団に含まれる測定データのうち、検証用の複数の測定データについて検証用の複数の測定データに対する推定及び推定に対する確信度を推定する。具体的には、推定部６１９は、学習部６１８により生成された学習モデルにより検証用の蛍光データ又は画像データの確信度を推定する。 The learning unit 618 performs learning using the multiple pieces of measurement data for learning split by the splitting unit 617, and generates a learning model. The estimation unit 619 estimates an estimate for multiple pieces of measurement data for verification among the measurement data included in the population identified by the population identification unit 616, and estimates the confidence level of the estimate. Specifically, the estimation unit 619 estimates the confidence level of the fluorescence data or image data for verification using the learning model generated by the learning unit 618.

　表示部６２０は、検証用の測定データの純度、効率の他、必要に応じて検証用の測定データ、閾値、分類（クラス）、閾値、モード、などを画面に表示する。 The display unit 620 displays the purity and efficiency of the measurement data for verification, as well as the measurement data for verification, thresholds, classification (class), thresholds, mode, etc., as necessary.

　閾値設定部６２１は、推定部６１９により推定された確信度に対して、取得部６１２により取得された複数の測定データを分類するための閾値を設定する。分取部６２２は、閾値設定部６２１により設定された閾値に基づいて、次元圧縮／クラスタリング部６１５により分類された次元圧縮された蛍光データ又は画像データ、又は分類されたクラスタに含まれる測定データを分取の対象とする測定データとして分取する。 The threshold setting unit 621 sets a threshold for classifying the multiple measurement data acquired by the acquisition unit 612 based on the confidence level estimated by the estimation unit 619. The sorting unit 622 sorts the dimensionally compressed fluorescent data or image data classified by the dimensionality compression/clustering unit 615, or the measurement data included in the classified cluster, as the measurement data to be sorted, based on the threshold set by the threshold setting unit 621.

　分取部６２２は、取得部６１２により取得された複数の測定データのうち、検証用の前記蛍光データ又は前記画像データ及び前記学習用の前記蛍光データ又は前記画像データ以外の残りの蛍光データ又は画像データを分取の対象として分取する。 The sorting unit 622 sorts out the remaining fluorescence data or image data other than the fluorescence data or image data for verification and the fluorescence data or image data for learning from the multiple measurement data acquired by the acquisition unit 612.

　具体的には、分取部６２２は、により分類されたクラスタに含まれる複数の測定データの全ての測定値が代表値±閾値に収まっていれば、により分類されたクラスタに含まれるサンプリングデータを分取の対象として分取する。 Specifically, if all the measurement values of the multiple measurement data included in the cluster classified by are within the representative value ±threshold, the sorting unit 622 sorts the sampling data included in the cluster classified by as the target for sorting.

　分取部６２２は、により分類されたクラスタに含まれる複数の測定データの全ての測定値が代表値±代表値×閾値に収まっていれば、クラスタリング部により分類されたクラスタに含まれるサンプリングデータを分取の対象として分取しても良い。 The fractionation unit 622 may fractionate the sampling data included in the cluster classified by the clustering unit as a fractionation target if all the measurement values of the multiple measurement data included in the cluster classified by fall within the range of the representative value ± the representative value × the threshold value.

　＜３．３．動作説明＞
　図３４は、第３の実施の形態に係る情報処理装置６００におけるＩＦＣＭ分取を説明するためのフローチャートである。 <3.3. Operation Description>
FIG. 34 is a flowchart for explaining IFCM sorting in information processing device 600 according to the third embodiment.

　まず、測定装置６１１にサンプルの一部が流されて一部の複数のサンプルが測定される（ステップＳ１３１）。次に、測定された一部の複数のサンプルの測定データのダウンサンプリングや目的の集団の絞り込みなどの前処理が行われる（ステップＳ１３２）。 First, a portion of the sample is passed through the measuring device 611 and a portion of the multiple samples is measured (step S131). Next, pre-processing such as downsampling of the measurement data of the portion of the multiple samples that have been measured and narrowing down of the target group is performed (step S132).

　次に、前処理が行われた一部の複数の測定データの蛍光又は画像のどちらを入力とするかが決定される（ステップＳ１３３）。そして、ステップＳ３３において決定された蛍光又は画像について次元圧縮及びクラスタリングが行われる（ステップＳ１３４）。次に、ステップＳ３４においてクラスタリングが行われたクラスタのうち、分取対象となる集団が特定される（ステップＳ１３５）。ここで、「集団」は蛍光又は画像について次元圧縮された島であり、この島となっている次元圧縮された蛍光又は画像がゲーティングされる。 Next, it is determined whether to input the fluorescence or the image of the portion of the multiple measurement data that has been preprocessed (step S133). Then, dimensionality reduction and clustering are performed on the fluorescence or image determined in step S33 (step S134). Next, a population to be sorted is identified from the clusters that have been clustered in step S34 (step S135). Here, a "population" is an island that has been dimensionally reduced for the fluorescence or image, and the dimensionality reduced fluorescence or image that constitutes this island is gated.

　ここで、次元圧縮やクラスタリングの入力データや学習時の説明変数は、スペクトルなど蛍光補正前の生の値を使っても良いし、蛍光補正後のデータであっても良い。画像を使用する場合は生のデータを使ってもいいし、畳み込みなどの前処理をしてから使用しても良い。また、蛍光補正をする際に逆行列計算を行うが、その際にガウスジョルダン法を用いて解いても良い。また、前処理としてバッチ効果を抑える目的で正規化などのアルゴリズムを用いても良い。 Here, the input data for dimensionality reduction and clustering, and the explanatory variables during learning, may be raw values before fluorescence correction, such as spectra, or may be data after fluorescence correction. When using images, the raw data may be used, or it may be used after preprocessing such as convolution. In addition, an inverse matrix calculation is performed when performing fluorescence correction, and the Gauss-Jordan method may be used to solve this. Furthermore, an algorithm such as normalization may be used as preprocessing to suppress batch effects.

　次に、ステップＳ１３５において特定された集団に含まれる複数の測定データを学習用の複数の測定データと、検証用の複数の測定データとに分割する（ステップＳ１３６）。 Next, the multiple measurement data included in the population identified in step S135 are divided into multiple measurement data for learning and multiple measurement data for validation (step S136).

　次に、分割された学習用の複数の測定データを使用して、蛍光又は画像を説明変数として学習を行い、学習モデルを生成する（ステップＳ１３７）。そして、生成された学習モデルを使用して、検証用の複数の測定データについて検証用の複数の測定データの正解に対する推定及び推定に対する確信度を推定する（ステップＳ１３８）。 Next, the divided multiple pieces of measurement data for learning are used to perform learning using the fluorescence or images as explanatory variables to generate a learning model (step S137). Then, the generated learning model is used to estimate the correct answer for the multiple pieces of measurement data for validation and the confidence level for the estimate for the multiple pieces of measurement data for validation (step S138).

　そして、推定された確信度に対する閾値が設定される（ステップＳ１３９）。次に、ユーザは、表示部３２０に表示された純度及び効率の値及び測定データのプロットの様子などを確認し（ステップＳ１４０）、閾値の設定が妥当でなければ（ステップＳ１４０のＮＧ）、ステップＳ１３９の処理に戻り、再度閾値の設定が行われる。 Then, a threshold value for the estimated confidence level is set (step S139). Next, the user checks the purity and efficiency values and the plot of the measurement data displayed on the display unit 320 (step S140), and if the threshold setting is not appropriate (NG in step S140), the process returns to step S139, and the threshold setting is performed again.

　一方、閾値の設定が妥当の場合（ステップＳ１４０のＯＫ）、残りの測定データが流され（ステップＳ１４１）、残りのサンプルについて測定された残りのサンプルについて、クラスタへの分取が行われる（ステップＳ１４２）。次に、設定された閾値に基づいて、分類されたクラスタに含まれる残りの測定データのうち、分取の対象とする測定データを確信度に基づいて分取する（ステップＳ１４３）。 On the other hand, if the threshold setting is appropriate (OK in step S140), the remaining measurement data is sent (step S141), and the remaining samples measured are sorted into clusters (step S142). Next, from the remaining measurement data contained in the classified clusters based on the set threshold, the measurement data to be sorted is sorted based on the confidence level (step S143).

　＜３．４．変形例＞
　第３の実施の形態では、情報処理装置６００が残りの測定データについて分類を行う場合について説明したが、残りの測定データについての分類は、処理に時間を要するため、測定装置６１１側で行っても良い。 3.4. Modifications
In the third embodiment, the case where the information processing device 600 classifies the remaining measurement data has been described. However, since classification of the remaining measurement data takes time, it may be performed on the measuring device 611 side.

　図３５は、第３の実施の形態の変形例に係る情報処理システムの機能ブロック図である。なお、図３０と同一部分には同一符号を付して説明する。図３５に示すように、情報処理装置６００に設けられていた分取部６２２が測定装置６１１に設けられても良い。 FIG. 35 is a functional block diagram of an information processing system according to a modified example of the third embodiment. Note that the same parts as those in FIG. 30 are denoted by the same reference numerals. As shown in FIG. 35, the fractionation unit 622 provided in the information processing device 600 may be provided in the measurement device 611.

　図３５に示すように、閾値設定部６２１により設定された閾値、次元圧縮／クラスタリング部６１５によりクラスタリングされたクラスタが情報処理装置６００から測定装置６１１に出力される。 As shown in FIG. 35, the threshold set by the threshold setting unit 621 and the clusters clustered by the dimensionality reduction/clustering unit 615 are output from the information processing device 600 to the measurement device 611.

　測定装置６１１の分取部６２２は、情報処理装置６００から出力された閾値及びクラスタリングされたクラスタを受信し、受信した閾値を利用してクラスタに含まれる測定データを分取する。 The fractionation unit 622 of the measurement device 611 receives the threshold and the clustered clusters output from the information processing device 600, and fractionates the measurement data contained in the clusters using the received threshold.

　第３の実施の形態の変形例に係る情報処理システムによれば、ＩＦＣＭ分取を適切に行うことができる。 The information processing system according to the modified example of the third embodiment allows for proper IFCM fractionation.

　＜４．ハードウェア構成＞
　図３６は、実施の形態に係る情報処理装置２０、３００、４００、６００、測定装置３１１、４１１、６１１の演算装置を実現するコンピュータの一例を示すハードウェア構成図である。 4. Hardware Configuration
FIG. 36 is a hardware configuration diagram showing an example of a computer that realizes the arithmetic unit of the information processing devices 20, 300, 400, 600 and the measurement devices 311, 411, 611 according to the embodiment.

　コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ（ＲＥＡＤ　ＯＮＬＹ　ＭＥＭＯＲＹ）１３００、ＨＤＤ（ＨＡＲＤ　ＤＩＳＫ　ＤＲＩＶＥ）１４００、通信インターフェース１５００、及び入出力インターフェース１６００を有する。コンピュータ１０００の各部は、バス１０５０によって接続される。 Computer 1000 has a CPU 1100, RAM 1200, ROM (READ ONLY MEMORY) 1300, HDD (HARD DISK DRIVE) 1400, communication interface 1500, and input/output interface 1600. Each part of computer 1000 is connected by bus 1050.

　ＣＰＵ１１００は、ＲＯＭ１３００又はＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。例えば、ＣＰＵ１１００は、ＲＯＭ１３００又はＨＤＤ１４００に格納されたプログラムをＲＡＭ１２００に展開し、各種プログラムに対応した処理を実行する。 The CPU 1100 operates based on the programs stored in the ROM 1300 or the HDD 1400 and controls each component. For example, the CPU 1100 loads the programs stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processes corresponding to the various programs.

　ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるＢＩＯＳ（ＢＡＳＩＣ　ＩＮＰＵＴ　ＯＵＴＰＵＴ　ＳＹＳＴＥＭ）等のブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The ROM 1300 stores boot programs such as the BIOS (BASIC INPUT OUTPUT SYSTEM) that is executed by the CPU 1100 when the computer 1000 starts up, as well as programs that depend on the hardware of the computer 1000.

　ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を非一時的に記録する、コンピュータが読み取り可能な記録媒体である。具体的には、ＨＤＤ１４００は、プログラムデータ１４５０の一例である本開示に係るアプリケーションプログラムを記録する記録媒体である。 HDD 1400 is a computer-readable recording medium that non-temporarily records programs executed by CPU 1100 and data used by such programs. Specifically, HDD 1400 is a recording medium that records application programs related to the present disclosure, which are an example of program data 1450.

　通信インターフェース１５００は、コンピュータ１０００が外部ネットワーク１５５０（例えばインターネット）と接続するためのインターフェースである。例えば、ＣＰＵ１１００は、通信インターフェース１５００を介して、他の機器からデータを受信したり、ＣＰＵ１１００が生成したデータを他の機器へ送信したりする。 The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (e.g., the Internet). For example, the CPU 1100 receives data from other devices and transmits data generated by the CPU 1100 to other devices via the communication interface 1500.

　入出力インターフェース１６００は、入出力デバイス１６５０とコンピュータ１０００とを接続するためのインターフェースである。例えば、ＣＰＵ１１００は、入出力インターフェース１６００を介して、キーボードやマウス等の入力デバイスからデータを受信する。また、ＣＰＵ１１００は、入出力インターフェース１６００を介して、ディスプレイやスピーカやプリンタ等の出力デバイスにデータを送信する。また、入出力インターフェース１６００は、所定の記録媒体（メディア）に記録されたプログラム等を読み取るメディアインターフェイスとして機能しても良い。メディアとは、例えばＤＶＤ（ＤＩＧＩＴＡＬ　ＶＥＲＳＡＴＩＬＥ　ＤＩＳＣ）、ＰＤ（ＰＨＡＳＥ　ＣＨＡＮＧＥ　ＲＥＷＲＩＴＡＢＬＥ　ＤＩＳＫ）等の光学記録媒体、ＭＯ（ＭＡＧＮＥＴＯ－ＯＰＴＩＣＡＬ　ＤＩＳＫ）等の光磁気記録媒体、テープ媒体、磁気記録媒体、又は半導体メモリ等である。 The input/output interface 1600 is an interface for connecting the input/output device 1650 and the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. The CPU 1100 also transmits data to an output device such as a display, a speaker or a printer via the input/output interface 1600. The input/output interface 1600 may also function as a media interface that reads programs and the like recorded on a specific recording medium. Media include, for example, optical recording media such as DVD (DIGITAL VERSATILE DISC) and PD (PHASE CHANGE REWRITABLE DISK), magneto-optical recording media such as MO (MAGNETO-OPTICAL DISK), tape media, magnetic recording media, or semiconductor memory.

　なお、ＣＰＵ１１００は、プログラムデータ１４５０をＨＤＤ１４００から読み取って実行するが、他の例として、外部ネットワーク１５５０を介して、他の装置からこれらのプログラムを取得しても良い。 Note that the CPU 1100 reads and executes the program data 1450 from the HDD 1400, but as another example, the CPU 1100 may obtain these programs from other devices via the external network 1550.

　以上、添付図面を参照しながら本開示の好適な実施の形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、請求の範囲に記載された技術的思想の範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 The above describes in detail preferred embodiments of the present disclosure with reference to the attached drawings, but the technical scope of the present disclosure is not limited to such examples. It is clear that a person with ordinary knowledge in the technical field of the present disclosure can conceive of various modified or revised examples within the scope of the technical ideas described in the claims, and it is understood that these also naturally fall within the technical scope of the present disclosure.

　また、本明細書に記載された効果は、あくまで説明的又は例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、又は上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 Furthermore, the effects described in this specification are merely descriptive or exemplary and are not limiting. In other words, the technology disclosed herein may achieve other effects that are apparent to a person skilled in the art from the description in this specification, in addition to or in place of the above effects.

　なお、本技術は以下のような構成も取ることができる。
［１］
　サンプルに含まれる生体由来粒子から測定された測定データを取得する取得部と、
　前記取得部により取得された前記測定データにデータ圧縮処理を行う圧縮部と、
　前記圧縮部により圧縮された測定データを学習用測定データと、検証用測定データとにゲートし、前記学習用測定データにラベルを付加するゲート部と、
　前記学習用測定データと、前記ラベルとを用いて学習モデルを構築する学習部と、
　前記学習モデルに前記検証用測定データを入力し、前記検証用測定データの確信度を出力する推定部と、
　前記確信度に基づいて前記サンプルを分取するための閾値を設定する閾値設定部と
を有する生体粒子分析システム。
［２］
　前記出力された確信度及び前記閾値に基づく、前記生体由来粒子の効率および収率を表示する表示部を有する、
［１］に記載の生体粒子分析システム。
［３］
　前記測定データに含まれる前記学習用測定データと前記検証用測定データは互いに異なる
前記［１］又は［２］に記載の生体粒子分析システム。
［４］
　分取用生体由来粒子から測定された測定データを前記学習モデルに入力し、前記分取用生体由来粒子が分取対象であるかを推論し、分取対象であると推論した場合に前記閾値設定部で設定された閾値に基づいて分取判断する判定部を有する生体由来粒子分取装置を含む、
前記［１］～［３］のいずれか１つに記載の生体粒子分析システム。
［５］
　前記生体由来粒子分取装置は、前記判定部の分取判断に基づいて、分取対象粒子を分取する分取部を含む、
前記［４］に記載の生体粒子分析システム。
［６］
　前記分取用生体由来粒子は前記サンプルに含まれる
前記［４］に記載の生体粒子分析システム。
［７］
　前記閾値は、予め定められた閾値が設定される
前記［１］～［６］のいずれか１つに記載の生体粒子分析システム。
［８］
　前記予め定められた閾値は、１以上のモードに応じて決定される
前記［７］に記載の生体粒子分析システム。
［９］
　前記閾値は、ユーザにより設定される
前記［１］～［６］のいずれか１つに記載の生体粒子分析システム。
［１０］
　前記データ圧縮処理は次元圧縮であり、
　前記次元圧縮の後に、分取対象範囲が決定される
前記［１］～［９］のいずれか１つに記載の生体粒子分析システム。
［１１］
　サンプルに含まれる生体由来粒子から測定された測定データにデータ圧縮処理を行う圧縮部と、
　前記圧縮部により圧縮された測定データを学習用測定データと、検証用測定データとにゲートし、前記学習用測定データにラベルを付加するゲート部と、
　前記学習用測定データと、前記ラベルとを用いて前記生体由来粒子が分取対象であるかどうかを判別する学習モデルを構築する学習部と、
　前記学習部によって構築された学習モデルに前記検証用測定データを入力し、分取対象であるかどうかを推論する推論部と、
　前記推論に使用された前記検証用測定データの確信度を算出する確信度算出部と、
　前記確信度算出部によって算出された前記確信度に基づいて前記サンプルを分取するための閾値を設定する閾値設定部と、
を有する情報処理装置。
［１２］
　前記構築された学習モデルを微小粒子分取装置に出力する、
前記［１１］に記載の情報処理装置。
［１３］
　サンプルに含まれる生体由来粒子から測定された測定データにデータ圧縮処理を行う圧縮工程と、
　前記データ圧縮処理された測定データを学習用測定データと、検証用測定データとにゲートし、前記学習用測定データにラベルを付加するゲート工程と、
　前記学習用測定データと、前記ラベルとを用いて前記生体由来粒子が分取対象であるかどうかを判別する学習モデルを構築する学習工程と
　前記学習工程によって構築された学習モデルに前記検証用測定データを入力し、分取対象であるかどうかを推論する推論工程と、
　前記推論に使用された前記検証用測定データの確信度を算出する確信度算出工程と、
　前記確信度算出工程によって算出された前記確信度に基づいて前記サンプルを分取するための閾値を設定する閾値設定工程と、
を有する情報処理方法。
［１４］
　前記測定データが微小粒子分析装置を用いて測定される測定工程を含む
前記［１３］に記載の情報処理方法。
［１５］
　前記微小粒子分析装置において分取用生体由来粒子から測定された光情報を、前記学習工程によって構築された学習モデルに入力し、前記分取用生体由来粒子が分取対象であるかを推論し、分取対象であると推論した場合に前記閾値設定工程により設定された前記閾値に基づいて分取判断する工程をさらに含む
前記［１４］に記載の情報処理方法。
［１６］
　前記分取判断に基づいて分取対象粒子を分取する工程をさらに含む
前記［１５］に記載の情報処理方法。
［１７］
　サンプルに含まれる生体由来粒子から測定された光情報を含む複数の測定データを取得する取得部と、
　前記取得部により取得された複数の前記測定データを複数のクラスタに分類するクラスタリング部と、
　前記クラスタリング部により分類された前記クラスタから分取対象となるクラスタを選択するクラスタ選択部と、
　前記クラスタ選択部により選択された前記クラスタに含まれる複数の前記測定データに基づいて閾値を設定する閾値設定部と
を有する情報処理装置。
［１８］
　前記クラスタリング部は、
　前記取得部により取得された複数の前記測定データをクラスタに分類し、
　前記閾値設定部により設定された前記閾値に基づいて、前記クラスタリング部により分類された前記クラスタに含まれる測定データのうち、分取の対象とする前記測定データを分取する分取部
を有する前記［１７］に記載の情報処理装置。
［１９］
　前記閾値設定部は、
　前記クラスタの代表値に対する閾値又は前記クラスタ選択部により選択された前記クラスタに含まれる前記複数の前記測定データの中央値の前記閾値を設定する
前記［１７］又は［１８］に記載の情報処理装置。 The present technology can also be configured as follows.
[1]
an acquisition unit that acquires measurement data measured from biogenic particles contained in a sample;
a compression unit that performs a data compression process on the measurement data acquired by the acquisition unit;
a gate unit that gates the measurement data compressed by the compression unit into training measurement data and verification measurement data and adds a label to the training measurement data;
A learning unit that constructs a learning model using the learning measurement data and the labels;
an estimation unit that inputs the verification measurement data to the learning model and outputs a confidence level of the verification measurement data;
and a threshold setting unit that sets a threshold for separating the sample based on the degree of certainty.
[2]
a display unit that displays the efficiency and yield of the biological particles based on the output confidence level and the threshold value;
The bioparticle analysis system according to [1].
[3]
The bioparticle analysis system according to claim 1 or 2, wherein the learning measurement data and the verification measurement data included in the measurement data are different from each other.
[4]
a determination unit that inputs measurement data measured from the biological particles for sorting into the learning model, infers whether the biological particles for sorting are targets for sorting, and, when it is inferred that the biological particles for sorting are targets for sorting, determines whether the biological particles for sorting are targets for sorting based on a threshold value set by the threshold setting unit;
The bioparticle analysis system according to any one of [1] to [3].
[5]
the biogenic particle sorting device includes a sorting unit that sorts particles to be sorted based on the sorting determination of the determination unit,
The bioparticle analysis system according to [4].
[6]
The biological particle analysis system according to [4], wherein the biological particles for separation are contained in the sample.
[7]
The biological particle analysis system according to any one of [1] to [6], wherein the threshold value is set to a predetermined threshold value.
[8]
The bioparticle analysis system according to [7], wherein the predetermined threshold is determined according to one or more modes.
[9]
The bioparticle analysis system according to any one of [1] to [6], wherein the threshold value is set by a user.
[10]
The data compression process is a dimensional compression process,
The bioparticle analysis system according to any one of [1] to [9], wherein a range to be sorted is determined after the dimensionality reduction.
[11]
a compression unit that performs a data compression process on measurement data obtained by measuring the biogenic particles contained in a sample;
a gate unit that gates the measurement data compressed by the compression unit into training measurement data and verification measurement data and adds a label to the training measurement data;
a learning unit that uses the learning measurement data and the label to construct a learning model for determining whether the biogenic particles are to be sorted;
an inference unit that inputs the verification measurement data into a learning model constructed by the learning unit and infers whether the data is a target for collection;
a certainty factor calculation unit that calculates a certainty factor of the verification measurement data used in the inference;
a threshold setting unit that sets a threshold for dividing the sample based on the certainty calculated by the certainty calculation unit;
An information processing device having the above configuration.
[12]
The constructed learning model is output to a microparticle sorting device.
The information processing device according to [11].
[13]
a compression step of performing a data compression process on the measurement data measured from the biogenic particles contained in the sample;
a gating step of gating the data-compressed measurement data into training measurement data and verification measurement data, and adding a label to the training measurement data;
a learning step of constructing a learning model for determining whether the biological particle is a separation target by using the learning measurement data and the label; and an inference step of inputting the verification measurement data into the learning model constructed by the learning step and inferring whether the biological particle is a separation target.
a confidence level calculation step of calculating a confidence level of the verification measurement data used in the inference;
a threshold setting step of setting a threshold for dividing the sample based on the certainty calculated by the certainty calculation step;
An information processing method comprising the steps of:
[14]
The information processing method according to the above [13], further comprising a measuring step in which the measurement data is measured using a microparticle analysis device.
[15]
The information processing method according to item [14], further comprising the steps of: inputting optical information measured from the biological particles for separation in the microparticle analysis device into a learning model constructed by the learning step; inferring whether the biological particles for separation are targets for separation; and, when it is inferred that the biological particles for separation are targets for separation, making a separation determination based on the threshold value set by the threshold setting step.
[16]
The information processing method according to [15] above, further comprising the step of separating particles to be separated based on the separation determination.
[17]
an acquisition unit that acquires a plurality of pieces of measurement data including optical information measured from biogenic particles contained in a sample;
a clustering unit that classifies the plurality of pieces of measurement data acquired by the acquisition unit into a plurality of clusters;
a cluster selection unit that selects a cluster to be collected from the clusters classified by the clustering unit;
a threshold setting unit that sets a threshold based on the plurality of pieces of measurement data included in the cluster selected by the cluster selection unit.
[18]
The clustering unit includes:
Classifying the plurality of pieces of measurement data acquired by the acquisition unit into clusters;
The information processing device according to [17], further comprising a fractionation unit that fractionates the measurement data to be fractionated from the measurement data included in the clusters classified by the clustering unit based on the threshold value set by the threshold setting unit.
[19]
The threshold setting unit is
The information processing device according to any one of claims 17 to 18, further comprising: setting a threshold for a representative value of the cluster or a threshold for a median value of the plurality of pieces of measurement data included in the cluster selected by the cluster selection unit.

　３００、４００、６００　情報処理装置
　３１１、４１１、６１１　測定装置
　３１２、４１２、６１２　取得部
　３１３、４１３、６１３　前処理部
　３１４　次元圧縮部
　３１５　ゲート部
　３１６　分割部
　３１７、６１８　学習部
　３１８、６１９　推定部
　３１９、４１７、６２１　閾値設定部
　３２０、４１６、６２０　表示部
　３２１、４１８、６２２　分取部
　４１４　クラスリング及びクラスタリング部
　４１５　クラスタ選択部
　６１４　決定部
　６１５　次元圧縮／クラスタリング部
　６１６　集団特定部 300, 400, 600 Information processing device 311, 411, 611 Measuring device 312, 412, 612 Acquisition unit 313, 413, 613 Preprocessing unit 314 Dimensional compression unit 315 Gating unit 316 Division unit 317, 618 Learning unit 318, 619 Estimation unit 319, 417, 621 Threshold setting unit 320, 416, 620 Display unit 321, 418, 622 Sorting unit 414 Classifying and clustering unit 415 Cluster selection unit 614 Determination unit 615 Dimensional compression/clustering unit 616 Population identification unit

Claims

an acquisition unit that acquires measurement data measured from biogenic particles contained in a sample;
a compression unit that performs a data compression process on the measurement data acquired by the acquisition unit;
a gate unit that gates the measurement data compressed by the compression unit into training measurement data and verification measurement data and adds a label to the training measurement data;
A learning unit that constructs a learning model using the learning measurement data and the labels;
an estimation unit that inputs the verification measurement data to the learning model and outputs a confidence level of the verification measurement data;
and a threshold setting unit that sets a threshold for separating the sample based on the degree of certainty.

a display unit that displays the efficiency and yield of the biological particles based on the output confidence level and the threshold value;
The bioparticle analysis system according to claim 1 .

The bioparticle analysis system according to claim 1 , wherein the learning measurement data and the verification measurement data included in the measurement data are different from each other.

a determination unit that inputs measurement data measured from the biological particles for sorting into the learning model, infers whether the biological particles for sorting are targets for sorting, and, when it is inferred that the biological particles for sorting are targets for sorting, determines whether the biological particles for sorting are targets for sorting based on a threshold value set by the threshold setting unit;
The bioparticle analysis system according to claim 1 .

the biogenic particle sorting device includes a sorting unit that sorts particles to be sorted based on the sorting determination of the determination unit,
The bioparticle analysis system according to claim 4 .

The biological particle analysis system according to claim 4 , wherein the biological particles for separation are contained in the sample.

The bioparticle analysis system according to claim 1 , wherein the threshold value is set to a predetermined threshold value.

The bioparticle analysis system according to claim 7 , wherein the predetermined threshold is determined according to one or more modes.

The bioparticle analysis system according to claim 1 , wherein the threshold value is set by a user.

The data compression process is a dimensional compression process,
The bioparticle analysis system according to claim 1 , wherein a fractionation target range is determined after the dimensionality reduction.

a compression unit that performs a data compression process on measurement data obtained by measuring the biogenic particles contained in a sample;
a gate unit that gates the measurement data compressed by the compression unit into training measurement data and verification measurement data and adds a label to the training measurement data;
a learning unit that uses the learning measurement data and the label to construct a learning model for determining whether the biogenic particles are to be sorted;
an inference unit that inputs the verification measurement data into a learning model constructed by the learning unit and infers whether the data is a target for collection;
a certainty factor calculation unit that calculates a certainty factor of the verification measurement data used in the inference;
a threshold setting unit that sets a threshold for dividing the sample based on the certainty calculated by the certainty calculation unit;
An information processing device having the above configuration.

The constructed learning model is output to a microparticle sorting device.
The information processing device according to claim 11.

a compression step of performing a data compression process on the measurement data measured from the biogenic particles contained in the sample;
a gating step of gating the data-compressed measurement data into training measurement data and verification measurement data, and adding a label to the training measurement data;
a learning step of constructing a learning model for determining whether the biological particle is a separation target by using the learning measurement data and the label; and an inference step of inputting the verification measurement data into the learning model constructed by the learning step and inferring whether the biological particle is a separation target.
a confidence level calculation step of calculating a confidence level of the verification measurement data used in the inference;
a threshold setting step of setting a threshold for dividing the sample based on the certainty calculated by the certainty calculation step;
An information processing method comprising the steps of:

The information processing method according to claim 13 , further comprising a measuring step in which the measurement data is measured using a microparticle analysis device.

15. The information processing method according to claim 14, further comprising the steps of: inputting optical information measured from the biological particles for separation in the microparticle analysis device into a learning model constructed by the learning step; inferring whether the biological particles for separation are targets for separation; and, when it is inferred that the biological particles for separation are targets for separation, making a separation determination based on the threshold value set by the threshold setting step.

The information processing method according to claim 15 , further comprising the step of separating particles to be separated based on the separation determination.

an acquisition unit that acquires a plurality of pieces of measurement data including optical information measured from biogenic particles contained in a sample;
a clustering unit that classifies the plurality of pieces of measurement data acquired by the acquisition unit into a plurality of clusters;
a cluster selection unit that selects a cluster to be collected from the clusters classified by the clustering unit;
a threshold setting unit that sets a threshold based on the plurality of pieces of measurement data included in the cluster selected by the cluster selection unit.

The clustering unit includes:
Classifying the plurality of pieces of measurement data acquired by the acquisition unit into clusters;
18. The information processing device according to claim 17, further comprising a sorting unit that sorts the measurement data to be sorted out from the measurement data included in the clusters classified by the clustering unit based on the threshold value set by the threshold setting unit.

The threshold setting unit is
The information processing apparatus according to claim 17 , wherein the threshold is set to a representative value of the cluster or a median value of the plurality of pieces of measurement data included in the cluster selected by the cluster selection unit.