WO2003070938A1

WO2003070938A1 - Gene expression data analyzer, and method, program and recording medium for gene expression data analysis

Info

Publication number: WO2003070938A1
Application number: PCT/JP2003/001900
Authority: WO
Inventors: Nobukazu Ono; Yoshiyuki Takahara; Quingwei Zhang; Hiroshi Tanaka
Original assignee: Ajinomoto Co Inc
Current assignee: Ajinomoto Co Inc
Priority date: 2002-02-21
Filing date: 2003-02-21
Publication date: 2003-08-28
Anticipated expiration: 2004-08-21
Also published as: AU2003211240A1; JPWO2003070938A1; JP4438414B2

Abstract

Gene expression doses expressed in fluorometric data, which are measured by an experiment with the use of DNA microarrays and DNA chips, of a comparative group and a control group are corrected based on a novel mathematic model. Based on the scatter plots thus corrected, a novel X-Y axis system having an x axis proprotional to the fluorescence intensity of genes is constructed. Next, windows each having a definite number of genes are made along the X-axis and the reliability limit of the arbitrary risk is determined in each window in accordance with Student’s t-distribution. Then windows are shifted by a definite number of genes in the X-axis direction and each reliability limit is determined. The plural reliability limits thus determined are complemented by smoothening (spline curve) to give a reliability curve of expression variation. Subsequently, genes located outside the reliability curve of expression variation thus obtained are extracted as variation genes.

Description

明細書遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体技術分野 Description Gene expression information analyzer, gene expression information analysis method, program, and recording medium

本発明は、遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体に関し、特に、 D NAマイクロアレイや D N Aチップなどの測定値データのバックグラウンド捕正を行い、発現量が変ィ匕した遺伝子を統計的に高い信頼度で抽出することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体に関する。背景技術 The present invention relates to a gene expression information analysis apparatus, a gene expression information analysis method, a program, and a recording medium. In particular, the present invention performs background correction of measured value data of a DNA microarray, a DNA chip, and the like to change the expression level. TECHNICAL FIELD The present invention relates to a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can statistically extract a ridden gene. Background art

分子生物学の研究、新薬の研究開発、臨床診断などにおいて、メッセンジャー R .N Aの発現量が変化した遺伝子を探索すること、および、その遺伝子を同定することは非常に重要である。そこで、現在 R N Aレべノレでの発現変化を調べる方法として、メッセンジャー R NAから逆転写酵素を用いて逆転写した c D NA断片をスラィドガラス上に高密度に固定ィヒした D NAマイクロアレイ、および、微細加工技術を用いて多種類のオリゴヌクレオチドを基板上に合成したァフィメトリタス社（会社名）の D NAチップ（商品名）が注目を集め、利用されている。 In the research of molecular biology, research and development of new drugs, and clinical diagnosis, it is very important to search for genes whose messenger RNA expression level has changed and to identify those genes. Therefore, as a method for examining changes in expression in RNA levels, a DNA microarray in which cDNA fragments reverse-transcribed from messenger RNA using reverse transcriptase are immobilized on slide glass at high density is used. In addition, the DNA chip (trade name) of Affimetritas (company name), which synthesizes various types of oligonucleotides on a substrate using microfabrication technology, has attracted attention and is being used.

これらの D NAマイクロアレイや D NAチップを用いた発現遺伝子解析法は、数百から数万遺伝子に対して一度に網羅的に発現量が変動した遺伝子を同定するのに有効であり、現在、一般的に測定値の補正方法は、バックグラウンド捕正工程、および、ノ一マライズ工程とよばれる大きく二つの工程を含んでいる。 Expression gene analysis methods using these DNA microarrays and DNA chips are effective for identifying genes whose expression levels fluctuate comprehensively for hundreds to tens of thousands of genes at once. In general, the method of correcting measured values includes two major steps called a background correction step and a normalization step.

バックグラウンド補正工程では、単純に個々の測定ィ直からブランクのスポットの平均バックグラウンド値、あるいは、各スポットの周囲の領域のバックグラウンド値を、スポットの蛍光強度測定値から引くことによってバックグラウンドの補正を行なう方法が主に用いられている。 In the background correction step, the average background value of a blank spot or the background value of the area surrounding each spot is simply subtracted from the measured fluorescence intensity of each spot directly from each measurement line. Correction The method used is mainly used.

一方、ノーマライズ工程は、最小自乗法や L owe s s平滑ィヒ（近傍領域に対応してバンド幅を用いた局所二次推定量）などで求めたノンパラメトリック回帰直泉を蛍光強度散布図（スキヤッタープロット）の Y = X直線に変換する係数で全ての遺伝子の測定値を補正する手法を用いている。 On the other hand, the normalization process uses a non-parametric regression straight spring obtained by the least squares method or Lowess sssig (a local quadratic estimator using the bandwidth corresponding to the neighboring area), and the like. A method is used in which the measured values of all genes are corrected using the coefficients converted to the Y = X line in the Yatter plot.

しかしながら、 D Ν Αマイクロアレイや D Ν Αチップを用いた発現遺伝子解析法は、信頼度の高い測定値の解析手法が確立されていないという問題点を有してレ、た。以下この問題点について具体的に説明する。 However, the expression gene analysis method using a DΝ microarray or a DΑ chip has a problem that a highly reliable analysis method of measured values has not been established. Hereinafter, this problem will be specifically described.

まず、従来の補正法は、測定装置、標本間の誤差、および、蛍光標識効率などの違いにより容易に影響を受けるという問題点を有している。また、ノーマライズェ程においては、最小自乗法は厳密には回帰直線が 2本引けてしまい、一方、 L ow e s s平滑化（Du d o i t S, Ya n g YH, C a l l ow MJ, S p e e d TP (2000) S t a t i s t i c a l me t h o d s f o r i d e n t i f y i n g d i f f e r e n t i a l l y e p r e s s e d g e n e s i n r e p l i c a t e d c DN A m i c r o a r r a y e x p e r i me n t s. Te c hn i c a l r e p o r t, D e p a r t m e n t o f S t a t i s t, i c s, UC— B e r k e l e y. h t t p : Zwww. s t a t. b e r k e l e y. e d u/u s e r s/t e r r y/z a r r a y/H t m l/p a p e r s i nd e x, h tm l 等) は経験貝 U に基づく正規化処理であり根拠のないものに過ぎないとレ、う問題点を有していた。さらに、発現量が変動した遺伝子の抽出法においては、従来の基準では任意の倍率以上の補正蛍光強度比を示した遺伝子を発現に差がある遺伝子として抽出しており、その基準となる倍率は、無根拠に 2倍、 3倍などに設定されていた（Ch e n Y, Do u g h e r t y ER, B i t t n e r ML (1 997) R a t i o— b a s e d d e c i s i o n s a n d t h e qu a n t i t a t i v e a n a l y s i s o f c DNA m i c r o a r r a y i m a g e s . J B i ome d Op t 2 : 364— 374、 S u s a n G. H i 1 s e n b e c k, e t c. ( 1 999) S t a t i s t i c a l a n a l y s i s o f a r r a y e x p r e s s i o n First, the conventional correction method has a problem that it is easily affected by differences in the measurement apparatus, the difference between samples, and the efficiency of fluorescent labeling. Also, in the normalization process, the least-squares method strictly draws two regression lines, while low-ess smoothing (Du doit S, Yang YH, Callow MJ, Speed TP (2000 ) S tatistical me thodsforidentifyingdi fferentiallyepressedg enesinreplicatedc DN A microarrayexperi me nts. H tml / papersind ex, html, etc.) had a problem that it was a normalization process based on the empirical shellfish U and was merely groundless. Furthermore, in the method for extracting a gene whose expression level fluctuates, a gene having a corrected fluorescence intensity ratio of an arbitrary factor or more is extracted as a gene having a difference in expression according to the conventional standard, and is used as a standard. The magnification was set to 2 ×, 3 ×, etc. without any basis (Chen Y, Dougherty ER, Bittner ML (1 997) Ratio—baseddecisionsandthe qu antitativeanalysisofc DNA microarrayimages.JB iome d Op t 2: 364—374, Susan G. H i 1 senbeck, et c. (1 999) S tatisticalanalysisofa rrayexpression

d a t a a s a p p l i e d t o t h e p r o b l em o f t arn o x i f e n r e s i s t a n c e. J o u r n a l o f t h e Na t i o n 1 C a n c e r I n s t i t u t e, Vo l . 9 1， No. 5 等）という問題点を有していた。 dat aa s a p p l i e d t o t h e p r o b l em o f t arn o x i f e n r e s i st a n c e. J o u r n a l o f t h e Na t i o n 1 C a n c e r I n st i t u t e, Vo l. 91, No. 5 etc.)

一方、誤差モデルや遺伝子発現の確率分布を仮定して、最適化により遺伝子の検出を行なう手法（Ch e n Y, Do u g h e r t y ER, B i t t n e r ML (1 99 ( Ra t i o— b a s e d d e c i s i o n s a n d t h e q u a n t i t a t i v e a n a l y s i s o f c D N A m i c r o a r r a y i ma g e s. J B i ome d Op t 2 : 364— 37 4、 New t o n MA, Ke n d z i o r s k i CM, R i c hm o n d CS, B l a t t n e r FR， T s u i KW (2001) On d i f f e r e n t i a l v a r i a b i l i t y o f e x p r e s s i o n r a t i o s ： I mp r o v i n g s t a t i s t i c a l i n f e r e n c e a b o u t g e n e e x p r e s s i o n c h a n g e s f r o m m i c r o a r r a y d a t a. J Comp B i o l 8 ： 37 -52. 等）もいくつか開発されているが、これらの手法は、安定性と再現性に乏しく、必ずしも実用レべノレまで達していないという問題点を有していた。また、理想的な検出信頼度を得るために、実験を何回繰り返せばいいという実験の指針となる統計表も存在しないため、実験の繰り返し回数と検出感度と検出信頼度の関係は明らかにされていない。 On the other hand, a method for gene detection by optimization, assuming an error model and a probability distribution of gene expression (Chen Y, Dougherty ER, Bittner ML (1 99 (Ratio—baseddecisionsandtheq uantitativeanalysisofc DNA microarrayi ma ge s. JB iome d Opt 2: 364—374, New ton MA, Kendziorski CM, Ric hm ond CS, B lattner FR, T sui KW (2001) On differentialvariabili tyofexpressionratios: Imp rovingstatisticalinfe renceaboutgeneexpress ionchangesfrommicroar raydat a. J Comp Biol 8: 37-52. etc.) have been developed, but these methods have problems in that they are poor in stability and reproducibility and do not always reach the practical level. Experiments were also performed to obtain ideal detection reliability. Since the statistical tables to guide the experiment that I be repeated many times does not exist, the relationship between the number of repetitions and the detection sensitivity and detection reliability of the experiment has not been revealed.

従って、本発明は、 DNAマイクロアレイ、および、 DNAチップを用いた発現遺伝子解析法において、遺伝子の発現量を確実に比較するための一般式を提供し、実際のデータ分布に合わせた頑健（ロバスト）な信頼度の高い発現変動遺伝子の抽出法を提供することを目的としている。発明の開示 Therefore, the present invention provides a general formula for reliably comparing the expression level of a gene in a DNA microarray and an expression gene analysis method using a DNA chip, and provides a robust method that matches the actual data distribution. An object of the present invention is to provide a highly reliable method for extracting expression-variable genes. Disclosure of the invention

本発明にかかる遺伝子発現情報解析装置は、 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を除去することによりバックダラゥンド補正された輝度データを作成するバックダラゥンド補正手段と、上記バックグラウンド補正手段によりバックグラウンド捕正された上記輝度データの対数を X— Y軸にとり蛍光強度散布図を作成し、各遺伝子のスポッ卜について蛍光強度平衡軸に対するバイアスを求め、上記輝度データから当該バイアスを除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X _ Y軸系の蛍光強度散布図を構築するバイァス補正手段と、上記バイァス補正手段により構築された新たな X— Υ軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出する遺伝子検出手段とを備えたことを特徴とする。この装置によれば、 D NAマイクロアレイや D NAチップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を除去することによりバックダラゥンド捕正された輝度データを作成する。ここで、個々のスポットの蛍光強度測定値からブランクのスポットの蛍光強度測定値の平均をバックグラウンド値として用いてもよく、あるいは、各スポッ卜の周囲の領域のブランクの蛍光強度測定値の平均値をバックグラウンド値として用いてもよレ、。また、これ以外のいかなる方法によりバックグラウンドネ甫正を行ってもよレ、。 The gene expression information analyzing apparatus according to the present invention is capable of removing the background data from the measured luminance data of each spot in which the fluorescence intensity indicating the expression level of the same gene is measured under two conditions to remove the luminance data subjected to the back-round correction. A fluorescence intensity scatterplot is created by taking the logarithm of the luminance data, whose background has been corrected by the background correction means to be created and the background data corrected by the background correction means, on the X and Y axes, and creating a fluorescence intensity equilibrium axis for each gene spot. Bias for the X-Y axis system, which has two axes, the fluorescence intensity equilibrium axis and the expression scale factor, by removing the bias from the luminance data. Correction means and the expression level based on the new X-axis fluorescence intensity scatter diagram constructed by the bias correction means. And a gene detecting means for detecting a fluctuating gene having fluctuated. According to this device, the background value is removed by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the same gene expression level is measured under two conditions using a DNA microarray or a DNA chip. Create corrected luminance data. Here, the average of the fluorescence intensity measurement values of the blank spots from the fluorescence intensity measurement values of the individual spots may be used as the background value, or the average of the fluorescence intensity measurement values of the blanks in the area around each spot may be used. The value may be used as a background value. Also, you can perform the background photo by any other method.

また、本装置によれば、バックグラウンド補正された輝度データの対数（自然対数または 2の対数等）を X— Υ軸にとり蛍光強度散布図（スキヤッタープロット）を作成し、各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平衡軸（すなわち、各遺伝子のスポッ卜について、 2つの条件で発現量が同等である遺伝子集団より得られた漸近線）に対するバイアスを求め、輝度データから当該バイアスを除去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Υ軸系の蛍光強度散布図を構築するので、より多くのバイアスを含む蛍光成分の判定を行レ、、このバイァスを除去した上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新しい直行軸系を構築することができるようになる。 In addition, according to this apparatus, the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2, etc.) is plotted on the X-Υ axis to create a fluorescence intensity scatter diagram (Skutter plot), and spots of each gene are generated. The fluorescence intensity equilibrium axis (that is, the asymptote obtained from a group of genes whose expression levels are equivalent under the two conditions for each gene spot) showing the same fluorescence intensity for By constructing a new X-Υ axis fluorescence intensity scatter plot with two axes, the fluorescence intensity equilibrium axis and the magnification axis of the expression level, by removing the bias from After the judgment, the fluorescence intensity equilibrium axis and the multiple axis of the expression level are set to two axes. A new orthogonal axis system can be constructed.

また、本装置によれば、構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出するので、従来の遺伝子検出法に比べて、測定装置、標本間の誤差、および、蛍光標識効率などの違いの影響を受けずに正確に発現量が変動した遺伝子を検出することができるようになる。 In addition, according to the present apparatus, a fluctuating gene whose expression level fluctuates is detected based on the constructed fluorescence intensity scatter diagram of the new XY axis system. This makes it possible to accurately detect genes whose expression levels fluctuate without being affected by differences in sample, error between samples, and fluorescent labeling efficiency.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記バイアス補正手段は、発現量が多い遺伝子集団の対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求める第一主成分作成手段と、上記第一主成分作成手段により求めた上記漸近線と X軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右に 0角度回転した座標を計算する座標回転手段と、上記座標回転手段による座標軸回転後の上記発現量が少ない遺伝子集団の座標を用いて、上記蛍光強度平衡軸の傾きを計算し、計算された傾きに基づいて 2つの条件の上記輝度データのうちどちらに上記バイァスが多く含まれているかを判定するバイアス判定手段と、上記バイアス判定手段にて上記バイアスが多く含まれていると判定された条件の上記輝度データから上記バイァスを差し引くことにより上記蛍光強度平衡軸と上記発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する補正プロッ卜生成手段とをさらに備えたことを特徴とする。 The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the bias correction means performs principal component analysis using a logarithmic value of a gene group having a large expression level. A first principal component creating means for finding the slope and intercept of an asymptote which is one principal component; and the angle between the asymptote obtained by the first principal component creating means and the X axis is set to 0, and a gene having a low expression level is set. The coordinate rotation means for calculating the coordinates obtained by rotating the coordinates in the X-Y axis system to the right by 0 degrees, and the coordinates of the gene group having a small expression amount after the rotation of the coordinate axes by the coordinate rotation means, A bias determining means for calculating a slope of the fluorescence intensity equilibrium axis, and determining which of the two conditions of the luminance data contains the larger amount of the bias based on the calculated slope; By subtracting the bias from the luminance data under the condition determined to contain a large amount of the bias, the fluorescence intensity equilibrium axis and the expression level magnification axis are set to two axes. A correction plot generating means for constructing a fluorescence intensity scatter diagram of an axis system is further provided.

これはバイアス補正手段の一例を一層具体的に示すものである。この装置によれば、 D NA濃度希釈系列の品質管理用のコントロール遺伝子サンプノレ（例えば外部遺伝子； L D N Aサンプル、あるいは発現量がほとんど変わらないリボソームなどの H o u s e— k e e p i n g遺伝子サンプル）を目的遺伝子サンプノレと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一つずつコントロール遺伝子を除き、残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と D N A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に強い相関が認められる基準（例えば 0 . 8以上）を満たした場合のコントロールサンプノレの二つの条件における蛍光強度データの積を閾値 1とし、二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子サンプルの集団を発現量が多い遺伝子集団とし、上記発現量が順番に計算される相関係数度が最初に弱い相関が認められる基準（例えば 0 . 5以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 2とし（ただし、閾値 2 <閾値 1 ) 、二つの条件における蛍光強度データの積が閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少ない遺伝子集団とし、発現量が多い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求め、求めた漸近線と X軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右に 0角度回転した座標を計算し、座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、蛍光強度平衡軸の傾きを計算し、計算された傾き（例えば、正、負、ゼロ等）に基づいて 2つの条件の輝度データのうちどちらにバイアスが多く含まれているかを判定し、バイアスが多く含まれていると判定された条件の輝度データからバイアスを差し引くこと（例えば、一定のバイアスをもつ遺伝子集団について座標を回転させる等）により蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、実測値のバイアスを効率的に除去し、かつ、データの性質を明白に表現できる蛍光強度散布図を作成することができるようになる。 This more specifically shows an example of the bias correction means. According to this device, a control gene for sampling control of DNA concentration dilution series (for example, an external gene; an LDNA sample, or a housekeeping gene sample such as ribosomes whose expression level hardly changes) can be obtained simultaneously with the target gene for sampling. Measure and remove the control gene one by one in order from the gene with the smallest product of the fluorescence intensity data.Create calibration curves for the gene expression level and DNA level from the data of all remaining control gene samples. Fluorescence intensity data under the two conditions of the control sample if the correlation coefficient calculated in order satisfies the criterion (for example, 0.8 or more) where strong correlation is first recognized. The product of 1, the population of all gene samples in which the product of the fluorescence intensity data under the two conditions exceeds the threshold 1 is defined as the gene population with a high expression level, and the correlation degree for which the expression level is calculated in order is weakly correlated first. The threshold 2 is defined as the product of the fluorescence intensity data under the two conditions of the control sample when the criteria satisfying the condition (for example, 0.5 or more) is satisfied (threshold 2 <threshold 1). The population of all gene samples whose product is less than the threshold 2 is defined as the gene group with low expression level, and the principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene group with high expression level to become the first principal component The slope and intercept of the asymptote are calculated, the angle between the obtained asymptote and the X-axis is set to 0, and the coordinates of the gene population with low expression level in the X-Y-axis system are rotated right by 0 degrees, and the coordinates are calculated. Coordinate Calculate the slope of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is low after rotation of the axis, and calculate the brightness data under the two conditions based on the calculated slope (eg, positive, negative, zero, etc.). Of which contains more bias, and subtracts the bias from the luminance data under the conditions determined to contain more bias (for example, rotate the coordinates for a gene population with a certain bias) ), A new X-Y axis fluorescence intensity scatter plot is constructed with two axes, the fluorescence intensity equilibrium axis and the expression level magnification axis. This makes it possible to create a fluorescence intensity scatter diagram that can clearly express the properties.

なお、本装置は軸回転後にバイアスの大小を判定するものに限定されず、例えば、軸回転の前にも高発現漸近線と低発現漸近線の傾きを比較することにより、バイァスの大小を判定してもよい。 In addition, this apparatus is not limited to the one that determines the magnitude of the bias after the rotation of the shaft.For example, before the rotation of the shaft, the magnitude of the bias is compared by comparing the inclination of the asymptote with high expression and the asymptote of low expression. It may be determined.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記主成分分析は、分散 '共分散行列を用いて行うことを特徴とする。 A gene expression information analyzer according to the next invention is characterized in that in the above-described gene expression information analyzer, the principal component analysis is performed using a variance-covariance matrix.

これは主成分分析の一例を一層具体的に示すものである。この装置によれば、主成分分析は、分散'共分散行列を用いて行うので、従来から発現遺伝子解析に用いられている相関行列を用レ、た主成分分析法と比較して正規化を要しないため、効率的に主成分分析を行うことができるようになる。つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記遺伝子検出手段は、上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドウ設定手段と、上記ウィンドゥ設定手段により設定された各ウィンドウ内において信頼限界点を決定する信頼限界点決定手段と、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドウ移動手段と、上記ウィンドゥ移動手段により移動した各ウィンドウにつレ、て上記信頼限界点決定手段により各信頼限界点を求め、求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境界線作成手段と、上記信頼境界線作成手段により作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出手段とをさらに備えたことを特徴とする。 This more specifically shows an example of the principal component analysis. According to this apparatus, the principal component analysis is performed using a variance / covariance matrix, so that the correlation matrix conventionally used for the expression gene analysis is used, and the normalization is performed by comparing with the principal component analysis method. Since it is not necessary, the principal component analysis can be performed efficiently. The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the gene detection means sets a window within a predetermined section in the fluorescence intensity equilibrium axis direction. Means; confidence limit point determining means for determining a confidence limit point in each window set by the window setting means; window moving means for moving a window by a given gene in the direction of the fluorescence intensity equilibrium axis; For each window moved by the moving means, each confidence limit point is determined by the confidence limit point determination means, and a confidence boundary line creation means for creating a confidence boundary line based on the obtained plurality of confidence limit points; Genes located outside the above-mentioned trust boundary created by the trust boundary creating means are referred to as fluctuating genes whose expression levels fluctuate. Characterized in that it further includes a fluctuation gene extraction means for extracting Te.

これは遺伝子検出手段の一例を一層具体的に示すものである。この装置によれば、予め定めた区間内のウィンドウを設定し、設定された各ウィンドゥ内にぉレ、て遺伝子の輝度データの平均値、標準偏差、 P値（例えば、 p=0. 05) 、重心などのうち少なくとも一つを用いて信頼限界点を決定する。そして、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動し、移動した各ウィンドウにつレ、て各信頼限界点を求め、求めた複数の信頼限界点に基づレヽて信頼境界線を作成する信頼境界線作成手段と、上記信頼境界線作成手段により作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、安定性、再現性、および、信頼度の高い発現遺伝子抽出を行うことができるようになる。 This more specifically shows an example of the gene detection means. According to this device, a window in a predetermined section is set, and the average value, standard deviation, and P value of the luminance data of the gene are set in each set window (for example, p = 0.05). ) Determine the confidence limit using at least one of the center of gravity. Then, the window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, and each of the moved windows is searched for a confidence limit point, and a confidence boundary is created based on the obtained confidence limit points. Extracting a gene located outside the trust boundary created by the trust boundary creation means as a fluctuating gene whose expression level fluctuates, so that stability, reproducibility, and Thus, highly reliable expression gene extraction can be performed.

また、これにより、誤差の範囲が異なる実験データであっても、その誤差に応じて、発現量変動倍率の閾値が決められるようになる。 In addition, even in the case of experimental data having a different error range, the threshold value of the expression amount variation magnification can be determined according to the error.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記信頼限界点決定手段は、シミュレーションにより得られた重複データの検定統計表に基づき、 t一分布を用いて上記信頼限界点を決定することを特 ί敫とする。 The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, wherein the confidence limit point determining means is based on a t-distribution based on a test statistical table of the duplicated data obtained by the simulation. It is characterized in that the above-mentioned confidence limit point is determined by using.

これは信頼限界点決定の一例を一層具体的に示すものである。この装置によれば、シミュレーションにより得られた重複データの検定統計表に基づき、 t一分布を用いて信頼限界点を決定するので、従来手法と比較して正確かつ効率的に信頼限界点を求めることができるようになる。また、この重複データの検定表によると実験設計の段階で必要となる重複実験の数を求められる。 This shows one example of the determination of the confidence limit point more specifically. According to this device, the t-one distribution is used based on the test statistical table of duplicate data obtained by simulation. Therefore, the confidence limit point is determined, so that the confidence limit point can be obtained more accurately and efficiently than the conventional method. Also, according to the test table of the duplicate data, the number of duplicate experiments required in the experimental design stage can be obtained.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記信頼境界線作成手段は、上記複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化を行い上記信頼境界線を作成することを特徴とする。 The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the confidence boundary line creation means creates a smooth curve by creating a spline curve based on the plurality of confidence limit points. And creating the above-mentioned confidence boundary line.

これは信頼境界線作成の一例を一層具体的に示すものである。この装置によれば、複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑ィヒを行い信頼境界線を作成するので、効率的に信頼限界点を補完して信頼曲線を作成することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this device, since a smooth boundary is created by creating a spline curve based on a plurality of confidence limit points and a confidence boundary line is created, a confidence curve is created by efficiently supplementing the confidence limit points. Will be able to do things.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記信頼境界線作成手段は、蛍光強度の高い領域については、最後の上記ウィンドウで求めた信頼限界点の水平延長線を用いて上記信頼限界線を作成することを特徴とする。 The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, wherein the confidence boundary line creation means comprises: for a region having a high fluorescence intensity, the confidence limit obtained in the last window. It is characterized in that the above-mentioned confidence limit line is created using a horizontal extension line of a point.

これは信頼境界線作成の一例を一層具体的に示すものである。この装置によれば、蛍光強度の高い領域については、最後のウィンドウ（最も右側にあるウィンドウ）で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成するので、傾きが少なくどちらに収束する力判断不能の場合であっても、適切な信頼限界線を作成することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this device, in the region where the fluorescence intensity is high, the confidence limit line is created using the horizontal extension line to the X axis of the confidence limit point obtained in the last window (the rightmost window), so that the slope is Even if it is impossible to judge which force converges to whichever, the appropriate confidence limit line can be created.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記信頼境界線作成手段は、蛍光強度の低い領域については、各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いることを特徴とする。 The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, wherein the confidence boundary line creation means is configured such that, for a region having a low fluorescence intensity, a minimum value is determined from a confidence limit point obtained in each window. The extrapolation of the asymptote obtained by the square method is used as the above-mentioned confidence limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。この装置によれば、蛍光強度の低い領域については、例えば、最初から数十程度の各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いるので、蛍光強度が低い遺伝子のスポットについても的確に検出することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this apparatus, in the region where the fluorescence intensity is low, for example, the extrapolation of the asymptote obtained by the least square method from the reliability limit points obtained in several tens of windows from the beginning is used as the reliability limit line. Therefore, it is possible to accurately detect even a spot of a gene having a low fluorescence intensity.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力手段をさらに備え、上記ウィンドウ設定手段は、上記遺伝子数入力手段により入力された上記遺伝子数の上記遣伝子が含まれる上記区間内で上記ウィンドウを設定することを特とする。 The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, further comprising: a gene number input means for allowing a user to input the number of genes in a window; The method is characterized in that the window is set in the section including the gene of the number of genes input by the gene number input means.

これはウィンドウ設定の一例を一層具体的に示すものである。この装置によれば、利用者にウィンドウ内の遺伝子数を入力させ、入力された遺伝子数の遺伝子が含まれる区間内でウィンドウを設定するので、実験毎に利用者が設定する遺伝子数を変動させることができるようになる。 This shows one example of the window setting more specifically. According to this device, the user is required to input the number of genes in the window, and the window is set in a section including the input number of genes, so that the number of genes set by the user for each experiment varies. Will be able to do that.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報角析装置において、利用者に信頼限界値を入力させる信頼限界値入力手段をさらに備え、上記信頼限界点決定手段は、上記ウィンドウ内において上記信頼限界値入力手段により入力された上記信頼限界値に基づいて上記信頼限界点を決定することを特徴とする。 The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus described above, further comprising a confidence limit value input means for allowing a user to input a confidence limit value, The method is characterized in that the confidence limit point is determined based on the confidence limit value input by the confidence limit value input means in the window.

これは信頼限界点決定の一例を一層具体的に示すものである。この装置によれば、利用者に信頼限界値を入力させ、ウィンドウ内において入力された信頼限界値に基づいて信頼限界点を決定するので、実験毎に利用者が設定する信頼限界値を変動させることができ、各実験の誤差を適切な範囲に収めることができるようになる。 This shows one example of the determination of the confidence limit point more specifically. According to this device, the user is required to input the confidence limit, and the confidence limit is determined based on the confidence limit entered in the window, so that the confidence limit set by the user varies for each experiment. And the error of each experiment can be kept within an appropriate range.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、利用者に、上記変動しない遺伝子の分布の形、上記変動遺伝子の分布の形、上記変動遺伝子の検出基準、実験の繰り返し数、および、シミュレーション回数のうち少なくとも一つに関する情報を含むシミュレーション条件を入力させるシミュレーション条件設定手段と、上記シミュレーション条件設定手段にて設定された上記シミュレ一シヨン条件に従って、同一の遺伝子群に対して同じ分布から繰り返して生成し、上記遺伝子検出手段を実行し、上記発現遺伝子を検出するシミュレーションを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、上記シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成するシミュレーション実行手段と、上記シミュレーション条件毎に、上記シミュレーション実行手段によるシミュレーション結果を出力するシミュレ一ション結果出力手段とをさらに備えたことを特徴とする。 The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the user is provided with a form of the distribution of the unchanging gene, a form of the distribution of the variable gene, and a detection of the variable gene. Simulation condition setting means for inputting simulation conditions including information on at least one of the reference, the number of repetitions of the experiment, and the number of simulations; and the simulation set by the simulation condition setting means The same gene group is repeatedly generated from the same distribution according to the conditions, and the gene detection means is executed to detect the expressed gene. Execute the simulation multiple times, calculate the false positive rate and false negative rate of the results obtained by the detection means, calculate the number of repetitions of the experiment, the simulation conditions, and the relationship between detection sensitivity and detection reliability, and express A simulation execution means for creating a test statistical table of genes whose amounts change, and a simulation result output means for outputting a simulation result by the simulation execution means for each of the simulation conditions. I do.

この装置によれば、利用者に、変動しない遺伝子の分布の形（例えば、分布の標準偏差（例えば、発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、中心 μ = 0としたときに、標準偏差 σの幅を 0 . 1から 1 . 5の範囲で設定する））、上記変動遺伝子の分布の形（例えば、中心（例えば、当該条件のときに、中心/ /の幅を 0 . 4から 3の範囲で設定する））、上記変動遺伝子の検出基準（例えば、全体数からみた検出された遺伝子の割合を、 2 Ζ 3、 2 / 4 , 3ノ4、 3 / 6、 4 Ζ 6などで設定する）、実験の繰り返し数、および、シミュレーション回数 (例えば、 3回から 1 0回の範囲で設定する）のうち少なくとも一つに関する情報を含むシミュレーション条件を入力させ、設定されたシミュレーション条件に従つて、同一の遺伝子群に対して同じ分布から繰り返して生成し、遺伝子検出を実行し、発現遺伝子を検出するシミュレーションを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成し、シミュレーション条件毎に、シミュレーション実行によるシミュレーション結果を出力するので、様々な条件におけるシミュレーション結果を組み合わせることにより上記の組み合わせによる検出力と検出信頼度を知ることができる。すなわち、同じ条件の対照実験を繰り返して行い、得られたそれぞれ異なったデータセッ卜に対して変動遺伝子の検出を行い、あらかじめ決めた回数以上検出される遺伝子のみを選択することにより、期待通りの信頼度あるいは検出力で変動遺伝子を検出できるようになる。 According to this device, the user is given the form of the distribution of the gene that does not fluctuate (for example, the standard deviation of the distribution (for example, the standard deviation σ = 1, the central μ = 0 , The width of the standard deviation σ is set in the range of 0.1 to 1.5))), the distribution form of the above-mentioned fluctuating gene (for example, the center (for example, the center // Set the width in the range of 0.4 to 3))), the above-mentioned detection criteria of the fluctuating gene (for example, the ratio of the detected genes in terms of the total number is 2/3, 2/4, 3/4, 3 / 6, 4-6), the number of repetitions of the experiment, and the number of simulations (for example, set in the range of 3 to 10). According to the set simulation conditions The same gene group is repeatedly generated from the same distribution, the gene detection is performed, the simulation for detecting the expressed gene is performed a plurality of times, and the false positive rate and false negative rate of the result obtained by the above detection means are calculated. Calculate the relationship between the number of repetitions of the experiment, the simulation conditions, and the detection sensitivity and detection reliability, create a test statistical table for the genes whose expression levels change, and simulate the simulation results for each simulation condition. Is output, it is possible to know the detection power and the detection reliability by the above combination by combining the simulation results under various conditions. That is, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only those genes detected more than a predetermined number of times. However, it becomes possible to detect a fluctuating gene with the expected reliability or power.

また、これにより、発現量が変わらない遺伝子が変動遺伝子として検出されたェラー（第一種のエラー）や、変動遺伝子が発現が変わらない遺伝子として検出されたエラー（第二種のエラー）を算出して比較することにより、シミュレーションのデータから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、実際の実験データに対して、期待される検出力と信頼度を得るために、実験の繰り返し数と変動遺伝子の検出基準、および信頼限界点の組み合わせを設定することができる。 In addition, as a result, a gene whose expression level did not change was detected as a fluctuating gene. By calculating and comparing errors (type 1 errors) and errors (type 2 errors) in which the fluctuating gene was detected as a gene whose expression does not change, the fluctuating gene obtained by the above method was obtained from the simulation data. In order to obtain the power and reliability to detect, and to obtain the expected power and reliability for the actual experimental data, the number of repetitions of the experiment and the detection criteria for the fluctuating gene, and the confidence limit Can be set.

また、これにより、何回実験を行えば、正確な実験データを取ることができるかを予測することが可能になり、実験効率を著しく向上させることができるようになる。 In addition, this makes it possible to predict how many experiments will be performed before accurate experimental data can be obtained, thereby significantly improving the experimental efficiency.

つぎの発明にかかる遺伝子発現情報解析装置は、上記に記載の遺伝子発現情報解析装置において、上記遺伝子検出手段は、各スポットの偏差値を計算する偏差値計算手段をさらに備えたことを特徴とする。 A gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus described above, wherein the gene detecting means further includes a deviation value calculating means for calculating a deviation value of each spot. And

これは遺伝子検出の一例を一層具体的に示すものである。この装置によれば、各スポットの偏差値を計算するので、このように計算された各スポッ卜の偏差値を変動比率（倍率）の代わりに用いることで、スライド間の誤差の差異に影響されない解析が可能になる。 This more specifically shows an example of gene detection. According to this apparatus, the deviation value of each spot is calculated. By using the deviation value of each spot calculated in this way instead of the variation ratio (magnification), the difference in error between slides can be calculated. Unaffected analysis becomes possible.

また、これにより、本装置により計算される偏差値を、クラスター解析に代表される多変量解析において変動比率の対数や正規ィヒした変動比率の変わりに用いることができ、発現量の大小による誤差の影響の違いに左右されない解析が可能になる _c また、本発明は遺伝子発現情報解析方法に関するものであり、本発明にかかる遺伝子発現情報解析方法は、 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックダラゥンド値を除去することによりバックグラウンド補正された輝度データを作成するバックグラウンド補正ステップと、上記バックグラウンド補正ステツプによりバックグラウンド補正された上記輝度データの対数を X— Y軸にとり蛍光強度散布図を作成し、各遺伝子のスポットについて蛍光強度平衡軸に対するバイァスを求め、上記輝度データから当該バイァスを除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X - Y軸系の蛍光強度散布図を構築するバイアス補正ステップと、上記バイァス補正ステップにより構築された新たな X - Υ軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出する遺伝子検出ステップとを含むことを特徴とする。この方法によれば、 D N Αマイクロアレイや D N Αチップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を除去することによりバックグラウンド補正された輝度データを作成する。ここで、個々のスポットの蛍光強度測定値からブランクのスポットの蛍光強度測定値の平均をバックグラウンド値として用いてもよく、あるいは、各スポッ卜の周囲の領域のブランクの蛍光強度測定値の平均値をバックグラウ: ド値として用いてもよい。また、これ以外のいかなる方法によりバックグラウンド補正を行ってもよい。 In addition, this makes it possible to use the deviation value calculated by this device instead of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by cluster analysis, depending on the expression level. the _c allowing analysis does not depend on the difference in the effect of errors, the present invention relates to gene expression information analysis method, such gene expression information analysis method according to the present invention, the same gene in two conditions A background correction step of creating background-corrected luminance data by removing a background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level has been measured; and a background correction step performed by the background correction step. The logarithm of the above brightness data is plotted on the X-Y axis to create a scatter plot of the fluorescence intensity. The bias for the fluorescence intensity equilibrium axis is determined for each unit, and the bias is removed from the luminance data to obtain a new X with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes. -A bias correction step for constructing a fluorescence intensity scatter plot of the Y-axis system, and a fluctuating gene whose expression level fluctuates is detected based on the new X-Υ axis fluorescence intensity scatter plot constructed by the above bias correction step. And a gene detection step. According to this method, the background value is removed from the measured brightness data of each spot where the fluorescence intensity indicating the same gene expression level is measured under two conditions using a DN DN microarray or DNΑ chip. Create corrected luminance data. Here, the average of the fluorescence intensity measurement values of the blank spots from the fluorescence intensity measurement values of the individual spots may be used as the background value, or the average of the fluorescence intensity measurement values of the blanks in the area around each spot may be used. The value may be used as the background value. The background correction may be performed by any other method.

また、本方法によれば、バックグラゥンド補正された輝度データの对数 (自然対数または 2の対数等）を X— Y軸にとり蛍光強度散布図（スキヤッタープロット）を作成し、各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平衡軸に対するバイアスを求め、輝度データから当該バイアスを除去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、より多くのバイアスを含む蛍光成分の判定を行い、このバイアスを除去した上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新しい直行軸系を構築することができるようになる。 In addition, according to this method, a fluorescence intensity scatter diagram (a scatter plot) is created by taking the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2) on the X--Y axes and creating a scatter plot of each gene. A bias for the fluorescence intensity equilibrium axis that shows the same fluorescence intensity for each spot is obtained, and the bias is removed from the luminance data to obtain a new X—Y with two axes, the fluorescence intensity equilibrium axis and the magnification axis of the expression level. Since a fluorescent intensity scatterplot of the axis system is constructed, the fluorescent component containing more bias is determined, and after removing this bias, a new orthogonal line with the fluorescent intensity equilibrium axis and the multiple axis of the expression level as two axes The axis system can be constructed.

また、本方法によれば、構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出するので、従来の遺伝子検出法に比べて、測定方法、標本間の誤差、および、蛍光標識効率などの違いの影響を受けずに正確に発現量が変動した遺伝子を検出することができるようになる。 In addition, according to the present method, a fluctuating gene whose expression level fluctuates is detected based on a constructed fluorescence scatter plot of a new XY axis system. This makes it possible to accurately detect genes whose expression levels fluctuate without being affected by differences in sample, error between samples, and fluorescent labeling efficiency.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記バイアス補正ステップは、発現量が多い遺伝子集団の対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求める第一主成分作成ステップと、上記第一主成分作成ステップにより求めた上記漸近線と X 軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右にThe method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the bias correction step performs a principal component analysis using a logarithmic value of a gene group having a high expression level. A first principal component creation step for calculating the slope and intercept of an asymptote that is one principal component, and the asymptote obtained by the first principal component creation step and X The angle with the axis is set to 0, and the coordinates of the low-expression gene population in the X-Y axis system are shifted to the right.

Θ角度回転した座標を計算する座標回転ステツプと、上記座標回転ステップによる座標軸回転後の上記発現量が少ない遺伝子集団の座標を用レ、て、上記蛍光強度平衡軸の傾きを計算し、計算された傾きに基づいて 2つの条件の上記輝度データのうちどちらに上記バイアスが多く含まれているかを判定するバイアス判定ステップと、上記バイアス判定ステップにて上記バイアスが多く含まれていると判定された条件の上記輝度データから上記バイアスを差し引くことにより上記蛍光強度平衡軸と上記発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する補正プロット生成ステップとをさらに含むことを特徴とする。座標 Using the coordinates rotation step for calculating the coordinates rotated by an angle, and the coordinates of the gene group with a low expression level after the rotation of the coordinate axes in the coordinate rotation step, the inclination of the fluorescence intensity equilibrium axis is calculated and calculated. A bias determination step of determining which of the two pieces of luminance data contains the bias based on the inclination, and the bias determination step determines that the bias contains a large amount of the bias. Subtracting the bias from the luminance data of the condition to create a new XY-axis fluorescence intensity scatter diagram with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes, and a correction plot generation step; Is further included.

これはバイアス補正ステップの一例を一層具体的に示すものである。この方法によれば、 D N A濃度希釈系列の品質管理用のコントロール遺伝子サンプル（例えば外部遺伝子 λ D N Aサンプノレ、あるいは発現量がほとんど変わらないリボソームなどの H o u s e— k e e p i n g遺伝子サンプル）を目的遺伝子サンプルと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一つずつコントロール遺伝子を除き、残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と D N A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に強い相関が認められる基準（例えば 0 . 8以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 1とし、二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子サンプノレの集団を発現量が多い遺伝子集団とし、上記発現量が順番に計算される相関係数度が最初に弱い相関が認められる基準（例えば 0 . 5以上）を満たした場合のコントロールサンプノレの二つの条件における蛍光強度データの積を閾値 2とし（ただし、閾値 2 <閾値 1 ) 、二つの条件における蛍光強度データの積が閾値 2 を下回るすべての遺伝子サンプノレの集団を発現量が少ない遺伝子集団とし、発現量が多い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求め、求めた漸近線と X軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右に ø角度回転した座標を計算し、座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、蛍光強度平衡軸の傾きを計算し、計算された傾き（例えば、正、負、ゼロ等）に基づいて 2つの条件の輝度データのうちどちらにバイァスが多く含まれているかを判定し、バイアスが多く含まれていると判定された条件の輝度データからバイアスを差し引くこと（例えば、一定のバイアスをもつ遺伝子集団について座標を回転させる等）により蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、実測値のバイアスを効率的に除去し、かつ、データの性質を明白に表現できる蛍光強度散布図を作成することができるようになる。 This shows one example of the bias correction step more specifically. According to this method, a control gene sample for quality control of a DNA concentration dilution series (for example, an external gene λ DNA sample or a housekeeping gene sample such as a ribosome whose expression level hardly changes) is measured simultaneously with the target gene sample. Then, the control gene was removed one by one in order from the gene with the smallest product of the fluorescence intensity data, and calibration curves for the gene expression level and DNA level were created from the data of all remaining control gene samples, respectively. Calculate the correlation coefficient of the data, and calculate the fluorescence intensity data under the two conditions of the control sample when the above-mentioned correlation coefficient first satisfies the criterion (eg, 0.8 or more) that a strong correlation is first recognized. The product of the fluorescence intensity data under the two conditions is greater than the threshold 1 under all conditions. The sample population is defined as a gene population with a high expression level, and the expression level is calculated in order. If the correlation coefficient degree satisfies the first criterion (for example, 0.5 or more) that a weak correlation is observed, The product of the fluorescence intensity data under the two conditions is defined as the threshold 2 (threshold 2 <threshold 1), and all genes whose product of the fluorescence intensity data under the two conditions is less than the threshold 2 are expressed in a group with a low expression level. Principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene population with a high expression level as the population, and the slope and intercept of the asymptote, which is the first principal component, are obtained.The angle between the obtained asymptote and the X axis Is set to 0, the coordinates of the gene group with low expression level are rotated to the right by ø degrees in the X-Y axis system, and the locus is calculated. Calculate the slope of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is low after the rotation of the benchmark axis. Based on the calculated slope (for example, positive, negative, zero, etc.), the luminance data under the two conditions is calculated. Which bias contains a large amount of bias, and subtracts the bias from the luminance data of the condition determined to contain a large amount of bias (for example, rotating the coordinates for a gene population with a certain bias) ), A new X-Y axis fluorescence intensity scatter plot with two axes, the fluorescence intensity equilibrium axis and the expression level magnification axis, is used to efficiently remove the bias of the measured values, and This makes it possible to create a fluorescence intensity scatter plot that can clearly express the nature of the fluorescence.

なお、本方法は軸回転後にバイアスの大小を判定するものに限定されず、例えば、軸回転の前にも高発現漸近線と低発現漸近線の傾きを比較することにより、ノィァスの大小を判定してもよい。 Note that this method is not limited to the method of determining the magnitude of the bias after the rotation of the axis.For example, by comparing the slopes of the asymptote with high expression and the asymptote with low expression before the rotation of the axis, the noise can be determined. The magnitude may be determined.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記主成分分析は、分散 ·共分散行列を用いて行うことを特徴とする。 A method of analyzing gene expression information according to the next invention is characterized in that in the method of analyzing gene expression information described above, the principal component analysis is performed using a variance-covariance matrix.

これは主成分分析の一例を一層具体的に示すものである。この方法によれば、主成分分析は、分散'共分散行列を用いて行うので、従来から発現遺伝子解析に用いられている相関行列を用レ、た主成分分析法と比較して正規化を要しないため、効率的に主成分分析を行うことができるようになる。 This more specifically shows an example of the principal component analysis. According to this method, the principal component analysis is performed using a variance 'covariance matrix, so that the correlation matrix conventionally used for the expression gene analysis is used, and the normalization is performed by comparing with the principal component analysis method. Since it is not necessary, the principal component analysis can be performed efficiently.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記遺伝子検出ステップは、上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドゥ設定ステップと、上記ウィンドゥ設定ステップにより設定された各ウィンドウ内において信頼限界点を決定する信頼限界点決定ステツプと、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するゥインドゥ移動ステツプと、上記ウィンドウ移動ステップにより移動した各ウィンドゥについて上記信頼限界点決定ステップにより各信頼限界点を求め、求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境界線作成ステップと、上記信頼境界線作成ステップにより作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出ステツプとをさらに含むことを特徴とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the gene detecting step comprises a window setting step of setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction. A step of determining a confidence limit point in each window set in the window setting step; a step of moving the window by a given gene in the direction of the fluorescence intensity equilibrium axis; and a step of moving the window. A confidence boundary point is determined in the confidence limit point determination step for each of the windows により moved by the above, and a confidence boundary line creating step of creating a confidence boundary line based on the obtained confidence limit points; and a confidence boundary line creation step Located outside of the above-mentioned trust boundary created by Depart And a variable gene extraction step for extracting the variable as a variable gene whose current amount has changed.

これは遺伝子検出ステップの一例を一層具体的に示すものである。この方法によれば、予め定めた区間内のウィンドウを設定し、設定された各ウィンドウ内において遺伝子の輝度データの平均値、標準偏差、 P値（例えば、 p=0. 05) 、重心などのうち少なくとも一つを用いて信頼限界点を決定する。そして、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動し、移動した各ウィンドウについて各信頼限界点を求め、求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境界線作成ステツプと、上記信頼境界線作成ステップにより作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、安定性、再現性、および、信頼度の高い発現遺伝子抽出を行うことができるようになる。 This more specifically shows an example of the gene detection step. According to this method, a window within a predetermined interval is set, and the average value, standard deviation, P value (for example, p = 0.05) of the luminance data of the gene within each set window, the center of gravity, Determine the confidence limit using at least one of these methods. Then, a window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, a confidence limit point is obtained for each moved window, and a confidence boundary line is created based on the obtained confidence limit points. And extracting the genes located outside the above-mentioned confidence boundary line created in the above-mentioned confidence boundary line creation step as the fluctuating genes whose expression levels fluctuate, so that the expression genes having high stability, reproducibility and high reliability Extraction can be performed.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記信頼限界点決定ステップは、シミュレーションにより得られた重複データの検定統計表に基づき、 t一分布を用いて上記信頼限界点を決定することを特徴とする。 In the method for analyzing gene expression information according to the next invention, in the method for analyzing gene expression information described above, the step of determining the confidence limit point includes the step of determining t-distribution based on a test statistical table of duplicate data obtained by simulation. It is characterized in that the above-mentioned confidence limit point is determined using the above.

これは信頼限界点決定の一例を一層具体的に示すものである。この方法によれば、シミュレーションにより得られた重複データの検定統計表に基づき、 t—分布を用いて信頼限界点を決定するので、従来手法と比較して正確力つ効率的に信頼限界点を求めることができるようになる。 This shows one example of the determination of the confidence limit point more specifically. According to this method, the confidence limit point is determined using the t-distribution based on the test statistical table of the duplicate data obtained by simulation, so that the confidence limit point can be determined more accurately and efficiently than the conventional method. You can ask for it.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記信頼境界線作成ステップは、上記複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化を行い上記信頼境界線を作成することを特とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the step of creating a confidence boundary line is performed by creating a spline curve based on the plurality of confidence limit points. And create the above-mentioned confidence boundary.

これは信頼境界線作成の一例を一層具体的に示すものである。この方法によれば、複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化を行い信頼境界線を作成するので、効率的に信頼限界点を補完して信頼曲線を作成することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this method, smoothing is performed by creating a spline curve based on a plurality of confidence limit points. Since a reliable boundary is created, a confidence curve can be created by efficiently supplementing the confidence limit points.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記信頼境界線作成ステップは、蛍光強度の高い領域については、最後の上記ウィンドウで求めた信頼限界点の水平延長線を用レ、て上記信頼限界線を作成することを特徴とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the step of creating a confidence boundary line includes, for an area having a high fluorescence intensity, a confidence limit point determined in the last window. It is characterized in that the above-mentioned confidence limit line is created by using the horizontal extension line.

これは信頼境界線作成の一例を一層具体的に示すものである。この方法によれば、蛍光強度の高い領域については、最後のウィンドウ（最も右側にあるウィンドウ）で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成するので、傾きが少なくどちらに収束する力判断不能の場合であっても、適切な信頼限界線を作成することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this method, in the region where the fluorescence intensity is high, a confidence limit line is created using a horizontal extension line to the X axis of the confidence limit point obtained in the last window (the rightmost window), so that the slope becomes Even if it is impossible to judge which force converges to whichever, the appropriate confidence limit line can be created.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記信頼境界線作成ステップは、蛍光強度の低い領域については、各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いることを特1 [とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the step of creating a confidence boundary line comprises, for an area having a low fluorescence intensity, a minimum value from a confidence limit point obtained in each window. The extrapolation of the asymptote obtained by the square method is used as the above confidence limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。この方法によれば、蛍光強度の低い領域については、例えば、最初から数十程度の各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いるので、蛍光強度が低い遺伝子のスポッ卜についても的確に検出することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this method, extrapolation of the asymptote obtained by the least squares method from the reliability limit points obtained in several tens of windows from the beginning is used as the above-mentioned reliability limit line in the region where the fluorescence intensity is low. As a result, spots of genes with low fluorescence intensity can be accurately detected.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力ステップをさらに含み、上記ウィンドウ設定ステツプは、上記遺伝子数入力ステップにより入力された上記遺伝子数の上記遺伝子が含まれる上記区間内で上記ウインドウを設定することを特徴とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, further comprising a gene number input step for allowing a user to input the number of genes in a window. The method is characterized in that the window is set in the section including the gene having the number of genes input in the gene number input step.

これはウィンドウ設定の一例を—層具体的に示すものである。この方法によれば、利用者にウィンドウ内の遺伝子数を入力させ、入力された遺伝子数の遺伝子が含まれる区間内でウィンドウを設定するので、実験毎に利用者が設定する遺伝子数を変動させることができるようになる。 This shows one example of window setting-layer specific. According to this method, the user is required to input the number of genes in the window, and the number of genes input is included. Since the window is set within the section to be set, the number of genes set by the user for each experiment can be changed.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、利用者に信頼限界値を入力させる信頼限界値入力ステップをさらに含み、上記信頼限界点決定ステップは、上記ウィンドウ内において上記信頼限界値入力ステップにより入力された上記信頼限界値に基づいて上記信頼限界点を決定することを特徴とする。 The gene expression information analysis method according to the next invention is the gene expression information analysis method described above, further comprising a confidence limit value input step of allowing a user to input a confidence limit value, wherein the confidence limit point determination step is And determining the confidence limit point based on the confidence limit value input in the confidence limit value input step in the window.

これは信頼限界点決定の一例を一層具体的に示すものである。この方法によれば、利用者に信頼限界値を入力させ、ウィンドウ内において入力された信頼限界値に基づいて信頼限界点を決定するので、実験毎に利用者が設定する信頼限界値を変動させることができ、各実験の誤差を適切な範囲に収めることができるようになる。 This shows one example of the determination of the confidence limit point more specifically. According to this method, the user inputs the confidence limit and the confidence limit is determined based on the confidence limit entered in the window, so the confidence limit set by the user varies for each experiment. And the error of each experiment can be kept within an appropriate range.

つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、利用者に、上記変動しない遺伝子の分布の形、上記変動遺伝子の分布の形、上記変動遺伝子の検出基準、実験の繰り返し数、および、シミュレーシヨン回数のうち少なくとも一つに関する情報を含むシミュレーション条件を入力させるシミュレ一ション条件設定ステツプと、上記シミュレーション条件設定ステップにて設定された上記シミュレーション条件に従って、同一の遺伝子群に対して同じ分布から繰り返して生成し、上記遺伝子検出手段を実行し、上記発現遺伝子を検出するシミュレ一シヨンを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、上記シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成するシミュレーシヨン実行ステップと、上記シミュレーション条件毎に、上記シミュレ一ション実行ステップによるシミュレーション結果を出力するシミュレーション結果出力ステップとをさらに含むことを特徴とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the user is provided with the form of the distribution of the unchanging gene, the form of the distribution of the variable gene, and the detection of the variable gene. A simulation condition setting step for inputting a simulation condition including information on at least one of the reference, the number of repetitions of the experiment, and the number of simulations; and the simulation condition set in the simulation condition setting step , The same gene group is repeatedly generated from the same distribution, the above-mentioned gene detecting means is executed, the simulation for detecting the expressed gene is executed plural times, and the false positive result of the above-mentioned detecting means is obtained. Rate and false negative rate, calculate the number of repetitions of the experiment, the above simulation conditions, And the relation between detection sensitivity and detection reliability are calculated, and a simulation execution step for creating a test statistical table for genes whose expression levels change, and a simulation result from the simulation execution step is output for each of the simulation conditions. A simulation result output step.

この方法によれば、利用者に、変動しない遺伝子の分布の形（例えば、分布の標準偏差（例えば、発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、中心； u = 0としたときに、標準偏差 σの幅を 0 . 1から 1 . 5の範囲で設定する））、上記変動遺伝子の分布の形（例えば、中心（例えば、当該条件のときに、中心の幅を 0 . 4から 3の範囲で設定する））、上記変動遺伝子の検出基準（例えば、全体数からみた検出された遺伝子の割合を、 2 3、 2 / 4 , 3 / 4、 3 / 6、 4 / 6などで設定する）、実験の繰り返し数、および、シミュレーション回数 (例えば、 3回から 1 0回の範囲で設定する）のうち少なくとも一つに関する情報を含むシミュレーション条件を入力させ、設定されたシミユレーション条件に従つて、同一の遺伝子群に対して同じ分布から繰り返して生成し、遺伝子検出を実行し、発現遺伝子を検出するシミュレーションを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成し、シミュレーション条件毎に、シミュレーション実行によるシミュレ一ション結果を出力するので、様々な条件におけるシミュレーション結果を組み合わせることにより上記の組み合わせによる検出力と検出信頼度を知ることができる。すなわち、同じ条件の対照実験を繰り返して行い、得られたそれぞれ異なったデータセットに対して変動遺伝子の検出を行い、あらかじめ決めた回数以上検出される遺伝子のみを選択することにより、期待通りの信頼度あるいは検出力で変動遺伝子を検出できるようになる。 According to this method, the user is given the form of the distribution of the gene that does not fluctuate (for example, the standard deviation of the distribution (for example, the standard deviation of the gene whose expression does not change is the standard normal distribution, σ = 1, center; u = Set the width of standard deviation σ in the range of 0.1 to 1.5 when 0 )), The shape of the distribution of the above-mentioned fluctuating gene (for example, the center (for example, the width of the center is set in the range of 0.4 to 3 under the conditions)), the detection criteria of the above fluctuating gene (for example, , Set the ratio of detected genes in terms of the total number to 23, 2/4, 3/4, 3/6, 4/6, etc.), the number of repeated experiments, and the number of simulations (for example, 3 (Set in the range of 10 times to 10 times), and input simulation conditions including information on at least one of them, and repeat from the same distribution for the same gene group according to the set simulation conditions. Generate, perform gene detection, execute the simulation to detect the expressed gene multiple times, calculate the false positive rate and false negative rate of the result of the above detection means, and repeat the experiment, simulation conditions, It calculates the relationship between detection sensitivity and detection reliability, creates a test statistical table for genes whose expression levels change, and outputs simulation results from simulation execution for each simulation condition. By combining the simulation results in, the detection power and detection reliability of the above combination can be known. That is, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only those genes that are detected more than a predetermined number of times. However, it becomes possible to detect a fluctuating gene with the expected reliability or power.

また、これにより、発現量が変わらない遺伝子が変動遺伝子として検出されたェラー（第一種のエラー）や、変動遺伝子が発現が変わらない遺伝子として検出されたエラー（第二種のエラー）を算出して比較することにより、シミュレーションのデータから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、実際の実験データに対して、期待される検出力と信頼度を得るために、実験の繰り返し数と変動遺伝子の検出基準、および信頼限界点の組み合わせを設定することがでさる。 In addition, as a result, an error in which a gene whose expression level does not change is detected as a fluctuating gene (a first type of error) and an error in which a fluctuating gene is detected as a gene whose expression does not change (a second type of error) By calculating and comparing, the power and reliability of detecting the fluctuating gene by the above method can be grasped from the simulation data, and the expected power and reliability can be obtained for the actual experimental data. In addition, it is possible to set a combination of the number of repetitions of the experiment, the detection criteria for the fluctuating gene, and the confidence limit.

また、これにより、何回実験を行えば、正確な実験データを取ることができるかを予測することが可能になり、実験効率を著しく向上させることができるようになる。つぎの発明にかかる遺伝子発現情報解析方法は、上記に記載の遺伝子発現情報解析方法において、上記遺伝子検出ステップは、各スポットの偏差値を計算する偏差値計算ステップをさらに含むことを特徴とする。 In addition, this makes it possible to predict how many experiments will be performed before accurate experimental data can be obtained, thereby significantly improving the experimental efficiency. The method for analyzing gene expression information according to the next invention is characterized in that, in the method for analyzing gene expression information described above, the gene detecting step further includes a deviation value calculating step for calculating a deviation value of each spot. I do.

これは遺伝子検出の一例を一層具体的に示すものである。この方法によれば、各スポットの偏差値を計算するので、このように計算された各スポットの偏差値を変動比率（倍率）の代わりに用いることで、スライド間の誤差の差異に影響されない解析が可能になる。 This more specifically shows an example of gene detection. According to this method, the deviation value of each spot is calculated, and the deviation value of each spot calculated in this way is used in place of the variation ratio (magnification), so that the difference in error between slides can be calculated. Unaffected analysis becomes possible.

また、これにより、本方法により計算される偏差値を、クラスター解析に代表される多変量解析において変動比率の対数や正規化した変動比率の変わりに用いることができ、発現量の大小による誤差の影響の違いに左右されない解析が可能になる _c また、本発明はプログラムに関するものであり、本発明にかかるプログラムは、 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を除去することによりバックグラウンド補正された輝度データを作成するバックグラウンド捕正ステツプと、上記バックグラウンド補正ステップによりバックグラウンド補正された上記輝度データの対数を X— Y軸にとり蛍光強度散布図を作成し、各遺伝子のスポットについて蛍光強度平衡軸に対するバイァスを求め、上記輝度データから当該バイアスを除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するバイアス補正ステップと、上記バイァス補正ステップにより構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出する遺伝子検出ステップとを含む遺伝子発現情報解析方法をコンピュータに実行させることを特徴とする。 In addition, in this way, the deviation value calculated by the present method can be used instead of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by a cluster analysis. the _c becomes possible. does not depend analyzed the difference in effects, the present invention relates to a program, a program according to the present invention, each spot of the fluorescence intensity was measured showing the expression level of the same gene in two conditions A background correction step for creating background-corrected luminance data by removing the background value from the measured luminance data of step (a), and the logarithm of the luminance data subjected to the background correction in the background correction step is represented by X- Create a fluorescence intensity scatter plot on the Y axis, and plot the fluorescence intensity equilibrium axis for each gene spot. Bias correction to obtain a bias for constructing a new X-Y axis fluorescence intensity scatter diagram with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes by finding the bias for the bias data and removing the bias from the luminance data A gene expression information analysis method comprising the steps of: detecting a fluctuating gene whose expression level fluctuates based on a new XY-axis fluorescence intensity scatter diagram constructed by the above bias correction step; It is characterized by being executed.

このプログラムによれば、 D NAマイクロアレイや D NAチップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポッ卜の測定輝度データからバックグラウンド値を除去することによりバックグラウンド補正された輝度デ一タを作成する。ここで、個々のスポットの蛍光強度測定値からブランクのスポッ卜の蛍光強度測定値の平均をバックグラウンド値として用いてもよく、あるいは、各スポッ卜の周囲の領域のブランクの蛍光強度測定値の平均 < (直をバックグラウンド値として用いてもよい。また、これ以外のいかなるプログラムによりバックグラウンド補正を行ってもよい。 According to this program, the background value is removed from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under two conditions using a DNA microarray or DNA chip. Creates brightness data with background correction. Here, the average of the measured values of the fluorescence intensity of the blank spots from the measured values of the fluorescence intensity of the individual spots may be used as the background value. Can be used as the background value, which is the average of the measured values of the fluorescence intensity of the blanks in the area around each spot. The background correction may be performed by any other program.

また、本プログラムによれば、バックグラウンド補正された輝度データの対数（自然対数または 2の対数等）を X— Y軸にとり蛍光強度散布図（スキヤッタープロット）を作成し、各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平衡軸に対するバイァスを求め、輝度データから当該バイァスを除去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、より多くのバイアスを含む蛍光成分の判定を行い、このバイアスを除去した上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新し、直行軸系を構築することができるようになる。 In addition, according to this program, the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2) is plotted on the X and Y axes, and a fluorescence intensity scatter plot (skutter plot) is created. The bias for the fluorescence intensity equilibrium axis, which shows the same fluorescence intensity for the spot, is determined, and the bias is removed from the luminance data to obtain a new X-Y axis fluorescence with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes. Since an intensity scatter diagram is constructed, the fluorescent component containing more bias is determined, and after removing this bias, a new orthogonal line is set with the fluorescent intensity equilibrium axis and the multiple axis of the expression level as two axes. The axis system can be constructed.

また、本プログラムによれば、構築された新たな X _ Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出するので、従来の遺伝子検出法に比べて、測定プログラム、標本間の誤差、および、蛍光標識効率などの違いの影響を受けずに正確に発現量が変動した遺伝子を検出することができるようになる。 In addition, according to this program, a fluctuating gene whose expression level fluctuates is detected based on the constructed fluorescence intensity scatter diagram of the new X_Y axis system. Genes whose expression levels fluctuate can be accurately detected without being affected by errors between samples and differences such as the efficiency of fluorescent labeling.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記バイァス補正ステツプは、発現量が多！/、遺伝子集団の対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求める第一主成分作成ステツプと、上記第一主成分作成ステップにより求めた上記漸近線と X軸との角度を Θとし、発現量が少ない遺伝子集団の X— Υ軸系における座標を右に Θ角度回転した座標を計算する座標回転ステップと、上記座標回転ステップによる座標軸回転後の上記発現量が少ない遺伝子集団の座標を用いて、上記蛍光強度平衡軸の傾きを計算し、計算された傾きに基づいて 2つの条件の上記輝度データのうちどちらに上記バイァスが多く含まれているかを判定するバイアス判定ステップと、上記バイアス判定ステップにて上記バイァスが多く含まれていると判定された条件の上記輝度データから上記バイアスを差し引くことにより上記蛍光強度平衡軸と上記発現量の倍率軸を 2軸とする新たな Χ— Υ軸系の蛍光強度散布図を構築する補正プロット生成ステップとをさらに含むことを特徴とする。 A program according to the next invention is the program according to the above, wherein the bias correction step has a large expression level! / Performing principal component analysis using the logarithmic value of the gene population to obtain the first principal component creation step to find the slope and intercept of the asymptote as the first principal component, and the above first principal component creation step The angle between the asymptote and the X axis is Θ, and the coordinate rotation step for calculating the coordinates obtained by rotating the coordinates in the X-Υ axis system of the gene group with low expression level to the right by Θ angle, and the coordinate axis rotation by the coordinate rotation step Later, the inclination of the fluorescence intensity equilibrium axis is calculated using the coordinates of the gene group having a low expression level, and based on the calculated inclination, the bias is larger in either of the two conditions of the luminance data. A bias determining step of determining whether the bias is included; and subtracting the bias from the luminance data of the condition determined to include a large amount of the bias in the bias determining step. By a correction plot generation step of constructing a fluorescence intensity scatter plot of new chi Upsilon shafting to two axes the magnification axis of the fluorescence intensity equilibrium shaft and the expression level It is further characterized by including.

これはバイアス補正ステップの一例を一層具体的に示すものである。このプログラムによれば、 D N A濃度希釈系列の品質管理用のコントロール遺伝子サンプル（例えば外部遺伝子え D N Aサンプル、あるいは発現量がほとんど変わらないリポソ —ムなどの H o u s e - k e e p i n g遺伝子サンプル）を目的遺伝子サンプルと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一つずつコント口ール遺伝子を除き、残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と D N A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に強い相関が認められる基準（例えば 0 . 8以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 1とし、二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子サンプノレの集団を発現量が多い遺伝子集団とし、上記発現量が順番に計算される相関係数度が最初に弱い相関が認められる基準（例えば 0 . 5以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 2とし（ただし、閾ィ直 2く閾値 1 ) 、二つの条件における蛍光強度データの積が閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少ない遺伝子集団とし、発現量が多!/、遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求め、求めた漸近線と X軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右に Θ角度回転した座標を計算し、座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、蛍光強度平衡軸の傾きを計算し、計算された傾き（例えば、正、負、ゼロ等）に基づいて 2つの条件の輝度データのうちどちらにバイアスが多く含まれているかを判定し、バイアスが多く含まれていると判定された条件の輝度データからバイアスを差し引くこと（例えば、一定のバイアスをもつ遺伝子集団について座標を回転させる等）により蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、実測値のバイアスを効率的に除去し、つ、データの性質を明白に表現できる蛍光強度散布図を作成することができるようになる。なお、本プロダラムは軸回転後にバイァスの大小を判定するものに限定されず、例えば、軸回転の前にも高発現漸近線と低発現漸近線の傾きを比較することにより，バイアスの大小を判定してもよい。 This shows one example of the bias correction step more specifically. According to this program, a control gene sample for quality control of a DNA concentration dilution series (for example, a DNA sample containing an external gene or a house-keeping gene sample such as a liposome whose expression level hardly changes) is used as a target gene sample. The control gene is removed one by one in order from the gene with the smallest product of the fluorescence intensity data, and the calibration curves of the gene expression level and DNA level are obtained from the data of all remaining control gene samples. The correlation coefficient of the data is calculated for each, and the two values of the control sample, which are calculated in order, when the above-mentioned correlation coefficient first satisfies the criterion (for example, 0.8 or more) that a strong correlation is recognized first. The product of the fluorescence intensity data under the conditions is set as the threshold 1, and the product of the fluorescence intensity data under the two conditions exceeds the threshold 1. When the population of the gene Sampnolle is a gene population with a high expression level, and the above-mentioned expression level is calculated in order, the degree of correlation coefficient satisfies the criterion (for example, 0.5 or more) at which a weak correlation is first recognized. The product of the fluorescence intensity data under the two conditions of the control sample is defined as the threshold value 2 (however, the threshold value is 2 and the threshold value is 1), and the population of all gene samples for which the product of the fluorescence intensity data under the two conditions is less than the threshold value 2 is Principal component analysis was performed using the logarithmic value of the fluorescence intensity of the gene population assuming that the gene population had a low expression level and the expression level was high, and the slope and intercept of the asymptote, which is the first main component, were calculated. The angle between the asymptote and the X axis is set to 0, and the coordinates of the gene group with low expression level are calculated by rotating the coordinates in the X-Y axis system to the right by an angle Θ, and the coordinates of the gene group with low expression level after rotation of the coordinate axis are calculated. Fluorescence intensity flat using Calculate the tilt of the axis and determine which of the two conditions of the luminance data contains more bias based on the calculated slope (eg, positive, negative, zero, etc.) By subtracting the bias from the luminance data of the condition determined to be present (for example, by rotating the coordinates for a gene population with a constant bias), the fluorescence intensity equilibrium axis and the expression level magnification axis become two axes. By constructing a fluorescent scatter plot of the X-Y axis system, it is possible to efficiently remove the bias of measured values and create a fluorescent scatter plot that can clearly express the nature of the data. become. Note that this program is not limited to the method of determining the magnitude of bias after axis rotation. For example, the magnitude of bias is determined by comparing the slope of the asymptote with high expression and the asymptote of low expression before axis rotation. May be.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記主成分分析は、分散'共分散行列を用いて行うことを特徴とする。 A program according to the next invention is the program according to the above, wherein the main component analysis is performed using a variance 'covariance matrix.

これは主成分分析の一例を一層具体的に示すものである。このプログラムによれば、主成分分析は、分散'共分散行列を用いて行うので、従来から発現遺伝子解析に用いられている相関行列を用いた主成分分析法と比較して正規化を要しないため、効率的に主成分分析を行うことができるようになる。 This more specifically shows an example of the principal component analysis. According to this program, principal component analysis is performed using a variance 'covariance matrix, so that normalization is not required compared to the principal component analysis method using a correlation matrix conventionally used for expression gene analysis. Therefore, the principal component analysis can be performed efficiently.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記遺伝子検出ステップは、上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドウ設定ステツプと、上記ウィンドゥ設定ステップにより設定された各ウィンドゥ内において信頼限界点を決定する信頼限界点決定ステップと、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドウ移動ステップと、上記ウィンドゥ移動ステップにより移動した各ウィンドウにつ！/、て上記信頼限界点決定ステップにより各信頼限界点を求め、求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境界線作成ステップと、上記信頼境界線作成ステップにより作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出ステップとをさらに含むことを特徴とする。 The program according to the next invention is the program according to the above, wherein the gene detection step comprises a window setting step of setting a window within a predetermined section in the direction of the fluorescence intensity equilibrium axis, and a window setting step. A confidence limit point determining step of determining a confidence limit point in each set window; a window moving step of moving a window by a certain gene in the direction of the fluorescence intensity equilibrium axis; and a window moving step by the window moving step. ! / The respective confidence limit points are determined in the above-described confidence limit point determination step, and a confidence boundary line creation step of creating a confidence boundary line based on the determined plurality of confidence limit points is provided. A variable gene extracting step of extracting a gene located outside the confidence boundary line as a variable gene whose expression level has changed.

これは遺伝子検出ステップの一例を一層具体的に示すものである。このプログラムによれば、予め定めた区間内のウィンドウを設定し、設定された各ウィンドウ内において遺伝子の輝度データの平均値、標準偏差、 P値（例えば、 P=0. 05) 、重心などのうち少なくとも一つを用いて信頼限界点を決定する。そして、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動し、移動した各ウィンドウにつレ、て各信頼限界点を求め、求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境界線作成ステップと、上記信頼境界線作成ステップにより作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、安定性、再現性、および、信頼度の高い発現遺伝子抽出を行うことができるようになる。 This more specifically shows an example of the gene detection step. According to this program, a window within a predetermined interval is set, and within each set window, the average value, standard deviation, P value (for example, P = 0.05), center of gravity, etc. of the luminance data of the gene are set. Is determined using at least one of the following. Then, a window is moved by a certain number of genes in the axial direction of the fluorescence intensity equilibrium, and each of the moved windows is searched for each trust limit point, and a confidence boundary is created based on the obtained plurality of trust limit points. Since the gene located outside the confidence boundary line created in the boundary line creation step and the confidence boundary line creation step is extracted as a fluctuating gene whose expression level has fluctuated, It will be possible to perform highly qualitative, reproducible, and reliable expression gene extraction.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記信頼限界点決定ステップは、シミュレ一ションにより得られた重複データの検定統計表に基づき、 t一分布を用いて上記信頼限界点を決定することを特徴とする。 The program according to the next invention is the program according to the above, wherein the step of determining the confidence limit is performed by using the t-distribution based on a test statistical table of duplicate data obtained by simulation. It is characterized in that it is determined.

これは信頼限界点決定の一例を一層具体的に示すものである。このプログラムによれば、シミュレーションにより得られた重複データの検定統計表に基づき、 t 一分布を用いて信頼限界点を決定するので、従来手法と比較して正確かつ効率的に信頼限界点を求めることができるようになる。 This shows one example of the determination of the confidence limit point more specifically. According to this program, the confidence limit point is determined using the t-one distribution based on the test statistical table of the duplicate data obtained by simulation, so that the confidence limit point can be determined more accurately and efficiently than the conventional method. You can ask for it.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記信頼境界線作成ステップは、上記複数の信頼限界点に基づレ、てスプライン曲線を作成することにより平滑化を行い上記信頼境界線を作成することを特徴とする。 A program according to the next invention is the program according to the above, wherein the step of creating a confidence boundary line performs smoothing by creating a spline curve based on the plurality of confidence limit points, and performs the smoothing of the confidence boundary. The method is characterized in that a line is created.

これは信頼境界線作成の一例を一層具体的に示すものである。このプログラムによれば、複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化を行い信頼境界線を作成するので、効率的に信頼限界点を補完して信頼曲線を作成することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this program, smoothing is performed by creating a spline curve based on multiple confidence points and a confidence boundary is created, so that a confidence curve can be efficiently created by complementing confidence points. become able to.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記信頼境界線作成ステップは、蛍光強度の高い領域については、最後の上記ウィンドウで求めた信頼限界点の水平延長線を用いて上記信頼限界線を作成することを特徴とする。 The program according to the next invention is the program according to the above, wherein the step of creating the confidence boundary line comprises, for a region having a high fluorescence intensity, using the horizontal extension line of the confidence limit point obtained in the last window. The feature is to create a limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。このプログラムによれば、蛍光強度の高い領域については、最後のウィンドウ（最も右側にあるウインドウ）で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成するので、傾きが少なくどちらに収束する力判断不能の場合であっても、適切な信頼限界線を作成することができるようになる。つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記信頼境界線作成ステップは、蛍光強度の低い領域については、各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いることを特 ί数とする。 This more specifically shows an example of creating a confidence boundary. According to this program, in the area with high fluorescence intensity, the confidence limit line is created using the horizontal extension line to the X axis of the confidence limit point obtained in the last window (the window on the rightmost side). Therefore, even if it is not possible to judge which force converges to which one, the appropriate confidence limit line can be created. The program according to the following invention is the program according to the above, wherein the step of creating a confidence boundary line includes, for an area having a low fluorescence intensity, an asymptote obtained by a least square method from a trust limit point obtained in each window. It is a special feature that extrapolation is used as the above confidence limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。このプログラムによれば、蛍光強度の低い領域については、例えば、最初から数十程度の各ウィンドゥで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いるので、蛍光強度が低い遺伝子のスポッ卜についても的確に検出することができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this program, for the region with low fluorescence intensity, for example, the extrapolation of the asymptote obtained by the least-squares method from the reliability limit points obtained in several windows ゥ from the beginning is used as the above-mentioned reliability limit line Therefore, it is possible to accurately detect even a gene spot having a low fluorescence intensity.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力ステップをさらに含み、上記ゥィンドゥ設定ステップは、上記遺伝子数入力ステップにより入力された上記遺伝子数の上記遺伝子が含まれる上記区間内で上記ウィンドウを設定することを特徴とする。 The program according to the next invention is the program according to the above, further comprising a gene number input step of allowing a user to input the number of genes in the window, wherein the window setting step is performed by the gene number input step. The window is set in the section in which the number of the genes is included.

これはウィンドウ設定の一例を一層具体的に示すものである。このプログラムによれば、利用者にウィンドウ內の遺伝子数を入力させ、入力された遺伝子数の遺伝子が含まれる区間内でウィンドウを設定するので、実験毎に利用者が設定する遺伝子数を変動させることができるようになる。 This shows one example of the window setting more specifically. According to this program, the user is required to input the number of genes in window 、, and the window is set within the section including the genes of the input number of genes. It can be varied.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、利用者に信頼限界値を入力させる信頼限界値入カステツプをさらに含み、上記信頼限界点決定ステップは、上記ウィンドウ内において上記信頼限界値入力ステップにより入力された上記信頼限界値に基づいて上記信頼限界点を決定することを特徴とする。これは信頼限界点決定の一例を一層具体的に示すものである。このプログラムによれば、利用者に信頼限界値を入力させ、ウィンドウ内において入力された信頼限界値に基づいて信頼限界点を決定するので、実験毎に利用者が設定する信頼限界値を変動させることができ、各実験の誤差を適切な範囲に収めることができるようになる。つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、利用者に, 上記変動しない遺伝子の分布の形、上記変動遺伝子の分布の形、上記変動遺伝子の検出基準、実験の繰り返し数、および、シミュレーション回数のうち少なくとも一つに関する情報を含むシミュレーション条件を入力させるシミュレーション条件設定ステップと、上記シミュレーション条件設定ステップにて設定された上記シミュレーション条件に従って、同一の遺伝子群に対して同じ分布から繰り返して生成し、上記遺伝子検出手段を実行し、上記発現遺伝子を検出するシミュレ一ションを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、上記シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成するシミュレーション実行ステップと、上記シミュレ一ション条件毎に、上記シミュレ一ション実行ステツプによるシミュレーション結果を出力するシミュレ一ション結果出力ステップとをさらに含むことを特徴とする。 The program according to the next invention is the above-described program, further comprising a confidence limit value input step for allowing a user to input a confidence limit value, wherein the step of determining the confidence limit point comprises: The reliability limit point is determined based on the reliability limit value input in the input step. This shows one example of the determination of the confidence limit point more specifically. According to this program, the user is required to input the confidence limit, and the confidence limit is determined based on the confidence limit entered in the window, so the confidence limit set by the user varies for each experiment. It is possible to keep the error of each experiment within an appropriate range. The program according to the next invention is the program according to the above, wherein the user is provided with a form of the distribution of the gene that does not fluctuate, a form of the distribution of the fluctuating gene, a criterion for detecting the fluctuating gene, the number of repetitions of the experiment, and A simulation condition setting step for inputting a simulation condition including information on at least one of the number of simulations, and the same gene group is repeated from the same distribution according to the simulation condition set in the simulation condition setting step. The above-mentioned gene detecting means is executed, the simulation for detecting the expressed gene is executed plural times, the false positive rate and the false negative rate of the result by the detecting means are calculated, and the number of repetitions of the experiment is calculated. , The above simulation conditions, detection sensitivity and detection reliability And a simulation result output step of outputting a simulation result by the above simulation execution step for each of the above simulation conditions. And further comprising:

このプログラムによれば、利用者に、変動しない遺伝子の分布の形（例えば、分布の標準偏差（例えば、発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、中心 μ = 0としたときに、標準偏差 σの幅を 0 . 1から 1 . 5の範囲で設定する））、上記変動遺伝子の分布の形（例えば、中心（例えば、当該条件のときに、中心； uの幅を 0 . 4から 3の範囲で設定する））、上記変動遺伝子の検出基準（例えば、全体数からみた検出された遺伝子の割合を、 2ノ 3、 2 / 4 , 3 / 4、 3 / 6、 4 / 6などで設定する）、実験の繰り返し数、および、シミュレーシヨン回数（例えば、 3回から 1 0回の範囲で設定する）のうち少なくとも一つに関する情報を含むシミュレーシヨン条件を入力させ、設定されたシミュレーション条件に従って、同一の遺伝子群に対して同じ分布から繰り返して生成し、遺伝子検出を実行し、発現遺伝子を検出するシミュレーションを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成し、シミュレーション条件毎に、シミュレーション実行によるシミュレーション結果を出力するので、様々な条件におけるシミュレーシヨン結果を組み合わせることにより上記の組み合わせによる検出力と検出信頼度を知ることができる。すなわち、同じ条件の対照実験を繰り返して行い、得られたそれぞれ異なつたデータセットに対して変動遺伝子の検出を行い、あらかじめ決めた回数以上検出される遺伝子のみを選択することにより、期待通りの信頼度あるいは検出力で変動遺伝子を検出できるようになる。 According to this program, the user is given the form of the distribution of the genes that do not fluctuate (for example, the standard deviation of the distribution (for example, the standard deviation of the distribution of genes whose expression does not change is the standard normal distribution, σ = 1, center μ = 0). , The width of the standard deviation σ is set in the range of 0.1 to 1.5))), the shape of the distribution of the fluctuating gene (for example, the center (for example, the center at the time of the condition, the center; u The width is set in the range of 0.4 to 3))), the above-mentioned detection criteria of the fluctuating gene (for example, the ratio of the detected genes in terms of the total number is 2/3, 2/4, 3/4, 3 / 6, 4/6, etc.), the number of repetitions of the experiment, and the number of simulations (for example, set from 3 to 10 times). Enter the conditions and set the simulation According to the case, the same gene group is repeatedly generated from the same distribution, the gene detection is executed, the simulation for detecting the expressed gene is executed multiple times, and the false positive rate and false negative rate of the result obtained by the above detection means are calculated. Calculate and calculate the number of repetitions of the experiment, simulation conditions, and the relationship between detection sensitivity and detection reliability, create a test statistical table for genes whose expression levels change, and use the simulation Since the simulation results are output, it is possible to know the detection power and the detection reliability by the above combinations by combining the simulation results under various conditions. In other words, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene for each of the obtained different data sets, and selecting only genes that are detected more than a predetermined number of times, the expected results are obtained. Fluctuating genes can be detected with reliability or power.

つぎの発明にかかるプログラムは、上記に記載のプログラムにおいて、上記遺伝子検出ステップは、各スポッ卜の偏差値を計算する偏差値計算ステップをさらに含むことを特徴とする。 The program according to the next invention is the program described above, wherein the gene detecting step further includes a deviation value calculating step of calculating a deviation value of each spot.

これは遺伝子検出の一例を一層具体的に示すものである。このプログラムによれば、各スポットの偏差値を計算するので、このように計算された各スポットの偏差値を変動比率（倍率）の代わりに用いることで、スライド間の誤差の差異に影響されない解析が可能になる。 This more specifically shows an example of gene detection. According to this program, the deviation value of each spot is calculated. By using the deviation value of each spot calculated in this way instead of the fluctuation ratio (magnification), the difference in error between slides is affected. Unresolved analysis becomes possible.

また、これにより、本プログラムにより計算される偏差値を、クラスター解析に代表される多変量解析において変動比率の対数や正規ィヒした変動比率の変わりに用いることができ、発現量の大小による誤差の影響の違レヽに左右されなレ、解析が可能になる。また、本発明は記録媒体に関するものであり、本発明にかかる記録媒体は、上記に記載されたプログラムを記録したことを特徴とする。 In addition, the deviation value calculated by this program can be used instead of the logarithm of the fluctuation ratio or the normal fluctuation ratio in a multivariate analysis represented by a cluster analysis. Analysis that is not affected by differences in the effects of errors becomes possible. In addition, the present invention relates to a recording medium, and the recording medium according to the present invention is characterized by recording the program described above.

この記録媒体によれば、当該記録媒体に記録されたプログラムをコンピュータに読み取らせて実行することによって、上記に記載されたプログラムをコンピュータを利用して実現することができ、これら各方法と同様の効果を得ることができる。図面の簡単な説明 According to this recording medium, the program described above can be realized using a computer by causing a computer to read and execute the program recorded on the recording medium. The effect can be obtained. BRIEF DESCRIPTION OF THE FIGURES

第 1図は、本発明による分散 ·共分散行列を用いた主成分分析の概念を示す図であり、第 2図は、本発明による新しい座標系での漸近線を求める処理の概念を示す図であり、第 3図は、本発明による分布図の再構築を概念的に示す図であり、第 4 図は、本発明による発現倍率の混合正規分布モデルを示す図であり、第 5図は、本発明による発現倍率の混合正規分布モデノレを示す図であり、第 6図は、本発明による発現倍率の混合正規分布モデルを示す図であり、第 7図は、本発明によるシミュレーシヨンによる第一種の検出ェラ一の計算結果の一例を示した図であり、第 8図は、本発明によるシミュレーションによる第一種の検出エラーの計算結果の一例を示した図であり、第 9図は、本発明によるシミュレーションによる第一種の検出ェラーの計算結果の一例を示した図であり、第 1 0図は、本発明によるシミュレーシヨンによる第一種の検出エラーの計算結果の一例を示した図であり、第 1 1図は、本発明による発現変動信頼曲線の作成を概念的に示した図であり、第 1 2図は、本発明による発現変動信頼曲線作成を概念的に示した図であり、第 1 3図は、本発明による発現変動信頼曲線の作成を概念的に示した図であり、第 1 4図は、本発明による発現変動信頼曲線の作成を概念的に示した図であり、第 1 5図は、本実施形態の本装置のメイン処理を示すフローチャートであり、第 1 6図は、本実施形態の本装置のバックグラウンド補正処理の一例を示すフローチャートであり、第 1 7図は、本実施形態の本装置のバイアス補正処理の一例を示すフローチャートであり、第 1 8図は、本実施形態の本装置の遺伝子検出処理の一例を示すフローチャートであり、第 1 9図は、本実施形態の本装置のシミュレーション処理の一例を示すフロ一チャートであり、第 2 0図は、ウィンドウ設定部 1 0 2 iの処理により、出力装置 1 1 4に出力される遺伝子抽出条件設定画面の一例を示す図であり、第 2 1図は、シミュレーション条件設定部 1 0 2 rの処理により、出力装置 1 1 4に出力されるシミュレーション条件設定画面の一例を示す図であり、第 2 2図は、本発明が適用される本装置の構成の一例を示すプロック図であり、第 2 3図は、バイァス補正部 1 0 2 bの構成の一例を示すプロック図であり、第 2 4図は、遺伝子検出部 1 0 2 cの構成の一例を示すブロック図であり、第 2 5図は、シミュレーション部 1 0 2 dの構成の一例を示すプロック図であり、第 2 6図は、本実施形態の本装置の偏差値を用いた遺伝子検出処理の一例を示すフローチャートであり、第 2 7図は、本実施形態の本装置の偏差値の計算を示す概念図であり、第 2 8図は、本実施形態の本装置のバイアス判定処理の一例を示す概念図である。発明を実施するための最良の形態 FIG. 1 is a diagram illustrating a concept of principal component analysis using a variance / covariance matrix according to the present invention, and FIG. 2 is a diagram illustrating a concept of a process of obtaining an asymptote in a new coordinate system according to the present invention. FIG. 3 is a diagram conceptually illustrating reconstruction of a distribution map according to the present invention, FIG. 4 is a diagram illustrating a mixed normal distribution model of the expression ratio according to the present invention, and FIG. FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention, FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention, and FIG. 7 is a simulation according to the present invention. FIG. 8 is a diagram showing an example of a calculation result of a first type detection error according to the present invention. FIG. 8 is a diagram showing an example of a calculation result of a first type detection error by a simulation according to the present invention. Fig. 9 shows the first FIG. 10 is a diagram illustrating an example of a calculation result of a type of detection error, and FIG. 10 is a diagram illustrating an example of a calculation result of a type 1 detection error by a simulation according to the present invention. FIG. 1 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 12 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 1 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 14 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 5 is a flowchart showing a main process of the present apparatus of the present embodiment. FIG. 16 is a flowchart showing an example of a background correction process of the present apparatus of the present embodiment. A flowchart showing an example of a bias correction process of the present apparatus of the present embodiment. An over preparative furo first 8 figure is a flow chart showing an example of the gene detection processing of the apparatus of the present embodiment, the first 9 figure showing an example of the simulation process of the device of the present embodiment FIG. 20 is a diagram showing an example of a gene extraction condition setting screen output to the output device 114 by the processing of the window setting unit 102 i; FIG. FIG. 12 is a diagram showing an example of a simulation condition setting screen output to the output device 114 by the processing of the simulation condition setting unit 102 r. FIG. 22 shows the configuration of the present device to which the present invention is applied. FIG. 23 is a block diagram showing an example of a configuration of a bias correction unit 102 b, and FIG. 24 is an example of a configuration of a gene detection unit 102 c. FIG. 25 is a block diagram showing an example of the configuration of the simulation unit 102 d. FIG. 26 is a block diagram showing gene detection using the deviation value of the apparatus of the present embodiment. FIG. 27 is a flowchart showing an example of the processing. Is a conceptual diagram showing the calculation of the deviation of location, the second 8 is a conceptual diagram showing an example of a bias determination processing of the apparatus of the present embodiment. BEST MODE FOR CARRYING OUT THE INVENTION

以下に、本発明にかかる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体の実施の形態を図面に基づいて詳細に説明する。尚、この実施の形態によりこの発明が限定されるものではない。 Hereinafter, embodiments of a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium according to the present invention will be described in detail with reference to the drawings. The present invention is not limited by the embodiment.

[本装置の概要] [Overview of this device]

以下、本装置の基本概念を説明し、その後、本発明の各実施例における本装置の構成、処理等について詳細に説明する。 Hereinafter, the basic concept of the present apparatus will be described, and then the configuration, processing, and the like of the present apparatus in each embodiment of the present invention will be described in detail.

[本装置の基本概念] [Basic concept of this device]

以下、図 1〜図 6および図 1 1〜図 1 4を用いて本発明の基本概念について説明する。 Hereinafter, the basic concept of the present invention will be described with reference to FIGS. 1 to 6 and FIGS. 11 to 14.

1 . 対照蛍光測定値の 2段階データ捕正 1. Two-step data collection of control fluorescence measurement

D N Aマイクロアレイ、または、 D N Aチップを用いた発現遺伝子の測定では、各遺伝子の発現量は、各遺伝子に対応する蛍光測定値の輝度に反映され、各遺伝子の発現量比は、対照蛍光測定値との比率として観測される。しかし、 D NAマイクロアレイや D NAチップの誤差、蛍光標識反応の誤差、測定誤差、蛍光物質のモノレ蛍光係数の違いなどにより、蛍光測定値の比率そのままでは正確に発現量の比を反映しない。そこで、本発明では、これらの誤差を処理するため以下の処理を行う。In the measurement of expressed genes using a DNA microarray or DNA chip, the expression level of each gene is reflected in the luminance of the fluorescence measurement value corresponding to each gene, and the expression level ratio of each gene is compared to the control fluorescence measurement value. Is observed as a ratio of However, errors in DNA microarrays and DNA chips, errors in fluorescent labeling reactions, measurement errors, and Due to differences in the fluorescence coefficient, etc., the ratio of the expression level is not accurately reflected as it is in the ratio of the measured fluorescence values. Therefore, in the present invention, the following processing is performed to process these errors.

(1) バックグラウンド補正 (1) Background correction

第一段階のデータ補正として、バックグラウンド補正を行なう。まず、遺伝子 i の二つの条件で測定された輝度を（aい b；) とし、各遺伝子の輝度からバックグラウンド（BKGa ^ BKGb i) を差し引く。この修正結果 (a j-BKGa ,, b；-BKGb；) を（A Β!) とする。 Background correction is performed as the first stage of data correction. First, let the brightness measured under the two conditions of gene i be (a or b;), and subtract the background (BKGa ^ BKGb i) from the brightness of each gene. The correction result (a j-BKGa ,, b; -BKGb;) is defined as (AΒ!).

(2) バイアス補正 (2) bias correction

次に、第二段階のデータ補正として、以下の手順によりバイアスの補正を行なう, まず、本発明のバイアス補正の概要を説明する。 DN A濃度希釈系列の品質管理用のコント口ール遺伝子サンプノレ（例えば外部遺伝子え D N Aサンプル、あるいは発現量がほとんど変わらないリボソームなどの Ho u s e— k e e p i n g遺伝子サンプル）を目的遺伝子サンプルと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一つずつコントロール遺伝子を除き、残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と DN A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に強い相関が認められる基準（例えば 0. 8以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 1とし、二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子サンプルの集団を発現量が多い遺伝子集団とし、上記発現量が順番に計算される相関係数度が最初に弱い相関が認められる基準（例えば 0. 5以上）を満たした場合のコントロールサンプノレの二つの条件における蛍光強度データの積を閾値 2とし（ただし、閾値 2<闞値 1) 、二つの条件における蛍光強度データの積が閾値 2を下回るすべての遺伝子サンプノレの集団を発現量が少ない遺伝子集団とし、発現量が多い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求め、求めた漸近線と X軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右に 0角度回転した座標を計算し、座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、蛍光強度平衡軸の傾きを計算し、計算された傾き（例えば、正、負、ゼロ等）に基づいて 2つの条件の輝度データのうちどちらにバイァスが多く含まれているかを判定し、バイアスが多く含まれていると判定された条件の輝度データからバイアスを差し引くこと（例えば、一定のバイアスをもつ遺伝子集団について座標を回転させる等）により蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X — Y軸系の蛍光強度散布図を構築するので、実測値のバイアスを効率的に除去し、つ、データの性質を明白に表現できる蛍光強度散布図を作成することができるようになる。 Next, as the data correction in the second stage, the bias is corrected by the following procedure. First, the outline of the bias correction of the present invention will be described. Simultaneous measurement of control gene samples (such as DNA samples with external genes or House-keeping gene samples such as ribosomes with almost unchanged expression) for quality control of DNA concentration dilution series simultaneously with target gene samples Then, the control gene was removed one by one from the gene with the smallest product of the fluorescence intensity data, and calibration curves for the gene expression level and DNA level were created from the data of all remaining control gene samples. Calculate the correlation coefficient of the data, and calculate the fluorescence intensities under the two conditions of the control sample when the above-mentioned correlation coefficient first satisfies the criterion (eg, 0.8 or more) that a strong correlation is first recognized. The product of the data is defined as threshold 1, and the population of all gene samples whose fluorescence intensity data under the two conditions exceeds threshold 1 is generated. The two conditions for the control sample when the correlation coefficient for which the expression level is calculated in the above order and the correlation coefficient degree satisfies the criterion (for example, 0.5 or more) at which a weak correlation is first recognized. The product of the fluorescence intensity data at step 2 is defined as threshold 2 (threshold 2 <闞 value 1), and the population of all gene samples whose fluorescence product under the two conditions is less than threshold 2 is defined as the gene group with low expression level. The principal component analysis was performed using the logarithmic fluorescence intensity of the gene population with a high expression level, and the slope and intercept of the asymptote, which was the first principal component, were determined.The angle between the obtained asymptote and the X axis was set to 0. The coordinates of the gene group with low expression level are calculated by rotating the coordinates in the X-Y axis system to the right by 0 degrees, and the gene group with low expression level after rotation of the coordinate axes is calculated. Using the coordinates, the slope of the fluorescence intensity equilibrium axis is calculated, and based on the calculated slope (eg, positive, negative, zero, etc.), it is determined which of the two conditions' luminance data contains more bias. The fluorescence intensity equilibrium axis and the expression level are determined by subtracting the bias from the luminance data under the conditions determined to contain a large amount of bias (for example, by rotating the coordinates for a gene population having a constant bias). A new X-Y fluorescence intensity scatter plot is constructed with two magnification axes, so the bias of the measured values can be efficiently removed and the characteristics of the data can be clearly expressed. Can be created.

以下にバイアス補正手順の一例を詳細に説明する。 Hereinafter, an example of the bias correction procedure will be described in detail.

i ) 対照蛍光測定値の一般関係式 i) General relational expression for control fluorescence measurement

本発明によるバイアス kの補正は、蛍光測定値 Aと Bの関係を表す一般式（1) あるいは（1 ' ) に基づく。 The correction of the bias k according to the present invention is based on the general formula (1) or (1 ′) representing the relationship between the fluorescence measurement values A and B.

Lo g₂B = a Lo g₂ (A-k) +b (1) Lo g ₂ B = a Lo g ₂ (Ak) + b (1)

L o g ₂ (B-k) = a L o g A+ b ( 1 ' ) ここで、 a, b, kは未知のパラメータ定数である。 Aと Bのうち、より多くのバイアスを含む方から、平均バイアス kを差し引く。すなわち、 Aのバックグラウンドのノイズが Bより大きく、多くのバイアスを含んでいる場合には、式 1を用いることになり、一方、 Bのバックグラウンドのノイズが Aより大きく、多くのバイァスを含んでいる場合には、式 1 ' を用いることになる。 a、 b、および、 kは（ Lo g ₂A-L o g ₂B) の直交軸系の蛍光測定のプロット図から推測する。 L og ₂ (Bk) = a L og A + b (1 ′) where a, b, and k are unknown parameter constants. Subtract the average bias k from A and B, which contain more bias. That is, if the background noise of A is larger than B and includes many biases, Equation 1 is used, while the background noise of B is larger than A and many biases are used. If it does, use Equation 1 '. a, b, and k are inferred from the plots of the fluorescence measurements of the orthogonal axis system of (Log ₂ AL og ₂ B).

i i) 分散'共分散行列を用いた主成分分析による蛍光強度平衡軸の抽出 i i) Extraction of fluorescence intensity equilibrium axis by principal component analysis using 'variance' covariance matrix

発現量が同じであれば、 D N Aマイクロアレイや D N Aチップの対照実験の蛍光測定値は、理論的には（L o g ₂A— L o g ₂B) 直交軸系の蛍光強度散布図上において 1 ： 1を示す直線 Lo g₂A=Lo g₂B上に位置するはずである。し力し、蛍光物質の性質の違い、実験条件の違い等の原因で、同じ蛍光強度を示す蛍光強度平衡軸（すなわち、各遺伝子のスポットについて、 2つの条件で発現量が同等である遺伝子集団より得られた漸近線）が Lo g A=L o g₂Bに従わないことがある。この場合、調べる遺伝子数は標本として十分（例えば、千以上）であり、また、発現量が変化する遺伝子である変動遺伝子の数は全体数に対して低い割合であることを前提として、蛍光強度平衡軸は（Lo gzAi, Lo g₂B i) 集団の漸近線であると仮定する。 If the expression level is the same, fluorescence measurements of control experiments DNA microarrays and DNA chips, to theoretically have you on the diagram the fluorescence intensity scatter of _{_{(L og 2 A- L og 2}} B) orthogonal axis system 1 : Should be located on the straight line Lo g ₂ A = Lo g ₂ B indicating 1. However, due to differences in the properties of the fluorescent substances, differences in experimental conditions, etc. 衡軸(i.e., the spots of each gene, the amount expressed in two conditions obtained from gene cluster is equivalent asymptote) may not follow the _{Lo g A = L og 2 B} . In this case, assuming that the number of genes to be examined is sufficient as a sample (for example, a thousand or more), and that the number of fluctuating genes whose The intensity equilibrium axis is assumed to be the asymptote of the (Lo gzAi, Log ₂ B i) population.

ここで、 Aiと Biが kよりはるかに大きい値の場合、つまりバイアスの影響が少なく無視できる場合、式 1と式 1 ' はし o g.,B=a Lo ,A+ b (2) に近似できる。このとき、傾き aと切片 bを求めるために、分散 ·共分散行列を用いた主成分分析を行なう。尚、分散 ·共分散行列を用いた主成分分析は、従来から遺伝子の解析で使われている相関行列を用いた主成分分析法と異なり、正規化を要しない。 Here, if Ai and Bi are much larger than k, that is, if the effect of the bias is negligible, Equation 1 and Equation 1 'can be rewritten as o g., B = a Lo, A + b (2) Can be approximated. At this time, principal component analysis using a variance / covariance matrix is performed to obtain the slope a and the intercept b. Principal component analysis using a variance / covariance matrix does not require normalization, unlike principal component analysis using a correlation matrix, which has been conventionally used in gene analysis.

ここで、図 1は分散 ·共分散行列を用いた主成分分析の概念を示す図である。 L 08₂ をに、 L o g₂Bを yに簡略化すると、漸近線を表す式 2は、 Here, FIG. 1 is a diagram showing the concept of principal component analysis using a variance / covariance matrix. Simplifying L 08 ₂ to, and L og ₂ B to y, Equation 2 representing the asymptote is

V； a X + b (3) となる。 V; a X + b (3)

従って、各点（_{X i}, y；) から漸近線までの距離 d _;は、

により求められる。 Therefore, the distance d _; from each point ( _{X i} , y;) to the asymptote is

Required by

また、全ての点から漸近線までの距離 Dは、

となる。 The distance D from all points to the asymptote is

Becomes

ここで、距離 Dが最小となる場合分布図上で最も適切となる漸近線のパラメータ aと bが決められる。 Here, when the distance D is the minimum, the parameters a and b of the asymptote that are most appropriate on the distribution map are determined.

距離 Dが最小の場合には、 When the distance D is minimum,

の二つの条件を満たす。 Satisfies the two conditions.

また、式 6より、 b = Y-aX (8) となる。ただし、 Fはの平均、は X iの平均を意味する _c From Equation 6, b = Y-aX (8). However, F is the average of the means the average of X i _c

また、式 7より、

となる。ただし、式 9で、 aは二つの解のうち、ゼロより大きいものとする。また、 3 ま；の分散、 3 ま ₁の分散、 S_xyは X iと y _;の共分散を意味する。実際の補正では、 aと bは積 A _; B iの上位遺伝子集団（Lo g₂A, L o g₂B) を用いる _c 簡単な計算法としては全遺伝子の積 A _; B iの上位（例として 70 %) の遺伝子集団を用いて求める。正確求めるには、 DN A濃度希釈系列の品質管理用のコントロール遺伝子サンプノレ（例えば外部遺伝子え DN Aサンプル、あるいは発現量がほとんど変わらないリボソームなどの Ho u s e— k e e p i n g遺伝子サンプル）を目的遺伝子サンプルと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一^ 3ずつコントロール遺伝子を除き、残りすベてのコントロール遺伝子サンプノレのデータから遺伝子の発現量と DN A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に強い相関が認められる基準（例えば 0. 8以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 1とし、二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子サンプルの集団を発現量が多レ、遺伝子集団とする。 i i i) バイアスの修正 Also, from equation 7,

Becomes Where, in Equation 9, a is greater than zero among the two solutions. Also, 3 or; the dispersion, 3 or ₁ of dispersion, S _xy is X i and _y; means a covariance. In actual correction, a and b are the product _A; B-level gene cluster _{(Lo g 2 A, L og} 2 B) of the i of _c A simple calculation method all genes using the product _A; higher B i (eg 70%). To obtain an accurate determination, use a control for quality control of the DNA concentration dilution series. The same gene sample as the target gene sample is measured simultaneously with the target gene sample, and the gene with the smallest product is measured. The control genes are removed one by three in order, and calibration curves for the gene expression level and DNA level are created from the remaining control gene sample data, and the correlation coefficient of the data is calculated and calculated in order. When the above correlation coefficient first satisfies the criterion (eg, 0.8 or more) for which a strong correlation is initially recognized, the product of the fluorescence intensity data under the two conditions of the control sample is defined as the threshold 1, and the two The population of all the gene samples whose product of the fluorescence intensity data under the condition exceeds the threshold 1 is defined as the gene population whose expression level is high. iii) Correction of bias

(L o g₂A-L o g₂B) の直交軸系では、 Aはバックグラウンドのノイズが大きく、 Bより多くのバイアスを含んでいる場合、漸近線と Lo g₂A軸との交わる点の座標は（A_c, 0) とすると、式 1より a L o g₂ (2 " A -k) +b = 0 (10) となり、 k = 2 A, — ₂— b/_a (1 1) となる。 In the orthogonal system of (L og ₂ AL og ₂ B), A is the coordinate of the point where the asymptote intersects with the Log ₂ A axis if the background noise is large and contains more bias than B If (A _c , 0), then from Equation 1, a L og ₂ (2 "A -k) + b = 0 (10) and k = 2 A, — ₂ — b / _a (1 1) .

また、 Bはバックグラウンドのノイズが大きく、 Aより多くのバイアスを含んでいる場合、漸近線と Lo g₂B軸との交わる点の座標は（0, B_c) とすると、式 1 ' より、 a L o g ₂ (2 " B„— K) =b (12) となり、 k = 2 " B_c- 2^b/a (13) となる。 Also, if B has a large background noise and contains more bias than A, then the coordinates of the point where the asymptote intersects the Log ₂ B axis are (0, B _c ). , A L og ₂ (2 "B„ — K) = b (12) And k = 2 "B _c -2 ^{b / a} (13).

ここで、 a, bはすでに求められているため、 A_cと B_eはそれぞれ（L o g₂A — Lo g₂B) の直交軸系の積 A iB _;の下位遺伝子集団（Lo g₂A， L o g₂B) カら求められた漸近線と L o g₂A軸、あるいは、 L o g ₂B軸の交差点の値として求められる。蛍光測定値の小さい遺伝子は、誤差の強い影響を受けるため、積 A; B _;の下位遺伝子集団（L o g ₂A, L o g₂B) の漸近線の計算に使われる遺伝子は，簡単な計算法は全遺伝子の積 A i B；の下位（例として 10%) を用いる。正確に求めるには、 DNA濃度希釈系列の品質管理用のコントロール遺伝子サンプノレ（例えば外部遺伝子 λ D Ν Αサンプル、あるいは発現量がほとんど変わらないリボソームなどの Ho u s e— k e e p i n g遺伝子サンプル）を目的遺伝子サンプノレと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一つずつコント口 —ノレ遺伝子を除き、残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と DN A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に弱い相関が認められる基準（例えば 0. 5以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 2とし（ただし、閾値 2<閾値 1) 、二つの条件における蛍光強度データの積が閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少ない遺伝子集団とする。 Here, a, b because it has already been determined, A _c and B _e are each - product of orthogonal axes system _{_{(L og 2 A Lo g 2}} B) A iB; lower gene cluster (Lo g ₂ A , L og ₂ B) It is obtained as the value of the intersection of the asymptote obtained from f and the L og ₂ A-axis or the L og ₂ B-axis. Since genes with small fluorescence measurements are strongly affected by errors, the genes used to calculate the asymptote of the lower gene population (Log ₂ A, Log ₂ B) of the product A; B _; The method uses the lower (for example, 10%) of the product of all genes A i B; To obtain it accurately, a control gene for control of the quality of a DNA concentration dilution series, such as a control gene for sample control (for example, a sample of an external gene λD Ho 、, or a House-keeping gene sample such as a ribosome whose expression level hardly changes) is used as the target gene. Measured at the same time as Sampnore, one control at a time starting with the gene with the smallest product of the fluorescence intensity data.- Excluding the Nore gene, calibration of the gene expression level and DNA amount from the data of all remaining control gene samples Create a line for each, calculate the correlation coefficient of the data, and calculate the correlation coefficient of the control sample if the above correlation coefficient, which is calculated in order, first meets the criteria for weak correlation (for example, 0.5 or more). The product of the fluorescence intensity data under the two conditions is defined as threshold 2 (where threshold 2 <threshold 1), and the fluorescence intensity data under the two conditions is used. Product expression levels a population of all the genetic sample below the threshold 2 is less genetic Group The.

また、測定ィ直（Α,) と（Bi) とのどちらがより大きいバイアスを含むことを判断するには、漸近線が L o g₂A軸と Lo g ₂B軸のどちらかに交差することにより判断できる。このとき、切片 =B_C. (Bがより多くのバイアスを含む場合）（14) あるいは、切片 = A_C. (Aがより多くのバイアスを含む場合） (14' ) となる。 Also, to determine which of the measurement lines (と,) and (Bi) contains a larger bias, the asymptote must intersect either the Log ₂ A axis or the Log ₂ B axis. Can be determined by Then the intercept = B _C. (When B contains more bias) (14) Or intercept = A _C. (When A contains more bias) (14 ').

i V) バイアスの判定 i V) Bias judgment

図 2は新しい座標系での漸近線を求める処理の概念を示す図である。 FIG. 2 is a diagram showing a concept of a process for obtaining an asymptote in a new coordinate system.

最小二乗法により、積 A； B _;の下位遺伝子集団の漸近線として、 By the least squares method, the asymptote of the subgene population of the product A; B _;

y = α + 3 (15) が求められる。 y = α + 3 (15) is obtained.

ただし、最小二乗法の独立変数と従厲変数を決めるには、まず（Lo g₂A— L o g₂B) 軸系は積 A； B；の上位遺伝子集団から求めた漸近線を新たな X軸とする軸系に回転する必要がある。よって、（Lo g₂Ai， Lo g₂B₅) の新しレ、座標 ( L o g₂Aj' , L o g ₂B i ' ) は、 However, in order to determine the independent variable and the dependent variable of the least squares method, first, the (Log ₂ A—Log ₂ B) axis system uses the asymptote obtained from the upper gene group of the product A; B; It is necessary to rotate to the axis system. Therefore, the new coordinates of (Lo g ₂ Ai, Log ₂ B ₅ ) and the coordinates (L og ₂ Aj ', L og ₂ B i') are

(Log₂AA ( CosB SineYLog₂A_i (Log ₂ AA (CosB SineYLog ₂ A _i

(16) (16)

- Sind CosB A より求められる。 -Required by Sind CosB A.

また、傾き α= t a η Θから、 tan0 a From the slope α = t a η 、, tan0 a

Sine = , CosQ = Sine =, CosQ =

+ tan²0 Vl+α' が求められる。 + tan ²⁰ Vl + α 'is required.

次に、新しい座標系で AiB iの下位遺伝子集団（A , ) の漸近線を最小二乗法で求める。 .で漸近線を y =m x + n (1 7) とする。 mが負数の場合、 Bが Aより多くのバイアスを含むと判定する。一方、 m が正数の場合、 Aが 13より多くのバイアスを含んでいると判定する。 Next, minimize the asymptote of the subgene population (A,) of AiB i in the new coordinate system. Calculate by the square method. The asymptote is assumed to be y = mx + n (1 7). If m is negative, it is determined that B contains more bias than A. On the other hand, if m is a positive number, it is determined that A contains more than 13 biases.

V) バイアスの計算 V) Calculation of bias

式 17で示す漸近線において、 mが負数の場合、（L o g ₂ A, L o g₂B) 軸系において、積 AiB _;の下位遺伝子集団（Ai，， Bi' ) のデ一タを用いて Lo g₂ B軸との切片は、最小二乗法（Lo g₂Aは独立変数、 Lo g₂Bは従属変数）より求められる。 In the asymptote shown in Eq. 17, if m is a negative number, in the (L og ₂ A, L og ₂ B) axis system, the data of the lower gene group (Ai,, Bi ') of the product AiB _; The intercept with the Log ₂ B axis is obtained by the least squares method (Log ₂ A is an independent variable, Log ₂ B is a dependent variable).

(18)

n it (18)

n it

切片 B ―》ん <¾r₂ — α 〉 Log2^Ai (19) i=l となる。 Intercept B-> n <¾r ₂ — α> Log2 ^A i (19) i = l.

一方、 mが正数の場合、（Lo g₂A, Lo g₂B) 軸系において、積 AiBiの下位遺伝子集団（Α , 13 ) のデータを用いて Lo g₂A軸との切片は、最小二乗法より（Lo g₂Bは独立変数、 Lo g₂Aは従属変数）で求められる。 On the other hand, when m is a positive number, in the (Lo g ₂ A, Log ₂ B) axis system, the intercept with the Log ₂ A axis is calculated using the data of the lower gene group (Α, 13) of the product AiBi. (Log ₂ B is an independent variable, and Log ₂ A is a dependent variable).

n n

nn

切片 A_C =— L^₂A—丄丄 y ₂^ (2 i) となる。 The intercept A _C = — L ^ ₂ A— 丄丄 y ₂ ^ (2 i).

第二段間のデータ補正は、対照測定値の片方のデータ全体に対して、式 1 1、あるいは、式 1 3で得られたバイアスを差し引くことで行われる。 The data correction between the second stage is performed by subtracting the bias obtained by Equation 11 or 13 from the entire data of one of the control measurement values.

以上の補正により、新たなデータプロット図（L o g ₂Ai, L o g ₂ (B i— k) ) 、あるいは、（L o g ₂ (Ai_k) , L o g ₂B i) を用いて（以下、「補正プロット（L o g ₂Ai, L o g ₂B J」という）、次段階の分析に進む。従って、式 1 あるいは、式 1 ' は、 With the above correction, a new data plot (L og ₂ Ai, L og ₂ (B i−k)) or (L og ₂ (Ai_k), L og ₂ B i) is used (hereinafter, “ The correction plot (referred to as “Log ₂ Ai, Log ₂ BJ”) proceeds to the next stage of analysis.

L o g ₂B = K L o g ₂A+ I (2 2) として表現できる。 L og ₂ B = KL og ₂ A + I (2 2)

2. 多重検定による発現量が変ィヒした遺伝子の頑健（口バス卜）検出法 2. Robust detection of genes with altered expression levels by multiple tests

本方法において、補正されたデータは発現量が変わる遺伝子集団と発現量が変わらない遺伝子集団との混合分布で構成されていると仮定する。まず、データ対ごとに、蛍光強度平衡軸方向に一定区間内のウィンドウを設定し、各ウィンドウ内でスチューデントの t一分布に基づいた任意危険率の信頼限界点を求める。続いて、蛍光強度平衡軸（X軸）方向に一定遺伝子ずつウィンドウを移動させ、各信頼限界点を求める。求めた複数の信頼限界点を平滑ィ匕（スプライン）により補完し、信頼境界線（信頼曲線）とする。 In this method, it is assumed that the corrected data is composed of a mixed distribution of a gene group whose expression level changes and a gene group whose expression level does not change. First, for each data pair, a window is set within a certain interval in the direction of the fluorescence intensity equilibrium axis, and within each window, the confidence limit of the arbitrary risk factor based on the Student's t-distribution is determined. Then, move the window by a fixed number of genes in the direction of the fluorescence intensity equilibrium axis (X-axis) to find each confidence limit point. The obtained plurality of confidence limit points are complemented by a smooth line (spline) to make a confidence boundary line (confidence curve).

この結果より、信頼境界線の外側に位置する遺伝子を発現量が変わった遺伝子として選択する。さらに高い抽出信頼性を得るため、繰り返し実験による多数決の比率を基準にして、確実に発現量の変わった遺伝子を選択する。次に、抽出の第一種のエラーを減らすため、マルチテストで、決められた回数以上発現量が変化したとして抽出された場合にのみ、遺伝子の発現量が変化したと認める。 Based on this result, genes located outside the confidence boundary are selected as genes whose expression levels have changed. In order to obtain even higher extraction reliability, the genes whose expression levels have been changed are surely selected based on the ratio of the majority decision by repeated experiments. Next, in order to reduce errors of the first kind of extraction, the multi-test recognizes that the gene expression level has changed only if the expression level has changed more than a predetermined number of times.

( 1 ) 蛍光強度平衡軸と発現量の比によるデータ分布の再構築図 3は、分布図の再構築を概念的に示す図である。図 3に示すように、各補正プロット（L o g ₂A i , L o g ₂ B i ) から蛍光強度平衡軸 L o g ₂ B i = κ L o g ₂ A i + Iまでの垂直の距離は、発現量の比に比例すると考えられる。また、蛍光強度平衡軸上、右へ移動する程、蛍光強度が比例して高くなるのは明らかである。従つて、各遺伝子から蛍光強度平衡軸までの距離を計算して Y軸の値とし、蛍光強度平衡軸を X軸にした蛍光強度散布図はデータの性質を明白に表現できる。ここで各遺伝子の Y軸の値 d ₂ (発現量の倍率）は、式 4により計算する。 (1) Reconstruction of data distribution based on ratio of fluorescence intensity equilibrium axis to expression level FIG. 3 is a diagram conceptually showing reconstruction of the distribution map. As shown in FIG. 3, the correction plot vertical distance from _{(L og 2 A i, L} og 2 B i) to the fluorescent intensity equilibrium axis _{L og 2 B i = κ L} og 2 A i + I is It is considered to be proportional to the expression level ratio. Also, it is clear that the fluorescence intensity increases proportionately to the right on the fluorescence intensity equilibrium axis. Therefore, the distance from each gene to the fluorescence intensity equilibrium axis is calculated and used as the value on the Y axis, and the fluorescence intensity scatter plot with the fluorescence intensity equilibrium axis as the X axis can clearly express the nature of the data. Here, the Y-axis value d ₂ (magnification of the expression amount) of each gene is calculated by Equation 4.

また、各遺伝子の X軸の値 d i (蛍光強度）は、蛍光測定値 Aと Bの集団は様々な誤差を含んでいるにもかかわらず、全体的に Aと Bの関係を示す式 2 2に従う。そして、再構築した蛍光強度散布図は、（蛍光強度一発現量の変化率）の X— Y軸を持つ。 In addition, the X-axis value di (fluorescence intensity) of each gene is calculated by the equation 22 which shows the relationship between A and B as a whole, despite the fact that the populations of the fluorescence measurement values A and B contain various errors. Obey. Then, the reconstructed fluorescence intensity scatter diagram has the XY axis of (fluorescence intensity-change rate of expression level).

( 2 ) 発現量が変わる遺伝子集団と、発現量が変わらない遺伝子集団との混合分布モデルの多重検定 (2) Multiple testing of a mixed distribution model of a gene population whose expression level changes and a gene population whose expression level does not change

図 4から図 6は、発現倍率の混合正規分布モデルを示す図である。 4 to 6 are diagrams showing a mixed normal distribution model of the expression fold.

実際のデータの分布は、発現量が変わった遺伝子（変動遺伝子）の集団と、発現量が変わらない遺伝子（非変動遺伝子）の集団の混合分布であると考えることができる。本方法の混合分布モデルは、図 4に示すように、発現量の変化率を表す Y軸において、ゼロを中心とした非変動遺伝子の集団分布と、それぞれ発現比が上昇、および、下降したある一点を中心とした変動遺伝子の集団分布からなっていると仮定している。ここでは説明の便宜上、正規分布のみを示すが、本発明は正規分布の場合に限定されず、全ての分布のデータに適用することができる。 The distribution of the actual data can be considered to be a mixed distribution of a group of genes whose expression level has changed (variable genes) and a group of genes whose expression level does not change (non-variable genes). As shown in Fig. 4, the mixture distribution model of this method has a population distribution of non-variable genes centered on zero on the Y-axis that represents the rate of change in the expression level, and an expression ratio that increases and decreases, respectively. It is assumed that it consists of a population distribution of fluctuating genes around one point. Here, for convenience of explanation, only the normal distribution is shown, but the present invention is not limited to the normal distribution, and can be applied to data of all distributions.

ここで、図 5に示すように、変動遺伝子の全体に対する割合がそれほど大きくない場合（例えば、変動遺伝子の集団が全体数の 1 0 %を占める場合など）には、その混合分布は図 6のように、正規分布に近似する。従って、一定の信頼限界値である P値（P— v a l u e ) を条件にした混合分布の蛍光倍率データに対して t分布に基づき、変動遺伝子を抽出できる。 Here, as shown in Fig. 5, when the ratio of the fluctuating genes to the whole is not so large (for example, when the fluctuating gene population occupies 10% of the total number), the mixture distribution is It approximates to normal distribution as shown in 6. Therefore, it is possible to extract a fluctuating gene based on the t-distribution with respect to the fluorescence magnification data of the mixture distribution under the condition of the P value (P-value) which is a certain confidence limit value.

本方法は、実際のデータの分散と中心の計算に基づいて発現量変動倍率の閾^:を決めているため、本方法は頑健（ロバスト）であるという特徴を持っている。すなわち、本方法は誤差の範囲が異なる実験データでもその誤差に応じて、発現量変動倍率の閾値が決められる。また、本方法のもう一つの特徴として、同じ条件の対照実験で得られた異なるデータセットに対して、数回の検出を行ない、あら力、じめ決めた回数以上検出される遺伝子のみを選択することにより、高い信頼度で変動遺伝子を検出できることが挙げられる。 This method is based on the calculation of the variance and the center of the actual data. As a result, the method has the feature of being robust. In other words, in the present method, even in experimental data having a different error range, the threshold value of the expression amount variation magnification is determined according to the error. Another feature of this method is that several detections are performed on different data sets obtained from control experiments under the same conditions, and only those genes that are detected more than a predetermined number of times are detected. By selecting this, it is possible to detect variable genes with high reliability.

さらに、非変動遺伝子、および、変動遺伝子集団の混合分布を、六つのパラメ一タ（全遺伝子数、発現が変動する遺伝子の割合、遺伝子分布の標準偏差（幅）、発現が変動する遺伝子の分布の中心、検出基準（検出数/全体数）、および、各データセット（ウィンドウ）内の信頼限界値（P_v a l u e) ) を変えてシミュレ一シヨンすることにより、第一種の検出エラーと第二種の検出エラーを計算できる。その結果は、実験のガイドラインとすることができる。ここで、「第一種の検出ェラー」は、変わらないものが変わるものとして検出された偽陽性エラーをいい、「第二種の検出エラー」は、変わるものが変わらないものとして検出された偽陰性ェラーをレヽう。 Furthermore, the mixed distribution of non-variable genes and fluctuating gene populations is calculated using six parameters (total number of genes, percentage of genes whose expression fluctuates, standard deviation (width) of gene distribution, and genes whose expression fluctuates). By changing the center of the distribution, the detection criterion (number of detections / total number), and the confidence limit (P_value) in each data set (window), the simulation error of the first kind and the second Two types of detection errors can be calculated. The results can be used as experimental guidelines. Here, `` Type 1 detection error '' refers to a false positive error that was detected as something that did not change, and `` Type 2 detection error '' was detected as something that changed did not change. Check for false negative errors.

(3) 移動ウィンドゥ法を用いたデータに合わせた発現変動信頼曲線の作成蛍光強度の小さレ、遺伝子ほど、その発現変化量の値がバックグラウンドなどの誤差に強い影響を受ける。例えば、対照実験で各蛍光値に除去不可能の誤差 α_Λと o;_B が存在するとすれば、ある遺伝子 iの発現変化量は、蛍光倍率 = (Α,-α_Λί) / (B i-a_{B i}) として現れる。従って、 Ai >〉a_{A i}、そして、 B _;〉〉 ct _B iの場合、蛍光倍率は A i/B iとして近似できるが、 Aiと α_Λいそして、 B iとひ _{B i}の値が近い場合は、その誤差による影響は無視できない。よって、実際に t分布に基づき、遺伝子を選択する場合、バックグラウンドなどの誤差により異なる程度をもって影響を受ける遺伝子集団が混在することを考えると、蛍光強度に応じて異なる集団の t値を決定するべきである。 (3) Creation of an expression fluctuation confidence curve in accordance with the data using the moving window method The smaller the fluorescence intensity and the gene, the more strongly the expression change value is affected by errors such as background. For example, a control experiment with non-removable error alpha _lambda and o in each fluorescence value; if _B is present, altered expression of a gene i is the fluorescence ratio _{= (Α, -α Λί) /} (B ia B _i ). Therefore, Ai >> a _{A i,} then _B; >> For ct _B i, although fluorescent magnification can be approximated as A i / B i, Ai and alpha _lambda There Then, the value of B i Tohi _{B i} If they are close, the effect of the error cannot be ignored. Therefore, when selecting genes based on the t-distribution, the t-values of different populations are determined according to the fluorescence intensity, considering that gene populations affected to different degrees due to background and other errors are mixed. Should be.

ここで、図 1 1〜図 1 4は、信頼曲線の作成を概念的に示した図である。まず、図 1 1に示すように、本装置は、一定遺伝子数で構成されたウィンドウ内の遺伝子の発現量の倍率分布に対して分散と中心を計算して、倍率変化の t値を決める（倍率座標軸の値に相当する）。尚、この発現変化の信頼限界点の蛍光強度平衡軸上の値はウィンドウ内部の全ての it伝子の蛍光強度平衡軸値 m e d i a n値を用いることとする。 Here, FIGS. 11 to 14 are diagrams conceptually showing the creation of a confidence curve. First, as shown in Fig. 11, this device uses a gene in a window composed of a fixed number of genes. Calculate the variance and center for the fold distribution of the expression level, and determine the t-value of the fold change (corresponding to the value on the fold coordinate axis). Note that the median value of the fluorescence intensity equilibrium axis value of all the it genes in the window is used as the value on the fluorescence intensity equilibrium axis at the confidence limit point of this expression change.

次に、本装置は、図 1 2に示すように、ウィンドウ内の蛍光強度平衡軸上下において、発現変化の信頼限界点の座標をそれぞれ決めた後、蛍光強度平衡軸が増加する方向に一定遺伝子分ウィンドウを移動させる。以降、この操作を繰り返す。本装置は、全ての発現変化の信頼限界点の計算を行なった後、発現変化の信頼限界点を 3次スプライン曲線によって、信頼限界点同士をつなぎ発現変化境界線である発現変動信頼曲線を作成する。ここで、両端のウィンドウにおいて、 3次スプラィン曲線による補完ができない蛍光強度の領域では、図 1 3で示すように、蛍光強度の高レ、ところ（点線で示す）では最後のウィンドウで求めた発現変化の信頼限界点の水平延長線を用い、また蛍光強度の低いところ（点線で示す）では一番左から続いた数十個のウインドウの境界点から最小二乗法により求めた漸近線の補外を発現変動信頼曲線（補外発現変化境界線）とする。 Next, as shown in Fig. 12, the system determines the coordinates of the confidence limit point of the expression change above and below the fluorescence intensity equilibrium axis in the window, and then the direction in which the fluorescence intensity equilibrium axis increases. To move the window for a certain number of genes. Thereafter, this operation is repeated. After calculating the confidence limit points of all expression changes, this device connects the confidence limit points of the expression changes by cubic spline curves to each other, and expresses the expression variation reliability curve that is the boundary line of the expression change. Create Here, in the window at both ends, in the region of the fluorescence intensity that cannot be complemented by the cubic spline curve, as shown in Fig. 13, the fluorescence intensity is high, but in the last window (indicated by the dotted line), The asymptote obtained by the least-squares method from the boundary points of several tens of windows that continued from the leftmost point using the horizontal extension line of the confidence limit point of the calculated expression change, and where fluorescence intensity is low (indicated by the dotted line). The extrapolation of is used as the expression fluctuation reliability curve (extrapolation expression change boundary line).

ついで、図 1 4に示すように、蛍光強度平衡軸上下の発現変動信頼曲線で挟んだ領域より外れた遺伝子を、発現量が変化した遺伝子、つまり、発現量が上昇、あるいは、下降したものとして抽出する。最終的な遺伝子の抽出は、前述した多重検定 ( 2— ( 2 ) ) により行なう。 Next, as shown in Fig. 14, the genes whose expression level was changed, that is, the genes outside the region sandwiched by the expression fluctuation reliability curves above and below the fluorescence intensity equilibrium axis, that is, the expression level increased or decreased Extract as things. The final gene extraction is performed by the multiple test (2- (2)) described above.

[装置構成] [Device configuration]

次に、遺伝子発現情報解析装置の構成について以下に図 2 2〜図 2 5を参照して説明する。図 2 2は、本発明が適用される本装置の構成の一例を示すプロック図であり、該構成のうち本発明に関係する部分のみを概念的に示している。 Next, the configuration of the gene expression information analyzer will be described below with reference to FIGS. FIG. 22 is a block diagram showing an example of the configuration of the present apparatus to which the present invention is applied, and conceptually shows only those parts of the configuration relating to the present invention.

図 2 2において遺伝子発現情報解析装置 1 0 0は、概略的に、遺伝子発現情報解析装置 1 0 0の全体を統括的に制御する C P U等の制御部 1 0 2、通信回線等に接続されるルータ等の通信装置（図示せず）に接続される通信制御インタ—フェース咅 0 4、入力装置 1 1 2や出力装置 1 1 4に接続される入出力制御インターフエース部 1 0 8、および、各種のデータベースやテーブルなどを格納する記憶部 1 0 6を備えて構成されており、これら各部は任意の通信路を介して通信可能に接続されている。さらに、この遺伝子発現情報解析装置 1 0 0は、ルータ等の通信装置および専用線等の有線または無線の通信回線を介して、ネットワークに通信可能に接続されてもよい。 In FIG. 22, the gene expression information analyzer 100 is connected to a control unit 102 such as a CPU for controlling the entire gene expression information analyzer 100 and a communication line. Communication control interface 咅 04 connected to communication devices (not shown) such as routers, input / output control interfaces connected to input devices 112 and output devices 114 And a storage unit 106 for storing various databases and tables. These units are communicably connected via an arbitrary communication path. Further, the gene expression information analyzer 100 may be communicably connected to a network via a communication device such as a router and a wired or wireless communication line such as a dedicated line.

記憶部 1 0 6に格納される各種のデータベースやテーブル（測定輝度データ 1 0 6 aおよびシミュレーション結果データ 1 0 6 b ) は、固定ディスク装置等のストレージ手段であり、各種処理に用いる各種のプログラムゃテ一ブルやファイルゃデータベースゃゥェブぺージ用フアイル等を格納する。 Various databases and tables (measured luminance data 106a and simulation result data 106b) stored in the storage unit 106 are storage means such as a fixed disk device, and are used for various types of processing. Stores program tables and files for file database pages.

これら記憶部 1 0 6の各構成要素のうち、測定輝度データ 1 0 6 aは、 D NAチップゃ D N Aマイクロアレイなどにより実験された遺伝子の発現量を示す各スポットの測定輝度データを各実験毎に格納した測定輝度データ格納手段である。また、シミュレーション結果データ 1 0 6 bは、本装置によるシミュレーション結果データを格納したシミュレーション結果データ格納手段である。 Among the constituent elements of the storage unit 106, the measured luminance data 106a is the measured luminance data of each spot that indicates the expression level of the gene that was tested by a DNA chip or DNA microarray. This is a measured luminance data storage means stored for each experiment. The simulation result data 106 b is a simulation result data storage unit that stores simulation result data by the present apparatus.

また、図 2 2において、通信制御ィンターフェース部 1 0 4は、遺伝子発現情報解析装置 1 0 0とネットワーク（またはルータ等の通信装置）との ¾における通信制御を行う。すなわち、通信制御インターフェース部 1 0 4は、他の端末と通信回線を介してデータを通信する機能を有する。 In FIG. 22, a communication control interface unit 104 controls communication between the gene expression information analysis device 100 and a network (or a communication device such as a router). That is, the communication control interface unit 104 has a function of communicating data with another terminal via a communication line.

また、図 2 2において、入出力制御インターフェース部 1 0 8は、入力装置 1 1 2や出力装置 1 1 4の制御を行う。ここで、出力装置 1 1 4としては、モニタ（家庭用テレビを含む）の他、スピーカを用いることができる（なお、以下においては出力装置 1 1 4をモニタとして記載する場合がある）。また、入力装置 1 1 2としては、キーボード、マウス、および、マイク等を用いることができる。また、モニタも、マウスと協働してボインティングデバイス機能を実現する。 In FIG. 22, an input / output control interface unit 108 controls the input device 112 and the output device 114. Here, as the output device 114, in addition to a monitor (including a home television), a speaker can be used (hereinafter, the output device 114 may be described as a monitor). As the input device 112, a keyboard, a mouse, a microphone, and the like can be used. The monitor also implements a pointing device function in cooperation with the mouse.

また、図 2 2において、制御部 1 0 2は、 O S (O p e r a t i n g S y s t e m) 等の制御プログラム、各種の処理手順等を規定したプログラム、および所要データを格納するための内部メモリを有し、これらのプログラム等により、種々の処理を実行するための情報処理を行う。制御部 1 0 2は、機能概念的に、バックグラウンド補正部 1 0 2 a、バイアス補正部 1 0 2 b、遺伝子検出部 1 0 2 c、および、シミュレーション部 1 0 2 dを備えて構成されている。 Further, in FIG. 22, the control unit 102 has a control program such as an OS (Operating System), a program defining various processing procedures, and an internal memory for storing required data. By these programs, various Information processing for executing the processing is performed. The control unit 102 is conceptually provided with a background correction unit 102 a, a bias correction unit 102 b, a gene detection unit 102 c, and a simulation unit 102 d. It is configured.

このうち、バックグラウンド補正部 1 0 2 aは、 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を除去することによりバックグラウンド補正された輝度データを作成するバックグラウンド補正手段である。 Of these, the background correction unit 102a corrects the background by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under the two conditions. This is the background correction means for creating the brightness data.

また、バイァス補正部 1 0 2 bは、バックダラゥンド補正手段によりバックダラゥンド補正された輝度データの対数を X— Y軸にとり蛍光強度散布図を作成し、各遺伝子のスポットについて蛍光強度平衡軸に対するバイアスを求め、輝度データから当該バイァスを除去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するバイァス補正手段である。 In addition, the bias correction unit 102b generates a fluorescence intensity scatter diagram by taking the logarithm of the luminance data subjected to the back ground correction by the back ground correction unit on the XY axis, and creates a bias with respect to the fluorescence intensity equilibrium axis for each gene spot. This is a bias correction means for constructing a new XY-axis fluorescence intensity scatter diagram having two axes of the fluorescence intensity equilibrium axis and the magnification axis of the expression level by removing the bias from the obtained luminance data.

ここで、図 2 3は、バイアス補正部 1 0 2 bの構成の一例を示すブロック図であり、該構成のうち本発明に関係する部分のみを概念的に示している。図 2 3に示すように、バイァス補正部 1 0 2 bは、機能概念的に、第一主成分作成部 1 0 2 e、座標回転部 1 0 2 f 、バイアス判定部 1 0 2 g、および、捕正プロット生成部 1 0 2 hを備えて構成されている。 Here, FIG. 23 is a block diagram showing an example of the configuration of the bias correction section 102b, and conceptually shows only those portions of the configuration relating to the present invention. As shown in FIG. 23, the bias correction unit 102 b is functionally conceptually composed of a first principal component creation unit 102 e, a coordinate rotation unit 102 f, a bias determination unit 102 g, and It is configured to include a capture plot generator 102h.

図 2 3において、第一主成分作成部 1 0 2 eは、発現量が多い遺伝子集団の対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求める第一主成分作成手段である。 In Fig. 23, the first principal component generator 102 e performs principal component analysis using the logarithmic value of the gene group with a high expression level, and finds the slope and intercept of the asymptote as the first principal component This is the first principal component creating means.

また、座標回転部 1 0 2 f は、第一主成分作成手段により求めた漸近線と X軸との角度を 0とし、発現量が少ない遺伝子集団の X— Y軸系における座標を右にり角度回転した座標を計算する座標回転手段である。 Also, the coordinate rotation unit 102 f sets the angle between the asymptote obtained by the first principal component creation means and the X axis to 0, and shifts the coordinates of the gene group with low expression level in the XY axis system to the right. This is a coordinate rotation unit that calculates the coordinate rotated by an angle.

また、バイアス判定部 1 0 2 gは、座標回転手段による座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、蛍光強度平衡軸の傾きを計算し、計算された傾きに基づいて 2つの条件の輝度データのうちどちらにバイアスが多く含まれているカを判定するバイアス判定手段である。また、補正プロット生成部 1 0 2 hは、バイアス判定手段にてバイアスが多く含まれていると判定された条件の輝度データからバイアスを差し引くことにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する補正プロット生成手段である。 In addition, the bias determination unit 102 g calculates the inclination of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is small after the rotation of the coordinate axis by the coordinate rotation means, and calculates the slope based on the calculated inclination. This is a bias determination means for determining which of the luminance data under the two conditions contains a large amount of bias. In addition, the correction plot generator 102h subtracts the bias from the luminance data of the condition determined to contain a large amount of bias by the bias determination means, thereby obtaining the fluorescence intensity equilibrium axis and the expression level magnification axis by two. This is a correction plot generation means for constructing a new X-Y axis fluorescence intensity scatter plot as the axis.

再び図 2 2に戻り、遺伝子検出部 1 0 2 cは、バイアス補正手段により構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出する遺伝子検出手段である。 Returning to Fig. 22 again, the gene detection unit 102c detects a fluctuating gene whose expression level fluctuates based on a new X-Y axis fluorescence intensity scatter diagram constructed by the bias correction means. It is a detecting means.

ここで、図 2 4は、遺伝子検出部 1 0 2 cの構成の一例を示すプロック図であり該構成のうち本発明に関係する部分のみを概念的に示している。図 2 4に示すように、遺伝子検出部 1 0 2 cは、機能概念的に、ウィンドウ設定部 1◦ 2 i、信頼限界点決定部 1 0 2 j、ウィンドウ移動部 1 0 2 k、信頼境界線作成部 1 0 2 m、変動遺伝子抽出部 1 0 2 n、遺伝子数入力部 1 0 2 p、信頼限界値入力部 1 0 2 q、および、偏差値処理部 1 0 2 uを備えて構成されている。 Here, FIG. 24 is a block diagram showing an example of the configuration of the gene detection section 102c, and conceptually shows only a portion related to the present invention in the configuration. As shown in Fig. 24, the gene detection unit 102 c is functionally conceptualized as a window setting unit 1 2 i, a confidence limit point determination unit 102 j, a window moving unit 102 k, and a reliability Boundary line creation unit 102 m, variable gene extraction unit 102 n, gene number input unit 102 p, confidence limit input unit 102 q, and deviation value processing unit 102 u It is configured.

図 2 4において、ウィンドウ設定部 1 0 2 iは、蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドウ設定手段である。 In FIG. 24, a window setting unit 102 i is a window setting unit that sets a window within a predetermined section in the direction of the fluorescence intensity equilibrium axis.

また、信頼限界点決定部 1 0 2 jは、ウィンドウ設定手段により設定された各ゥィンドウ内において信頼限界点を決定する信頼限界点決定手段である。 The confidence limit point determining unit 102 j is a confidence limit point determination unit that determines a confidence limit point in each window set by the window setting unit.

また、ウィンドウ移動部 1 0 2 kは、蛍光強度平衡軸方向に一定遺伝子ずつゥィンドウを移動するウィンドゥ移動手段である。 The window moving unit 102 k is a window moving means for moving a window by a certain gene in the direction of the fluorescence intensity equilibrium axis.

また、信頼境界線作成部 1 0 2 mは、ウィンドゥ移動手段により移動した各ウインドウにつレ、て信頼限界点決定手段により各信頼限界点を求め、求めた複数の信頼限界点に基づレ、て信頼境界線を作成する信頼境界線作成手段である。 In addition, the confidence boundary creating unit 102m finds each confidence limit point by means of the confidence limit point determination means for each window moved by the window moving means, and based on the plurality of confidence limit points thus found. Reasonable boundary creation means for creating a confidence boundary.

また、変動遺伝子抽出部 1 0 2 nは、信頼境界線作成手段により作成された信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出手段である。 The fluctuating gene extraction unit 102 n is fluctuating gene extraction means for extracting a gene located outside the reliability boundary created by the reliability boundary creation means as a fluctuating gene whose expression level has fluctuated.

また、遺伝子数入力部 1 0 2 pは、利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力手段である。また、信頼限界値入力部 1 0 2 qは、利用者に信頼限界値を入力させる信頼限界値入力手段である。 The gene number input section 102p is a gene number input means for allowing a user to input the number of genes in the window. The confidence limit value input section 102q is a confidence limit value input means for allowing a user to input a confidence limit value.

また、偏差値処理部 1 0 2 uは、各スポッ卜の偏差値を計算する偏差値計算手段である。 Further, the deviation value processing unit 102 u is a deviation value calculation means for calculating a deviation value of each spot.

再び図 2 2に戻り、シミュレーション部 1 0 2 dは、予め定めた条件に従って、複数回のシミュレーションを実行してシミュレーション結果を条件毎に出力するシミュレーション手段である。 Returning to FIG. 22 again, the simulation unit 102 d is a simulation unit that executes a plurality of simulations according to predetermined conditions and outputs a simulation result for each condition.

ここで、図 2 5は、シミュレーション部 1 0 2 dの構成の一例を示すプロック図であり、該構成のうち本発明に関係する部分のみを概念的に示している。図 2 5に示すように、シミュレーション部 1 0 2 dは、機能概念的に、シミュレーション条件設定部 1 0 2 r、シミュレーション実行部 1 0 2 s、および、シミュレーション結果出力部 1 0 2 tを備えて構成されている。 Here, FIG. 25 is a block diagram showing an example of the configuration of the simulation section 102d, and conceptually shows only a portion related to the present invention in the configuration. As shown in FIG. 25, the simulation unit 102 d is functionally conceptually composed of a simulation condition setting unit 102 r, a simulation execution unit 102 s, and a simulation result output unit 102 t. It is provided with.

図 2 5において、シミュレーション条件設定部 1 0 2 rは、利用者に、遺伝子の分布の標準偏差、変動遺伝子の分布の中心、変動遺伝子の検出基準、および、シミユレーシヨン回数のうち少なくとも一つに関する情報を含むシミュレーション条件を入力させるシミュレーシヨン条件設定手段である。 In FIG. 25, the simulation condition setting unit 102 r provides the user with at least one of the standard deviation of the distribution of genes, the center of the distribution of fluctuating genes, the criteria for detecting fluctuating genes, and the number of simulations. Simulation condition setting means for inputting simulation conditions including information.

また、シミュレーション実行部 1 0 2 sは、シミュレーション条件設定手段にて設定されたシミュレーション条件に従って、同一の遺伝子群に対して同じ分布から繰り返して生成し、遺伝子検出手段を実行し、発現遺伝子を検出するシミュレ一シヨンを複数回実行し、検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成するシミュレーション実行手段である。 In addition, the simulation execution unit 102 s repeatedly generates the same gene group from the same distribution according to the simulation conditions set by the simulation condition setting unit, executes the gene detection unit, and detects the expressed gene. Simulate multiple times, calculate the false positive rate and false negative rate of the results obtained by the detection means, calculate the number of repetitions of the experiment, simulation conditions, and the relationship between detection sensitivity and detection reliability, and express This is a simulation execution means for creating a test statistical table of genes whose amounts change.

また、シミュレーション結果出力部 1 0 2 tは、シミュレーション条件毎に、シミュレ一ション実行手段によるシミュレーション結果を出力するシミュレーション結果出力手段である。 The simulation result output unit 102t is a simulation result output unit that outputs a simulation result by the simulation execution unit for each simulation condition.

なお、これら各部によって行なわれる処理の詳細については、後述する。 [本装置の処理] The details of the processing performed by these units will be described later. [Processing of this device]

次に、このように構成された本実施の形態における本装置の本実施形態の処理の一例について、以下に図 7〜図 1 0、図 1 5〜図 2 8を参照して詳細に説明する。 Next, an example of the processing of the present embodiment of the present embodiment configured as described above will be described in detail with reference to FIGS. 7 to 10 and FIGS. 15 to 28. .

[本装置のメイン処理] [Main processing of this device]

まず、本装置のメイン処理について図 1 5を参照して説明する。図 1 5は本実施形態の本装置のメイン処理の一例を示すフローチヤ一卜である。 First, the main processing of the present apparatus will be described with reference to FIG. FIG. 15 is a flowchart showing an example of a main process of the present apparatus of the present embodiment.

まず、遺伝子発現情報解析装置 1 0 0は、バックダラゥンド補正部 1 0 2 aの処理により、図 1 6を用いて後述するバックグラウンド補正処理を実行する（ステツプ S— 1 ) 。すなわち、ノックグラウンドネ正部 1 0 2 aは、 D NAマイクロアレィゃ D NAチップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を除去することによりバックダラゥンド補正された輝度データを作成する。 First, the gene expression information analyzing apparatus 100 executes a background correction process, which will be described later with reference to FIG. 16, by the processing of the back target correction unit 102a (step S-1). In other words, the positive part 102a of the knock ground pattern is obtained by measuring the background intensity from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under the two conditions using a DNA microarray DNA chip or the like. Then, the luminance data corrected for the back ground is created by removing the luminance.

ついで、遺伝子発現情報解析装置 1 0 0は、バイアス補正部 1 0 2 bの処理により、図 1 7を用いて後述するバイアス補正処理を実行する（ステップ S— 2 ) 。すなわち、バイアス補正部 1 0 2 bは、バックグラウンド補正された輝度データの対数（自然対数または 2の対数等）を X— Y軸にとり蛍光強度散布図（スキヤッタ一プロット）を作成し、各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平衡軸に対するバイアスを求め、輝度データから当該バイアスを除去することにより蛍光強度 ¥衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する。 Next, the gene expression information analyzing apparatus 100 executes a bias correction process described later with reference to FIG. 17 by the process of the bias correction unit 102b (step S-2). That is, the bias correction unit 102b generates a fluorescence intensity scatter diagram (a scatter plot) by taking the logarithm (natural logarithm or logarithm of 2) of the background-corrected luminance data on the XY axis. The bias for the fluorescence intensity equilibrium axis, which shows the same fluorescence intensity for each gene spot, is calculated, and the bias is removed from the luminance data to obtain a new X with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes — Construct a Y-axis fluorescence intensity scatter plot.

ついで、遺伝子発現情報解析装置 1 0 0は、遺伝子検出部 1 0 2 cの処理により図 1 8および図 2 0を用いて後述する移動ウィンドウによる遺伝子検出処理を実行する（ステップ S— 3 ) 。すなわち、遺伝子検出部 1 0 2 cは、構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出する。っレ、で、遺伝子発現情報解析装置 1 0 0は、シミュレーション部 1 0 2 dの処理により、図 1 9および図 2 1等を用いて後述するシミュレーション処理を実行する (ステップ S— 4 ) 。すなわち、シミュレーション部 1 0 2 dは、予め定めた条件に従って、複数回のシミュレーションを実行してシミュレーション結果を条件毎に出力する。 Next, the gene expression information analyzer 100 executes a gene detection process using a moving window, which will be described later with reference to FIGS. 18 and 20, by the process of the gene detection unit 102c (step S-3). That is, the gene detection unit 102c detects a fluctuating gene whose expression level has fluctuated based on the constructed fluorescence scatter diagram of the new XY axis system. Then, the gene expression information analyzer 100 executes a simulation process described later with reference to FIGS. 19 and 21 and the like by the process of the simulation unit 102 d (step S-4). That is, the simulation unit 102 d is based on a predetermined condition. According to, the simulation is executed a plurality of times and the simulation result is output for each condition.

これにて、本装置のメイン処理が終了する。 Thus, the main processing of the present apparatus ends.

レベックグラウンド補正処理] Rebek ground correction process]

次に、バックグラウンド補正処理の詳細について図 16を用いて説明する。図 1 6は本実施形態の本装置のバイアス補正処理の一例を示すフローチヤ一トである。まず、遺伝子発現情報解析装置 100は、バックグラウンド補正部 102 aの処理により、遺伝子の二つの条件で測定された輝度から、平均あるいは局部のバックグラウンド値を求め（ステップ SA— 1) 、このバックグラウンド値を測定値から除去し、この修正の結果を A群、および、 B群とする（ステップ SA—2)。 Next, details of the background correction processing will be described with reference to FIG. FIG. 16 is a flowchart showing an example of the bias correction process of the present apparatus of the present embodiment. First, the gene expression information analyzer 100 calculates the average or local background value from the luminance measured under the two conditions of the gene by the processing of the background correction unit 102a (step SA-1). The background value is removed from the measured values, and the result of this correction is used as group A and group B (step SA-2).

すなわち、バックグラウンド補正部 102 aは、個々のスポットの蛍光強度測定値からブランクのスポットの平均バックグラウンド値、あるいは、各スポットの周囲の領域のバックグラウンド値を、各スポットの蛍光強度測定値から引くことにより、バックグラウンド補正を行う。これにてバックグラウンド補正処理を終了する。レくィァス補正処理] That is, the background correction unit 102a calculates the average background value of the blank spot or the background value of the area surrounding each spot from the measured fluorescence intensity of each spot, and calculates the measured fluorescence intensity of each spot. Perform background correction by subtracting from. This ends the background correction processing. Correction processing]

次に、バイアス補正処理の詳細について、図 17を参照して説明する。図 17は本実施形態の本装置のバイアス補正処理の一例を示すフローチヤ一卜である。まず、バイァス補正部 102 bは、第一主成分作成部 102 eの処理により、 A群、および、 B群に対し、 2を底にした対数を計算し、 Lo g₂A, Lo g₂Bを X， Y軸とした直交軸系にスキヤッタープロットする（ステップ SB— 1)。 Next, details of the bias correction processing will be described with reference to FIG. FIG. 17 is a flowchart showing an example of the bias correction process of the present apparatus of the present embodiment. First, the bias correction unit 102b calculates the base-2 logarithm of the group A and the group B by the processing of the first principal component creation unit 102e, and calculates Log ₂ A, Log g ₂ Perform a scatter plot on the orthogonal axis system with B as the X and Y axes (step SB-1).

次に、バイアス補正部 102 bは、第一主成分作成部 102 eの処理により、積 ABの上位遺伝子集団（例えば、上位 70%までの遺伝子集団）の対数値を用いて、分散 ·共分散行列を用いた主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求める（ステップ SB— 2)。 Next, the bias correction unit 102b uses the logarithmic value of the upper gene group of the product AB (for example, the gene group up to the upper 70%) by the processing of the first principal component creation unit 102e to calculate the variance and covariance. Principal component analysis using a matrix is performed to find the slope and intercept of the asymptote that is the first principal component (step SB-2).

ついで、バイアス補正部 102 bは、座標回転部 102 f の処理により、求めた漸近線と Lo g ₂ A軸の角度を Θとし、積 A Bの下位に属する遺伝子集団（例えば、下位 10%に含まれる遺伝子の集団など）の Lo g ₂A-L o g ₂ B軸系における座標を右に 0角度回転した座標を計算する（ステップ SB— 3) 。 Next, the bias correction unit 102b sets the angle between the asymptote and the Log ₂ A axis obtained by the processing of the coordinate rotation unit 102f to Θ, and the gene population belonging to the lower order of the product AB (for example, seat in Lo g _₂ AL og ₂ B axis system are such groups of genes) Calculate the coordinates of the target rotated right by 0 degrees (step SB-3).

ついで、バイアス補正部 102 bは、バイアス判定部 1 02 gの処理により、座標軸回転後の積 A Bの下位遺伝子集団の座標を用いて、漸近線のィ頃きを計算する（ステップ SB— 4) 。 Next, the bias correction unit 102b calculates the asymptote by using the coordinates of the lower gene group of the product AB after the rotation of the coordinate axis by the processing of the bias determination unit 102g (step SB-). Four) .

ついで、バイアス補正部 102 bは、バイアス判定部 102 gの処理により、漸近線の傾き力;、正数か否か判定する（ステップ SB— 5) 。正数の場合、バイアス半 IJ定部 102 gは、 Aのデータはより多くのバイアスを含んでいると判定する。従つて、バイァス補正部 1 02 bは、バイァス判定部 1 02 gの処理により、 L o g ₂A— L o g₂B軸系にある積 ABの下位遺伝子集団（例えば、下位 10%に含まれる遺伝子の集団など）の座標を用い、 L o g₂B軸のデータを独立変数として、 L o g ₂ Aのデータを従属変数として用いた最小二乗法により、下位遺伝子集団の漸近線と L o g₂A軸との交差点（A_c, 0) の値 A_cを求める（ステップ SB— 6) ついで、バイアス補正部 1 02 bは、補正プロット生成部 1 0 2 hの処理によりバイアスを求め、対照測定値のデータからバイアスを差し引く（ステップ SB— 7 ) 。 Next, the bias correction unit 102b determines whether or not the asymptotic gradient force is a positive number by the processing of the bias determination unit 102g (step SB-5). In the case of a positive number, the bias half IJ constant unit 102g determines that the data of A includes more bias. Accordance connexion, Baiasu correction unit 1 02 b, due Baiasu determining unit 1 02 g of processing, L og ₂ A- lower gene cluster of product AB in L og ₂ B shafting (e.g., Ru contained in the bottom 10% Log ₂ B axis data as an independent variable, and L og ₂ A data as a dependent variable, using the least squares method, the asymptote of the lower gene group and L og ₂ Find the value A _c of the intersection (A _c , 0) with the A axis (step SB-6). Then, the bias correction unit 102 b _calculates the bias by the processing of the correction plot generation unit 102 h, Subtract the bias from the measured data (step SB-7).

一方、ステップ SB— 5において、漸近線の傾きが正数でない場合、バイアス判定部 102 gは、ゼロであるか否か判定する（ステップ SB— 8) 。ゼロの場合、バイァス補正処理を終了する。 On the other hand, when the asymptote has a non-positive slope in step SB-5, the bias determining unit 102g determines whether or not the value is zero (step SB-8). If zero, the bias correction process ends.

また、ステップ SB— 8において、漸近線の傾きがゼロでない場合、バイアス判定部 102 gは、 Bのデータがより多くのバイアスを含んでいると判定する。従つて、バイアス補正部 102 bは、補正プロット生成部 1 02 hの処理により、 L o g₂A—L o g₂B軸系にある積 ABの下位遺伝子集団（例えば、下位 10%に含まれる遺伝子の集団など）の座標を用い、 L o g ₂A軸のデータを独立変数として、 L o g₂Bのデータを従属変数として用いた最小二乗法により、下位遺伝子集団の漸近線と L o g ₂B軸との交差点（0, B_c) の値 B_cを求め（ステップ SB— 9) 上述したステツプ S B— 7の処理を行なう。 If the slope of the asymptote is not zero in step SB-8, the bias determining unit 102g determines that the data of B contains more bias. Accordingly, the bias correction unit 102b performs the processing of the correction plot generation unit 102h to generate the lower gene group of the product AB (for example, included in the lower 10%) in the Log ₂ A—Log ₂ B axis system. Log ₂ A axis data as independent variables and L og ₂ B data as dependent variables using the least-squares method, asymptote of lower gene population and L og ₂ B obtains the value B _c of intersection (0, B _c) of the shaft (step into SB-9) performs the process of step into SB-7 described above.

次に、バイアス補正部 1◦ 2 bは、ネ ft正プロット生成部 102 hの処理により、バイアスを差し引いたデータを用いて、直交軸系 Lo g₂ (A-k) — Lo g₂B軸系あるいは L o g₂A— L o g₂ (B-k) 軸系を構築する（ステップ SB— 10) _t これにてバイァス捕正処理を終了する。 Next, the bias correction unit 1◦2 b, by the processing of the neft positive plot generation unit 102h, Using the data from which the bias has been subtracted, construct an orthogonal axis system Lo g ₂ (Ak) — Log ₂ B axis system or Log ₂ A—Log ₂ (Bk) axis system (Step SB-10) _t To end the bias correction process.

[遺伝子検出処理] [Gene detection processing]

次に、遺伝子検出処理の詳細について、図 18を参照して説明する。図 18は本実施形態の本装置の遺伝子検出処理の一例を示すフローチヤ一トである。 Next, details of the gene detection processing will be described with reference to FIG. FIG. 18 is a flowchart showing an example of the gene detection process of the present apparatus of the present embodiment.

まず、遺伝子発現情報解析装置 100の遺伝子検出部 102 cは、ウィンドウ設定部 102 iの処理により、利用者に対して、図 11を用いて上述したウィンドウ内の遺伝子数、および、信頼限界値である信頼度（P_e値）を設定させるための遺伝子抽出条件設定画面を出力装置 114に出力する（ステップ SC_1) 。 First, the gene detection unit 102c of the gene expression information analysis apparatus 100 provides the user with the number of genes in the window described above with reference to FIG. A gene extraction condition setting screen for setting the reliability (P _e value) is output to the output device 114 (step SC_1).

ここで、図 20は、ウィンドウ設定部 102 iの処理により、出力装置 114に出力される遺伝子抽出条件設定画面の一例を示す図である。図 20に示すように、遺伝子抽出条件設定画面は、ウィンドウ内遺伝子数の入力領域 MA— 1、信頼限界値である信頼度（P_e値）の入力領域 MA— 2、設定終了ボタン MA— 3等を含んで構成される。 Here, FIG. 20 is a diagram showing an example of a gene extraction condition setting screen output to the output device 114 by the processing of the window setting unit 102i. As shown in Fig. 20, the gene extraction condition setting screen has an input area MA-1 for the number of genes in the window, an input area MA-2 for the confidence level (P _e value), which is the confidence limit value, and an MA- 3 setting end button. And so on.

ここで、利用者が、図 20に示す遺伝子抽出条件設定画面を見ながら入力装置 1 12を用いて、入力領域 MA— 1、 MA— 2の各項目の入力を完了した後、設定終了ボタン MA— 3を選択すると、遺伝子数入力部 102 pおよび信頼限界値入力部' 102 qは、遺伝子抽出条件設定画面で設定された情報に基づいて、図 11に示すウィンドゥ内の遺伝子が設定値となるようにウィンドウの大きさを調整する。 Here, the user completes the input of each item of the input areas MA-1 and MA-2 using the input device 112 while watching the gene extraction condition setting screen shown in Fig. 20, and then a setting end button. When MA-3 is selected, the gene number input section 102p and the confidence limit value input section '102q' determine the set values of the genes in the window shown in Fig. 11 based on the information set on the gene extraction condition setting screen. Adjust the size of the window so that

再び図 18に戻り、遺伝子検出部 102 cは、信頼限界点決定部 102〗の処理により、 X軸の最左端から、ウィンドウ内の各点の Y軸（変化倍率）の値を用いて，分散と中心を計算し、信頼限界点である発現量変化が増加の境界値 y_{l imi t+}、減少の倍率の境界値 y , _{imi t}、および、 X軸の重心を求める（ステップ SC— 2)。ついで、遺伝子検出部 102 cは、ウィンドウ移動部 102 kの処理により、 X 軸の蛍光強度が増す方向にウィンドウを一定遺伝子分移動させ、信頼限界点決定部 102 jの処理により、新たなウィンドウでの信頼限界点となる発現量変化倍率の境界値 y , _{i m i t +}と y , _{i m i t}、および、 X軸の重心を求める（ステップ S C— 3 ) ついで、遺伝子検出部 1 0 2 cは、この処理をウィンドウが X軸の最右端になるまで繰り返す（ステップ S C— 4 ) 。 Returning to FIG. 18 again, the gene detection unit 102c calculates the variance from the leftmost end of the X-axis using the value of the Y-axis (multiplication factor) of each point in the window from the processing of the confidence limit point determination unit 102〗 and a central compute, the trust boundary value of limit point at which the expression level change increases y _{l imi t +,} boundary values y of the subtractive low magnification, _{imi t,} and calculates the center of gravity of the X-axis (step SC- 2) . Then, the gene detection unit 102c moves the window by a certain number of genes in the direction in which the X-axis fluorescence intensity increases by the processing of the window moving unit 102k, and the new window is processed by the processing of the confidence limit point determination unit 102j. Of the fold change in expression level _{Find the} boundary values y, _{imit +} and y, _imit , and the center of gravity of the X axis (step SC-3). Then, the gene detector _{102c repeats} this process until the window is at the rightmost end of the X axis. (Step SC-4).

ついで、遺伝子検出部 1 0 2 cは、信頼境界線作成部 1 0 2 mの処理により、全てのウィンドウの発現変化の信頼限界点である発現量変化倍率境界点を 3次スプライン曲線によりつなぎ、発現変動信頼曲線である発現倍率の増加境界線、および、減少境界線を決める（ステップ S C— 5 ) 。 Next, the gene detection unit 102 c calculates the expression amount change magnification boundary point, which is the reliability limit point of the expression change in all windows, by the processing of the confidence boundary line creation unit 102 m using a cubic spline curve. Then, determine the boundary for increasing and decreasing the expression fold, which is the reliability curve of expression fluctuation (Step SC-5).

ついで、遺伝子検出部 1 0 2 cは、変動遺伝子抽出部 1 0 2 nの処理により、発現変動信頼曲線である発現倍率の増加境界線、および、減少境界線で挟んだ領域より外れた遺伝子（変動遺伝子）を抽出することにより、多重検定により発現量が変化した遺伝子を頑健（ロバスト）に検出することができる（ステップ S C— 6 ) 。また、本発明は、各スポットの偏差値を計算することにより、遺伝子検出効率の向上を行ってもよい。以下に、本実施形態の本装置の偏差値を用いた遺伝子検出処理の詳細について、図 2 6および図 2 7を参照して説明する。図 2 6は本実施形態の本装置の偏差値を用いた遺伝子検出処理の一例を示すフローチヤ一トである。まず、禾 ϋ用者がウィンドウ内の遺伝子数および信頼度（P e値）を設定した後（ステップ S E— 1 ) 、偏差値処理部 1 0 2 uは、上述したように蛍光強度平衡軸方向に一定数の遺伝子を含むウィンドウを設定し、各ウィンドウ内全遺伝子の発現量の変化率を表す Y軸の値を用いて、平均値、標準偏差値を求める。次に、偏差値処理部 1 0 2 uは、全遺伝子の X軸の値を用いて重心（蛍光強度の中間値に相当する ) を求める（ステップ S E— 2 ) 。 Next, the processing of the variable gene extraction unit 102n caused the gene detection unit 102c to deviate from the region between the increase and decrease boundaries of the expression fold, which is the expression fluctuation reliability curve. By extracting genes (variable genes), it is possible to robustly detect genes whose expression levels have been changed by multiple tests (Step SC-6). In the present invention, the gene detection efficiency may be improved by calculating the deviation value of each spot. Hereinafter, the details of the gene detection processing using the deviation value of the present apparatus of the present embodiment will be described with reference to FIGS. 26 and 27. FIG. 26 is a flowchart showing an example of a gene detection process using the deviation value of the present apparatus of the present embodiment. First, after the user sets the number of genes in the window and the reliability (P e value) (Step SE-1), the deviation value processing unit 102 u determines the fluorescence intensity equilibrium axis as described above. A window containing a certain number of genes is set in the direction, and the average value and standard deviation value are calculated using the Y-axis value that represents the rate of change in the expression level of all genes in each window. Next, the deviation value processing unit 102 u obtains the center of gravity (corresponding to an intermediate value of the fluorescence intensity) using the values of the X-axis of all the genes (step SE-2).

続レ、て、偏差値処理部 1 0 2 uは、 X軸方向に一定遺伝子ずつウィンドウを移動させ、最右端のウィンドウまで同様の処理を繰り返す（ステップ S E— 3 ) 。 Then, the deviation value processing unit 102 u shifts the window by a constant gene in the X-axis direction, and repeats the same processing until the rightmost window (step S E-3).

ついで、偏差値処理部 1 0 2 uは、求めた複数の（蛍光強度の中間値、平均値）のデータセットを一連の（X , y ) のデータとして平滑化によりネ甫完し（例えば、 3次スプライン曲線を作成）、図 2 7に示す平均値の平滑線とする。また、偏差値処理部 1 0 2 uは、同様に複数の（蛍光強度の中間値、標準偏差 ί直）のデータセットを平滑ィ匕により補完し（例えば、 3次スブラィン曲線を作成）、図 27に示す標準偏差値の平滑線とする（ステップ SE— 4) 。 Next, the deviation value processing unit 102 u performs a smoothing process on the obtained data sets of the (intermediate values and average values of the fluorescence intensities) as a series of (X, y) data (for example, Create a cubic spline curve), and use the average value shown in Fig. 27 as a smooth line. In addition, the deviation value processing unit 102 u similarly stores a plurality of data sets (intermediate values of fluorescence intensity, standard deviations). Are complemented by a smoothing ridge (for example, a cubic submarine curve is created) to obtain a standard deviation smoothed line shown in FIG. 27 (step SE-4).

ついで、偏差値処理部 102 uは、各遺伝子の蛍光強度平衡軸の値（X軸の値）より、それに対応する平均値の平滑線上の Y値、そして標準偏差値の平滑線上の Y 値を用いて、以下の数式により偏差値を計算する（ステップ SE— 5) 。偏差値 = (遺伝子の y値一平滑線から得られた平均値） / Then, the deviation value processing unit 102 u calculates the Y value on the smooth line of the average value and the Y value on the smooth line of the standard deviation value from the value of the fluorescence intensity equilibrium axis (the value of the X axis) of each gene. The deviation value is calculated using the following formula (Step SE-5). Deviation value = (average value obtained from the y value of the gene minus the smoothed line) /

平滑線から得られた標準偏差値 σ このように計算された各スポットの偏差値を変動比率（倍率）の代わりに用いることにより、スライド間の誤差の差異に影響されない解析が可能になる。すなわち、従来各マイクロアレイなどの物理的な誤差、各チップごとに検出する際の人為的な誤差が一定ではないため、チップ間等の比較を行うことが困難であつたが、偏差値を用いることによりチップ間等の比較が容易になる。 Standard deviation σ obtained from the smoothed line σ By using the deviation of each spot calculated in this way instead of the variation ratio (magnification), analysis that is not affected by differences in errors between slides becomes possible. . In other words, it has been difficult to compare between chips, etc., because the physical error of each microarray and the like and the artificial error when detecting each chip were not constant in the past. This facilitates comparison between chips and the like.

また、従来から遺伝子発現パターンの分類や共発現遺伝子の抽出のために階層的クラスタリング（一次元、二次元）、 Κ一 Me a n s法、自己組織化マップ法などを用いたクラスター解析に代表される多変量解析が行われている。例えば、変動比率の対数を用いるものとして、 MB E i s e n, PT S p e 1 1 ma n, PO B r own, D B o t s t e i n (1 998) , " C l u s t e r a n a l y s i s a n d d i s p l a y o f g e n ome— w i d e e x p r e s s i o n p a t t e r n s " , P r o c e e d i n g s o f t h e Na t i o n a l Ac a d emy o f S c i e n c e s, 95 (25) ： 14863- 14868が公知である。また、正規化した変動比率を用いるものとして、 TR Go l u b, DK S l o n i m, P Tama y o, C Hu a r d, M Ca a s e n b e e k, J P Me s i r o v, H C o l l e r, ML L o h, J R Down i n g, MA C a 1 i g i u r i , CD B 1 o om f i e 1 d, E S L a n d e r (1 999) 、 " Mo 1 e c u 1 a r c l a s s i f i c a t i o n o f c a n c e r ： c 1 a s s d i s c o v e r y a n d c l a s s p r e d i c t i o n b y g e n e e x p r e s s i o n mo n i t o r i n g " , S c i e n c e , 28 6 ： 5 3 1 - 5 3 7が公知である。ここで、本方法により計算される偏差値を、クラスタ一解析に代表される多変量解析において変動比率の対数や正規ィ匕した変動比率の代わりに用いることにより、発現量の大小による誤差の影響の違いに左右されない解析が可能になる。 Traditionally, cluster analysis using hierarchical clustering (one-dimensional or two-dimensional), Κ-one Means method, self-organizing map method, etc. has been used to classify gene expression patterns and extract co-expressed genes. Multivariate analysis has been performed. For example, assuming that the logarithm of the variation ratio is used, MB E isen, PT Spe 11 man, PO Brown, DB otstein (1 998), "Clusteranalysisanddisplayofgenome—wideexpressionpatterns", ProceedingsoftheNational Ac ad emy of Sciences, 95 (25): 14863-1868. In addition, assuming that the normalized fluctuation ratio is used, TR Go lub, DK S lonim, P Tama yo, C Huard, M Ca asenbeek, JP Me sirov, HC oller, ML Loh, JR Downing, MA Ca 1 igiuri, CD B 1 o om fie 1 d, ESL ander (1 999), "Mo 1 ecu 1 arclassificationofcancer: c 1 assdiscoveryandclasspredictionbygeneexpressionmonitoring ", Science, 286: 531-15337. Here, the deviation calculated by this method is used for cluster-one analysis. By using instead of the logarithm of the variation ratio or the normalized variation ratio in the representative multivariate analysis, it becomes possible to perform analysis independent of the difference in the influence of errors depending on the expression level.

これにて遺伝子検出処理が終了する。 This ends the gene detection process.

[シミュレーション処理] [Simulation processing]

次に、本発明のシミュレーション処理の詳細について、図 1 9および図 2 1を参照して説明する。図 1 9は本実施形態の本装置の遺伝子検出処理の一例を示すフ口 —チヤ一トである。 Next, details of the simulation processing of the present invention will be described with reference to FIGS. FIG. 19 is a flowchart showing an example of the gene detection process of the present apparatus of the present embodiment.

まず、遺伝子発現情報解析装置 1 00のシミュレーション部 1 02 dは、シミュレーシヨン条件設定部 1 02 rの処理により、利用者に対して、シミュレーションの各種の条件パラメータ（例えば、遺伝子分布の標準偏差（幅）、発現が変動する遺伝子の分布の中心、検出基準（検出数 Z全体数）、および、シミュレーション回数）を設定させるためのシミュレーション条件設定画面を出力装置 1 1 4に出力する（ステップ SD— 1)。 First, the simulation unit 102d of the gene expression information analyzer 100 provides the user with various condition parameters of the simulation (for example, the standard deviation of the gene distribution (eg, the standard deviation of the gene distribution) by the processing of the simulation condition setting unit 102r. Output the simulation condition setting screen for setting the width, the distribution center of the gene whose expression fluctuates, the detection criterion (the number of detections Z as a whole, and the number of simulations) to the output device 114 (step SD—1).

ここで、図 2 1は、シミュレーション条件設定部 1 0 2 rの処理により、出力装置 1 14に出力されるシミュレーション条件設定画面の一例を示す図である。図 2 1に示すように、シミュレーション条件設定画面は、遺伝子分布の標準偏差の入力領域 MB— 1、遺伝子分布の中心の入力領域 MB - 2、検出基準の入力領域 MB― 3、シミュレーション回数の入力領域 MB— 4、設定終了ボタン MB— 5を含んで構成される。 Here, FIG. 21 is a diagram illustrating an example of a simulation condition setting screen output to the output device 114 by the processing of the simulation condition setting unit 102r. As shown in Fig. 21, the simulation condition setting screen shows the input area MB-1 of the standard deviation of the gene distribution, the input area MB-2 at the center of the gene distribution, the input area MB-3 of the detection standard, and the input of the number of simulations. It consists of area MB-4 and setting end button MB-5.

なお、遺伝子の分布の標準偏差は、例えば、発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、中心 μ = 0としたときに、標準偏差 σの幅を 0. 1力ら 1. 5の範囲で設定してもよい。また、変動遺伝子の分布の中心は、例えば、当該条件のときに、中心 μの幅を 0 . 4から 3の範囲で設定してもよい。また、変動遺伝子の検出基準は、例えば、全体数からみた検出された遺伝子の割合を、 2 / 3、 2 / 4、 3 4、 3 / 6 , 4 / 6などで設定してもよい。また、シミュレーシヨン回数は、例えば、 3回から 1 0回の範囲で設定してもよレ、。 The standard deviation of the gene distribution is, for example, the standard deviation σ = 1 and the center μ = 0 as the standard normal distribution of the gene whose expression does not change. May be set in the range of 1.5. Also, the center of the distribution of the fluctuating gene is, for example, Under this condition, the width of the center μ may be set in the range of 0.4 to 3. In addition, the detection criterion for the transgene may be, for example, set to 2/3, 2/4, 34, 3/6, 4/6, etc., based on the total number of detected genes. The number of simulations may be set, for example, in a range of 3 to 10 times.

ここで、利用者が、シミュレーション条件設定画面を見ながら入力装置 1 1 2を用いて、入力領域 MB― 1〜入力領域 MB— 4の各項目の入力を完了した後、設定終了ボタン MB— 5を選択すると、シミュレーション部 1 0 2 dは、シミュレーション実行部 1 0 2 sの処理により、シミュレーション条件設定画面で設定された情報に基づいて、上述したバックグラウンド補正処理、バイアス補正処理、および、遺伝子検出処理を繰り返して実行して、遺伝子検出処理により抽出した発現量が変わる遺伝子集団（変動遺伝子集団）、および、発現量が変動しなかった遺伝子集団 (非変動遺伝子集団）の混合分布のシミュレーション処理を行う（ステップ S D— 2 ) 。 Here, after the user completes the input of each of the input areas MB-1 to MB-4 using the input device 112 while watching the simulation condition setting screen, the setting end button MB-5 When is selected, the simulation unit 102d executes the above-described background correction process and bias correction process based on the information set on the simulation condition setting screen by the processing of the simulation execution unit 102s. , And the gene group whose expression level is changed (variable gene group) and the gene group whose expression level does not change (non-variable gene group) extracted by the gene detection process are repeatedly executed. A simulation process of the mixture distribution is performed (Step SD-2).

ついで、シミュレ一ション部 1 0 2 dは、シミュレーション結果出力部 1 0 2 t の処理により、図 7から図 1 0に示すシミュレーション結果画面用データを出力装置 1 1 4に出力する（ステップ S D— 3 ) 。 Then, the simulation unit 102 d outputs the simulation result screen data shown in FIGS. 7 to 10 to the output unit 114 by the processing of the simulation result output unit 102 t (step SD — 3)

ここで、図 7から図 1 0は、シミュレーションによる第一種の検出エラー（偽陽性）の計算結果の一例を示した図である。混合分布は、上述した六つのシミュレ一シヨン条件で設定したパラメータ（全遺伝子数、発現が変動する遺伝子の割合、遺伝子分布の標準偏差（幅）、発現が変動する遺伝子の分布の中心、検出基準（検出数/全体数）、および、各データセット（ウィンドウ）内の信頼限界値（P— v a 1 u e ) ) に依存する。 Here, FIG. 7 to FIG. 10 are diagrams showing an example of a calculation result of a first-type detection error (false positive) by simulation. The mixture distribution is based on the parameters set under the above six simulation conditions (the total number of genes, the proportion of genes whose expression fluctuates, the standard deviation (width) of the gene distribution, the center of the distribution of genes whose expression fluctuates, It depends on the detection criteria (number of detections / overall number) and the confidence limits (P—va 1 ue) in each dataset (window).

図 7は、発現が変わる遺伝子集団（変動遺伝子集団）の中心 μ 'を ± σ、標準偏差を 1に設定し、検出基準を 3回のうち 2回が検出されるとき（検出基準 = 2 / 3 ) 、第一種の検出エラーの計算結果をグラフ出力した図である。 Figure 7 shows the case where the center μ 'of the gene group whose expression changes (variable gene group) is set to ± σ and the standard deviation is set to 1, and two out of three detection criteria are detected (detection criteria = 2 / 3) is a graph showing a calculation result of a first-type detection error.

一方、図 8は、発現が変わる遺伝子集団（変動遺伝子集団）の中心 μ 'を土。、標準偏差を 1に設定し、検出基準を 4回のうち 3回が検出されたら、発現が変わつたとする場合を示した図である。 On the other hand, Fig. 8 shows the soil at the center μ 'of the gene group whose expression changes (variable gene group). , The standard deviation is set to 1, and the expression changes when 3 out of 4 detection criteria are detected. It is a figure showing the case where it is assumed.

これら 2つの図の比較により、 α (第一種の検出エラー）は各ウィンドウ内で検出する Ρ_ε値に大きく依存することがわかる。尚、図の横軸は発現が変動した遺伝子集団が全遺伝子を占める割合を表している。 These Comparison of the two figures, alpha (first type of detection error) is seen to be highly dependent on [rho _epsilon values detect within each window. The horizontal axis in the figure represents the proportion of the gene group whose expression fluctuated occupies all genes.

すなわち、発現が変わらない遺伝子の分布を標準正規分布（標準偏差 σ = 1、中心 μ = 0) として発生し、一方、発現が変わる遺伝子の分布は標準正規分布の左か右に 50 %の確率で発生する。 That is, the distribution of genes whose expression does not change is generated as a standard normal distribution (standard deviation σ = 1, center μ = 0), while the distribution of genes whose expression changes is 50% to the left or right of the standard normal distribution. Occurs with probability.

ただし、混合分布は合計六つのパラメータ（すべての遺伝子の数、発現が変わる遺伝子が全体に占める割合、発現が変わる遺伝子の分布の標準偏差と中心、そして検出の基準および各データセット内の信頼限界）に依存する。 However, the mixture distribution has a total of six parameters (the number of all genes, the percentage of genes whose expression changes, the standard deviation and center of the distribution of genes whose expression changes, the detection criteria, and the confidence limits within each dataset. )

- また、多重検定の第一種のエラー α、すなわち、発現が変わらない遺伝子が変わる遺伝子として検出されたエラーのみを示し、またすベての結果はパラメータを固定した後、十回の計算結果の平均を表している。また、発現が変わる遺伝子集団の中心 ' =±σ、標準偏差 =1のとき、（a) 検出基準： 3回のうち 2回が検出されたら、発現が変わったとする場合（図 7の場合）と、（b) 検出基準： 4回のうち 3回が検出されたら、発現が変わったとする場合（図 8の場合）との比較により αは各ウィンドウ内で検出する P e値に大きく依存することがわかる。 -In addition, only the first type of error α of the multiple test, that is, the error detected as a gene whose expression does not change is shown as a changing gene, and all the results are 10 times after fixing the parameters. Shows the average of the calculation results. In addition, when the center of the gene population whose expression changes' = ± σ, standard deviation = 1, (a) Detection criteria: If expression is changed when two out of three detections are detected (Fig. 7 ) And (b) Detection criteria: If three out of four detections were detected, the expression would change (in the case of Fig. 8), and α was larger than the Pe value detected in each window. It turns out that it depends.

さらに、図 9、および、図 10は、発現が変わる遺伝子集団（変動遺伝子集団）の標準偏差を 1とした場合を示した図である。図 9では、 P_e=0. 1 5となり、図 10では、 P_c = 0. 25となる。従って 95%の信頼度を得るためには、検定基準を 3回中 2回とする場合は、データセット内の信頼限界 P_eを 0. 1 5以下に設定すればよく、一方、検定基準を 4回中 3回とする場合は、データセット内の信頼限界 P_eを 0. 25以下に設定すればよいことがわかる。尚、図の横軸は、発現が変わる遺伝子集団の中心と発現が変わらない遺伝子集団の標準偏差とを積算した数値を表し、図中の「TNum」は全遺伝子数、「d i ί— x%」は発現が変わる遺伝子集団が占める割合、そして、「2/3」、および、「3Z4」は検出基準を意味する。これにて、シミュレーション処理を終了する。 Further, FIGS. 9 and 10 are diagrams showing the case where the standard deviation of the gene group whose expression changes (variable gene group) is set to 1. FIG. In FIG. 9, P _e = 0.15, and in FIG. 10, P _c = 0.25. Thus in order to obtain a 95% confidence level, when the two three times in assay criteria may be set confidence limits P _e in the dataset 0.5 to 1 5 or less, whereas, the test reference If the four 3 times, it can be seen that the confidence limits P _e of the data set may be set to 0.25 or less. The horizontal axis of the figure represents the value obtained by integrating the center of the gene group whose expression changes and the standard deviation of the gene group whose expression does not change. In the figure, “TNum” is the total number of genes and “di di—x% "" Means the percentage of the gene population whose expression is changed, and "2/3" and "3Z4" mean the detection criteria. This ends the simulation processing.

[他の実施の形態] [Other embodiments]

さて、これまで本発明の実施の形態について説明したが、本発明は、上述した実施の形態以外にも、上記特許請求の範囲に記載した技術的思想の範囲内において種々の異なる実施の形態にて実施されてよいものである。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, but may be implemented in various different forms within the scope of the technical idea described in the claims. It may be implemented in a form.

また、実施形態において説明した各処理のうち、自動的に行なわれるものとして説明した処理の全部、または、一部を手動的に行なうこともでき、あるいは、手動的に行なわれるものとして説明した処理の全部または一部を公知の方法で自動的に行なうこともできる。 Further, of the processes described in the embodiment, all or some of the processes described as being automatically performed may be manually performed, or the processes described as being performed manually may be performed. All or a part of the method can be automatically performed by a known method.

この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、シミュレ一シヨン条件等のパラメータを含む情報、画面例については、特記する場合を除いて任意に変更することができる。 In addition, the processing procedures, control procedures, specific names, information including parameters such as simulation conditions, and screen examples shown in the above documents and drawings, and screen examples may be arbitrarily changed unless otherwise specified. Can be.

また、シミュレ一ション部 1 0 2 dは、ガンマ分布などの他の分布と混合分布シミュレーシヨンをすることにより、上述した信頼度（P _e値）、第一種、第二種の検出エラー等を求めてもよい。上述した実施形態においては、変動しない遺伝子の分布と変動する遺伝子の分布が正規分布となる場合を一例として説明したが、例えば、変動する遺伝子の分布は正規分布以外の分布（たとえばガンマ分布）で発生させてもよく、本発明をあらゆる分布をとる遺伝子集団に適用することが可能である _c また、上述した本装置のバイアス判定部 1 0 2 gによるバイアス判定処理は、軸回転後にバイアスの大小を判定するものに限定されず、例えば、図 2 8に示すように、軸回転の前に高発現漸近線の傾き aと低発現漸近線の傾き bとを比較することにより、バイアスの大小を判定してもよい。また、本処理において座標の回転は必要条件ではない。 The simulation section 102 d simulates a mixture distribution with another distribution such as a gamma distribution to obtain the above-described reliability (P _e value), the first type and the second type of detection error, and the like. May be required. In the above-described embodiment, the case where the distribution of the unchanging gene and the distribution of the fluctuating gene are a normal distribution has been described as an example. For example, the distribution of the fluctuating gene is a distribution other than the normal distribution (eg, a gamma distribution). in may be generated, _c also it is possible to apply the present invention to gene group take any distribution, bias determination process by the bias determining unit 1 0 2 g of the apparatus described above, the bias after axial rotation For example, as shown in Fig. 28, by comparing the slope a of the high expression asymptote with the slope b of the low expression asymptote before the rotation of the axis, as shown in Fig. 28, the bias The magnitude may be determined. In this process, the rotation of the coordinates is not a necessary condition.

また、遺伝子発現情報解析装置 1 0 0に関して、図示の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。 Regarding the gene expression information analyzer 100, the components shown in the drawings are functionally conceptual, and need not necessarily be physically configured as shown in the drawings.

例えば、遺伝子発現情報解析装置 1 0 0の各部が備える処理機能、特に制御部 1 0 2にて行なわれる各処理機能については、その全部、または、任意の一部を、 C PU (Ce n t r a l P r o c e s s i n g Un i t) , および、当該 CPU にて解釈実行されるプログラムにて実現することができ、あるいは、ワイヤードロジックによるハードウェアとして実現することも可能である。尚、プログラムは、後述する記録媒体に記録されており、必要に応じて遺伝子発現情報解析装置 100 に機械的に読み取られる。 For example, with respect to the processing functions provided in each unit of the gene expression information analysis apparatus 100, in particular, each processing function performed by the control unit 102, all or any part of the processing functions is represented by C It can be implemented by PU (Central Processing Unit) and a program interpreted and executed by the CPU, or it can be implemented as hardware by wire logic. The program is recorded on a recording medium described later, and is mechanically read by the gene expression information analyzer 100 as necessary.

また、遺伝子発現情報解析装置 100は、既知のパーソナノレコンピュータ、ヮークステーション等の情報処理端末等のコンピュータ（情報処理装置）にプリンタ、モニタ、イメージスキャナ等の周辺装置を接続し、該情報処理装置に本発明の方法を実現させるソフトウェア（プログラム、データ等を含む）を実装することにより実現してもよい。 Further, the gene expression information analyzer 100 connects peripheral devices such as a printer, monitor, and image scanner to a computer (information processing device) such as an information processing terminal such as a known personal computer or a work station. This may be realized by mounting software (including programs, data, and the like) for realizing the method of the present invention on the device.

さらに、遺伝子発現情報解析装置 100の分散 ·統合の具体的形態は図示のものに限られず、その全部、または、一部を、各種の負荷等に応じた任意の単位で、機能的または物理的に分散 ·統合して構成することができる。例えば、各データべ一スを独立したデータベース装置として独立に構成してもよく、また、処理の一部を CG I (C ommo n Ga t ewa y I n t e r f a c e) を用いて実現してちょい。 Furthermore, the specific form of dispersion / integration of the gene expression information analysis device 100 is not limited to the illustrated one, and all or a part thereof may be functionally or physically integrated in an arbitrary unit corresponding to various loads. Distributed and integrated. For example, each database may be configured independently as an independent database device, and a part of the processing may be realized by using CGI (Common Gateway Technology Int e rfa ace).

また、本発明にかかるプログラムを、コンピュータ読み取り可能な記録媒体に格納することもできる。ここで、この「記録媒体」は、フレキシブルディスク、光磁気ディスク、 ROM、 E PROM, EE PROM, CD-ROM, MO、 DVD等の任意の「可搬用の物理媒体」や、各種コンピュータシステムに内蔵される ROM、 RAM, HD等の任意の「固定用の物理媒体」、あるいは、 LAN、 WAN, インターネッ卜に代表されるネットワークを介してプログラムを送信する場合の通信回線や搬送波のように、短期にプログラムを保持する「通信媒体」を含むものとする。また、「プログラム」は、任意の言語や記述方法にて記述されたデータ処理方法であり、ソースコードやバイナリコード等の形式を問わない。尚、「プログラム」は必ずしも単一的に構成されるものに限られず、複数のモジュールやライプラリーとして分散構成されるものや、 OS (Op e r a t i n g Sy s t em) に代表される別個のプログラムと協働してその機能を達成するものをも含む。尚、実施の形態に示した各装置において記録媒体を読み取るための具体的な構成、読み取り手順、あるいは、読み取り後のインストール手順等については、周知の構成や手順を用いることができる。 Further, the program according to the present invention can be stored in a computer-readable recording medium. Here, this “recording medium” refers to any “portable physical medium” such as a flexible disk, magneto-optical disk, ROM, EPROM, EE PROM, CD-ROM, MO, DVD, etc., and various computer systems. Such as a communication line or carrier wave when transmitting a program via an arbitrary "fixed physical medium" such as built-in ROM, RAM, HD, etc., or a network represented by LAN, WAN, Internet And “communications media” that hold programs for a short period of time. A “program” is a data processing method described in an arbitrary language or description method, regardless of the format of source code or binary code. Note that “programs” are not necessarily limited to a single program, but are typically distributed as multiple modules and libraries, and are typically represented by an operating system (OS). Including those that achieve their functions in coordination with separate programs. It should be noted that a known configuration or procedure can be used for a specific configuration for reading a recording medium in each device described in the embodiment, a reading procedure, an installation procedure after reading, and the like.

また、遺伝子発現情報角军析装置 1 0 0がスタンドアローンの形態で処理を行う場合を一例に説明したが、遺伝子発現情報解析装置 1 0 0とは別筐体で構成されるクライアント端末からネットワークを介して送信される要求に応じて処理を行い、その処理結果を当該クライアント端末に返却するように構成してもよレ、。 Also, the case where the gene expression information analyzing apparatus 100 performs the processing in a stand-alone form has been described as an example, but the client terminal configured in a separate housing from the gene expression information analyzing apparatus 100 has been described. , Processing may be performed in response to a request transmitted from the client via the network, and the processing result may be returned to the client terminal.

ここで、ネットワークは、遺伝子発現情報解析装置 1 0 0と外部のクライアン卜装置とを相互に接続する機能を有し、例えば、インタ一ネットや、イントラネットや、 L AN (有線/無線の双方を含む）や、 VANや、パソコン通信網や、公衆電話網（アナログ/デジタルの双方を含む）や、専用回線網（アナログ /デジタルの双方を含む）や、 C AT V網や、 I MT 2 0 0 0方式、 G S M方式、または、 P D CZP D C— P方式等の携帯回線交換網/携帯パケット交換網、無線呼出網、 B 1 u e t o o t h等の局所無線網、 P H S網、 C S、 B Sまたは I S D B等の衛星通信網等のうちいずれかを含んでもよい。すなわち、本装置は、有線 ·無線を問わず任意のネットワークを介して、各種データを送受信することができる。 Here, the network has a function of interconnecting the gene expression information analyzer 100 and an external client device. For example, the network includes an Internet, an intranet, and a LAN (both wired and wireless). , VAN, PC communication network, public telephone network (including both analog and digital), leased line network (including both analog and digital), CATV network, IMT2 0 0 0 system, GSM system, PD CZP DC-P system, etc., mobile line switching network / mobile packet switching network, paging network, local wireless network such as B 1 uetooth, PHS network, CS, BS or ISDB etc. Or any of the above satellite communication networks. That is, the present device can transmit and receive various data via any network, whether wired or wireless.

以上詳細に説明したように、本発明によれば、 D NAマイクロアレイや D N Aチップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度デ一タ力らノくックグラウンド値を除去することによりバックグラゥンド補正された輝度データを作成することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。また、本発明によれば、バックグラウンド補正された輝度データの対数（自然対数または 2の対数等）を X— Y軸にとり蛍光強度散布図（スキヤッタープロット）を作成し、各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平渙 ΐ軸に対するバイアスを求め、輝度データから当該バイアスを除去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Υ軸系の蛍光強度散布図を構築するので、より多くのバイアスを含む蛍光成分の判定を行い、このバイアスを除去した上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新しい直行軸系を構築することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 As described in detail above, according to the present invention, the measured luminance data of each spot obtained by measuring the fluorescence intensity indicating the expression level of the same gene under two conditions using a DNA microarray, a DNA chip, or the like. It is possible to provide a gene expression information analysis apparatus, a gene expression information analysis method, a program, and a recording medium that can create background-corrected luminance data by removing a force background value. . According to the present invention, the logarithm (natural logarithm or logarithm 2) of the background-corrected luminance data is plotted on the XY axis to create a fluorescence intensity scatter diagram (Scatter plot), and spots of each gene are generated. The bias for the fluorescence intensity 強度 axis, which shows the same fluorescence intensity, is obtained for the axis, and the bias is removed from the luminance data to remove the bias from the luminance data. Construct a fluorescence intensity scatter plot of the axis system Therefore, a fluorescent component containing more bias is determined, and after removing this bias, a gene capable of constructing a new orthogonal axis system having two axes of the fluorescence intensity equilibrium axis and the multiple axis of the expression level An expression information analyzer, a gene expression information analysis method, a program, and a recording medium can be provided.

また、本発明によれば、構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出するので、従来の遺伝子検出法に比べて、測定装置、標本間の誤差、および、蛍光標識効率などの違いの影響を受けずに正確に発現量が変動した遺伝子を検出することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 In addition, according to the present invention, a fluctuating gene whose expression level fluctuates is detected based on a constructed fluorescence scatter plot of a new XY axis system. A gene expression information analysis apparatus, a gene expression information analysis method, a program, and the like, which can accurately detect a gene whose expression level has fluctuated without being affected by errors between samples and differences in fluorescence labeling efficiency and the like. A recording medium can be provided.

また、本発明によれば、 D N A濃度希釈系列の品質管理用のコント口一ノレ遺伝子サンプル（例えば外部遺伝子え D NAサンプル、あるいは発現量がほとんど変わらないリボソームなどの H o u s e - k e e p i n g遺伝子サンプル）を目的遺伝子サンプルと同時に測定し、蛍光強度データの積の一番小さい遺伝子から順に一つずっコントロール遺伝子を除き、残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と D N A量の検量線をそれぞれ作成し、データの相関係数を計算し、順番に計算される上記の相関係数が最初に強い相関が認められる基準（例えば 0 . 8以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 1とし、二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子サンプノレの集団を発現量が多い遺伝子集団とし、上記発現量力 S順番に計算される相関係数度が最初に弱い相関が認められる基準（例えば 0 . 5 以上）を満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 2とし（ただし、閾値 2 <閾値 1 ) 、二つの条件における蛍光強度デ一タの積が閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少なレ、遺伝子集団とし、発現量が多い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、第一主成分となる漸近線の傾きと切片を求め、求めた漸近線と X軸との角度を Θとし、発現量が少ない遺伝子集団の X— Y軸系における座標を右に Θ角度回転した座標を計算し、座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、蛍光強度平衡軸の傾きを計算し、計算された傾き（例えば、正、負、ゼロ等）に基づいて 2 つの条件の輝度データのうちどちらにバイアスが多く含まれているかを判定し、バィァスが多く含まれていると判定された条件の輝度データからバイアスを差し引くこと（例えば、一定のバイアスをもつ遺伝子集団について座標を回転させる等）により蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、実測値のバイアスを効率的に除去し、かつ、データの性質を明白に表現できる蛍光強度散布図を作成することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。また、本発明によれば、主成分分析は、分散 ·共分散行列を用いて行うので、従来から発現遺伝子解析に用レ、られている相関行列を用レ、た主成分分析法と比較して正規化を要しないため、効率的に主成分分析を行うことができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 In addition, according to the present invention, a control gene sample for quality control of a DNA concentration dilution series (for example, an external gene or a DNA sample, or a house-keeping gene sample such as a ribosome whose expression level hardly changes) is used. Measure simultaneously with the target gene sample, remove all control genes in order from the gene with the smallest product of the fluorescence intensity data, and calibrate the gene expression level and DNA amount from the data of all remaining control gene samples And calculate the correlation coefficient of the data, and calculate the correlation coefficient of the control sample when the above-mentioned correlation coefficient, which is calculated in order, first satisfies the criteria for strong correlation (for example, 0.8 or more). The product of the fluorescence intensity data under the two conditions is defined as the threshold 1, and the product of the fluorescence intensity data under the two conditions exceeds the threshold 1. The gene sampnore population is defined as a gene group with a high expression level, and the control is performed when the correlation coefficient degree calculated in the order of the expression level satisfies a criterion (for example, 0.5 or more) at which a weak correlation is first recognized. The product of the fluorescence intensity data under the two conditions of the sample is defined as threshold 2 (threshold 2 <threshold 1), and the product of all the gene samples whose product of the fluorescence intensity data under the two conditions is less than threshold 2 is expressed. Assuming that the gene population is small, the principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene population with high expression level, and the slope and intercept of the asymptote as the first principal component are calculated. The angle with respect to the X axis is defined as 、, and the coordinates of the gene group with low expression level in the X-Y axis system are rotated to the right by Θ angle, and the coordinates of the gene group with low expression level after rotation of the coordinate axis are calculated. Using the fluorescence intensity The slope of the equilibrium axis is calculated, and based on the calculated slope (eg, positive, negative, zero, etc.), it is determined which of the two conditions of the luminance data contains more bias. By subtracting the bias from the luminance data of the condition determined to be included (for example, rotating the coordinates for a gene population having a constant bias), the fluorescence intensity equilibrium axis and the expression level magnification axis are set to two axes. A new X-Y axis fluorescence intensity scatter plot that efficiently removes bias from measured values and creates a fluorescence intensity scatter plot that can clearly express the nature of data A gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium can be provided. In addition, according to the present invention, since the principal component analysis is performed using a variance / covariance matrix, it is compared with a principal component analysis method using a correlation matrix which has been conventionally used for analysis of expressed genes. As a result, since normalization is not required, it is possible to provide a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can efficiently perform principal component analysis.

また、本発明によれば、予め定めた区間内のウィンドウを設定し、設定された各ウィンドウ内において遺伝子の輝度データの平均値、標準偏差、 P値（例えば、 9 Further, according to the present invention, a window within a predetermined section is set, and within each set window, the average value, standard deviation, and P value (eg, 9

5 %値）、重心などのうち少なくとも一つを用いて信頼限界点を決定する。そして、蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動し、移動した各ウィンドウにつレ、て各信頼限界点を求め、求めた複数の信頼限界点に基づレ、て信頼境界線を作成する信頼境界線作成手段と、上記信頼境界線作成手段により作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、安定性、再現性、および、信頼度の高い発現遺伝子抽出を行うことができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 5% value), determine the confidence limit using at least one of the center of gravity. Then, the window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, and each of the moved windows is determined for each of the confidence limit points, and a confidence boundary is created based on the obtained plurality of confidence limit points. And extracting the genes located outside of the trust boundary created by the trust boundary creation means as fluctuating genes whose expression levels fluctuate, so that the stability, reproducibility, and It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can perform highly reliable expression gene extraction.

また、本発明によれば、誤差の範囲が異なる実験データであっても、その誤差に応じて、発現量変動倍率の閾値が決められる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 Further, according to the present invention, even when experimental data have different error ranges, a gene expression information analysis apparatus, a gene expression information analysis method, a program, A recording medium can be provided.

また、本発明によれば、シミュレーションにより得られた重複データの検定統計表に基づき、 t一分布を用いて信頼限界点を決定するので、従来手法と比較して正確かつ効率的に信頼限界点を求めることができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。また、本発明によれば、複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化を行レ、信頼境界線を作成するので、効率的に信頼限界点を補完して信頼曲線を作成することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 Further, according to the present invention, a test statistic of duplicate data obtained by simulation is provided. Since the confidence limit is determined using the t-distribution based on the table, a gene expression information analysis device, a gene expression information analysis method, A program and a recording medium can be provided. In addition, according to the present invention, smoothing is performed by creating a spline curve based on a plurality of confidence limit points, and a confidence boundary is created, so that the confidence curve is efficiently complemented and the confidence curve is complemented. A gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can be created can be provided.

また、本発明によれば、蛍光強度の高い領域については、最後のウィンドウ（最も右側にあるウィンドウ）で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成するので、傾きが少なくどちらに収束するか判断不能の場合であつても、適切な信頼限界線を作成することができる遺伝子発現情報角析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。また、本発明によれば、蛍光強度の低い領域については、例えば、最初から数十程度の各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いるので、蛍光強度が低い遺伝子のスポットについても的確に検出することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 Further, according to the present invention, for a region having a high fluorescence intensity, a confidence limit line is created using a horizontal extension line to the X axis of the confidence limit point obtained in the last window (the window on the rightmost side). Even if the slope is so small that it is not possible to determine which one to converge on, a gene expression information analyzing apparatus, a gene expression information analysis method, a program, and a recording medium capable of creating an appropriate confidence limit line are provided. Can be provided. Further, according to the present invention, for a region having a low fluorescence intensity, for example, the extrapolation of an asymptote obtained by the least square method from the reliability limit points obtained in several tens of windows from the beginning is used as the reliability limit line. Since it is used, it is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can accurately detect even a spot of a gene with low fluorescence intensity.

また、本発明によれば、利用者にウィンドウ内の遺伝子数を入力させ、入力された遺伝子数の遺伝子が含まれる区間内でウィンドウを設定するので、実験毎に利用者が設定する遺伝子数を変動させることができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。また、本発明によれば、禾者に信頼限界値を入力させ、ウィンドウ内において入力された信頼限界値に基づいて信頼限界点を決定するので、実験毎に利用者が設定する信頼限界値を変動させることができ、各実験の誤差を適切な範囲に収めることができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 Further, according to the present invention, the user is required to input the number of genes in the window, and the window is set in a section including the genes of the input number of genes. Therefore, the number of genes set by the user for each experiment is determined. A gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can be varied can be provided. Further, according to the present invention, the confidence limit value is determined by the user based on the confidence limit value input in the window, and the confidence limit value set by the user for each experiment. Thus, it is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium capable of varying the experiment and keeping the error of each experiment within an appropriate range.

また、本発明によれば、利用者に、変動しない遺伝子の分布の形（例えば、分布の標準偏差（例えば、発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、中心 μ = 0としたときに、標準偏差 σの幅を 0 . 1から 1 . 5の範囲で設定する））、上記変動遺伝子の分布の形（例えば、中心（例えば、当該条件のときに、中心 μの幅を 0 . 4から 3の範囲で設定する））、上記変動遺伝子の検出基準（例えば、全体数からみた検出された遺伝子の割合を、 2 / 3、 2 / 4 , 3 / 4 , 3 / 6、 4 / 6などで設定する）、実験の繰り返し数、および、シミュレーション回数（例えば、 3回から 1 0回の範囲で設定する）のうち少なくとも一つに関する情報を含むシミュレーシヨン条件を入力させ、設定されたシミュレ一ション条件に従って、同一の遺伝子群に対して同じ分布から繰り返して生成し、遺伝子検出を実行し、発現遺伝子を検出するシミュレーションを複数回実行し、上記検出手段による結果の偽陽性率と偽陰性率を計算し、実験の繰り返し数、シミュレーション条件、および検出感度と検出信頼度との関係を計算し、発現量が変わる遺伝子の検定統計表を作成し、シミュレーション条件毎に、シミュレーション実行によるシミュレ一ション結果を出力するので、様々な条件におけるシミュレーション結果を組み合わせることにより上記の組み合わせによる検出力と検出信頼度を知ることができる。すなわち、同じ条件の対照実験を繰り返して行い、得られたそれぞれ異なったデータセットに対して変動遺伝子の検出を行い、あらかじめ決めた回数以上検出される遺伝子のみを選択することにより、期待通りの信頼度あるいは検出力で変動遺伝子を検出できる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 In addition, according to the present invention, the user is provided with a form of distribution of the gene that does not fluctuate (for example, distribution (For example, if the distribution of genes whose expression does not change is the standard normal distribution and the standard deviation σ = 1 and the center μ = 0, the width of the standard deviation σ is in the range of 0.1 to 1.5. Set)), the distribution form of the above-mentioned fluctuating gene (for example, the center (for example, the width of the center μ is set in the range of 0.4 to 3 under the conditions)), Quasi (for example, set the ratio of the detected genes based on the total number to 2/3, 2/4, 3/4, 3/6, 4/6, etc.), the number of repetitions of the experiment, and the number of simulations (For example, set in the range of 3 to 10 times), input simulation conditions including information on at least one of them, and distribute the same gene group according to the set simulation conditions. Gene generation, and gene detection The simulation for detecting the expressed gene is executed multiple times, and the false positive rate and false negative rate of the results obtained by the above detection means are calculated, and the number of repetitions of the experiment, the simulation conditions, and the detection sensitivity and detection reliability are determined. Calculate the relationship between the two, and create a test statistical table for the genes whose expression levels change, and output the simulation results by executing the simulation for each simulation condition.These combinations can be performed by combining the simulation results under various conditions. The detection power and the detection reliability can be known. In other words, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only genes that are detected more than a predetermined number of times, reliability as expected is obtained. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can detect a fluctuating gene with degree or power.

また、本発明によれば、発現量が変わらない遺伝子が変動遺伝子として検出されたエラー（第一種のエラー）や、変動遺伝子が発現が変わらない遺伝子として検出されたエラ一（第二種のエラー）を算出して比較することにより、シミュレーションのデ一タから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、実際の実験データに対して、期待される検出力と信頼度を得るために、実験の繰り返し数と変動遺伝子の検出基準、および信頼限界点の組み合わせを設定することができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 Further, according to the present invention, an error in which a gene whose expression level does not change is detected as a fluctuating gene (a first type of error) or an error in which a fluctuating gene is detected as a gene whose expression does not change (a second type of error) Error), the power and reliability of detecting the fluctuating gene by the above method can be grasped from the simulation data, and the expected power is compared with the actual experimental data. Gene expression information analyzer, gene expression information analysis method, program, and program that can set a combination of the number of repetitions of the experiment, the detection criteria of the fluctuating gene, and the confidence limit point in order to obtain reliability and reliability A recording medium can be provided.

また、本発明によれば、シミュレ一ションにより得られた重複データの検定統計表に基づき、何回実験を行えば、正確な実験データを取ることができるかを予測することが可能になり、実験効率を著レく向上させることができる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 Further, according to the present invention, it is possible to predict how many experiments should be performed to obtain accurate experimental data based on a test statistical table of duplicate data obtained by simulation. The present invention can provide a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can significantly improve the experimental efficiency.

また、本発明によれば、各スポットの偏差値を計算するので、このように計算された各スポットの偏差値を変動比率 (倍率）の代わりに用いることで、スライド間の誤差の差異に影響されない解析が可能になる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。 In addition, according to the present invention, since the deviation value of each spot is calculated, the deviation value of each spot calculated in this way is used instead of the variation ratio (magnification), so that the difference in error between the slides is obtained. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that enable analysis that is not affected by the information.

さらに、本発明によれば、本装置により計算される偏差値を、クラスター解析に代表される多変量解析において変動比率の対数や正規化した変動比率の変わりに用いることができ、発現量の大小による誤差の影響の違いに左右されない解析が可能になる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体を提供することができる。産業上の利用可能性 Further, according to the present invention, the deviation value calculated by the present apparatus can be used in place of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by a cluster analysis. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that enable analysis that is not affected by differences in the effects of errors depending on the magnitude. Industrial applicability

以上のように、本発明にかかる遺伝子発現情報解析装置、遺伝子発現情報解析方法、プログラム、および、記録媒体は、 D NAマイクロアレイや D NAチップなどの測定値データの解析を行うバイオインフォマテイクス分野において極めて有用である。 As described above, the gene expression information analysis apparatus, gene expression information analysis method, program, and recording medium according to the present invention are bioinformatics for analyzing measured value data such as DNA microarrays and DNA chips. Very useful in the field.

本発明は、産業上多くの分野、特に医薬品、食品、化粧品、医療、遺伝子発現解析等の分野で広く実施することができ、極めて有用である。 INDUSTRIAL APPLICABILITY The present invention can be widely implemented in many industrial fields, particularly in the fields of pharmaceuticals, foods, cosmetics, medical treatment, gene expression analysis, and the like, and is extremely useful.

Claims

The scope of the claims

1. Background correction means that creates background-corrected luminance data by removing the background value from the measured luminance data of each spot that measures the fluorescence intensity indicating the expression level of the same gene under the two conditions. When,

The logarithm of the luminance data subjected to the background correction by the background correction means is plotted on the XY axis to create a fluorescence intensity scatterplot, and a bias for the fluorescence intensity equilibrium axis is obtained for each gene spot. Bias correction means for constructing a new XY axis 0 system fluorescence intensity scatter diagram having two axes, the fluorescence intensity equilibrium axis and the expression amount magnification axis, by removing the bias;

Gene detection means for detecting a fluctuating gene whose expression level fluctuates based on a new XY-axis fluorescence intensity scatter diagram constructed by the bias correction means,

A gene expression information analysis device comprising:

2. The bias correction means is

(5) a first principal component creating means for performing principal component analysis using a logarithmic value of a gene group having a large amount of expression to obtain a slope and an intercept of an asymptote as a first principal component;

The angle between the asymptote and the X-axis obtained by the first principal component creating means is defined as 、, and the coordinate rotation of the gene population with low expression level in the X-Y axis system is calculated by rotating the coordinate Θ to the right by Θ. Means,

0 The inclination of the fluorescence intensity equilibrium axis is calculated using the coordinates of the low-expression gene group after the rotation of the coordinate axis by the coordinate rotation means, and the luminance data under the two conditions is calculated based on the calculated inclination. Bias determination means for determining which of the above includes the bias more;

By subtracting the bias from the brightness data of the condition 5 in which the bias determination means determined that the bias was contained in a large amount, the fluorescence intensity flat axis and the expression level magnification axis were set to two axes. A correction plot generating means for constructing a new X—Υ-axis fluorescence intensity scatter plot; The gene expression information analysis according to claim 1, further comprising:

3. The principal component analysis is performed using a variance / covariance matrix.

3. The apparatus for analyzing gene expression information according to claim 2, wherein:

4. The gene detection means is as follows:

Window setting means for setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction,

A confidence limit point determining means for determining a confidence limit point within each window set by the window setting means;

Window moving means for moving the window by a given gene in the fluorescence intensity equilibrium axis direction;

For each window moved by the window moving means, each confidence limit point is determined by the confidence limit point determination means, and a confidence boundary line creation means is created based on the obtained plurality of confidence limit points. When,

A variable gene extracting means for extracting a gene located outside the trust boundary created by the trust boundary creating means as a variable gene having a variable expression level;

The gene expression information analyzer according to any one of claims 1 to 3, further comprising:

5. The above-mentioned confidence limit point determining means decides the above-mentioned confidence limit point using a t-distribution based on a test statistical table of duplicate data obtained by simulation,

The gene expression information analysis device according to claim 4, characterized in that:

6. The reliability boundary line creating means performs smoothing by creating a spline curve based on the plurality of reliability limit points, and creates the reliability boundary line.

The gene expression information analysis device according to claim 4 or 5, characterized in that:

7. The above-mentioned confidence boundary line creating means, for a region having a high fluorescence intensity, creates the above-mentioned confidence limit line using a horizontal extension line of the confidence limit point obtained in the last window. Gene expression information according to any one of the ranges 4 to 6 Analysis device.

8. The confidence boundary line creation means uses, for the region with low fluorescence intensity, the extrapolation of the asymptote obtained by the least squares method from the confidence limit point obtained in the window 上記 as the confidence limit line,

The gene expression information analyzer according to any one of claims 4 to 6, wherein the analyzer is characterized in that:

9. Gene number input means for letting the IJ user input the number of genes in the window,

Further comprising

The window setting means sets the window within the section including the gene having the number of genes input by the gene number input means,

The gene expression information analyzer according to any one of claims 4 to 8, characterized in that:

1 0. Confidence limit value input means for allowing the user to input the confidence limit value,

Further comprising

4. The method according to claim 4, wherein the confidence limit point determination means determines the confidence limit point based on the confidence limit value input by the confidence limit value input means within the window. The gene expression information analyzer according to any one of the items 9 to 9.

1 1. A simulation including information on at least one of the distribution form of the unchanging gene, the distribution form of the fluctuation gene, the detection criteria of the fluctuation gene, the number of repetitions of the experiment, and the number of simulations. Simulation condition setting means for inputting conditions;

According to the simulation conditions set in the simulation condition setting step, the same gene group is repeatedly generated from the same distribution, the gene detection means is executed, and the simulation for detecting the expressed gene is performed a plurality of times. Calculation, calculate the false positive rate and false negative rate of the results obtained by the above detection means, calculate the number of repetitions of the experiment, the above simulation conditions, and the relationship between detection sensitivity and detection reliability. Strange Simulation execution means for creating a test statistical table of the genes

Simulation result output means for outputting a simulation result by the simulation execution means for each of the simulation conditions;

The gene expression information analysis device according to any one of claims 1 to 10, further comprising:

1 2. The above gene detection means,

Deviation value calculating means for calculating the deviation of each spot,

The gene expression information analyzer according to any one of claims 1 to 11, further comprising:

13. A background correction step of creating background-corrected luminance data by removing the background value from the measured luminance data of each spot whose fluorescence intensity indicating the same gene expression level under the two conditions was measured. ,

A logarithm of the luminance data background-corrected in the background correction step is used as an X_Y axis to create a fluorescence intensity scatterplot, a bias for the fluorescence intensity equilibrium axis is determined for each gene spot, and A bias correction step of constructing a new X-Υ-axis fluorescence intensity scatter diagram having two axes of the fluorescence intensity equilibrium axis and the expression amount magnification axis by removing the bias;

A gene detection step for detecting a fluctuating gene whose expression level has fluctuated based on the fluorescence intensity scatterplot of the new X-Υ axis system constructed by the above bias correction step. Information analysis method.

1 4. The above bias correction step is

Performing a principal component analysis using a logarithmic value of a gene population having a large amount of expression to obtain a slope and an intercept of an asymptote as a first principal component;

The angle between the asymptote and the X-axis obtained in the first principal component creation step is defined as 、, and the coordinate rotation in the X- 少ない axis system of the gene group with a low expression level is calculated by rotating the coordinate 右 to the right by Θ. Steps and

The locus of the gene group having a low expression level after the rotation of the coordinate axis by the coordinate rotation step A bias determination step of calculating a slope of the fluorescence intensity equilibrium axis using a target, and determining which of the two brightness data includes the bias more based on the calculated slope. ,

By subtracting the bias from the luminance data of the condition determined to contain a large amount of the bias in the bias determination step, a new fluorescence intensity equilibrium axis and a magnification axis of the expression level are set to two axes. Generating a correction plot for constructing a fluorescence intensity scatter plot of the X—Y axis system;

14. The method for analyzing gene expression information according to claim 13, further comprising:

15. The principal component analysis should be performed using a variance-covariance matrix.

The method for analyzing gene expression information according to claim 14, wherein:

1 6. The above gene detection step

A window setting step of setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction,

A confidence limit point determining step for determining a confidence limit point within each window set by the window setting step;

A window moving step for moving the window by a given gene in the direction of the fluorescence intensity equilibrium axis;

A confidence boundary line creation step of obtaining each confidence limit point in the confidence limit point determination step for each window moved in the window moving step, and creating a confidence boundary line based on the obtained plurality of confidence limit points;

A variable gene extraction step of extracting a gene located outside the confidence boundary created by the confidence boundary creation step as a variation gene whose expression level has fluctuated, further comprising: Item 6. The method for analyzing gene expression information according to any one of Items 3 to 15.

17. The above-mentioned confidence limit point determination step is to determine the above-mentioned confidence limit point using a t-one distribution based on a test statistical table of duplicate data obtained by simulation, 17. The method for analyzing gene expression information according to claim 16, wherein:

18. The step of creating a confidence boundary line, wherein the step of creating a confidence boundary line by performing smoothing by creating a Sburyn curve based on the plurality of confidence limit points to create the confidence boundary line. 16. The method for analyzing gene expression information according to paragraph 16 or 17.

1 9. The confidence boundary and line creation step is to create the confidence limit line using the horizontal extension line of the confidence limit point obtained in the last window for the area with high fluorescence intensity.

The method for analyzing gene expression information according to any one of claims 16 to 18, characterized in that:

20. The above-mentioned confidence boundary line creation step is to calculate the extrapolation of the asymptote obtained by the least squares method from the confidence limit point obtained in the above window, Use as a line,

The method for analyzing gene expression information according to any one of claims 16 to 19, characterized in that:

2 1. A gene number input step for allowing a user to input the number of genes in the window さらに.

The window setting step is to set the window within the section including the genes having the number of genes input in the gene number input step, wherein the window is set. Item 7. The method for analyzing gene expression information according to any one of items 0.

2 2. Confidence limit value input step for user to input confidence limit value,

Further comprising

Determining the confidence limit point based on the confidence limit value input in the confidence limit value input step in the window;

The gene expression according to any one of claims 16 to 21, characterized in that: Information analysis method.

23. Include the user with information on at least one of the following: the form of the distribution of the gene that does not fluctuate, the form of the distribution of the fluctuating gene, the detection criteria for the fluctuating gene, the number of repetitions of the experiment, and the number of simulations A simulation condition setting step for inputting simulation conditions;

According to the simulation conditions set in the simulation condition setting step, the same gene group is repeatedly generated from the same distribution, the gene detection means is executed, and the simulation for detecting the expressed gene is performed a plurality of times. Execute, calculate the false positive rate and false negative rate of the results obtained by the above detection means, calculate the number of repetitions of the experiment, the above simulation conditions, and the relationship between detection sensitivity and detection reliability, and the expression level changes A simulation execution step for creating a test statistical table of genes;

A simulation result output step of outputting a simulation result by the simulation execution step for each of the simulation conditions;

The method for analyzing gene expression information according to any one of claims 13 to 22, further comprising:

2 4. The above gene detection steps

A deviation calculation step for calculating the deviation of each spot,

The method for analyzing gene expression information according to any one of claims 13 to 23, further comprising:

25. A background correction step of creating background-corrected luminance data by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene under the two conditions was measured. ,

A logarithm of the luminance data background-corrected in the background correction step is used as an X_Y axis to create a fluorescence intensity scatterplot, a bias for the fluorescence intensity equilibrium axis is determined for each gene spot, and By removing the bias, a new X-

A bias correction step for constructing a fluorescence intensity scatter plot of the Υ axis system, A gene detection step of detecting a fluctuating gene whose expression level fluctuates based on the fluorescence intensity scatterplot of the new X_Y axis system constructed by the above bias correction step, and causing the computer to execute a gene expression information analysis method comprising: A program characterized by the following.

2 6. The bias correction step is

The angle between the asymptote and the X axis obtained in the first principal component creation step is defined as Θ, and the coordinates of the gene group with low expression level are calculated by rotating the coordinates in the XY axis system to the right by Θ degrees. Steps and

Using the coordinates of the gene group whose expression level is small after the rotation of the coordinate axis in the coordinate rotation step, the inclination of the fluorescence intensity equilibrium axis is calculated, and the luminance data of the two conditions are calculated based on the calculated inclination. A bias determination step of determining which of the biases is included in which of the above,

By subtracting the bias from the luminance data under the conditions determined to contain a large amount of bias in the bias determination step, a new fluorescence intensity equilibrium axis and a magnification axis of the expression level are newly set as two axes. Generating a calibration plot for constructing a fluorescence intensity scatter plot of the X-Y axis system;

26. The program according to claim 25, further comprising:

27. The principal component analysis should be performed using a variance / covariance matrix.

27. The program according to claim 26, wherein:

2 8. The above gene detection step

A confidence limit point determining step of determining a confidence limit point in each window set in the window setting step;

A window moving step that moves the window by a given gene in the direction of the fluorescence intensity equilibrium axis. And

For each window moved in the window moving step, each confidence limit point is determined in the confidence limit point determination step, and a confidence boundary line is created based on the determined plurality of confidence limit points. Creation steps,

A variable gene extraction step of extracting a gene located outside the confidence boundary line created in the confidence boundary line creation step as a variation gene whose expression level has fluctuated, further comprising: The program according to any one of Items 5 to 27.

29. The confidence limit point determining step is characterized in that the confidence limit point is determined using a t-distribution based on a test statistical table of duplicate data obtained by a simulation. 28. The program according to item 28.

30. The step of creating a confidence boundary line, wherein the confidence boundary line is smoothed by creating a spline curve based on the plurality of confidence limit points to create the confidence boundary line. The program according to paragraph 28 or 29.

31. In the above-described step of creating a confidence boundary line, for the region with a high fluorescence intensity, the confidence limit line is created using the horizontal extension line of the confidence limit point obtained in the last window.

The program _{c32 according} to any one of claims 28 to 30, wherein the confidence boundary line creating step comprises the steps of: The extrapolation of the asymptote obtained by the least-squares method from the confidence limit point obtained in the window is used as the reliability limit line,

The program according to any one of claims 28 to 31, characterized in that: 3 3. A gene number input step of allowing the user to input the number of genes in the window,

The window setting step is to set the window within the section including the gene having the number of genes input in the gene number input step, wherein the window is set. The program according to any one of the two items.

3 4. The system further includes a confidence limit value input for allowing a user to enter a confidence limit value.

The program according to any one of claims 28 to 33, characterized by: 3 5. A simulation including information on at least one of the following: the form of the distribution of the gene that does not fluctuate, the form of the distribution of the fluctuating gene, the criteria for detecting the fluctuating gene, the number of repetitions of the experiment, and the number of simulations A simulation condition setting step for inputting conditions;

A simulation for repeatedly generating the same gene group from the same distribution according to the simulation conditions set in the simulation condition setting step, executing the gene detection means, and detecting the expressed gene Is performed several times to calculate the false positive rate and false negative rate of the results obtained by the above detection means, calculate the number of repetitions of the experiment, the above simulation conditions, and the relationship between detection sensitivity and detection reliability, and express the expression level. A simulation execution step for creating a test statistical table for genes whose

The program according to any one of claims 25 to 34, further comprising:

3 6. The above gene detection step

A deviation calculation step for calculating a deviation of each spot,

The program according to any one of claims 25 to 35, further comprising:

37. A computer-readable recording medium on which the program according to any one of claims 25 to 36 is recorded.