[go: up one dir, main page]

WO2003070938A1 - Gene expression data analyzer, and method, program and recording medium for gene expression data analysis - Google Patents

Gene expression data analyzer, and method, program and recording medium for gene expression data analysis Download PDF

Info

Publication number
WO2003070938A1
WO2003070938A1 PCT/JP2003/001900 JP0301900W WO03070938A1 WO 2003070938 A1 WO2003070938 A1 WO 2003070938A1 JP 0301900 W JP0301900 W JP 0301900W WO 03070938 A1 WO03070938 A1 WO 03070938A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
fluorescence intensity
window
axis
confidence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2003/001900
Other languages
French (fr)
Japanese (ja)
Inventor
Nobukazu Ono
Yoshiyuki Takahara
Quingwei Zhang
Hiroshi Tanaka
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ajinomoto Co Inc
Original Assignee
Ajinomoto Co Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ajinomoto Co Inc filed Critical Ajinomoto Co Inc
Priority to JP2003569831A priority Critical patent/JP4438414B2/en
Priority to AU2003211240A priority patent/AU2003211240A1/en
Publication of WO2003070938A1 publication Critical patent/WO2003070938A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation

Definitions

  • the present invention relates to a gene expression information analysis apparatus, a gene expression information analysis method, a program, and a recording medium.
  • the present invention performs background correction of measured value data of a DNA microarray, a DNA chip, and the like to change the expression level.
  • TECHNICAL FIELD The present invention relates to a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can statistically extract a ridden gene. Background art
  • RNA microarray in which cDNA fragments reverse-transcribed from messenger RNA using reverse transcriptase are immobilized on slide glass at high density is used.
  • DNA chip (trade name) of Affimetritas (company name), which synthesizes various types of oligonucleotides on a substrate using microfabrication technology, has attracted attention and is being used.
  • Expression gene analysis methods using these DNA microarrays and DNA chips are effective for identifying genes whose expression levels fluctuate comprehensively for hundreds to tens of thousands of genes at once.
  • the method of correcting measured values includes two major steps called a background correction step and a normalization step.
  • the average background value of a blank spot or the background value of the area surrounding each spot is simply subtracted from the measured fluorescence intensity of each spot directly from each measurement line. Correction The method used is mainly used.
  • the normalization process uses a non-parametric regression straight spring obtained by the least squares method or Lowess sssig (a local quadratic estimator using the bandwidth corresponding to the neighboring area), and the like.
  • the conventional correction method has a problem that it is easily affected by differences in the measurement apparatus, the difference between samples, and the efficiency of fluorescent labeling.
  • the least-squares method strictly draws two regression lines, while low-ess smoothing (Du Ci Ci S, Yang YH, Callow MJ, Speed TP (2000 ) S tatistical me thodsforidentifyingdi fferentiallyepressedg enesinreplicatedc DN A microarrayexperi me nts. H tml / papersind ex, html, etc.) had a problem that it was a normalization process based on the empirical shellfish U and was merely groundless.
  • a gene having a corrected fluorescence intensity ratio of an arbitrary factor or more is extracted as a gene having a difference in expression according to the conventional standard, and is used as a standard.
  • the magnification was set to 2 ⁇ , 3 ⁇ , etc. without any basis (Chen Y, Dougherty ER, Bittner ML (1 997) Ratio—baseddecisionsandthe qu antitativeanalysisofc DNA microarrayimages.JB iome d Op t 2: 364—374, Susan G. H i 1 senbeck, et c. (1 999) S tatisticalanalysisofa rrayexpression
  • the present invention provides a general formula for reliably comparing the expression level of a gene in a DNA microarray and an expression gene analysis method using a DNA chip, and provides a robust method that matches the actual data distribution.
  • An object of the present invention is to provide a highly reliable method for extracting expression-variable genes. Disclosure of the invention
  • the gene expression information analyzing apparatus is capable of removing the background data from the measured luminance data of each spot in which the fluorescence intensity indicating the expression level of the same gene is measured under two conditions to remove the luminance data subjected to the back-round correction.
  • a fluorescence intensity scatterplot is created by taking the logarithm of the luminance data, whose background has been corrected by the background correction means to be created and the background data corrected by the background correction means, on the X and Y axes, and creating a fluorescence intensity equilibrium axis for each gene spot. Bias for the X-Y axis system, which has two axes, the fluorescence intensity equilibrium axis and the expression scale factor, by removing the bias from the luminance data.
  • the background value is removed by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the same gene expression level is measured under two conditions using a DNA microarray or a DNA chip. Create corrected luminance data.
  • the average of the fluorescence intensity measurement values of the blank spots from the fluorescence intensity measurement values of the individual spots may be used as the background value, or the average of the fluorescence intensity measurement values of the blanks in the area around each spot may be used.
  • the value may be used as a background value. Also, you can perform the background photo by any other method.
  • the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2, etc.) is plotted on the X- ⁇ axis to create a fluorescence intensity scatter diagram (Skutter plot), and spots of each gene are generated.
  • the fluorescence intensity equilibrium axis (that is, the asymptote obtained from a group of genes whose expression levels are equivalent under the two conditions for each gene spot) showing the same fluorescence intensity for
  • the fluorescence intensity equilibrium axis and the magnification axis of the expression level are set to two axes.
  • a new orthogonal axis system can be constructed.
  • a fluctuating gene whose expression level fluctuates is detected based on the constructed fluorescence intensity scatter diagram of the new XY axis system. This makes it possible to accurately detect genes whose expression levels fluctuate without being affected by differences in sample, error between samples, and fluorescent labeling efficiency.
  • the gene expression information analyzer is the gene expression information analyzer according to the above, wherein the bias correction means performs principal component analysis using a logarithmic value of a gene group having a large expression level.
  • the coordinate rotation means for calculating the coordinates obtained by rotating the coordinates in the X-Y axis system to the right by 0 degrees, and the coordinates of the gene group having a small expression amount after the rotation of the coordinate axes by the coordinate rotation means,
  • a bias determining means for calculating a slope of the fluorescence intensity equilibrium axis, and determining which of the two conditions of the luminance data contains the larger amount of the bias based on the calculated slope; By subtracting the bias from the luminance data under the condition determined to contain a large amount of the bias, the fluorescence intensity equilibrium axis and the expression level magnification axis are set to two axes.
  • a correction plot generating means for constructing a fluorescence intensity scatter diagram of an axis system is further provided.
  • a control gene for sampling control of DNA concentration dilution series (for example, an external gene; an LDNA sample, or a housekeeping gene sample such as ribosomes whose expression level hardly changes) can be obtained simultaneously with the target gene for sampling. Measure and remove the control gene one by one in order from the gene with the smallest product of the fluorescence intensity data.Create calibration curves for the gene expression level and DNA level from the data of all remaining control gene samples. Fluorescence intensity data under the two conditions of the control sample if the correlation coefficient calculated in order satisfies the criterion (for example, 0.8 or more) where strong correlation is first recognized.
  • the criterion for example, 0.8 or more
  • the product of 1, the population of all gene samples in which the product of the fluorescence intensity data under the two conditions exceeds the threshold 1 is defined as the gene population with a high expression level, and the correlation degree for which the expression level is calculated in order is weakly correlated first.
  • the threshold 2 is defined as the product of the fluorescence intensity data under the two conditions of the control sample when the criteria satisfying the condition (for example, 0.5 or more) is satisfied (threshold 2 ⁇ threshold 1).
  • the population of all gene samples whose product is less than the threshold 2 is defined as the gene group with low expression level, and the principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene group with high expression level to become the first principal component
  • the slope and intercept of the asymptote are calculated, the angle between the obtained asymptote and the X-axis is set to 0, and the coordinates of the gene population with low expression level in the X-Y-axis system are rotated right by 0 degrees, and the coordinates are calculated.
  • this apparatus is not limited to the one that determines the magnitude of the bias after the rotation of the shaft.
  • the magnitude of the bias is compared by comparing the inclination of the asymptote with high expression and the asymptote of low expression. It may be determined.
  • a gene expression information analyzer is characterized in that in the above-described gene expression information analyzer, the principal component analysis is performed using a variance-covariance matrix.
  • the gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the gene detection means sets a window within a predetermined section in the fluorescence intensity equilibrium axis direction.
  • confidence limit point determining means for determining a confidence limit point in each window set by the window setting means
  • window moving means for moving a window by a given gene in the direction of the fluorescence intensity equilibrium axis; For each window moved by the moving means, each confidence limit point is determined by the confidence limit point determination means, and a confidence boundary line creation means for creating a confidence boundary line based on the obtained plurality of confidence limit points;
  • Genes located outside the above-mentioned trust boundary created by the trust boundary creating means are referred to as fluctuating genes whose expression levels fluctuate. Characterized in that it further includes a fluctuation gene extraction means for extracting Te.
  • the window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, and each of the moved windows is searched for a confidence limit point, and a confidence boundary is created based on the obtained confidence limit points. Extracting a gene located outside the trust boundary created by the trust boundary creation means as a fluctuating gene whose expression level fluctuates, so that stability, reproducibility, and Thus, highly reliable expression gene extraction can be performed.
  • the threshold value of the expression amount variation magnification can be determined according to the error.
  • the gene expression information analyzing apparatus is the gene expression information analyzing apparatus according to the above, wherein the confidence limit point determining means is based on a t-distribution based on a test statistical table of the duplicated data obtained by the simulation. It is characterized in that the above-mentioned confidence limit point is determined by using.
  • the t-one distribution is used based on the test statistical table of duplicate data obtained by simulation. Therefore, the confidence limit point is determined, so that the confidence limit point can be obtained more accurately and efficiently than the conventional method. Also, according to the test table of the duplicate data, the number of duplicate experiments required in the experimental design stage can be obtained.
  • the gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the confidence boundary line creation means creates a smooth curve by creating a spline curve based on the plurality of confidence limit points. And creating the above-mentioned confidence boundary line.
  • a smooth boundary is created by creating a spline curve based on a plurality of confidence limit points and a confidence boundary line is created, a confidence curve is created by efficiently supplementing the confidence limit points. Will be able to do things.
  • the gene expression information analyzing apparatus is the gene expression information analyzing apparatus according to the above, wherein the confidence boundary line creation means comprises: for a region having a high fluorescence intensity, the confidence limit obtained in the last window. It is characterized in that the above-mentioned confidence limit line is created using a horizontal extension line of a point.
  • the confidence limit line is created using the horizontal extension line to the X axis of the confidence limit point obtained in the last window (the rightmost window), so that the slope is Even if it is impossible to judge which force converges to whichever, the appropriate confidence limit line can be created.
  • the gene expression information analyzing apparatus is the gene expression information analyzing apparatus according to the above, wherein the confidence boundary line creation means is configured such that, for a region having a low fluorescence intensity, a minimum value is determined from a confidence limit point obtained in each window. The extrapolation of the asymptote obtained by the square method is used as the above-mentioned confidence limit line.
  • the extrapolation of the asymptote obtained by the least square method from the reliability limit points obtained in several tens of windows from the beginning is used as the reliability limit line. Therefore, it is possible to accurately detect even a spot of a gene having a low fluorescence intensity.
  • the gene expression information analyzing apparatus is the gene expression information analyzing apparatus according to the above, further comprising: a gene number input means for allowing a user to input the number of genes in a window;
  • the method is characterized in that the window is set in the section including the gene of the number of genes input by the gene number input means.
  • the user is required to input the number of genes in the window, and the window is set in a section including the input number of genes, so that the number of genes set by the user for each experiment varies. Will be able to do that.
  • the gene expression information analyzing apparatus is the gene expression information analyzing apparatus described above, further comprising a confidence limit value input means for allowing a user to input a confidence limit value,
  • the method is characterized in that the confidence limit point is determined based on the confidence limit value input by the confidence limit value input means in the window.
  • the user is required to input the confidence limit, and the confidence limit is determined based on the confidence limit entered in the window, so that the confidence limit set by the user varies for each experiment. And the error of each experiment can be kept within an appropriate range.
  • the gene expression information analyzer is the gene expression information analyzer according to the above, wherein the user is provided with a form of the distribution of the unchanging gene, a form of the distribution of the variable gene, and a detection of the variable gene.
  • Simulation condition setting means for inputting simulation conditions including information on at least one of the reference, the number of repetitions of the experiment, and the number of simulations; and the simulation set by the simulation condition setting means The same gene group is repeatedly generated from the same distribution according to the conditions, and the gene detection means is executed to detect the expressed gene.
  • Execute the simulation multiple times calculate the false positive rate and false negative rate of the results obtained by the detection means, calculate the number of repetitions of the experiment, the simulation conditions, and the relationship between detection sensitivity and detection reliability, and express A simulation execution means for creating a test statistical table of genes whose amounts change, and a simulation result output means for outputting a simulation result by the simulation execution means for each of the simulation conditions. I do.
  • the distribution form of the above-mentioned fluctuating gene for example, the center (for example, the
  • the same gene group is repeatedly generated from the same distribution, the gene detection is performed, the simulation for detecting the expressed gene is performed a plurality of times, and the false positive rate and false negative rate of the result obtained by the above detection means are calculated.
  • Calculate the relationship between the number of repetitions of the experiment, the simulation conditions, and the detection sensitivity and detection reliability create a test statistical table for the genes whose expression levels change, and simulate the simulation results for each simulation condition. Is output, it is possible to know the detection power and the detection reliability by the above combination by combining the simulation results under various conditions. That is, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only those genes detected more than a predetermined number of times. However, it becomes possible to detect a fluctuating gene with the expected reliability or power.
  • a gene expression information analyzing apparatus is the gene expression information analyzing apparatus described above, wherein the gene detecting means further includes a deviation value calculating means for calculating a deviation value of each spot.
  • the present invention relates to gene expression information analysis method, such gene expression information analysis method according to the present invention, the same gene in two conditions
  • the logarithm of the above brightness data is plotted on the X-Y axis to create a scatter plot of the fluorescence intensity.
  • the bias for the fluorescence intensity equilibrium axis is determined for each unit, and the bias is removed from the luminance data to obtain a new X with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes.
  • a gene detection step is provided according to this method, the background value is removed from the measured brightness data of each spot where the fluorescence intensity indicating the same gene expression level is measured under two conditions using a DN DN microarray or DN ⁇ chip. Create corrected luminance data.
  • the average of the fluorescence intensity measurement values of the blank spots from the fluorescence intensity measurement values of the individual spots may be used as the background value, or the average of the fluorescence intensity measurement values of the blanks in the area around each spot may be used.
  • the value may be used as the background value.
  • the background correction may be performed by any other method.
  • a fluorescence intensity scatter diagram (a scatter plot) is created by taking the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2) on the X--Y axes and creating a scatter plot of each gene.
  • a bias for the fluorescence intensity equilibrium axis that shows the same fluorescence intensity for each spot is obtained, and the bias is removed from the luminance data to obtain a new X—Y with two axes, the fluorescence intensity equilibrium axis and the magnification axis of the expression level.
  • the fluorescent component containing more bias is determined, and after removing this bias, a new orthogonal line with the fluorescent intensity equilibrium axis and the multiple axis of the expression level as two axes
  • the axis system can be constructed.
  • a fluctuating gene whose expression level fluctuates is detected based on a constructed fluorescence scatter plot of a new XY axis system. This makes it possible to accurately detect genes whose expression levels fluctuate without being affected by differences in sample, error between samples, and fluorescent labeling efficiency.
  • the method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the bias correction step performs a principal component analysis using a logarithmic value of a gene group having a high expression level.
  • a first principal component creation step for calculating the slope and intercept of an asymptote that is one principal component, and the asymptote obtained by the first principal component creation step and X The angle with the axis is set to 0, and the coordinates of the low-expression gene population in the X-Y axis system are shifted to the right.
  • the inclination of the fluorescence intensity equilibrium axis is calculated and calculated.
  • a control gene sample for quality control of a DNA concentration dilution series (for example, an external gene ⁇ DNA sample or a housekeeping gene sample such as a ribosome whose expression level hardly changes) is measured simultaneously with the target gene sample. Then, the control gene was removed one by one in order from the gene with the smallest product of the fluorescence intensity data, and calibration curves for the gene expression level and DNA level were created from the data of all remaining control gene samples, respectively.
  • the sample population is defined as a gene population with a high expression level, and the expression level is calculated in order.
  • the product of the fluorescence intensity data under the two conditions is defined as the threshold 2 (threshold 2 ⁇ threshold 1), and all genes whose product of the fluorescence intensity data under the two conditions is less than the threshold 2 are expressed in a group with a low expression level.
  • Principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene population with a high expression level as the population, and the slope and intercept of the asymptote, which is the first principal component, are obtained.
  • bias contains a large amount of bias, and subtracts the bias from the luminance data of the condition determined to contain a large amount of bias (for example, rotating the coordinates for a gene population with a certain bias) ),
  • a new X-Y axis fluorescence intensity scatter plot with two axes, the fluorescence intensity equilibrium axis and the expression level magnification axis, is used to efficiently remove the bias of the measured values, and This makes it possible to create a fluorescence intensity scatter plot that can clearly express the nature of the fluorescence.
  • this method is not limited to the method of determining the magnitude of the bias after the rotation of the axis.For example, by comparing the slopes of the asymptote with high expression and the asymptote with low expression before the rotation of the axis, the noise can be determined. The magnitude may be determined.
  • a method of analyzing gene expression information according to the next invention is characterized in that in the method of analyzing gene expression information described above, the principal component analysis is performed using a variance-covariance matrix.
  • the principal component analysis is performed using a variance 'covariance matrix, so that the correlation matrix conventionally used for the expression gene analysis is used, and the normalization is performed by comparing with the principal component analysis method. Since it is not necessary, the principal component analysis can be performed efficiently.
  • the method for analyzing gene expression information is the method for analyzing gene expression information described above, wherein the gene detecting step comprises a window setting step of setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction. A step of determining a confidence limit point in each window set in the window setting step; a step of moving the window by a given gene in the direction of the fluorescence intensity equilibrium axis; and a step of moving the window.
  • a confidence boundary point is determined in the confidence limit point determination step for each of the windows ⁇ ⁇ ⁇ moved by the above, and a confidence boundary line creating step of creating a confidence boundary line based on the obtained confidence limit points; and a confidence boundary line creation step Located outside of the above-mentioned trust boundary created by Depart And a variable gene extraction step for extracting the variable as a variable gene whose current amount has changed.
  • a window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, a confidence limit point is obtained for each moved window, and a confidence boundary line is created based on the obtained confidence limit points.
  • the threshold value of the expression amount variation magnification can be determined according to the error.
  • the step of determining the confidence limit point includes the step of determining t-distribution based on a test statistical table of duplicate data obtained by simulation. It is characterized in that the above-mentioned confidence limit point is determined using the above.
  • the confidence limit point is determined using the t-distribution based on the test statistical table of the duplicate data obtained by simulation, so that the confidence limit point can be determined more accurately and efficiently than the conventional method. You can ask for it.
  • the method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the step of creating a confidence boundary line is performed by creating a spline curve based on the plurality of confidence limit points. And create the above-mentioned confidence boundary.
  • smoothing is performed by creating a spline curve based on a plurality of confidence limit points. Since a reliable boundary is created, a confidence curve can be created by efficiently supplementing the confidence limit points.
  • the method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the step of creating a confidence boundary line includes, for an area having a high fluorescence intensity, a confidence limit point determined in the last window. It is characterized in that the above-mentioned confidence limit line is created by using the horizontal extension line.
  • a confidence limit line is created using a horizontal extension line to the X axis of the confidence limit point obtained in the last window (the rightmost window), so that the slope becomes Even if it is impossible to judge which force converges to whichever, the appropriate confidence limit line can be created.
  • the method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the step of creating a confidence boundary line comprises, for an area having a low fluorescence intensity, a minimum value from a confidence limit point obtained in each window.
  • the extrapolation of the asymptote obtained by the square method is used as the above confidence limit line.
  • extrapolation of the asymptote obtained by the least squares method from the reliability limit points obtained in several tens of windows from the beginning is used as the above-mentioned reliability limit line in the region where the fluorescence intensity is low.
  • spots of genes with low fluorescence intensity can be accurately detected.
  • the method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, further comprising a gene number input step for allowing a user to input the number of genes in a window.
  • the method is characterized in that the window is set in the section including the gene having the number of genes input in the gene number input step.
  • the user is required to input the number of genes in the window, and the number of genes input is included. Since the window is set within the section to be set, the number of genes set by the user for each experiment can be changed.
  • the gene expression information analysis method is the gene expression information analysis method described above, further comprising a confidence limit value input step of allowing a user to input a confidence limit value, wherein the confidence limit point determination step is And determining the confidence limit point based on the confidence limit value input in the confidence limit value input step in the window.
  • the user inputs the confidence limit and the confidence limit is determined based on the confidence limit entered in the window, so the confidence limit set by the user varies for each experiment. And the error of each experiment can be kept within an appropriate range.
  • the method for analyzing gene expression information is the method for analyzing gene expression information described above, wherein the user is provided with the form of the distribution of the unchanging gene, the form of the distribution of the variable gene, and the detection of the variable gene.
  • a simulation condition setting step for inputting a simulation condition including information on at least one of the reference, the number of repetitions of the experiment, and the number of simulations; and the simulation condition set in the simulation condition setting step The same gene group is repeatedly generated from the same distribution, the above-mentioned gene detecting means is executed, the simulation for detecting the expressed gene is executed plural times, and the false positive result of the above-mentioned detecting means is obtained.
  • Rate and false negative rate calculate the number of repetitions of the experiment, the above simulation conditions, And the relation between detection sensitivity and detection reliability are calculated, and a simulation execution step for creating a test statistical table for genes whose expression levels change, and a simulation result from the simulation execution step is output for each of the simulation conditions.
  • a simulation result output step calculates the number of repetitions of the experiment, the above simulation conditions, And the relation between detection sensitivity and detection reliability are calculated, and a simulation execution step for creating a test statistical table for genes whose expression levels change, and a simulation result from the simulation execution step is output for each of the simulation conditions.
  • the shape of the distribution of the above-mentioned fluctuating gene for example, the center (for example, the width of the center is set in the range of 0.4 to 3 under the conditions)
  • the detection criteria of the above fluctuating gene for example, Set the ratio of detected genes in terms of the total number to 23, 2/4, 3/4, 3/6, 4/6, etc.
  • the number of repeated experiments, and the number of simulations for example, 3 (Set in the range of 10 times to 10 times), and input simulation conditions including information on at least one of them, and repeat from the same distribution for the same gene group according to the set simulation conditions.
  • the detection power and detection reliability of the above combination can be known. That is, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only those genes that are detected more than a predetermined number of times. However, it becomes possible to detect a fluctuating gene with the expected reliability or power.
  • the method for analyzing gene expression information according to the next invention is characterized in that, in the method for analyzing gene expression information described above, the gene detecting step further includes a deviation value calculating step for calculating a deviation value of each spot. I do.
  • the present invention relates to a program, a program according to the present invention, each spot of the fluorescence intensity was measured showing the expression level of the same gene in two conditions
  • a background correction step for creating background-corrected luminance data by removing the background value from the measured luminance data of step (a), and the logarithm of the luminance data subjected to the background correction in the background correction step is represented by X- Create a fluorescence intensity scatter plot on the Y axis, and plot the fluorescence intensity equilibrium axis for each gene spot.
  • Bias correction to obtain a bias for constructing a new X-Y axis fluorescence intensity scatter diagram with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes by finding the bias for the bias data and removing the bias from the luminance data
  • a gene expression information analysis method comprising the steps of: detecting a fluctuating gene whose expression level fluctuates based on a new XY-axis fluorescence intensity scatter diagram constructed by the above bias correction step; It is characterized by being executed.
  • the background value is removed from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under two conditions using a DNA microarray or DNA chip. Creates brightness data with background correction.
  • the average of the measured values of the fluorescence intensity of the blank spots from the measured values of the fluorescence intensity of the individual spots may be used as the background value.
  • the background correction may be performed by any other program.
  • the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2) is plotted on the X and Y axes, and a fluorescence intensity scatter plot (skutter plot) is created.
  • the bias for the fluorescence intensity equilibrium axis which shows the same fluorescence intensity for the spot, is determined, and the bias is removed from the luminance data to obtain a new X-Y axis fluorescence with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes.
  • the fluorescent component containing more bias is determined, and after removing this bias, a new orthogonal line is set with the fluorescent intensity equilibrium axis and the multiple axis of the expression level as two axes.
  • the axis system can be constructed.
  • a fluctuating gene whose expression level fluctuates is detected based on the constructed fluorescence intensity scatter diagram of the new X_Y axis system. Genes whose expression levels fluctuate can be accurately detected without being affected by errors between samples and differences such as the efficiency of fluorescent labeling.
  • a program according to the next invention is the program according to the above, wherein the bias correction step has a large expression level! / Performing principal component analysis using the logarithmic value of the gene population to obtain the first principal component creation step to find the slope and intercept of the asymptote as the first principal component, and the above first principal component creation step
  • the angle between the asymptote and the X axis is ⁇
  • the coordinate rotation step for calculating the coordinates obtained by rotating the coordinates in the X- ⁇ axis system of the gene group with low expression level to the right by ⁇ angle, and the coordinate axis rotation by the coordinate rotation step
  • the inclination of the fluorescence intensity equilibrium axis is calculated using the coordinates of the gene group having a low expression level, and based on the calculated inclination, the bias is larger in either of the two conditions of the luminance data.
  • a control gene sample for quality control of a DNA concentration dilution series (for example, a DNA sample containing an external gene or a house-keeping gene sample such as a liposome whose expression level hardly changes) is used as a target gene sample.
  • the control gene is removed one by one in order from the gene with the smallest product of the fluorescence intensity data, and the calibration curves of the gene expression level and DNA level are obtained from the data of all remaining control gene samples.
  • the correlation coefficient of the data is calculated for each, and the two values of the control sample, which are calculated in order, when the above-mentioned correlation coefficient first satisfies the criterion (for example, 0.8 or more) that a strong correlation is recognized first.
  • the product of the fluorescence intensity data under the conditions is set as the threshold 1, and the product of the fluorescence intensity data under the two conditions exceeds the threshold 1.
  • the degree of correlation coefficient satisfies the criterion (for example, 0.5 or more) at which a weak correlation is first recognized.
  • the product of the fluorescence intensity data under the two conditions of the control sample is defined as the threshold value 2 (however, the threshold value is 2 and the threshold value is 1), and the population of all gene samples for which the product of the fluorescence intensity data under the two conditions is less than the threshold value 2 is Principal component analysis was performed using the logarithmic value of the fluorescence intensity of the gene population assuming that the gene population had a low expression level and the expression level was high, and the slope and intercept of the asymptote, which is the first main component, were calculated.
  • the angle between the asymptote and the X axis is set to 0, and the coordinates of the gene group with low expression level are calculated by rotating the coordinates in the X-Y axis system to the right by an angle ⁇ , and the coordinates of the gene group with low expression level after rotation of the coordinate axis are calculated.
  • the bias By subtracting the bias from the luminance data of the condition determined to be present (for example, by rotating the coordinates for a gene population with a constant bias), the fluorescence intensity equilibrium axis and the expression level magnification axis become two axes.
  • a fluorescent scatter plot of the X-Y axis system it is possible to efficiently remove the bias of measured values and create a fluorescent scatter plot that can clearly express the nature of the data. become.
  • this program is not limited to the method of determining the magnitude of bias after axis rotation.
  • the magnitude of bias is determined by comparing the slope of the asymptote with high expression and the asymptote of low expression before axis rotation. May be.
  • a program according to the next invention is the program according to the above, wherein the main component analysis is performed using a variance 'covariance matrix.
  • the program according to the next invention is the program according to the above, wherein the gene detection step comprises a window setting step of setting a window within a predetermined section in the direction of the fluorescence intensity equilibrium axis, and a window setting step.
  • the respective confidence limit points are determined in the above-described confidence limit point determination step, and a confidence boundary line creation step of creating a confidence boundary line based on the determined plurality of confidence limit points is provided.
  • a variable gene extracting step of extracting a gene located outside the confidence boundary line as a variable gene whose expression level has changed.
  • a window is moved by a certain number of genes in the axial direction of the fluorescence intensity equilibrium, and each of the moved windows is searched for each trust limit point, and a confidence boundary is created based on the obtained plurality of trust limit points. Since the gene located outside the confidence boundary line created in the boundary line creation step and the confidence boundary line creation step is extracted as a fluctuating gene whose expression level has fluctuated, It will be possible to perform highly qualitative, reproducible, and reliable expression gene extraction.
  • the threshold value of the expression amount variation magnification can be determined according to the error.
  • the program according to the next invention is the program according to the above, wherein the step of determining the confidence limit is performed by using the t-distribution based on a test statistical table of duplicate data obtained by simulation. It is characterized in that it is determined.
  • the confidence limit point is determined using the t-one distribution based on the test statistical table of the duplicate data obtained by simulation, so that the confidence limit point can be determined more accurately and efficiently than the conventional method. You can ask for it.
  • a program according to the next invention is the program according to the above, wherein the step of creating a confidence boundary line performs smoothing by creating a spline curve based on the plurality of confidence limit points, and performs the smoothing of the confidence boundary.
  • the method is characterized in that a line is created.
  • smoothing is performed by creating a spline curve based on multiple confidence points and a confidence boundary is created, so that a confidence curve can be efficiently created by complementing confidence points. become able to.
  • the program according to the next invention is the program according to the above, wherein the step of creating the confidence boundary line comprises, for a region having a high fluorescence intensity, using the horizontal extension line of the confidence limit point obtained in the last window.
  • the feature is to create a limit line.
  • the confidence limit line is created using the horizontal extension line to the X axis of the confidence limit point obtained in the last window (the window on the rightmost side). Therefore, even if it is not possible to judge which force converges to which one, the appropriate confidence limit line can be created.
  • the program according to the following invention is the program according to the above, wherein the step of creating a confidence boundary line includes, for an area having a low fluorescence intensity, an asymptote obtained by a least square method from a trust limit point obtained in each window. It is a special feature that extrapolation is used as the above confidence limit line.
  • the region with low fluorescence intensity for example, the extrapolation of the asymptote obtained by the least-squares method from the reliability limit points obtained in several windows ⁇ from the beginning is used as the above-mentioned reliability limit line Therefore, it is possible to accurately detect even a gene spot having a low fluorescence intensity.
  • the program according to the next invention is the program according to the above, further comprising a gene number input step of allowing a user to input the number of genes in the window, wherein the window setting step is performed by the gene number input step.
  • the window is set in the section in which the number of the genes is included.
  • the user is required to input the number of genes in window ⁇ , and the window is set within the section including the genes of the input number of genes. It can be varied.
  • the program according to the next invention is the above-described program, further comprising a confidence limit value input step for allowing a user to input a confidence limit value, wherein the step of determining the confidence limit point comprises:
  • the reliability limit point is determined based on the reliability limit value input in the input step. This shows one example of the determination of the confidence limit point more specifically.
  • the user is required to input the confidence limit, and the confidence limit is determined based on the confidence limit entered in the window, so the confidence limit set by the user varies for each experiment. It is possible to keep the error of each experiment within an appropriate range.
  • the program according to the next invention is the program according to the above, wherein the user is provided with a form of the distribution of the gene that does not fluctuate, a form of the distribution of the fluctuating gene, a criterion for detecting the fluctuating gene, the number of repetitions of the experiment, and A simulation condition setting step for inputting a simulation condition including information on at least one of the number of simulations, and the same gene group is repeated from the same distribution according to the simulation condition set in the simulation condition setting step.
  • the above-mentioned gene detecting means is executed, the simulation for detecting the expressed gene is executed plural times, the false positive rate and the false negative rate of the result by the detecting means are calculated, and the number of repetitions of the experiment is calculated.
  • the above simulation conditions, detection sensitivity and detection reliability And a simulation result output step of outputting a simulation result by the above simulation execution step for each of the above simulation conditions. And further comprising:
  • the width of the standard deviation ⁇ is set in the range of 0.1 to 1.5))), the shape of the distribution of the fluctuating gene (for example, the center (for example, the center at the time of the condition, the center; u
  • the width is set in the range of 0.4 to 3))), the above-mentioned detection criteria of the fluctuating gene (for example, the ratio of the detected genes in terms of the total number is 2/3, 2/4, 3/4, 3 / 6, 4/6, etc.), the number of repetitions of the experiment, and the number of simulations (for example, set from 3 to 10 times).
  • the same gene group is repeatedly generated from the same distribution, the gene detection is executed, the simulation for detecting the expressed gene is executed multiple times, and the false positive rate and false negative rate of the result obtained by the above detection means are calculated.
  • Calculate and calculate the number of repetitions of the experiment, simulation conditions, and the relationship between detection sensitivity and detection reliability create a test statistical table for genes whose expression levels change, and use the simulation Since the simulation results are output, it is possible to know the detection power and the detection reliability by the above combinations by combining the simulation results under various conditions.
  • the program according to the next invention is the program described above, wherein the gene detecting step further includes a deviation value calculating step of calculating a deviation value of each spot.
  • the deviation value calculated by this program can be used instead of the logarithm of the fluctuation ratio or the normal fluctuation ratio in a multivariate analysis represented by a cluster analysis. Analysis that is not affected by differences in the effects of errors becomes possible.
  • the present invention relates to a recording medium, and the recording medium according to the present invention is characterized by recording the program described above.
  • the program described above can be realized using a computer by causing a computer to read and execute the program recorded on the recording medium. The effect can be obtained.
  • FIG. 1 is a diagram illustrating a concept of principal component analysis using a variance / covariance matrix according to the present invention
  • FIG. 2 is a diagram illustrating a concept of a process of obtaining an asymptote in a new coordinate system according to the present invention
  • FIG. 3 is a diagram conceptually illustrating reconstruction of a distribution map according to the present invention
  • FIG. 4 is a diagram illustrating a mixed normal distribution model of the expression ratio according to the present invention
  • FIG. FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention
  • FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention
  • FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention
  • FIG. 7 is a simulation according to the present invention.
  • FIG. 1 is a diagram illustrating a concept of principal component analysis using a variance / covariance matrix according to the present invention
  • FIG. 2 is a diagram
  • FIG. 8 is a diagram showing an example of a calculation result of a first type detection error according to the present invention.
  • FIG. 8 is a diagram showing an example of a calculation result of a first type detection error by a simulation according to the present invention.
  • Fig. 9 shows the first
  • FIG. 10 is a diagram illustrating an example of a calculation result of a type of detection error
  • FIG. 10 is a diagram illustrating an example of a calculation result of a type 1 detection error by a simulation according to the present invention.
  • FIG. 1 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention.
  • FIG. 12 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention.
  • FIG. 1 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention.
  • FIG. 14 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention.
  • FIG. 5 is a flowchart showing a main process of the present apparatus of the present embodiment.
  • FIG. 16 is a flowchart showing an example of a background correction process of the present apparatus of the present embodiment.
  • An over preparative furo first 8 figure is a flow chart showing an example of the gene detection processing of the apparatus of the present embodiment, the first 9 figure showing an example of the simulation process of the device of the present embodiment FIG.
  • FIG. 20 is a diagram showing an example of a gene extraction condition setting screen output to the output device 114 by the processing of the window setting unit 102 i;
  • FIG. FIG. 12 is a diagram showing an example of a simulation condition setting screen output to the output device 114 by the processing of the simulation condition setting unit 102 r.
  • FIG. 22 shows the configuration of the present device to which the present invention is applied.
  • FIG. 23 is a block diagram showing an example of a configuration of a bias correction unit 102 b
  • FIG. 24 is an example of a configuration of a gene detection unit 102 c.
  • FIG. 25 is a block diagram showing an example of the configuration of the simulation unit 102 d.
  • FIG. 26 is a block diagram showing gene detection using the deviation value of the apparatus of the present embodiment.
  • FIG. 27 is a flowchart showing an example of the processing. Is a conceptual diagram showing the calculation of the deviation of location, the second 8 is a conceptual diagram showing an example of a bias determination processing of the apparatus of the present embodiment.
  • the expression level of each gene is reflected in the luminance of the fluorescence measurement value corresponding to each gene, and the expression level ratio of each gene is compared to the control fluorescence measurement value. Is observed as a ratio of However, errors in DNA microarrays and DNA chips, errors in fluorescent labeling reactions, measurement errors, and Due to differences in the fluorescence coefficient, etc., the ratio of the expression level is not accurately reflected as it is in the ratio of the measured fluorescence values. Therefore, in the present invention, the following processing is performed to process these errors.
  • Background correction is performed as the first stage of data correction. First, let the brightness measured under the two conditions of gene i be (a or b;), and subtract the background (BKGa ⁇ BKGb i) from the brightness of each gene.
  • the correction result (a j-BKGa ,, b; -BKGb;) is defined as (A ⁇ !).
  • control gene samples such as DNA samples with external genes or House-keeping gene samples such as ribosomes with almost unchanged expression
  • control gene samples such as DNA samples with external genes or House-keeping gene samples such as ribosomes with almost unchanged expression
  • the control gene was removed one by one from the gene with the smallest product of the fluorescence intensity data, and calibration curves for the gene expression level and DNA level were created from the data of all remaining control gene samples.
  • Calculate the correlation coefficient of the data and calculate the fluorescence intensities under the two conditions of the control sample when the above-mentioned correlation coefficient first satisfies the criterion (eg, 0.8 or more) that a strong correlation is first recognized.
  • the product of the data is defined as threshold 1, and the population of all gene samples whose fluorescence intensity data under the two conditions exceeds threshold 1 is generated.
  • the two conditions for the control sample when the correlation coefficient for which the expression level is calculated in the above order and the correlation coefficient degree satisfies the criterion (for example, 0.5 or more) at which a weak correlation is first recognized.
  • the product of the fluorescence intensity data at step 2 is defined as threshold 2 (threshold 2 ⁇ value 1), and the population of all gene samples whose fluorescence product under the two conditions is less than threshold 2 is defined as the gene group with low expression level.
  • the principal component analysis was performed using the logarithmic fluorescence intensity of the gene population with a high expression level, and the slope and intercept of the asymptote, which was the first principal component, were determined. The angle between the obtained asymptote and the X axis was set to 0.
  • the coordinates of the gene group with low expression level are calculated by rotating the coordinates in the X-Y axis system to the right by 0 degrees, and the gene group with low expression level after rotation of the coordinate axes is calculated.
  • the slope of the fluorescence intensity equilibrium axis is calculated, and based on the calculated slope (eg, positive, negative, zero, etc.), it is determined which of the two conditions' luminance data contains more bias.
  • the fluorescence intensity equilibrium axis and the expression level are determined by subtracting the bias from the luminance data under the conditions determined to contain a large amount of bias (for example, by rotating the coordinates for a gene population having a constant bias).
  • a new X-Y fluorescence intensity scatter plot is constructed with two magnification axes, so the bias of the measured values can be efficiently removed and the characteristics of the data can be clearly expressed. Can be created.
  • the correction of the bias k according to the present invention is based on the general formula (1) or (1 ′) representing the relationship between the fluorescence measurement values A and B.
  • Lo g 2 B a Lo g 2 (Ak) + b (1)
  • Equation 1 a L og A + b (1 ′)
  • Equation 1 the background noise of B
  • Equation 1 the background noise of B
  • the number of genes to be examined is sufficient as a sample (for example, a thousand or more), and that the number of fluctuating genes whose The intensity equilibrium axis is assumed to be the asymptote of the (Lo gzAi, Log 2 B i) population.
  • principal component analysis using a variance / covariance matrix is performed to obtain the slope a and the intercept b.
  • Principal component analysis using a variance / covariance matrix does not require normalization, unlike principal component analysis using a correlation matrix, which has been conventionally used in gene analysis.
  • FIG. 1 is a diagram showing the concept of principal component analysis using a variance / covariance matrix. Simplifying L 08 2 to, and L og 2 B to y, Equation 2 representing the asymptote is
  • the parameters a and b of the asymptote that are most appropriate on the distribution map are determined.
  • F is the average of the means the average of X i c
  • Equation 9 a is greater than zero among the two solutions.
  • 3 or; the dispersion, 3 or 1 of dispersion, S xy is X i and y; means a covariance.
  • a and b are the product A; B-level gene cluster (Lo g 2 A, L og 2 B) of the i of c A simple calculation method all genes using the product A; higher B i (eg 70%).
  • B i eg 70%
  • control genes are removed one by three in order, and calibration curves for the gene expression level and DNA level are created from the remaining control gene sample data, and the correlation coefficient of the data is calculated and calculated in order.
  • the above correlation coefficient first satisfies the criterion (eg, 0.8 or more) for which a strong correlation is initially recognized
  • the product of the fluorescence intensity data under the two conditions of the control sample is defined as the threshold 1
  • the two The population of all the gene samples whose product of the fluorescence intensity data under the condition exceeds the threshold 1 is defined as the gene population whose expression level is high.
  • A is the coordinate of the point where the asymptote intersects with the Log 2 A axis if the background noise is large and contains more bias than B
  • a c and B e are each - product of orthogonal axes system (L og 2 A Lo g 2 B) A iB; lower gene cluster (Lo g 2 A , L og 2 B) It is obtained as the value of the intersection of the asymptote obtained from f and the L og 2 A-axis or the L og 2 B-axis.
  • a control gene for control of the quality of a DNA concentration dilution series such as a control gene for sample control (for example, a sample of an external gene ⁇ D Ho ⁇ , or a House-keeping gene sample such as a ribosome whose expression level hardly changes) is used as the target gene.
  • FIG. 2 is a diagram showing a concept of a process for obtaining an asymptote in a new coordinate system.
  • the (Log 2 A—Log 2 B) axis system uses the asymptote obtained from the upper gene group of the product A; B; It is necessary to rotate to the axis system. Therefore, the new coordinates of (Lo g 2 Ai, Log 2 B 5 ) and the coordinates (L og 2 Aj ', L og 2 B i') are
  • the data correction between the second stage is performed by subtracting the bias obtained by Equation 11 or 13 from the entire data of one of the control measurement values.
  • the corrected data is composed of a mixed distribution of a gene group whose expression level changes and a gene group whose expression level does not change.
  • a window is set within a certain interval in the direction of the fluorescence intensity equilibrium axis, and within each window, the confidence limit of the arbitrary risk factor based on the Student's t-distribution is determined.
  • move the window by a fixed number of genes in the direction of the fluorescence intensity equilibrium axis (X-axis) to find each confidence limit point.
  • the obtained plurality of confidence limit points are complemented by a smooth line (spline) to make a confidence boundary line (confidence curve).
  • genes located outside the confidence boundary are selected as genes whose expression levels have changed.
  • the genes whose expression levels have been changed are surely selected based on the ratio of the majority decision by repeated experiments.
  • the multi-test recognizes that the gene expression level has changed only if the expression level has changed more than a predetermined number of times.
  • FIG. 3 is a diagram conceptually showing reconstruction of the distribution map.
  • the distance from each gene to the fluorescence intensity equilibrium axis is calculated and used as the value on the Y axis, and the fluorescence intensity scatter plot with the fluorescence intensity equilibrium axis as the X axis can clearly express the nature of the data.
  • the Y-axis value d 2 (magnification of the expression amount) of each gene is calculated by Equation 4.
  • the X-axis value di (fluorescence intensity) of each gene is calculated by the equation 22 which shows the relationship between A and B as a whole, despite the fact that the populations of the fluorescence measurement values A and B contain various errors. Obey. Then, the reconstructed fluorescence intensity scatter diagram has the XY axis of (fluorescence intensity-change rate of expression level).
  • 4 to 6 are diagrams showing a mixed normal distribution model of the expression fold.
  • the distribution of the actual data can be considered to be a mixed distribution of a group of genes whose expression level has changed (variable genes) and a group of genes whose expression level does not change (non-variable genes).
  • the mixture distribution model of this method has a population distribution of non-variable genes centered on zero on the Y-axis that represents the rate of change in the expression level, and an expression ratio that increases and decreases, respectively. It is assumed that it consists of a population distribution of fluctuating genes around one point.
  • the normal distribution is shown, but the present invention is not limited to the normal distribution, and can be applied to data of all distributions.
  • This method is based on the calculation of the variance and the center of the actual data. As a result, the method has the feature of being robust. In other words, in the present method, even in experimental data having a different error range, the threshold value of the expression amount variation magnification is determined according to the error. Another feature of this method is that several detections are performed on different data sets obtained from control experiments under the same conditions, and only those genes that are detected more than a predetermined number of times are detected. By selecting this, it is possible to detect variable genes with high reliability.
  • the mixed distribution of non-variable genes and fluctuating gene populations is calculated using six parameters (total number of genes, percentage of genes whose expression fluctuates, standard deviation (width) of gene distribution, and genes whose expression fluctuates).
  • the detection criterion number of detections / total number
  • P_value confidence limit
  • the simulation error of the first kind and the second Two types of detection errors can be calculated.
  • the results can be used as experimental guidelines.
  • ⁇ Type 1 detection error '' refers to a false positive error that was detected as something that did not change
  • ⁇ Type 2 detection error '' was detected as something that changed did not change. Check for false negative errors.
  • FIGS. 11 to 14 are diagrams conceptually showing the creation of a confidence curve.
  • this device uses a gene in a window composed of a fixed number of genes. Calculate the variance and center for the fold distribution of the expression level, and determine the t-value of the fold change (corresponding to the value on the fold coordinate axis). Note that the median value of the fluorescence intensity equilibrium axis value of all the it genes in the window is used as the value on the fluorescence intensity equilibrium axis at the confidence limit point of this expression change.
  • the system determines the coordinates of the confidence limit point of the expression change above and below the fluorescence intensity equilibrium axis in the window, and then the direction in which the fluorescence intensity equilibrium axis increases. To move the window for a certain number of genes. Thereafter, this operation is repeated. After calculating the confidence limit points of all expression changes, this device connects the confidence limit points of the expression changes by cubic spline curves to each other, and expresses the expression variation reliability curve that is the boundary line of the expression change. Create Here, in the window at both ends, in the region of the fluorescence intensity that cannot be complemented by the cubic spline curve, as shown in Fig.
  • the fluorescence intensity is high, but in the last window (indicated by the dotted line),
  • the asymptote obtained by the least-squares method from the boundary points of several tens of windows that continued from the leftmost point using the horizontal extension line of the confidence limit point of the calculated expression change, and where fluorescence intensity is low (indicated by the dotted line).
  • the extrapolation of is used as the expression fluctuation reliability curve (extrapolation expression change boundary line).
  • the genes whose expression level was changed that is, the genes outside the region sandwiched by the expression fluctuation reliability curves above and below the fluorescence intensity equilibrium axis, that is, the expression level increased or decreased Extract as things.
  • the final gene extraction is performed by the multiple test (2- (2)) described above.
  • FIG. 22 is a block diagram showing an example of the configuration of the present apparatus to which the present invention is applied, and conceptually shows only those parts of the configuration relating to the present invention.
  • the gene expression information analyzer 100 is connected to a control unit 102 such as a CPU for controlling the entire gene expression information analyzer 100 and a communication line.
  • Communication control interface ⁇ 04 connected to communication devices (not shown) such as routers, input / output control interfaces connected to input devices 112 and output devices 114 And a storage unit 106 for storing various databases and tables. These units are communicably connected via an arbitrary communication path.
  • the gene expression information analyzer 100 may be communicably connected to a network via a communication device such as a router and a wired or wireless communication line such as a dedicated line.
  • Various databases and tables (measured luminance data 106a and simulation result data 106b) stored in the storage unit 106 are storage means such as a fixed disk device, and are used for various types of processing. Stores program tables and files for file database pages.
  • the measured luminance data 106a is the measured luminance data of each spot that indicates the expression level of the gene that was tested by a DNA chip or DNA microarray. This is a measured luminance data storage means stored for each experiment.
  • the simulation result data 106 b is a simulation result data storage unit that stores simulation result data by the present apparatus.
  • a communication control interface unit 104 controls communication between the gene expression information analysis device 100 and a network (or a communication device such as a router). That is, the communication control interface unit 104 has a function of communicating data with another terminal via a communication line.
  • an input / output control interface unit 108 controls the input device 112 and the output device 114.
  • the output device 114 in addition to a monitor (including a home television), a speaker can be used (hereinafter, the output device 114 may be described as a monitor).
  • the input device 112 a keyboard, a mouse, a microphone, and the like can be used.
  • the monitor also implements a pointing device function in cooperation with the mouse.
  • control unit 102 has a control program such as an OS (Operating System), a program defining various processing procedures, and an internal memory for storing required data. By these programs, various Information processing for executing the processing is performed.
  • the control unit 102 is conceptually provided with a background correction unit 102 a, a bias correction unit 102 b, a gene detection unit 102 c, and a simulation unit 102 d. It is configured.
  • the background correction unit 102a corrects the background by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under the two conditions. This is the background correction means for creating the brightness data.
  • the bias correction unit 102b generates a fluorescence intensity scatter diagram by taking the logarithm of the luminance data subjected to the back ground correction by the back ground correction unit on the XY axis, and creates a bias with respect to the fluorescence intensity equilibrium axis for each gene spot.
  • This is a bias correction means for constructing a new XY-axis fluorescence intensity scatter diagram having two axes of the fluorescence intensity equilibrium axis and the magnification axis of the expression level by removing the bias from the obtained luminance data.
  • FIG. 23 is a block diagram showing an example of the configuration of the bias correction section 102b, and conceptually shows only those portions of the configuration relating to the present invention.
  • the bias correction unit 102 b is functionally conceptually composed of a first principal component creation unit 102 e, a coordinate rotation unit 102 f, a bias determination unit 102 g, and It is configured to include a capture plot generator 102h.
  • the first principal component generator 102 e performs principal component analysis using the logarithmic value of the gene group with a high expression level, and finds the slope and intercept of the asymptote as the first principal component This is the first principal component creating means.
  • the coordinate rotation unit 102 f sets the angle between the asymptote obtained by the first principal component creation means and the X axis to 0, and shifts the coordinates of the gene group with low expression level in the XY axis system to the right. This is a coordinate rotation unit that calculates the coordinate rotated by an angle.
  • the bias determination unit 102 g calculates the inclination of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is small after the rotation of the coordinate axis by the coordinate rotation means, and calculates the slope based on the calculated inclination.
  • This is a bias determination means for determining which of the luminance data under the two conditions contains a large amount of bias.
  • the correction plot generator 102h subtracts the bias from the luminance data of the condition determined to contain a large amount of bias by the bias determination means, thereby obtaining the fluorescence intensity equilibrium axis and the expression level magnification axis by two.
  • the gene detection unit 102c detects a fluctuating gene whose expression level fluctuates based on a new X-Y axis fluorescence intensity scatter diagram constructed by the bias correction means. It is a detecting means.
  • FIG. 24 is a block diagram showing an example of the configuration of the gene detection section 102c, and conceptually shows only a portion related to the present invention in the configuration.
  • the gene detection unit 102 c is functionally conceptualized as a window setting unit 1 2 i, a confidence limit point determination unit 102 j, a window moving unit 102 k, and a reliability Boundary line creation unit 102 m, variable gene extraction unit 102 n, gene number input unit 102 p, confidence limit input unit 102 q, and deviation value processing unit 102 u It is configured.
  • a window setting unit 102 i is a window setting unit that sets a window within a predetermined section in the direction of the fluorescence intensity equilibrium axis.
  • the confidence limit point determining unit 102 j is a confidence limit point determination unit that determines a confidence limit point in each window set by the window setting unit.
  • the window moving unit 102 k is a window moving means for moving a window by a certain gene in the direction of the fluorescence intensity equilibrium axis.
  • the confidence boundary creating unit 102m finds each confidence limit point by means of the confidence limit point determination means for each window moved by the window moving means, and based on the plurality of confidence limit points thus found.
  • Reasonable boundary creation means for creating a confidence boundary.
  • the fluctuating gene extraction unit 102 n is fluctuating gene extraction means for extracting a gene located outside the reliability boundary created by the reliability boundary creation means as a fluctuating gene whose expression level has fluctuated.
  • the gene number input section 102p is a gene number input means for allowing a user to input the number of genes in the window.
  • the confidence limit value input section 102q is a confidence limit value input means for allowing a user to input a confidence limit value.
  • the deviation value processing unit 102 u is a deviation value calculation means for calculating a deviation value of each spot.
  • the simulation unit 102 d is a simulation unit that executes a plurality of simulations according to predetermined conditions and outputs a simulation result for each condition.
  • FIG. 25 is a block diagram showing an example of the configuration of the simulation section 102d, and conceptually shows only a portion related to the present invention in the configuration.
  • the simulation unit 102 d is functionally conceptually composed of a simulation condition setting unit 102 r, a simulation execution unit 102 s, and a simulation result output unit 102 t. It is provided with.
  • the simulation condition setting unit 102 r provides the user with at least one of the standard deviation of the distribution of genes, the center of the distribution of fluctuating genes, the criteria for detecting fluctuating genes, and the number of simulations. Simulation condition setting means for inputting simulation conditions including information.
  • the simulation execution unit 102 s repeatedly generates the same gene group from the same distribution according to the simulation conditions set by the simulation condition setting unit, executes the gene detection unit, and detects the expressed gene. Simulate multiple times, calculate the false positive rate and false negative rate of the results obtained by the detection means, calculate the number of repetitions of the experiment, simulation conditions, and the relationship between detection sensitivity and detection reliability, and express This is a simulation execution means for creating a test statistical table of genes whose amounts change.
  • the simulation result output unit 102t is a simulation result output unit that outputs a simulation result by the simulation execution unit for each simulation condition.
  • FIG. 15 is a flowchart showing an example of a main process of the present apparatus of the present embodiment.
  • the gene expression information analyzing apparatus 100 executes a background correction process, which will be described later with reference to FIG. 16, by the processing of the back target correction unit 102a (step S-1).
  • the positive part 102a of the knock ground pattern is obtained by measuring the background intensity from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under the two conditions using a DNA microarray DNA chip or the like. Then, the luminance data corrected for the back ground is created by removing the luminance.
  • the gene expression information analyzing apparatus 100 executes a bias correction process described later with reference to FIG. 17 by the process of the bias correction unit 102b (step S-2). That is, the bias correction unit 102b generates a fluorescence intensity scatter diagram (a scatter plot) by taking the logarithm (natural logarithm or logarithm of 2) of the background-corrected luminance data on the XY axis.
  • the bias for the fluorescence intensity equilibrium axis which shows the same fluorescence intensity for each gene spot, is calculated, and the bias is removed from the luminance data to obtain a new X with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes — Construct a Y-axis fluorescence intensity scatter plot.
  • the gene expression information analyzer 100 executes a gene detection process using a moving window, which will be described later with reference to FIGS. 18 and 20, by the process of the gene detection unit 102c (step S-3). That is, the gene detection unit 102c detects a fluctuating gene whose expression level has fluctuated based on the constructed fluorescence scatter diagram of the new XY axis system. Then, the gene expression information analyzer 100 executes a simulation process described later with reference to FIGS. 19 and 21 and the like by the process of the simulation unit 102 d (step S-4). That is, the simulation unit 102 d is based on a predetermined condition. According to, the simulation is executed a plurality of times and the simulation result is output for each condition.
  • FIG. 16 is a flowchart showing an example of the bias correction process of the present apparatus of the present embodiment.
  • the gene expression information analyzer 100 calculates the average or local background value from the luminance measured under the two conditions of the gene by the processing of the background correction unit 102a (step SA-1).
  • the background value is removed from the measured values, and the result of this correction is used as group A and group B (step SA-2).
  • the background correction unit 102a calculates the average background value of the blank spot or the background value of the area surrounding each spot from the measured fluorescence intensity of each spot, and calculates the measured fluorescence intensity of each spot. Perform background correction by subtracting from. This ends the background correction processing. Correction processing]
  • FIG. 17 is a flowchart showing an example of the bias correction process of the present apparatus of the present embodiment.
  • the bias correction unit 102b calculates the base-2 logarithm of the group A and the group B by the processing of the first principal component creation unit 102e, and calculates Log 2 A, Log g 2 Perform a scatter plot on the orthogonal axis system with B as the X and Y axes (step SB-1).
  • the bias correction unit 102b uses the logarithmic value of the upper gene group of the product AB (for example, the gene group up to the upper 70%) by the processing of the first principal component creation unit 102e to calculate the variance and covariance. Principal component analysis using a matrix is performed to find the slope and intercept of the asymptote that is the first principal component (step SB-2).
  • the bias correction unit 102b sets the angle between the asymptote and the Log 2 A axis obtained by the processing of the coordinate rotation unit 102f to ⁇ , and the gene population belonging to the lower order of the product AB (for example, seat in Lo g 2 AL og 2 B axis system are such groups of genes) Calculate the coordinates of the target rotated right by 0 degrees (step SB-3).
  • the bias correction unit 102b calculates the asymptote by using the coordinates of the lower gene group of the product AB after the rotation of the coordinate axis by the processing of the bias determination unit 102g (step SB-). Four) .
  • the bias correction unit 102b determines whether or not the asymptotic gradient force is a positive number by the processing of the bias determination unit 102g (step SB-5). In the case of a positive number, the bias half IJ constant unit 102g determines that the data of A includes more bias.
  • Baiasu correction unit 1 02 b due Baiasu determining unit 1 02 g of processing, L og 2 A- lower gene cluster of product AB in L og 2 B shafting (e.g., Ru contained in the bottom 10% Log 2 B axis data as an independent variable, and L og 2 A data as a dependent variable, using the least squares method, the asymptote of the lower gene group and L og 2 Find the value A c of the intersection (A c , 0) with the A axis (step SB-6). Then, the bias correction unit 102 b calculates the bias by the processing of the correction plot generation unit 102 h, Subtract the bias from the measured data (step SB-7).
  • L og 2 A- lower gene cluster of product AB in L og 2 B shafting e.g., Ru contained in the bottom 10% Log 2 B axis data as an independent variable, and L og 2 A data as a dependent variable
  • step SB-5 when the asymptote has a non-positive slope in step SB-5, the bias determining unit 102g determines whether or not the value is zero (step SB-8). If zero, the bias correction process ends.
  • the bias determining unit 102g determines that the data of B contains more bias. Accordingly, the bias correction unit 102b performs the processing of the correction plot generation unit 102h to generate the lower gene group of the product AB (for example, included in the lower 10%) in the Log 2 A—Log 2 B axis system.
  • the bias correction unit 1 ⁇ 2 b by the processing of the neft positive plot generation unit 102h, Using the data from which the bias has been subtracted, construct an orthogonal axis system Lo g 2 (Ak) — Log 2 B axis system or Log 2 A—Log 2 (Bk) axis system (Step SB-10) t To end the bias correction process.
  • FIG. 18 is a flowchart showing an example of the gene detection process of the present apparatus of the present embodiment.
  • the gene detection unit 102c of the gene expression information analysis apparatus 100 provides the user with the number of genes in the window described above with reference to FIG.
  • a gene extraction condition setting screen for setting the reliability (P e value) is output to the output device 114 (step SC_1).
  • FIG. 20 is a diagram showing an example of a gene extraction condition setting screen output to the output device 114 by the processing of the window setting unit 102i.
  • the gene extraction condition setting screen has an input area MA-1 for the number of genes in the window, an input area MA-2 for the confidence level (P e value), which is the confidence limit value, and an MA- 3 setting end button. And so on.
  • the user completes the input of each item of the input areas MA-1 and MA-2 using the input device 112 while watching the gene extraction condition setting screen shown in Fig. 20, and then a setting end button.
  • the gene number input section 102p and the confidence limit value input section '102q' determine the set values of the genes in the window shown in Fig. 11 based on the information set on the gene extraction condition setting screen. Adjust the size of the window so that
  • the gene detection unit 102c calculates the variance from the leftmost end of the X-axis using the value of the Y-axis (multiplication factor) of each point in the window from the processing of the confidence limit point determination unit 102 ⁇ and a central compute, the trust boundary value of limit point at which the expression level change increases y l imi t +, boundary values y of the subtractive low magnification, imi t, and calculates the center of gravity of the X-axis (step SC- 2) .
  • the gene detection unit 102c moves the window by a certain number of genes in the direction in which the X-axis fluorescence intensity increases by the processing of the window moving unit 102k, and the new window is processed by the processing of the confidence limit point determination unit 102j.
  • the fold change in expression level Find the boundary values y, imit + and y, imit , and the center of gravity of the X axis (step SC-3). Then, the gene detector 102c repeats this process until the window is at the rightmost end of the X axis. (Step SC-4).
  • the gene detection unit 102 c calculates the expression amount change magnification boundary point, which is the reliability limit point of the expression change in all windows, by the processing of the confidence boundary line creation unit 102 m using a cubic spline curve. Then, determine the boundary for increasing and decreasing the expression fold, which is the reliability curve of expression fluctuation (Step SC-5).
  • FIG. 26 is a flowchart showing an example of a gene detection process using the deviation value of the present apparatus of the present embodiment.
  • the deviation value processing unit 102 u determines the fluorescence intensity equilibrium axis as described above. A window containing a certain number of genes is set in the direction, and the average value and standard deviation value are calculated using the Y-axis value that represents the rate of change in the expression level of all genes in each window. Next, the deviation value processing unit 102 u obtains the center of gravity (corresponding to an intermediate value of the fluorescence intensity) using the values of the X-axis of all the genes (step SE-2).
  • the deviation value processing unit 102 u shifts the window by a constant gene in the X-axis direction, and repeats the same processing until the rightmost window (step S E-3).
  • the deviation value processing unit 102 u performs a smoothing process on the obtained data sets of the (intermediate values and average values of the fluorescence intensities) as a series of (X, y) data (for example, Create a cubic spline curve), and use the average value shown in Fig. 27 as a smooth line.
  • the deviation value processing unit 102 u similarly stores a plurality of data sets (intermediate values of fluorescence intensity, standard deviations). Are complemented by a smoothing ridge (for example, a cubic submarine curve is created) to obtain a standard deviation smoothed line shown in FIG. 27 (step SE-4).
  • the deviation value processing unit 102 u calculates the Y value on the smooth line of the average value and the Y value on the smooth line of the standard deviation value from the value of the fluorescence intensity equilibrium axis (the value of the X axis) of each gene.
  • cluster analysis using hierarchical clustering (one-dimensional or two-dimensional), ⁇ -one Means method, self-organizing map method, etc. has been used to classify gene expression patterns and extract co-expressed genes.
  • Multivariate analysis has been performed. For example, assuming that the logarithm of the variation ratio is used, MB E isen, PT Spe 11 man, PO Brown, DB otstein (1 998), "Clusteranalysisanddisplayofgenome—wideexpressionpatterns", ProceedingsoftheNational Ac ad emy of Sciences, 95 (25): 14863-1868.
  • FIG. 19 is a flowchart showing an example of the gene detection process of the present apparatus of the present embodiment.
  • the simulation unit 102d of the gene expression information analyzer 100 provides the user with various condition parameters of the simulation (for example, the standard deviation of the gene distribution (eg, the standard deviation of the gene distribution) by the processing of the simulation condition setting unit 102r.
  • FIG. 21 is a diagram illustrating an example of a simulation condition setting screen output to the output device 114 by the processing of the simulation condition setting unit 102r.
  • the simulation condition setting screen shows the input area MB-1 of the standard deviation of the gene distribution, the input area MB-2 at the center of the gene distribution, the input area MB-3 of the detection standard, and the input of the number of simulations. It consists of area MB-4 and setting end button MB-5.
  • the center of the distribution of the fluctuating gene is, for example, Under this condition, the width of the center ⁇ may be set in the range of 0.4 to 3.
  • the detection criterion for the transgene may be, for example, set to 2/3, 2/4, 34, 3/6, 4/6, etc., based on the total number of detected genes. The number of simulations may be set, for example, in a range of 3 to 10 times.
  • the simulation unit 102d executes the above-described background correction process and bias correction process based on the information set on the simulation condition setting screen by the processing of the simulation execution unit 102s. , And the gene group whose expression level is changed (variable gene group) and the gene group whose expression level does not change (non-variable gene group) extracted by the gene detection process are repeatedly executed. A simulation process of the mixture distribution is performed (Step SD-2).
  • the simulation unit 102 d outputs the simulation result screen data shown in FIGS. 7 to 10 to the output unit 114 by the processing of the simulation result output unit 102 t (step SD — 3)
  • FIG. 7 to FIG. 10 are diagrams showing an example of a calculation result of a first-type detection error (false positive) by simulation.
  • the mixture distribution is based on the parameters set under the above six simulation conditions (the total number of genes, the proportion of genes whose expression fluctuates, the standard deviation (width) of the gene distribution, the center of the distribution of genes whose expression fluctuates, It depends on the detection criteria (number of detections / overall number) and the confidence limits (P—va 1 ue) in each dataset (window).
  • Fig. 8 shows the soil at the center ⁇ 'of the gene group whose expression changes (variable gene group). , The standard deviation is set to 1, and the expression changes when 3 out of 4 detection criteria are detected. It is a figure showing the case where it is assumed.
  • alpha first type of detection error
  • the horizontal axis in the figure represents the proportion of the gene group whose expression fluctuated occupies all genes.
  • the mixture distribution has a total of six parameters (the number of all genes, the percentage of genes whose expression changes, the standard deviation and center of the distribution of genes whose expression changes, the detection criteria, and the confidence limits within each dataset. )
  • FIGS. 9 and 10 are diagrams showing the case where the standard deviation of the gene group whose expression changes (variable gene group) is set to 1.
  • P c 0.25.
  • the horizontal axis of the figure represents the value obtained by integrating the center of the gene group whose expression changes and the standard deviation of the gene group whose expression does not change.
  • “TNum” is the total number of genes and “di di—x% "” Means the percentage of the gene population whose expression is changed, and "2/3" and "3Z4" mean the detection criteria. This ends the simulation processing.
  • all or some of the processes described as being automatically performed may be manually performed, or the processes described as being performed manually may be performed. All or a part of the method can be automatically performed by a known method.
  • processing procedures, control procedures, specific names, information including parameters such as simulation conditions, and screen examples shown in the above documents and drawings, and screen examples may be arbitrarily changed unless otherwise specified. Can be.
  • the simulation section 102 d simulates a mixture distribution with another distribution such as a gamma distribution to obtain the above-described reliability (P e value), the first type and the second type of detection error, and the like. May be required.
  • the distribution of the unchanging gene and the distribution of the fluctuating gene are a normal distribution
  • the distribution of the fluctuating gene is a distribution other than the normal distribution (eg, a gamma distribution). in may be generated, c also it is possible to apply the present invention to gene group take any distribution, bias determination process by the bias determining unit 1 0 2 g of the apparatus described above, the bias after axial rotation
  • the bias determining unit 1 0 2 g of the apparatus described above the bias after axial rotation
  • the bias The magnitude may be determined. In this process, the rotation of the coordinates is not a necessary condition.
  • each processing function performed by the control unit 102 all or any part of the processing functions is represented by C It can be implemented by PU (Central Processing Unit) and a program interpreted and executed by the CPU, or it can be implemented as hardware by wire logic.
  • the program is recorded on a recording medium described later, and is mechanically read by the gene expression information analyzer 100 as necessary.
  • the gene expression information analyzer 100 connects peripheral devices such as a printer, monitor, and image scanner to a computer (information processing device) such as an information processing terminal such as a known personal computer or a work station. This may be realized by mounting software (including programs, data, and the like) for realizing the method of the present invention on the device.
  • each database may be configured independently as an independent database device, and a part of the processing may be realized by using CGI (Common Gateway Technology Int e rfa ace).
  • CGI Common Gateway Technology Int e rfa ace
  • the program according to the present invention can be stored in a computer-readable recording medium.
  • this “recording medium” refers to any “portable physical medium” such as a flexible disk, magneto-optical disk, ROM, EPROM, EE PROM, CD-ROM, MO, DVD, etc., and various computer systems.
  • a “program” is a data processing method described in an arbitrary language or description method, regardless of the format of source code or binary code.
  • programs are not necessarily limited to a single program, but are typically distributed as multiple modules and libraries, and are typically represented by an operating system (OS). Including those that achieve their functions in coordination with separate programs. It should be noted that a known configuration or procedure can be used for a specific configuration for reading a recording medium in each device described in the embodiment, a reading procedure, an installation procedure after reading, and the like.
  • OS operating system
  • the case where the gene expression information analyzing apparatus 100 performs the processing in a stand-alone form has been described as an example, but the client terminal configured in a separate housing from the gene expression information analyzing apparatus 100 has been described.
  • Processing may be performed in response to a request transmitted from the client via the network, and the processing result may be returned to the client terminal.
  • the network has a function of interconnecting the gene expression information analyzer 100 and an external client device.
  • the network includes an Internet, an intranet, and a LAN (both wired and wireless).
  • VAN PC communication network
  • public telephone network including both analog and digital
  • leased line network including both analog and digital
  • CATV network IMT2 0 0 0 system
  • GSM system GSM system
  • PD CZP DC-P system etc.
  • mobile line switching network / mobile packet switching network paging network
  • local wireless network such as B 1 uetooth
  • PHS network CS
  • BS BS or ISDB etc.
  • the present device can transmit and receive various data via any network, whether wired or wireless.
  • the measured luminance data of each spot obtained by measuring the fluorescence intensity indicating the expression level of the same gene under two conditions using a DNA microarray, a DNA chip, or the like. It is possible to provide a gene expression information analysis apparatus, a gene expression information analysis method, a program, and a recording medium that can create background-corrected luminance data by removing a force background value. .
  • the logarithm naturally logarithm or logarithm 2 of the background-corrected luminance data is plotted on the XY axis to create a fluorescence intensity scatter diagram (Scatter plot), and spots of each gene are generated.
  • the bias for the fluorescence intensity ⁇ axis which shows the same fluorescence intensity, is obtained for the axis, and the bias is removed from the luminance data to remove the bias from the luminance data. Construct a fluorescence intensity scatter plot of the axis system Therefore, a fluorescent component containing more bias is determined, and after removing this bias, a gene capable of constructing a new orthogonal axis system having two axes of the fluorescence intensity equilibrium axis and the multiple axis of the expression level An expression information analyzer, a gene expression information analysis method, a program, and a recording medium can be provided.
  • a fluctuating gene whose expression level fluctuates is detected based on a constructed fluorescence scatter plot of a new XY axis system.
  • a gene expression information analysis apparatus, a gene expression information analysis method, a program, and the like which can accurately detect a gene whose expression level has fluctuated without being affected by errors between samples and differences in fluorescence labeling efficiency and the like.
  • a recording medium can be provided.
  • a control gene sample for quality control of a DNA concentration dilution series for example, an external gene or a DNA sample, or a house-keeping gene sample such as a ribosome whose expression level hardly changes
  • a DNA concentration dilution series for example, an external gene or a DNA sample, or a house-keeping gene sample such as a ribosome whose expression level hardly changes
  • Measure simultaneously with the target gene sample remove all control genes in order from the gene with the smallest product of the fluorescence intensity data, and calibrate the gene expression level and DNA amount from the data of all remaining control gene samples And calculate the correlation coefficient of the data, and calculate the correlation coefficient of the control sample when the above-mentioned correlation coefficient, which is calculated in order, first satisfies the criteria for strong correlation (for example, 0.8 or more).
  • the product of the fluorescence intensity data under the two conditions is defined as the threshold 1, and the product of the fluorescence intensity data under the two conditions exceeds the threshold 1.
  • the gene sampnore population is defined as a gene group with a high expression level, and the control is performed when the correlation coefficient degree calculated in the order of the expression level satisfies a criterion (for example, 0.5 or more) at which a weak correlation is first recognized.
  • a criterion for example, 0.5 or more
  • the principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene population with high expression level, and the slope and intercept of the asymptote as the first principal component are calculated.
  • the angle with respect to the X axis is defined as ⁇ , and the coordinates of the gene group with low expression level in the X-Y axis system are rotated to the right by ⁇ angle, and the coordinates of the gene group with low expression level after rotation of the coordinate axis are calculated.
  • the slope of the equilibrium axis is calculated, and based on the calculated slope (eg, positive, negative, zero, etc.), it is determined which of the two conditions of the luminance data contains more bias.
  • the fluorescence intensity equilibrium axis and the expression level magnification axis are set to two axes.
  • a new X-Y axis fluorescence intensity scatter plot that efficiently removes bias from measured values and creates a fluorescence intensity scatter plot that can clearly express the nature of data
  • a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium can be provided.
  • the principal component analysis is performed using a variance / covariance matrix, it is compared with a principal component analysis method using a correlation matrix which has been conventionally used for analysis of expressed genes.
  • normalization since normalization is not required, it is possible to provide a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can efficiently perform principal component analysis.
  • a window within a predetermined section is set, and within each set window, the average value, standard deviation, and P value (eg, 9
  • the window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, and each of the moved windows is determined for each of the confidence limit points, and a confidence boundary is created based on the obtained plurality of confidence limit points.
  • a gene expression information analysis apparatus even when experimental data have different error ranges, a gene expression information analysis apparatus, a gene expression information analysis method, a program, A recording medium can be provided.
  • a test statistic of duplicate data obtained by simulation is provided. Since the confidence limit is determined using the t-distribution based on the table, a gene expression information analysis device, a gene expression information analysis method, A program and a recording medium can be provided. In addition, according to the present invention, smoothing is performed by creating a spline curve based on a plurality of confidence limit points, and a confidence boundary is created, so that the confidence curve is efficiently complemented and the confidence curve is complemented. A gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can be created can be provided.
  • a confidence limit line is created using a horizontal extension line to the X axis of the confidence limit point obtained in the last window (the window on the rightmost side). Even if the slope is so small that it is not possible to determine which one to converge on, a gene expression information analyzing apparatus, a gene expression information analysis method, a program, and a recording medium capable of creating an appropriate confidence limit line are provided. Can be provided. Further, according to the present invention, for a region having a low fluorescence intensity, for example, the extrapolation of an asymptote obtained by the least square method from the reliability limit points obtained in several tens of windows from the beginning is used as the reliability limit line. Since it is used, it is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can accurately detect even a spot of a gene with low fluorescence intensity.
  • the user is required to input the number of genes in the window, and the window is set in a section including the genes of the input number of genes. Therefore, the number of genes set by the user for each experiment is determined.
  • a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can be varied can be provided.
  • the confidence limit value is determined by the user based on the confidence limit value input in the window, and the confidence limit value set by the user for each experiment.
  • the distribution form of the above-mentioned fluctuating gene for example, the center (for example, the width of the center ⁇ is set in the range of 0.4 to 3 under the conditions)
  • Quasi for example, set the ratio of the detected genes based on the total number to 2/3, 2/4, 3/4, 3/6, 4/6, etc.
  • the number of repetitions of the experiment and the number of simulations (For example, set in the range of 3 to 10 times)
  • input simulation conditions including information on at least one of them, and distribute the same gene group according to the set simulation conditions.
  • the simulation for detecting the expressed gene is executed multiple times, and the false positive rate and false negative rate of the results obtained by the above detection means are calculated, and the number of repetitions of the experiment, the simulation conditions, and the detection sensitivity and detection reliability are determined. Calculate the relationship between the two, and create a test statistical table for the genes whose expression levels change, and output the simulation results by executing the simulation for each simulation condition.These combinations can be performed by combining the simulation results under various conditions.
  • the detection power and the detection reliability can be known. In other words, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only genes that are detected more than a predetermined number of times, reliability as expected is obtained. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can detect a fluctuating gene with degree or power.
  • an error in which a gene whose expression level does not change is detected as a fluctuating gene (a first type of error) or an error in which a fluctuating gene is detected as a gene whose expression does not change (a second type of error) Error) can be grasped from the simulation data, and the expected power is compared with the actual experimental data.
  • Gene expression information analyzer, gene expression information analysis method, program, and program that can set a combination of the number of repetitions of the experiment, the detection criteria of the fluctuating gene, and the confidence limit point in order to obtain reliability and reliability
  • a recording medium can be provided.
  • the present invention can provide a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can significantly improve the experimental efficiency.
  • the deviation value of each spot is calculated, the deviation value of each spot calculated in this way is used instead of the variation ratio (magnification), so that the difference in error between the slides is obtained. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that enable analysis that is not affected by the information.
  • the deviation value calculated by the present apparatus can be used in place of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by a cluster analysis. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that enable analysis that is not affected by differences in the effects of errors depending on the magnitude.
  • the gene expression information analysis apparatus, gene expression information analysis method, program, and recording medium according to the present invention are bioinformatics for analyzing measured value data such as DNA microarrays and DNA chips. Very useful in the field.
  • the present invention can be widely implemented in many industrial fields, particularly in the fields of pharmaceuticals, foods, cosmetics, medical treatment, gene expression analysis, and the like, and is extremely useful.

Landscapes

  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Materials By The Use Of Chemical Reactions (AREA)

Abstract

Gene expression doses expressed in fluorometric data, which are measured by an experiment with the use of DNA microarrays and DNA chips, of a comparative group and a control group are corrected based on a novel mathematic model. Based on the scatter plots thus corrected, a novel X-Y axis system having an x axis proprotional to the fluorescence intensity of genes is constructed. Next, windows each having a definite number of genes are made along the X-axis and the reliability limit of the arbitrary risk is determined in each window in accordance with Student’s t-distribution. Then windows are shifted by a definite number of genes in the X-axis direction and each reliability limit is determined. The plural reliability limits thus determined are complemented by smoothening (spline curve) to give a reliability curve of expression variation. Subsequently, genes located outside the reliability curve of expression variation thus obtained are extracted as variation genes.

Description

明 細 書 遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体 技術分野  Description Gene expression information analyzer, gene expression information analysis method, program, and recording medium

本発明は、 遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 お よび、 記録媒体に関し、 特に、 D NAマイクロアレイや D N Aチップなどの測定値 データのバックグラウンド捕正を行い、 発現量が変ィ匕した遺伝子を統計的に高い信 頼度で抽出することができる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体に関する。 背景技術  The present invention relates to a gene expression information analysis apparatus, a gene expression information analysis method, a program, and a recording medium. In particular, the present invention performs background correction of measured value data of a DNA microarray, a DNA chip, and the like to change the expression level. TECHNICAL FIELD The present invention relates to a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can statistically extract a ridden gene. Background art

分子生物学の研究、 新薬の研究開発、 臨床診断などにおいて、 メッセンジャー R .N Aの発現量が変化した遺伝子を探索すること、 および、 その遺伝子を同定するこ とは非常に重要である。 そこで、 現在 R N Aレべノレでの発現変化を調べる方法とし て、 メッセンジャー R NAから逆転写酵素を用いて逆転写した c D NA断片をスラ ィドガラス上に高密度に固定ィヒした D NAマイクロアレイ、 および、 微細加工技術 を用いて多種類のオリゴヌクレオチドを基板上に合成したァフィメ トリタス社 (会 社名) の D NAチップ (商品名) が注目を集め、 利用されている。  In the research of molecular biology, research and development of new drugs, and clinical diagnosis, it is very important to search for genes whose messenger RNA expression level has changed and to identify those genes. Therefore, as a method for examining changes in expression in RNA levels, a DNA microarray in which cDNA fragments reverse-transcribed from messenger RNA using reverse transcriptase are immobilized on slide glass at high density is used. In addition, the DNA chip (trade name) of Affimetritas (company name), which synthesizes various types of oligonucleotides on a substrate using microfabrication technology, has attracted attention and is being used.

これらの D NAマイクロアレイや D NAチップを用いた発現遺伝子解析法は、 数 百から数万遺伝子に対して一度に網羅的に発現量が変動した遺伝子を同定するのに 有効であり、 現在、 一般的に測定値の補正方法は、 バックグラウンド捕正工程、 お よび、 ノ一マライズ工程とよばれる大きく二つの工程を含んでいる。  Expression gene analysis methods using these DNA microarrays and DNA chips are effective for identifying genes whose expression levels fluctuate comprehensively for hundreds to tens of thousands of genes at once. In general, the method of correcting measured values includes two major steps called a background correction step and a normalization step.

バックグラウンド補正工程では、 単純に個々の測定ィ直からブランクのスポットの 平均バックグラウンド値、 あるいは、 各スポットの周囲の領域のバックグラウンド 値を、 スポットの蛍光強度測定値から引くことによってバックグラウンドの補正を 行なう方法が主に用いられている。 In the background correction step, the average background value of a blank spot or the background value of the area surrounding each spot is simply subtracted from the measured fluorescence intensity of each spot directly from each measurement line. Correction The method used is mainly used.

一方、 ノーマライズ工程は、 最小自乗法や L owe s s平滑ィヒ (近傍領域に対応 してバンド幅を用いた局所二次推定量) などで求めたノンパラメ トリック回帰直泉 を蛍光強度散布図 (スキヤッタープロット) の Y = X直線に変換する係数で全ての 遺伝子の測定値を補正する手法を用いている。  On the other hand, the normalization process uses a non-parametric regression straight spring obtained by the least squares method or Lowess sssig (a local quadratic estimator using the bandwidth corresponding to the neighboring area), and the like. A method is used in which the measured values of all genes are corrected using the coefficients converted to the Y = X line in the Yatter plot.

しかしながら、 D Ν Αマイクロアレイや D Ν Αチップを用いた発現遺伝子解析法 は、 信頼度の高い測定値の解析手法が確立されていないという問題点を有してレ、た。 以下この問題点について具体的に説明する。  However, the expression gene analysis method using a DΝ microarray or a DΑ chip has a problem that a highly reliable analysis method of measured values has not been established. Hereinafter, this problem will be specifically described.

まず、 従来の補正法は、 測定装置、 標本間の誤差、 および、 蛍光標識効率などの 違いにより容易に影響を受けるという問題点を有している。 また、 ノーマライズェ 程においては、 最小自乗法は厳密には回帰直線が 2本引けてしまい、 一方、 L ow e s s平滑化 (Du d o i t S, Ya n g YH, C a l l ow MJ, S p e e d TP (2000) S t a t i s t i c a l me t h o d s f o r i d e n t i f y i n g d i f f e r e n t i a l l y e p r e s s e d g e n e s i n r e p l i c a t e d c DN A m i c r o a r r a y e x p e r i me n t s. Te c hn i c a l r e p o r t, D e p a r t m e n t o f S t a t i s t, i c s, UC— B e r k e l e y. h t t p : Zwww. s t a t. b e r k e l e y. e d u/u s e r s/t e r r y/z a r r a y/H t m l/p a p e r s i nd e x, h tm l 等) は経験貝 U に基づく正規化処理であり根拠のないものに過ぎないとレ、う問題点を有していた。 さらに、 発現量が変動した遺伝子の抽出法においては、 従来の基準では任意の倍 率以上の補正蛍光強度比を示した遺伝子を発現に差がある遺伝子として抽出してお り、 その基準となる倍率は、 無根拠に 2倍、 3倍などに設定されていた (Ch e n Y, Do u g h e r t y ER, B i t t n e r ML (1 997) R a t i o— b a s e d d e c i s i o n s a n d t h e qu a n t i t a t i v e a n a l y s i s o f c DNA m i c r o a r r a y i m a g e s . J B i ome d Op t 2 : 364— 374、 S u s a n G. H i 1 s e n b e c k, e t c. ( 1 999) S t a t i s t i c a l a n a l y s i s o f a r r a y e x p r e s s i o n First, the conventional correction method has a problem that it is easily affected by differences in the measurement apparatus, the difference between samples, and the efficiency of fluorescent labeling. Also, in the normalization process, the least-squares method strictly draws two regression lines, while low-ess smoothing (Du doit S, Yang YH, Callow MJ, Speed TP (2000 ) S tatistical me thodsforidentifyingdi fferentiallyepressedg enesinreplicatedc DN A microarrayexperi me nts. H tml / papersind ex, html, etc.) had a problem that it was a normalization process based on the empirical shellfish U and was merely groundless. Furthermore, in the method for extracting a gene whose expression level fluctuates, a gene having a corrected fluorescence intensity ratio of an arbitrary factor or more is extracted as a gene having a difference in expression according to the conventional standard, and is used as a standard. The magnification was set to 2 ×, 3 ×, etc. without any basis (Chen Y, Dougherty ER, Bittner ML (1 997) Ratio—baseddecisionsandthe qu antitativeanalysisofc DNA microarrayimages.JB iome d Op t 2: 364—374, Susan G. H i 1 senbeck, et c. (1 999) S tatisticalanalysisofa rrayexpression

d a t a a s a p p l i e d t o t h e p r o b l em o f t arn o x i f e n r e s i s t a n c e. J o u r n a l o f t h e Na t i o n 1 C a n c e r I n s t i t u t e, Vo l . 9 1, No. 5 等) という問題点を有していた。  dat aa s a p p l i e d t o t h e p r o b l em o f t arn o x i f e n r e s i st a n c e. J o u r n a l o f t h e Na t i o n 1 C a n c e r I n st i t u t e, Vo l. 91, No. 5 etc.)

一方、 誤差モデルや遺伝子発現の確率分布を仮定して、 最適化により遺伝子の検 出を行なう手法 (Ch e n Y, Do u g h e r t y ER, B i t t n e r ML (1 99 ( Ra t i o— b a s e d d e c i s i o n s a n d t h e q u a n t i t a t i v e a n a l y s i s o f c D N A m i c r o a r r a y i ma g e s. J B i ome d Op t 2 : 364— 37 4、 New t o n MA, Ke n d z i o r s k i CM, R i c hm o n d CS, B l a t t n e r FR, T s u i KW (2001) On d i f f e r e n t i a l v a r i a b i l i t y o f e x p r e s s i o n r a t i o s : I mp r o v i n g s t a t i s t i c a l i n f e r e n c e a b o u t g e n e e x p r e s s i o n c h a n g e s f r o m m i c r o a r r a y d a t a. J Comp B i o l 8 : 37 -52. 等) もいくつか開発されているが、 これらの手法は、 安定性と再現性に 乏しく、 必ずしも実用レべノレまで達していないという問題点を有していた。 また、 理想的な検出信頼度を得るために、 実験を何回繰り返せばいいという実験の指針と なる統計表も存在しないため、 実験の繰り返し回数と検出感度と検出信頼度の関係 は明らかにされていない。  On the other hand, a method for gene detection by optimization, assuming an error model and a probability distribution of gene expression (Chen Y, Dougherty ER, Bittner ML (1 99 (Ratio—baseddecisionsandtheq uantitativeanalysisofc DNA microarrayi ma ge s. JB iome d Opt 2: 364—374, New ton MA, Kendziorski CM, Ric hm ond CS, B lattner FR, T sui KW (2001) On differentialvariabili tyofexpressionratios: Imp rovingstatisticalinfe renceaboutgeneexpress ionchangesfrommicroar raydat a. J Comp Biol 8: 37-52. etc.) have been developed, but these methods have problems in that they are poor in stability and reproducibility and do not always reach the practical level. Experiments were also performed to obtain ideal detection reliability. Since the statistical tables to guide the experiment that I be repeated many times does not exist, the relationship between the number of repetitions and the detection sensitivity and detection reliability of the experiment has not been revealed.

従って、 本発明は、 DNAマイクロアレイ、 および、 DNAチップを用いた発現 遺伝子解析法において、 遺伝子の発現量を確実に比較するための一般式を提供し、 実際のデータ分布に合わせた頑健 (ロバスト) な信頼度の高い発現変動遺伝子の抽 出法を提供することを目的としている。 発明の開示 Therefore, the present invention provides a general formula for reliably comparing the expression level of a gene in a DNA microarray and an expression gene analysis method using a DNA chip, and provides a robust method that matches the actual data distribution. An object of the present invention is to provide a highly reliable method for extracting expression-variable genes. Disclosure of the invention

本発明にかかる遺伝子発現情報解析装置は、 2つの条件で同一の遺伝子の発現量 を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド値を 除去することによりバックダラゥンド補正された輝度データを作成するバックダラ ゥンド補正手段と、 上記バックグラウンド補正手段によりバックグラウンド捕正さ れた上記輝度データの対数を X— Y軸にとり蛍光強度散布図を作成し、 各遺伝子の スポッ卜について蛍光強度平衡軸に対するバイアスを求め、 上記輝度データから当 該バイアスを除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とす る新たな X _ Y軸系の蛍光強度散布図を構築するバイァス補正手段と、 上記バイァ ス補正手段により構築された新たな X— Υ軸系の蛍光強度散布図に基づいて発現量 が変動した変動遺伝子を検出する遺伝子検出手段とを備えたことを特徴とする。 この装置によれば、 D NAマイクロアレイや D NAチップなどにより 2つの条件 で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データか らバックグラウンド値を除去することによりバックダラゥンド捕正された輝度デー タを作成する。 ここで、 個々のスポットの蛍光強度測定値からブランクのスポット の蛍光強度測定値の平均をバックグラウンド値として用いてもよく、 あるいは、 各 スポッ卜の周囲の領域のブランクの蛍光強度測定値の平均値をバックグラウンド値 として用いてもよレ、。 また、 これ以外のいかなる方法によりバックグラウンドネ甫正 を行ってもよレ、。  The gene expression information analyzing apparatus according to the present invention is capable of removing the background data from the measured luminance data of each spot in which the fluorescence intensity indicating the expression level of the same gene is measured under two conditions to remove the luminance data subjected to the back-round correction. A fluorescence intensity scatterplot is created by taking the logarithm of the luminance data, whose background has been corrected by the background correction means to be created and the background data corrected by the background correction means, on the X and Y axes, and creating a fluorescence intensity equilibrium axis for each gene spot. Bias for the X-Y axis system, which has two axes, the fluorescence intensity equilibrium axis and the expression scale factor, by removing the bias from the luminance data. Correction means and the expression level based on the new X-axis fluorescence intensity scatter diagram constructed by the bias correction means. And a gene detecting means for detecting a fluctuating gene having fluctuated. According to this device, the background value is removed by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the same gene expression level is measured under two conditions using a DNA microarray or a DNA chip. Create corrected luminance data. Here, the average of the fluorescence intensity measurement values of the blank spots from the fluorescence intensity measurement values of the individual spots may be used as the background value, or the average of the fluorescence intensity measurement values of the blanks in the area around each spot may be used. The value may be used as a background value. Also, you can perform the background photo by any other method.

また、 本装置によれば、 バックグラウンド補正された輝度データの対数 (自然対 数または 2の対数等) を X— Υ軸にとり蛍光強度散布図 (スキヤッタープロット) を作成し、 各遺伝子のスポッ トについて同じ蛍光強度を示す蛍光強度平衡軸 (すな わち、 各遺伝子のスポッ卜について、 2つの条件で発現量が同等である遺伝子集団 より得られた漸近線) に対するバイアスを求め、 輝度データから当該バイアスを除 去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Υ軸系 の蛍光強度散布図を構築するので、 より多くのバイアスを含む蛍光成分の判定を行 レ、、 このバイァスを除去した上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする 新しい直行軸系を構築することができるようになる。 In addition, according to this apparatus, the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2, etc.) is plotted on the X-Υ axis to create a fluorescence intensity scatter diagram (Skutter plot), and spots of each gene are generated. The fluorescence intensity equilibrium axis (that is, the asymptote obtained from a group of genes whose expression levels are equivalent under the two conditions for each gene spot) showing the same fluorescence intensity for By constructing a new X-Υ axis fluorescence intensity scatter plot with two axes, the fluorescence intensity equilibrium axis and the magnification axis of the expression level, by removing the bias from After the judgment, the fluorescence intensity equilibrium axis and the multiple axis of the expression level are set to two axes. A new orthogonal axis system can be constructed.

また、 本装置によれば、 構築された新たな X— Y軸系の蛍光強度散布図に基づい て発現量が変動した変動遺伝子を検出するので、 従来の遺伝子検出法に比べて、 測 定装置、 標本間の誤差、 および、 蛍光標識効率などの違いの影響を受けずに正確に 発現量が変動した遺伝子を検出することができるようになる。  In addition, according to the present apparatus, a fluctuating gene whose expression level fluctuates is detected based on the constructed fluorescence intensity scatter diagram of the new XY axis system. This makes it possible to accurately detect genes whose expression levels fluctuate without being affected by differences in sample, error between samples, and fluorescent labeling efficiency.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記バイアス補正手段は、 発現量が多い遺伝子集団の対数値を用 いて主成分分析を実行し、 第一主成分となる漸近線の傾きと切片を求める第一主成 分作成手段と、 上記第一主成分作成手段により求めた上記漸近線と X軸との角度を 0とし、 発現量が少ない遺伝子集団の X— Y軸系における座標を右に 0角度回転し た座標を計算する座標回転手段と、 上記座標回転手段による座標軸回転後の上記発 現量が少ない遺伝子集団の座標を用いて、 上記蛍光強度平衡軸の傾きを計算し、 計 算された傾きに基づいて 2つの条件の上記輝度データのうちどちらに上記バイァス が多く含まれているかを判定するバイアス判定手段と、 上記バイアス判定手段にて 上記バイアスが多く含まれていると判定された条件の上記輝度データから上記バイ ァスを差し引くことにより上記蛍光強度平衡軸と上記発現量の倍率軸を 2軸とする 新たな X— Y軸系の蛍光強度散布図を構築する補正プロッ卜生成手段とをさらに備 えたことを特徴とする。  The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the bias correction means performs principal component analysis using a logarithmic value of a gene group having a large expression level. A first principal component creating means for finding the slope and intercept of an asymptote which is one principal component; and the angle between the asymptote obtained by the first principal component creating means and the X axis is set to 0, and a gene having a low expression level is set. The coordinate rotation means for calculating the coordinates obtained by rotating the coordinates in the X-Y axis system to the right by 0 degrees, and the coordinates of the gene group having a small expression amount after the rotation of the coordinate axes by the coordinate rotation means, A bias determining means for calculating a slope of the fluorescence intensity equilibrium axis, and determining which of the two conditions of the luminance data contains the larger amount of the bias based on the calculated slope; By subtracting the bias from the luminance data under the condition determined to contain a large amount of the bias, the fluorescence intensity equilibrium axis and the expression level magnification axis are set to two axes. A correction plot generating means for constructing a fluorescence intensity scatter diagram of an axis system is further provided.

これはバイアス補正手段の一例を一層具体的に示すものである。 この装置によれ ば、 D NA濃度希釈系列の品質管理用のコントロール遺伝子サンプノレ (例えば外部 遺伝子; L D N Aサンプル、 あるいは発現量がほとんど変わらないリボソームなどの H o u s e— k e e p i n g遺伝子サンプル) を目的遺伝子サンプノレと同時に測定 し、 蛍光強度データの積の一番小さい遺伝子から順に一つずつコントロール遺伝子 を除き、 残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現量と D N A量の検量線をそれぞれ作成し、 データの相関係数を計算し、 順番に計算され る上記の相関係数が最初に強い相関が認められる基準 (例えば 0 . 8以上) を満た した場合のコントロールサンプノレの二つの条件における蛍光強度データの積を閾値 1とし、 二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺伝子 サンプルの集団を発現量が多い遺伝子集団とし、 上記発現量が順番に計算される相 関係数度が最初に弱い相関が認められる基準 (例えば 0 . 5以上) を満たした場合 のコントロールサンプルの二つの条件における蛍光強度データの積を閾値 2とし ( ただし、 閾値 2 <閾値 1 ) 、 二つの条件における蛍光強度データの積が閾値 2を下 回るすべての遺伝子サンプルの集団を発現量が少ない遺伝子集団とし、 発現量が多 い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、 第一主成分となる漸 近線の傾きと切片を求め、 求めた漸近線と X軸との角度を 0とし、 発現量が少ない 遺伝子集団の X— Y軸系における座標を右に 0角度回転した座標を計算し、 座標軸 回転後の発現量が少ない遺伝子集団の座標を用いて、 蛍光強度平衡軸の傾きを計算 し、 計算された傾き (例えば、 正、 負、 ゼロ等) に基づいて 2つの条件の輝度デー タのうちどちらにバイアスが多く含まれているかを判定し、 バイアスが多く含まれ ていると判定された条件の輝度データからバイアスを差し引くこと (例えば、 一定 のバイアスをもつ遺伝子集団について座標を回転させる等) により蛍光強度平衡軸 と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築するので、 実測値のバイアスを効率的に除去し、 かつ、 データの性質を明白に表現できる蛍光 強度散布図を作成することができるようになる。 This more specifically shows an example of the bias correction means. According to this device, a control gene for sampling control of DNA concentration dilution series (for example, an external gene; an LDNA sample, or a housekeeping gene sample such as ribosomes whose expression level hardly changes) can be obtained simultaneously with the target gene for sampling. Measure and remove the control gene one by one in order from the gene with the smallest product of the fluorescence intensity data.Create calibration curves for the gene expression level and DNA level from the data of all remaining control gene samples. Fluorescence intensity data under the two conditions of the control sample if the correlation coefficient calculated in order satisfies the criterion (for example, 0.8 or more) where strong correlation is first recognized. The product of 1, the population of all gene samples in which the product of the fluorescence intensity data under the two conditions exceeds the threshold 1 is defined as the gene population with a high expression level, and the correlation degree for which the expression level is calculated in order is weakly correlated first. The threshold 2 is defined as the product of the fluorescence intensity data under the two conditions of the control sample when the criteria satisfying the condition (for example, 0.5 or more) is satisfied (threshold 2 <threshold 1). The population of all gene samples whose product is less than the threshold 2 is defined as the gene group with low expression level, and the principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene group with high expression level to become the first principal component The slope and intercept of the asymptote are calculated, the angle between the obtained asymptote and the X-axis is set to 0, and the coordinates of the gene population with low expression level in the X-Y-axis system are rotated right by 0 degrees, and the coordinates are calculated. Coordinate Calculate the slope of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is low after rotation of the axis, and calculate the brightness data under the two conditions based on the calculated slope (eg, positive, negative, zero, etc.). Of which contains more bias, and subtracts the bias from the luminance data under the conditions determined to contain more bias (for example, rotate the coordinates for a gene population with a certain bias) ), A new X-Y axis fluorescence intensity scatter plot is constructed with two axes, the fluorescence intensity equilibrium axis and the expression level magnification axis. This makes it possible to create a fluorescence intensity scatter diagram that can clearly express the properties.

なお、 本装置は軸回転後にバイアスの大小を判定するものに限定されず、 例えば、 軸回転の前にも高発現漸近線と低発現漸近線の傾きを比較することにより、 バイァ スの大小を判定してもよい。  In addition, this apparatus is not limited to the one that determines the magnitude of the bias after the rotation of the shaft.For example, before the rotation of the shaft, the magnitude of the bias is compared by comparing the inclination of the asymptote with high expression and the asymptote of low expression. It may be determined.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記主成分分析は、 分散 '共分散行列を用いて行うことを特徴と する。  A gene expression information analyzer according to the next invention is characterized in that in the above-described gene expression information analyzer, the principal component analysis is performed using a variance-covariance matrix.

これは主成分分析の一例を一層具体的に示すものである。 この装置によれば、 主 成分分析は、 分散'共分散行列を用いて行うので、 従来から発現遺伝子解析に用い られている相関行列を用レ、た主成分分析法と比較して正規化を要しないため、 効率 的に主成分分析を行うことができるようになる。 つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記遺伝子検出手段は、 上記蛍光強度平衡軸方向に予め定めた区 間内のウィンドウを設定するウィンドウ設定手段と、 上記ウィンドゥ設定手段によ り設定された各ウィンドウ内において信頼限界点を決定する信頼限界点決定手段と、 蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドウ移動手段と、 上記ウィンドゥ移動手段により移動した各ウィンドウにつレ、て上記信頼限界点決定 手段により各信頼限界点を求め、 求めた複数の信頼限界点に基づいて信頼境界線を 作成する信頼境界線作成手段と、 上記信頼境界線作成手段により作成された上記信 頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出する変 動遺伝子抽出手段とをさらに備えたことを特徴とする。 This more specifically shows an example of the principal component analysis. According to this apparatus, the principal component analysis is performed using a variance / covariance matrix, so that the correlation matrix conventionally used for the expression gene analysis is used, and the normalization is performed by comparing with the principal component analysis method. Since it is not necessary, the principal component analysis can be performed efficiently. The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the gene detection means sets a window within a predetermined section in the fluorescence intensity equilibrium axis direction. Means; confidence limit point determining means for determining a confidence limit point in each window set by the window setting means; window moving means for moving a window by a given gene in the direction of the fluorescence intensity equilibrium axis; For each window moved by the moving means, each confidence limit point is determined by the confidence limit point determination means, and a confidence boundary line creation means for creating a confidence boundary line based on the obtained plurality of confidence limit points; Genes located outside the above-mentioned trust boundary created by the trust boundary creating means are referred to as fluctuating genes whose expression levels fluctuate. Characterized in that it further includes a fluctuation gene extraction means for extracting Te.

これは遺伝子検出手段の一例を一層具体的に示すものである。 この装置によれば、 予め定めた区間内のウィンドウを設定し、 設定された各ウィンドゥ内にぉレ、て遺伝 子の輝度データの平均値、 標準偏差、 P値 (例えば、 p=0. 05) 、 重心などのうち少 なくとも一つを用いて信頼限界点を決定する。 そして、 蛍光強度平衡軸方向に一定 遺伝子ずつウィンドウを移動し、 移動した各ウィンドウにつレ、て各信頼限界点を求 め、 求めた複数の信頼限界点に基づレヽて信頼境界線を作成する信頼境界線作成手段 と、 上記信頼境界線作成手段により作成された上記信頼境界線の外側に位置する遺 伝子を発現量が変動した変動遺伝子として抽出するので、 安定性、 再現性、 および、 信頼度の高い発現遺伝子抽出を行うことができるようになる。  This more specifically shows an example of the gene detection means. According to this device, a window in a predetermined section is set, and the average value, standard deviation, and P value of the luminance data of the gene are set in each set window (for example, p = 0.05). ) Determine the confidence limit using at least one of the center of gravity. Then, the window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, and each of the moved windows is searched for a confidence limit point, and a confidence boundary is created based on the obtained confidence limit points. Extracting a gene located outside the trust boundary created by the trust boundary creation means as a fluctuating gene whose expression level fluctuates, so that stability, reproducibility, and Thus, highly reliable expression gene extraction can be performed.

また、 これにより、 誤差の範囲が異なる実験データであっても、 その誤差に応じ て、 発現量変動倍率の閾値が決められるようになる。  In addition, even in the case of experimental data having a different error range, the threshold value of the expression amount variation magnification can be determined according to the error.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記信頼限界点決定手段は、 シミュレーションにより得られた重 複データの検定統計表に基づき、 t一分布を用いて上記信頼限界点を決定すること を特 ί敫とする。  The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, wherein the confidence limit point determining means is based on a t-distribution based on a test statistical table of the duplicated data obtained by the simulation. It is characterized in that the above-mentioned confidence limit point is determined by using.

これは信頼限界点決定の一例を一層具体的に示すものである。 この装置によれば、 シミュレーションにより得られた重複データの検定統計表に基づき、 t一分布を用 いて信頼限界点を決定するので、 従来手法と比較して正確かつ効率的に信頼限界点 を求めることができるようになる。 また、 この重複データの検定表によると実験設 計の段階で必要となる重複実験の数を求められる。 This shows one example of the determination of the confidence limit point more specifically. According to this device, the t-one distribution is used based on the test statistical table of duplicate data obtained by simulation. Therefore, the confidence limit point is determined, so that the confidence limit point can be obtained more accurately and efficiently than the conventional method. Also, according to the test table of the duplicate data, the number of duplicate experiments required in the experimental design stage can be obtained.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記信頼境界線作成手段は、 上記複数の信頼限界点に基づいてス プライン曲線を作成することにより平滑化を行い上記信頼境界線を作成することを 特徴とする。  The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the confidence boundary line creation means creates a smooth curve by creating a spline curve based on the plurality of confidence limit points. And creating the above-mentioned confidence boundary line.

これは信頼境界線作成の一例を一層具体的に示すものである。 この装置によれば、 複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑ィヒを行い信 頼境界線を作成するので、 効率的に信頼限界点を補完して信頼曲線を作成すること ができるようになる。  This more specifically shows an example of creating a confidence boundary. According to this device, since a smooth boundary is created by creating a spline curve based on a plurality of confidence limit points and a confidence boundary line is created, a confidence curve is created by efficiently supplementing the confidence limit points. Will be able to do things.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記信頼境界線作成手段は、 蛍光強度の高い領域については、 最 後の上記ウィンドウで求めた信頼限界点の水平延長線を用いて上記信頼限界線を作 成することを特徴とする。  The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, wherein the confidence boundary line creation means comprises: for a region having a high fluorescence intensity, the confidence limit obtained in the last window. It is characterized in that the above-mentioned confidence limit line is created using a horizontal extension line of a point.

これは信頼境界線作成の一例を一層具体的に示すものである。 この装置によれば、 蛍光強度の高い領域については、 最後のウィンドウ (最も右側にあるウィンドウ) で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成するので、 傾きが少なくどちらに収束する力判断不能の場合であっても、 適切な信頼限界線を 作成することができるようになる。  This more specifically shows an example of creating a confidence boundary. According to this device, in the region where the fluorescence intensity is high, the confidence limit line is created using the horizontal extension line to the X axis of the confidence limit point obtained in the last window (the rightmost window), so that the slope is Even if it is impossible to judge which force converges to whichever, the appropriate confidence limit line can be created.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記信頼境界線作成手段は、 蛍光強度の低い領域については、 各 ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信 頼限界線として用いることを特徴とする。  The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, wherein the confidence boundary line creation means is configured such that, for a region having a low fluorescence intensity, a minimum value is determined from a confidence limit point obtained in each window. The extrapolation of the asymptote obtained by the square method is used as the above-mentioned confidence limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。 この装置によれば、 蛍光強度の低い領域については、 例えば、 最初から数十程度の各ウィンドウで求め た信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用 いるので、 蛍光強度が低い遺伝子のスポットについても的確に検出することができ るようになる。 This more specifically shows an example of creating a confidence boundary. According to this apparatus, in the region where the fluorescence intensity is low, for example, the extrapolation of the asymptote obtained by the least square method from the reliability limit points obtained in several tens of windows from the beginning is used as the reliability limit line. Therefore, it is possible to accurately detect even a spot of a gene having a low fluorescence intensity.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力手段 をさらに備え、 上記ウィンドウ設定手段は、 上記遺伝子数入力手段により入力され た上記遺伝子数の上記遣伝子が含まれる上記区間内で上記ウィンドウを設定するこ とを特 とする。  The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus according to the above, further comprising: a gene number input means for allowing a user to input the number of genes in a window; The method is characterized in that the window is set in the section including the gene of the number of genes input by the gene number input means.

これはウィンドウ設定の一例を一層具体的に示すものである。 この装置によれば、 利用者にウィンドウ内の遺伝子数を入力させ、 入力された遺伝子数の遺伝子が含ま れる区間内でウィンドウを設定するので、 実験毎に利用者が設定する遺伝子数を変 動させることができるようになる。  This shows one example of the window setting more specifically. According to this device, the user is required to input the number of genes in the window, and the window is set in a section including the input number of genes, so that the number of genes set by the user for each experiment varies. Will be able to do that.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報角 析装置において、 利用者に信頼限界値を入力させる信頼限界値入力手段をさらに備 え、 上記信頼限界点決定手段は、 上記ウィンドウ内において上記信頼限界値入力手 段により入力された上記信頼限界値に基づいて上記信頼限界点を決定することを特 徴とする。  The gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus described above, further comprising a confidence limit value input means for allowing a user to input a confidence limit value, The method is characterized in that the confidence limit point is determined based on the confidence limit value input by the confidence limit value input means in the window.

これは信頼限界点決定の一例を一層具体的に示すものである。 この装置によれば、 利用者に信頼限界値を入力させ、 ウィンドウ内において入力された信頼限界値に基 づいて信頼限界点を決定するので、 実験毎に利用者が設定する信頼限界値を変動さ せることができ、 各実験の誤差を適切な範囲に収めることができるようになる。  This shows one example of the determination of the confidence limit point more specifically. According to this device, the user is required to input the confidence limit, and the confidence limit is determined based on the confidence limit entered in the window, so that the confidence limit set by the user varies for each experiment. And the error of each experiment can be kept within an appropriate range.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 利用者に、 上記変動しない遺伝子の分布の形、 上記変動遺伝子の 分布の形、 上記変動遺伝子の検出基準、 実験の繰り返し数、 および、 シミュレーシ ョン回数のうち少なくとも一つに関する情報を含むシミュレーション条件を入力さ せるシミュレーション条件設定手段と、 上記シミュレーション条件設定手段にて設 定された上記シミュレ一シヨン条件に従って、 同一の遺伝子群に対して同じ分布か ら繰り返して生成し、 上記遺伝子検出手段を実行し、 上記発現遺伝子を検出するシ ミュレーションを複数回実行し、 上記検出手段による結果の偽陽性率と偽陰性率を 計算し、 実験の繰り返し数、 上記シミュレーション条件、 および検出感度と検出信 頼度との関係を計算し、 発現量が変わる遺伝子の検定統計表を作成するシミュレー ション実行手段と、 上記シミュレーション条件毎に、 上記シミュレーション実行手 段によるシミュレーション結果を出力するシミュレ一ション結果出力手段とをさら に備えたことを特徴とする。 The gene expression information analyzer according to the next invention is the gene expression information analyzer according to the above, wherein the user is provided with a form of the distribution of the unchanging gene, a form of the distribution of the variable gene, and a detection of the variable gene. Simulation condition setting means for inputting simulation conditions including information on at least one of the reference, the number of repetitions of the experiment, and the number of simulations; and the simulation set by the simulation condition setting means The same gene group is repeatedly generated from the same distribution according to the conditions, and the gene detection means is executed to detect the expressed gene. Execute the simulation multiple times, calculate the false positive rate and false negative rate of the results obtained by the detection means, calculate the number of repetitions of the experiment, the simulation conditions, and the relationship between detection sensitivity and detection reliability, and express A simulation execution means for creating a test statistical table of genes whose amounts change, and a simulation result output means for outputting a simulation result by the simulation execution means for each of the simulation conditions. I do.

この装置によれば、 利用者に、 変動しない遺伝子の分布の形 (例えば、 分布の標 準偏差 (例えば、 発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、 中心 μ = 0としたときに、 標準偏差 σの幅を 0 . 1から 1 . 5の範囲で設定 する) ) 、 上記変動遺伝子の分布の形 (例えば、 中心 (例えば、 当該条件のときに、 中心/ /の幅を 0 . 4から 3の範囲で設定する) ) 、 上記変動遺伝子の検出基準 (例 えば、 全体数からみた検出された遺伝子の割合を、 2 Ζ 3、 2 / 4 , 3ノ4、 3 / 6、 4 Ζ 6などで設定する) 、 実験の繰り返し数、 および、 シミュレーション回数 (例えば、 3回から 1 0回の範囲で設定する) のうち少なくとも一つに関する情報 を含むシミュレーション条件を入力させ、 設定されたシミュレーション条件に従つ て、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 遺伝子検出を実行し、 発現遺伝子を検出するシミュレーションを複数回実行し、 上記検出手段による結果 の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 シミュレーション条件、 およ び検出感度と検出信頼度との関係を計算し、 発現量が変わる遺伝子の検定統計表を 作成し、 シミュレーション条件毎に、 シミュレーション実行によるシミュレーショ ン結果を出力するので、 様々な条件におけるシミュレーション結果を組み合わせる ことにより上記の組み合わせによる検出力と検出信頼度を知ることができる。 すな わち、 同じ条件の対照実験を繰り返して行い、 得られたそれぞれ異なったデータセ ッ卜に対して変動遺伝子の検出を行い、 あらかじめ決めた回数以上検出される遺伝 子のみを選択することにより、 期待通りの信頼度あるいは検出力で変動遺伝子を検 出できるようになる。  According to this device, the user is given the form of the distribution of the gene that does not fluctuate (for example, the standard deviation of the distribution (for example, the standard deviation σ = 1, the central μ = 0 , The width of the standard deviation σ is set in the range of 0.1 to 1.5))), the distribution form of the above-mentioned fluctuating gene (for example, the center (for example, the center // Set the width in the range of 0.4 to 3))), the above-mentioned detection criteria of the fluctuating gene (for example, the ratio of the detected genes in terms of the total number is 2/3, 2/4, 3/4, 3 / 6, 4-6), the number of repetitions of the experiment, and the number of simulations (for example, set in the range of 3 to 10). According to the set simulation conditions The same gene group is repeatedly generated from the same distribution, the gene detection is performed, the simulation for detecting the expressed gene is performed a plurality of times, and the false positive rate and false negative rate of the result obtained by the above detection means are calculated. Calculate the relationship between the number of repetitions of the experiment, the simulation conditions, and the detection sensitivity and detection reliability, create a test statistical table for the genes whose expression levels change, and simulate the simulation results for each simulation condition. Is output, it is possible to know the detection power and the detection reliability by the above combination by combining the simulation results under various conditions. That is, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only those genes detected more than a predetermined number of times. However, it becomes possible to detect a fluctuating gene with the expected reliability or power.

また、 これにより、 発現量が変わらない遺伝子が変動遺伝子として検出されたェ ラー (第一種のエラー) や、 変動遺伝子が発現が変わらない遺伝子として検出され たエラー (第二種のエラー) を算出して比較することにより、 シミュレーションの データから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、 実 際の実験データに対して、 期待される検出力と信頼度を得るために、 実験の繰り返 し数と変動遺伝子の検出基準、 および信頼限界点の組み合わせを設定することがで きる。 In addition, as a result, a gene whose expression level did not change was detected as a fluctuating gene. By calculating and comparing errors (type 1 errors) and errors (type 2 errors) in which the fluctuating gene was detected as a gene whose expression does not change, the fluctuating gene obtained by the above method was obtained from the simulation data. In order to obtain the power and reliability to detect, and to obtain the expected power and reliability for the actual experimental data, the number of repetitions of the experiment and the detection criteria for the fluctuating gene, and the confidence limit Can be set.

また、 これにより、 何回実験を行えば、 正確な実験データを取ることができるか を予測することが可能になり、 実験効率を著しく向上させることができるようにな る。  In addition, this makes it possible to predict how many experiments will be performed before accurate experimental data can be obtained, thereby significantly improving the experimental efficiency.

つぎの発明にかかる遺伝子発現情報解析装置は、 上記に記載の遺伝子発現情報解 析装置において、 上記遺伝子検出手段は、 各スポットの偏差値を計算する偏差値計 算手段をさらに備えたことを特徴とする。  A gene expression information analyzing apparatus according to the next invention is the gene expression information analyzing apparatus described above, wherein the gene detecting means further includes a deviation value calculating means for calculating a deviation value of each spot. And

これは遺伝子検出の一例を一層具体的に示すものである。 この装置によれば、 各 スポットの偏差値を計算するので、 このように計算された各スポッ卜の偏差値を変 動比率 (倍率) の代わりに用いることで、 スライ ド間の誤差の差異に影響されない 解析が可能になる。  This more specifically shows an example of gene detection. According to this apparatus, the deviation value of each spot is calculated. By using the deviation value of each spot calculated in this way instead of the variation ratio (magnification), the difference in error between slides can be calculated. Unaffected analysis becomes possible.

また、 これにより、 本装置により計算される偏差値を、 クラスター解析に代表さ れる多変量解析において変動比率の対数や正規ィヒした変動比率の変わりに用いるこ とができ、 発現量の大小による誤差の影響の違いに左右されない解析が可能になる c また、 本発明は遺伝子発現情報解析方法に関するものであり、 本発明にかかる遺 伝子発現情報解析方法は、 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測 定した各スポットの測定輝度データからバックダラゥンド値を除去することにより バックグラウンド補正された輝度データを作成するバックグラウンド補正ステップ と、 上記バックグラウンド補正ステツプによりバックグラウンド補正された上記輝 度データの対数を X— Y軸にとり蛍光強度散布図を作成し、 各遺伝子のスポッ トに ついて蛍光強度平衡軸に対するバイァスを求め、 上記輝度データから当該バイァス を除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X - Y軸系の蛍光強度散布図を構築するバイアス補正ステップと、 上記バイァス補正 ステップにより構築された新たな X - Υ軸系の蛍光強度散布図に基づいて発現量が 変動した変動遺伝子を検出する遺伝子検出ステップとを含むことを特徴とする。 この方法によれば、 D N Αマイクロアレイや D N Αチップなどにより 2つの条件 で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝度データか らバックグラウンド値を除去することによりバックグラウンド補正された輝度デー タを作成する。 ここで、 個々のスポットの蛍光強度測定値からブランクのスポット の蛍光強度測定値の平均をバックグラウンド値として用いてもよく、 あるいは、 各 スポッ卜の周囲の領域のブランクの蛍光強度測定値の平均値をバックグラウ: ド値 として用いてもよい。 また、 これ以外のいかなる方法によりバックグラウンド補正 を行ってもよい。 In addition, this makes it possible to use the deviation value calculated by this device instead of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by cluster analysis, depending on the expression level. the c allowing analysis does not depend on the difference in the effect of errors, the present invention relates to gene expression information analysis method, such gene expression information analysis method according to the present invention, the same gene in two conditions A background correction step of creating background-corrected luminance data by removing a background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level has been measured; and a background correction step performed by the background correction step. The logarithm of the above brightness data is plotted on the X-Y axis to create a scatter plot of the fluorescence intensity. The bias for the fluorescence intensity equilibrium axis is determined for each unit, and the bias is removed from the luminance data to obtain a new X with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes. -A bias correction step for constructing a fluorescence intensity scatter plot of the Y-axis system, and a fluctuating gene whose expression level fluctuates is detected based on the new X-Υ axis fluorescence intensity scatter plot constructed by the above bias correction step. And a gene detection step. According to this method, the background value is removed from the measured brightness data of each spot where the fluorescence intensity indicating the same gene expression level is measured under two conditions using a DN DN microarray or DNΑ chip. Create corrected luminance data. Here, the average of the fluorescence intensity measurement values of the blank spots from the fluorescence intensity measurement values of the individual spots may be used as the background value, or the average of the fluorescence intensity measurement values of the blanks in the area around each spot may be used. The value may be used as the background value. The background correction may be performed by any other method.

また、 本方法によれば、 バックグラゥンド補正された輝度データの对数 (自然対 数または 2の対数等) を X— Y軸にとり蛍光強度散布図 (スキヤッタープロット) を作成し、 各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平衡軸に対す るバイアスを求め、 輝度データから当該バイアスを除去することにより蛍光強度平 衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する ので、 より多くのバイアスを含む蛍光成分の判定を行い、 このバイアスを除去した 上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新しい直行軸系を構築するこ とができるようになる。  In addition, according to this method, a fluorescence intensity scatter diagram (a scatter plot) is created by taking the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2) on the X--Y axes and creating a scatter plot of each gene. A bias for the fluorescence intensity equilibrium axis that shows the same fluorescence intensity for each spot is obtained, and the bias is removed from the luminance data to obtain a new X—Y with two axes, the fluorescence intensity equilibrium axis and the magnification axis of the expression level. Since a fluorescent intensity scatterplot of the axis system is constructed, the fluorescent component containing more bias is determined, and after removing this bias, a new orthogonal line with the fluorescent intensity equilibrium axis and the multiple axis of the expression level as two axes The axis system can be constructed.

また、 本方法によれば、 構築された新たな X— Y軸系の蛍光強度散布図に基づい て発現量が変動した変動遺伝子を検出するので、 従来の遺伝子検出法に比べて、 測 定方法、 標本間の誤差、 および、 蛍光標識効率などの違いの影響を受けずに正確に 発現量が変動した遺伝子を検出することができるようになる。  In addition, according to the present method, a fluctuating gene whose expression level fluctuates is detected based on a constructed fluorescence scatter plot of a new XY axis system. This makes it possible to accurately detect genes whose expression levels fluctuate without being affected by differences in sample, error between samples, and fluorescent labeling efficiency.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記バイアス補正ステップは、 発現量が多い遺伝子集団の対数値 を用いて主成分分析を実行し、 第一主成分となる漸近線の傾きと切片を求める第一 主成分作成ステップと、 上記第一主成分作成ステップにより求めた上記漸近線と X 軸との角度を 0とし、 発現量が少ない遺伝子集団の X— Y軸系における座標を右にThe method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the bias correction step performs a principal component analysis using a logarithmic value of a gene group having a high expression level. A first principal component creation step for calculating the slope and intercept of an asymptote that is one principal component, and the asymptote obtained by the first principal component creation step and X The angle with the axis is set to 0, and the coordinates of the low-expression gene population in the X-Y axis system are shifted to the right.

Θ角度回転した座標を計算する座標回転ステツプと、 上記座標回転ステップによる 座標軸回転後の上記発現量が少ない遺伝子集団の座標を用レ、て、 上記蛍光強度平衡 軸の傾きを計算し、 計算された傾きに基づいて 2つの条件の上記輝度データのうち どちらに上記バイアスが多く含まれているかを判定するバイアス判定ステップと、 上記バイアス判定ステップにて上記バイアスが多く含まれていると判定された条件 の上記輝度データから上記バイアスを差し引くことにより上記蛍光強度平衡軸と上 記発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する補正 プロット生成ステップとをさらに含むことを特徴とする。 座標 Using the coordinates rotation step for calculating the coordinates rotated by an angle, and the coordinates of the gene group with a low expression level after the rotation of the coordinate axes in the coordinate rotation step, the inclination of the fluorescence intensity equilibrium axis is calculated and calculated. A bias determination step of determining which of the two pieces of luminance data contains the bias based on the inclination, and the bias determination step determines that the bias contains a large amount of the bias. Subtracting the bias from the luminance data of the condition to create a new XY-axis fluorescence intensity scatter diagram with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes, and a correction plot generation step; Is further included.

これはバイアス補正ステップの一例を一層具体的に示すものである。 この方法に よれば、 D N A濃度希釈系列の品質管理用のコントロール遺伝子サンプル (例えば 外部遺伝子 λ D N Aサンプノレ、 あるいは発現量がほとんど変わらないリボソームな どの H o u s e— k e e p i n g遺伝子サンプル) を目的遺伝子サンプルと同時に 測定し、 蛍光強度データの積の一番小さい遺伝子から順に一つずつコントロール遺 伝子を除き、 残りすベてのコントロール遺伝子サンプルのデータから遺伝子の発現 量と D N A量の検量線をそれぞれ作成し、 データの相関係数を計算し、 順番に計算 される上記の相関係数が最初に強い相関が認められる基準 (例えば 0 . 8以上) を 満たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を 閾値 1とし、 二つの条件における蛍光強度データの積が閾値 1を上回るすべての遺 伝子サンプノレの集団を発現量が多い遺伝子集団とし、 上記発現量が順番に計算され る相関係数度が最初に弱い相関が認められる基準 (例えば 0 . 5以上) を満たした 場合のコントロールサンプノレの二つの条件における蛍光強度データの積を閾値 2と し (ただし、 閾値 2 <閾値 1 ) 、 二つの条件における蛍光強度データの積が閾値 2 を下回るすべての遺伝子サンプノレの集団を発現量が少ない遺伝子集団とし、 発現量 が多い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、 第一主成分とな る漸近線の傾きと切片を求め、 求めた漸近線と X軸との角度を 0とし、 発現量が少 ない遺伝子集団の X— Y軸系における座標を右に ø角度回転した座標を計算し、 座 標軸回転後の発現量が少ない遺伝子集団の座標を用いて、 蛍光強度平衡軸の傾きを 計算し、 計算された傾き (例えば、 正、 負、 ゼロ等) に基づいて 2つの条件の輝度 データのうちどちらにバイァスが多く含まれているかを判定し、 バイアスが多く含 まれていると判定された条件の輝度データからバイアスを差し引くこと (例えば、 一定のバイアスをもつ遺伝子集団について座標を回転させる等) により蛍光強度平 衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する ので、 実測値のバイアスを効率的に除去し、 かつ、 データの性質を明白に表現でき る蛍光強度散布図を作成することができるようになる。 This shows one example of the bias correction step more specifically. According to this method, a control gene sample for quality control of a DNA concentration dilution series (for example, an external gene λ DNA sample or a housekeeping gene sample such as a ribosome whose expression level hardly changes) is measured simultaneously with the target gene sample. Then, the control gene was removed one by one in order from the gene with the smallest product of the fluorescence intensity data, and calibration curves for the gene expression level and DNA level were created from the data of all remaining control gene samples, respectively. Calculate the correlation coefficient of the data, and calculate the fluorescence intensity data under the two conditions of the control sample when the above-mentioned correlation coefficient first satisfies the criterion (eg, 0.8 or more) that a strong correlation is first recognized. The product of the fluorescence intensity data under the two conditions is greater than the threshold 1 under all conditions. The sample population is defined as a gene population with a high expression level, and the expression level is calculated in order. If the correlation coefficient degree satisfies the first criterion (for example, 0.5 or more) that a weak correlation is observed, The product of the fluorescence intensity data under the two conditions is defined as the threshold 2 (threshold 2 <threshold 1), and all genes whose product of the fluorescence intensity data under the two conditions is less than the threshold 2 are expressed in a group with a low expression level. Principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene population with a high expression level as the population, and the slope and intercept of the asymptote, which is the first principal component, are obtained.The angle between the obtained asymptote and the X axis Is set to 0, the coordinates of the gene group with low expression level are rotated to the right by ø degrees in the X-Y axis system, and the locus is calculated. Calculate the slope of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is low after the rotation of the benchmark axis. Based on the calculated slope (for example, positive, negative, zero, etc.), the luminance data under the two conditions is calculated. Which bias contains a large amount of bias, and subtracts the bias from the luminance data of the condition determined to contain a large amount of bias (for example, rotating the coordinates for a gene population with a certain bias) ), A new X-Y axis fluorescence intensity scatter plot with two axes, the fluorescence intensity equilibrium axis and the expression level magnification axis, is used to efficiently remove the bias of the measured values, and This makes it possible to create a fluorescence intensity scatter plot that can clearly express the nature of the fluorescence.

なお、 本方法は軸回転後にバイアスの大小を判定するものに限定されず、 例えば、 軸回転の前にも高発現漸近線と低発現漸近線の傾きを比較することにより、 ノ ィァ スの大小を判定してもよい。  Note that this method is not limited to the method of determining the magnitude of the bias after the rotation of the axis.For example, by comparing the slopes of the asymptote with high expression and the asymptote with low expression before the rotation of the axis, the noise can be determined. The magnitude may be determined.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記主成分分析は、 分散 ·共分散行列を用いて行うことを特徴と する。  A method of analyzing gene expression information according to the next invention is characterized in that in the method of analyzing gene expression information described above, the principal component analysis is performed using a variance-covariance matrix.

これは主成分分析の一例を一層具体的に示すものである。 この方法によれば、 主 成分分析は、 分散'共分散行列を用いて行うので、 従来から発現遺伝子解析に用い られている相関行列を用レ、た主成分分析法と比較して正規化を要しないため、 効率 的に主成分分析を行うことができるようになる。  This more specifically shows an example of the principal component analysis. According to this method, the principal component analysis is performed using a variance 'covariance matrix, so that the correlation matrix conventionally used for the expression gene analysis is used, and the normalization is performed by comparing with the principal component analysis method. Since it is not necessary, the principal component analysis can be performed efficiently.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記遺伝子検出ステップは、 上記蛍光強度平衡軸方向に予め定め た区間内のウィンドウを設定するウィンドゥ設定ステップと、 上記ウィンドゥ設定 ステップにより設定された各ウィンドウ内において信頼限界点を決定する信頼限界 点決定ステツプと、 蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するゥ インドゥ移動ステツプと、 上記ウィンドウ移動ステップにより移動した各ウィンド ゥについて上記信頼限界点決定ステップにより各信頼限界点を求め、 求めた複数の 信頼限界点に基づいて信頼境界線を作成する信頼境界線作成ステップと、 上記信頼 境界線作成ステップにより作成された上記信頼境界線の外側に位置する遺伝子を発 現量が変動した変動遺伝子として抽出する変動遺伝子抽出ステツプとをさらに含む ことを特徴とする。 The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the gene detecting step comprises a window setting step of setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction. A step of determining a confidence limit point in each window set in the window setting step; a step of moving the window by a given gene in the direction of the fluorescence intensity equilibrium axis; and a step of moving the window. A confidence boundary point is determined in the confidence limit point determination step for each of the windows に よ り moved by the above, and a confidence boundary line creating step of creating a confidence boundary line based on the obtained confidence limit points; and a confidence boundary line creation step Located outside of the above-mentioned trust boundary created by Depart And a variable gene extraction step for extracting the variable as a variable gene whose current amount has changed.

これは遺伝子検出ステップの一例を一層具体的に示すものである。 この方法によ れば、 予め定めた区間内のウィンドウを設定し、 設定された各ウィンドウ内におい て遺伝子の輝度データの平均値、 標準偏差、 P値 (例えば、 p=0. 05) 、 重心などの うち少なくとも一つを用いて信頼限界点を決定する。 そして、 蛍光強度平衡軸方向 に一定遺伝子ずつウィンドウを移動し、 移動した各ウィンドウについて各信頼限界 点を求め、 求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境界線作 成ステツプと、 上記信頼境界線作成ステップにより作成された上記信頼境界線の外 側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、 安定性、 再現性、 および、 信頼度の高い発現遺伝子抽出を行うことができるようになる。  This more specifically shows an example of the gene detection step. According to this method, a window within a predetermined interval is set, and the average value, standard deviation, P value (for example, p = 0.05) of the luminance data of the gene within each set window, the center of gravity, Determine the confidence limit using at least one of these methods. Then, a window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, a confidence limit point is obtained for each moved window, and a confidence boundary line is created based on the obtained confidence limit points. And extracting the genes located outside the above-mentioned confidence boundary line created in the above-mentioned confidence boundary line creation step as the fluctuating genes whose expression levels fluctuate, so that the expression genes having high stability, reproducibility and high reliability Extraction can be performed.

また、 これにより、 誤差の範囲が異なる実験データであっても、 その誤差に応じ て、 発現量変動倍率の閾値が決められるようになる。  In addition, even in the case of experimental data having a different error range, the threshold value of the expression amount variation magnification can be determined according to the error.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記信頼限界点決定ステップは、 シミュレーションにより得られ た重複データの検定統計表に基づき、 t一分布を用いて上記信頼限界点を決定する ことを特徴とする。  In the method for analyzing gene expression information according to the next invention, in the method for analyzing gene expression information described above, the step of determining the confidence limit point includes the step of determining t-distribution based on a test statistical table of duplicate data obtained by simulation. It is characterized in that the above-mentioned confidence limit point is determined using the above.

これは信頼限界点決定の一例を一層具体的に示すものである。 この方法によれば、 シミュレーションにより得られた重複データの検定統計表に基づき、 t—分布を用 いて信頼限界点を決定するので、 従来手法と比較して正確力つ効率的に信頼限界点 を求めることができるようになる。  This shows one example of the determination of the confidence limit point more specifically. According to this method, the confidence limit point is determined using the t-distribution based on the test statistical table of the duplicate data obtained by simulation, so that the confidence limit point can be determined more accurately and efficiently than the conventional method. You can ask for it.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記信頼境界線作成ステップは、 上記複数の信頼限界点に基づい てスプライン曲線を作成することにより平滑化を行い上記信頼境界線を作成するこ とを特 とする。  The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the step of creating a confidence boundary line is performed by creating a spline curve based on the plurality of confidence limit points. And create the above-mentioned confidence boundary.

これは信頼境界線作成の一例を一層具体的に示すものである。 この方法によれば、 複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化を行い信 頼境界線を作成するので、 効率的に信頼限界点を補完して信頼曲線を作成すること ができるようになる。 This more specifically shows an example of creating a confidence boundary. According to this method, smoothing is performed by creating a spline curve based on a plurality of confidence limit points. Since a reliable boundary is created, a confidence curve can be created by efficiently supplementing the confidence limit points.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記信頼境界線作成ステップは、 蛍光強度の高い領域については、 最後の上記ウィンドウで求めた信頼限界点の水平延長線を用レ、て上記信頼限界線を 作成することを特徴とする。  The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the step of creating a confidence boundary line includes, for an area having a high fluorescence intensity, a confidence limit point determined in the last window. It is characterized in that the above-mentioned confidence limit line is created by using the horizontal extension line.

これは信頼境界線作成の一例を一層具体的に示すものである。 この方法によれば、 蛍光強度の高い領域については、 最後のウィンドウ (最も右側にあるウィンドウ) で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成するので、 傾きが少なくどちらに収束する力判断不能の場合であっても、 適切な信頼限界線を 作成することができるようになる。  This more specifically shows an example of creating a confidence boundary. According to this method, in the region where the fluorescence intensity is high, a confidence limit line is created using a horizontal extension line to the X axis of the confidence limit point obtained in the last window (the rightmost window), so that the slope becomes Even if it is impossible to judge which force converges to whichever, the appropriate confidence limit line can be created.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記信頼境界線作成ステップは、 蛍光強度の低い領域については、 各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記 信頼限界線として用いることを特1 [とする。  The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information according to the above, wherein the step of creating a confidence boundary line comprises, for an area having a low fluorescence intensity, a minimum value from a confidence limit point obtained in each window. The extrapolation of the asymptote obtained by the square method is used as the above confidence limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。 この方法によれば、 蛍光強度の低い領域については、 例えば、 最初から数十程度の各ウィンドウで求め た信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用 いるので、 蛍光強度が低い遺伝子のスポッ卜についても的確に検出することができ るようになる。  This more specifically shows an example of creating a confidence boundary. According to this method, extrapolation of the asymptote obtained by the least squares method from the reliability limit points obtained in several tens of windows from the beginning is used as the above-mentioned reliability limit line in the region where the fluorescence intensity is low. As a result, spots of genes with low fluorescence intensity can be accurately detected.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力ステ ップをさらに含み、 上記ウィンドウ設定ステツプは、 上記遺伝子数入力ステップに より入力された上記遺伝子数の上記遺伝子が含まれる上記区間内で上記ウインドウ を設定することを特徴とする。  The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, further comprising a gene number input step for allowing a user to input the number of genes in a window. The method is characterized in that the window is set in the section including the gene having the number of genes input in the gene number input step.

これはウィンドウ設定の一例を—層具体的に示すものである。 この方法によれば、 利用者にウィンドウ内の遺伝子数を入力させ、 入力された遺伝子数の遺伝子が含ま れる区間内でウィンドウを設定するので、 実験毎に利用者が設定する遺伝子数を変 動させることができるようになる。 This shows one example of window setting-layer specific. According to this method, the user is required to input the number of genes in the window, and the number of genes input is included. Since the window is set within the section to be set, the number of genes set by the user for each experiment can be changed.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 利用者に信頼限界値を入力させる信頼限界値入力ステップをさら に含み、 上記信頼限界点決定ステップは、 上記ウィンドウ内において上記信頼限界 値入力ステップにより入力された上記信頼限界値に基づいて上記信頼限界点を決定 することを特徴とする。  The gene expression information analysis method according to the next invention is the gene expression information analysis method described above, further comprising a confidence limit value input step of allowing a user to input a confidence limit value, wherein the confidence limit point determination step is And determining the confidence limit point based on the confidence limit value input in the confidence limit value input step in the window.

これは信頼限界点決定の一例を一層具体的に示すものである。 この方法によれば、 利用者に信頼限界値を入力させ、 ウィンドウ内において入力された信頼限界値に基 づいて信頼限界点を決定するので、 実験毎に利用者が設定する信頼限界値を変動さ せることができ、 各実験の誤差を適切な範囲に収めることができるようになる。  This shows one example of the determination of the confidence limit point more specifically. According to this method, the user inputs the confidence limit and the confidence limit is determined based on the confidence limit entered in the window, so the confidence limit set by the user varies for each experiment. And the error of each experiment can be kept within an appropriate range.

つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 利用者に、 上記変動しない遺伝子の分布の形、 上記変動遺伝子の 分布の形、 上記変動遺伝子の検出基準、 実験の繰り返し数、 および、 シミュレーシ ヨン回数のうち少なくとも一つに関する情報を含むシミュレーション条件を入力さ せるシミュレ一ション条件設定ステツプと、 上記シミュレーション条件設定ステッ プにて設定された上記シミュレーション条件に従って、 同一の遺伝子群に対して同 じ分布から繰り返して生成し、 上記遺伝子検出手段を実行し、 上記発現遺伝子を検 出するシミュレ一シヨンを複数回実行し、 上記検出手段による結果の偽陽性率と偽 陰性率を計算し、 実験の繰り返し数、 上記シミュレーション条件、 および検出感度 と検出信頼度との関係を計算し、 発現量が変わる遺伝子の検定統計表を作成するシ ミュレーシヨン実行ステップと、 上記シミュレーション条件毎に、 上記シミュレ一 ション実行ステップによるシミュレーション結果を出力するシミュレーション結果 出力ステップとをさらに含むことを特徴とする。  The method for analyzing gene expression information according to the next invention is the method for analyzing gene expression information described above, wherein the user is provided with the form of the distribution of the unchanging gene, the form of the distribution of the variable gene, and the detection of the variable gene. A simulation condition setting step for inputting a simulation condition including information on at least one of the reference, the number of repetitions of the experiment, and the number of simulations; and the simulation condition set in the simulation condition setting step , The same gene group is repeatedly generated from the same distribution, the above-mentioned gene detecting means is executed, the simulation for detecting the expressed gene is executed plural times, and the false positive result of the above-mentioned detecting means is obtained. Rate and false negative rate, calculate the number of repetitions of the experiment, the above simulation conditions, And the relation between detection sensitivity and detection reliability are calculated, and a simulation execution step for creating a test statistical table for genes whose expression levels change, and a simulation result from the simulation execution step is output for each of the simulation conditions. A simulation result output step.

この方法によれば、 利用者に、 変動しない遺伝子の分布の形 (例えば、 分布の標 準偏差 (例えば、 発現が変わらない遺伝子の分布を標準正規分布として標準偏差 σ = 1、 中心; u = 0としたときに、 標準偏差 σの幅を 0 . 1から 1 . 5の範囲で設定 する) ) 、 上記変動遺伝子の分布の形 (例えば、 中心 (例えば、 当該条件のときに、 中心 の幅を 0 . 4から 3の範囲で設定する) ) 、 上記変動遺伝子の検出基準 (例 えば、 全体数からみた検出された遺伝子の割合を、 2 3、 2 / 4 , 3 / 4、 3 / 6、 4 / 6などで設定する) 、 実験の繰り返し数、 および、 シミュレーション回数 (例えば、 3回から 1 0回の範囲で設定する) のうち少なくとも一つに関する情報 を含むシミュレーション条件を入力させ、 設定されたシミユレーション条件に従つ て、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 遺伝子検出を実行し、 発現遺伝子を検出するシミュレーションを複数回実行し、 上記検出手段による結果 の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 シミュレーション条件、 およ び検出感度と検出信頼度との関係を計算し、 発現量が変わる遺伝子の検定統計表を 作成し、 シミュレーション条件毎に、 シミュレーション実行によるシミュレ一ショ ン結果を出力するので、 様々な条件におけるシミュレーション結果を組み合わせる ことにより上記の組み合わせによる検出力と検出信頼度を知ることができる。 すな わち、 同じ条件の対照実験を繰り返して行い、 得られたそれぞれ異なったデータセ ットに対して変動遺伝子の検出を行い、 あらかじめ決めた回数以上検出される遺伝 子のみを選択することにより、 期待通りの信頼度あるいは検出力で変動遺伝子を検 出できるようになる。 According to this method, the user is given the form of the distribution of the gene that does not fluctuate (for example, the standard deviation of the distribution (for example, the standard deviation of the gene whose expression does not change is the standard normal distribution, σ = 1, center; u = Set the width of standard deviation σ in the range of 0.1 to 1.5 when 0 )), The shape of the distribution of the above-mentioned fluctuating gene (for example, the center (for example, the width of the center is set in the range of 0.4 to 3 under the conditions)), the detection criteria of the above fluctuating gene (for example, , Set the ratio of detected genes in terms of the total number to 23, 2/4, 3/4, 3/6, 4/6, etc.), the number of repeated experiments, and the number of simulations (for example, 3 (Set in the range of 10 times to 10 times), and input simulation conditions including information on at least one of them, and repeat from the same distribution for the same gene group according to the set simulation conditions. Generate, perform gene detection, execute the simulation to detect the expressed gene multiple times, calculate the false positive rate and false negative rate of the result of the above detection means, and repeat the experiment, simulation conditions, It calculates the relationship between detection sensitivity and detection reliability, creates a test statistical table for genes whose expression levels change, and outputs simulation results from simulation execution for each simulation condition. By combining the simulation results in, the detection power and detection reliability of the above combination can be known. That is, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only those genes that are detected more than a predetermined number of times. However, it becomes possible to detect a fluctuating gene with the expected reliability or power.

また、 これにより、 発現量が変わらない遺伝子が変動遺伝子として検出されたェ ラー (第一種のエラー) や、 変動遺伝子が発現が変わらない遺伝子として検出され たエラー (第二種のエラー) を算出して比較することにより、 シミュレーションの データから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、 実 際の実験データに対して、 期待される検出力と信頼度を得るために、 実験の繰り返 し数と変動遺伝子の検出基準、 および信頼限界点の組み合わせを設定することがで さる。  In addition, as a result, an error in which a gene whose expression level does not change is detected as a fluctuating gene (a first type of error) and an error in which a fluctuating gene is detected as a gene whose expression does not change (a second type of error) By calculating and comparing, the power and reliability of detecting the fluctuating gene by the above method can be grasped from the simulation data, and the expected power and reliability can be obtained for the actual experimental data. In addition, it is possible to set a combination of the number of repetitions of the experiment, the detection criteria for the fluctuating gene, and the confidence limit.

また、 これにより、 何回実験を行えば、 正確な実験データを取ることができるか を予測することが可能になり、 実験効率を著しく向上させることができるようにな る。 つぎの発明にかかる遺伝子発現情報解析方法は、 上記に記載の遺伝子発現情報解 析方法において、 上記遺伝子検出ステップは、 各スポッ トの偏差値を計算する偏差 値計算ステップをさらに含むことを特徴とする。 In addition, this makes it possible to predict how many experiments will be performed before accurate experimental data can be obtained, thereby significantly improving the experimental efficiency. The method for analyzing gene expression information according to the next invention is characterized in that, in the method for analyzing gene expression information described above, the gene detecting step further includes a deviation value calculating step for calculating a deviation value of each spot. I do.

これは遺伝子検出の一例を一層具体的に示すものである。 この方法によれば、 各 スポットの偏差値を計算するので、 このように計算された各スポッ トの偏差値を変 動比率 (倍率) の代わりに用いることで、 スライ ド間の誤差の差異に影響されない 解析が可能になる。  This more specifically shows an example of gene detection. According to this method, the deviation value of each spot is calculated, and the deviation value of each spot calculated in this way is used in place of the variation ratio (magnification), so that the difference in error between slides can be calculated. Unaffected analysis becomes possible.

また、 これにより、 本方法により計算される偏差値を、 クラスター解析に代表さ れる多変量解析において変動比率の対数や正規化した変動比率の変わりに用いるこ とができ、 発現量の大小による誤差の影響の違いに左右されない解析が可能になる c また、 本発明はプログラムに関するものであり、 本発明にかかるプログラムは、 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポットの測定輝 度データからバックグラウンド値を除去することによりバックグラウンド補正され た輝度データを作成するバックグラウンド捕正ステツプと、 上記バックグラウンド 補正ステップによりバックグラウンド補正された上記輝度データの対数を X— Y軸 にとり蛍光強度散布図を作成し、 各遺伝子のスポットについて蛍光強度平衡軸に対 するバイァスを求め、 上記輝度データから当該バイアスを除去することにより上記 蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図 を構築するバイアス補正ステップと、 上記バイァス補正ステップにより構築された 新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出 する遺伝子検出ステップとを含む遺伝子発現情報解析方法をコンピュータに実行さ せることを特徴とする。 In addition, in this way, the deviation value calculated by the present method can be used instead of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by a cluster analysis. the c becomes possible. does not depend analyzed the difference in effects, the present invention relates to a program, a program according to the present invention, each spot of the fluorescence intensity was measured showing the expression level of the same gene in two conditions A background correction step for creating background-corrected luminance data by removing the background value from the measured luminance data of step (a), and the logarithm of the luminance data subjected to the background correction in the background correction step is represented by X- Create a fluorescence intensity scatter plot on the Y axis, and plot the fluorescence intensity equilibrium axis for each gene spot. Bias correction to obtain a bias for constructing a new X-Y axis fluorescence intensity scatter diagram with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes by finding the bias for the bias data and removing the bias from the luminance data A gene expression information analysis method comprising the steps of: detecting a fluctuating gene whose expression level fluctuates based on a new XY-axis fluorescence intensity scatter diagram constructed by the above bias correction step; It is characterized by being executed.

このプログラムによれば、 D NAマイクロアレイや D NAチップなどにより 2つ の条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポッ卜の測定輝度デ ータからバックグラウンド値を除去することによりバックグラウンド補正された輝 度デ一タを作成する。 ここで、 個々のスポットの蛍光強度測定値からブランクのス ポッ卜の蛍光強度測定値の平均をバックグラウンド値として用いてもよく、 あるい は、 各スポッ卜の周囲の領域のブランクの蛍光強度測定値の平均 < (直をバックグラウ ンド値として用いてもよい。 また、 これ以外のいかなるプログラムによりバックグ ラウンド補正を行ってもよい。 According to this program, the background value is removed from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under two conditions using a DNA microarray or DNA chip. Creates brightness data with background correction. Here, the average of the measured values of the fluorescence intensity of the blank spots from the measured values of the fluorescence intensity of the individual spots may be used as the background value. Can be used as the background value, which is the average of the measured values of the fluorescence intensity of the blanks in the area around each spot. The background correction may be performed by any other program.

また、 本プログラムによれば、 バックグラウンド補正された輝度データの対数 ( 自然対数または 2の対数等) を X— Y軸にとり蛍光強度散布図 (スキヤッタープロ ット) を作成し、 各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平衡軸 に対するバイァスを求め、 輝度データから当該バイァスを除去することにより蛍光 強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構 築するので、 より多くのバイアスを含む蛍光成分の判定を行い、 このバイアスを除 去した上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新し 、直行軸系を構築 することができるようになる。  In addition, according to this program, the logarithm of the background-corrected luminance data (natural logarithm or logarithm of 2) is plotted on the X and Y axes, and a fluorescence intensity scatter plot (skutter plot) is created. The bias for the fluorescence intensity equilibrium axis, which shows the same fluorescence intensity for the spot, is determined, and the bias is removed from the luminance data to obtain a new X-Y axis fluorescence with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes. Since an intensity scatter diagram is constructed, the fluorescent component containing more bias is determined, and after removing this bias, a new orthogonal line is set with the fluorescent intensity equilibrium axis and the multiple axis of the expression level as two axes. The axis system can be constructed.

また、 本プログラムによれば、 構築された新たな X _ Y軸系の蛍光強度散布図に 基づいて発現量が変動した変動遺伝子を検出するので、 従来の遺伝子検出法に比べ て、 測定プログラム、 標本間の誤差、 および、 蛍光標識効率などの違いの影響を受 けずに正確に発現量が変動した遺伝子を検出することができるようになる。  In addition, according to this program, a fluctuating gene whose expression level fluctuates is detected based on the constructed fluorescence intensity scatter diagram of the new X_Y axis system. Genes whose expression levels fluctuate can be accurately detected without being affected by errors between samples and differences such as the efficiency of fluorescent labeling.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記バイ ァス補正ステツプは、 発現量が多!/、遺伝子集団の対数値を用いて主成分分析を実行 し、 第一主成分となる漸近線の傾きと切片を求める第一主成分作成ステツプと、 上 記第一主成分作成ステップにより求めた上記漸近線と X軸との角度を Θとし、 発現 量が少ない遺伝子集団の X— Υ軸系における座標を右に Θ角度回転した座標を計算 する座標回転ステップと、 上記座標回転ステップによる座標軸回転後の上記発現量 が少ない遺伝子集団の座標を用いて、 上記蛍光強度平衡軸の傾きを計算し、 計算さ れた傾きに基づいて 2つの条件の上記輝度データのうちどちらに上記バイァスが多 く含まれているかを判定するバイアス判定ステップと、 上記バイアス判定ステップ にて上記バイァスが多く含まれていると判定された条件の上記輝度データから上記 バイアスを差し引くことにより上記蛍光強度平衡軸と上記発現量の倍率軸を 2軸と する新たな Χ— Υ軸系の蛍光強度散布図を構築する補正プロット生成ステップとを さらに含むことを特徴とする。 A program according to the next invention is the program according to the above, wherein the bias correction step has a large expression level! / Performing principal component analysis using the logarithmic value of the gene population to obtain the first principal component creation step to find the slope and intercept of the asymptote as the first principal component, and the above first principal component creation step The angle between the asymptote and the X axis is Θ, and the coordinate rotation step for calculating the coordinates obtained by rotating the coordinates in the X-Υ axis system of the gene group with low expression level to the right by Θ angle, and the coordinate axis rotation by the coordinate rotation step Later, the inclination of the fluorescence intensity equilibrium axis is calculated using the coordinates of the gene group having a low expression level, and based on the calculated inclination, the bias is larger in either of the two conditions of the luminance data. A bias determining step of determining whether the bias is included; and subtracting the bias from the luminance data of the condition determined to include a large amount of the bias in the bias determining step. By a correction plot generation step of constructing a fluorescence intensity scatter plot of new chi Upsilon shafting to two axes the magnification axis of the fluorescence intensity equilibrium shaft and the expression level It is further characterized by including.

これはバイアス補正ステップの一例を一層具体的に示すものである。 このプログ ラムによれば、 D N A濃度希釈系列の品質管理用のコントロール遺伝子サンプル ( 例えば外部遺伝子え D N Aサンプル、 あるいは発現量がほとんど変わらないリポソ —ムなどの H o u s e - k e e p i n g遺伝子サンプル) を目的遺伝子サンプルと 同時に測定し、 蛍光強度データの積の一番小さい遺伝子から順に一つずつコント口 ール遺伝子を除き、 残りすベてのコントロール遺伝子サンプルのデータから遺伝子 の発現量と D N A量の検量線をそれぞれ作成し、 データの相関係数を計算し、 順番 に計算される上記の相関係数が最初に強い相関が認められる基準 (例えば 0 . 8以 上) を満たした場合のコントロールサンプルの二つの条件における蛍光強度データ の積を閾値 1とし、 二つの条件における蛍光強度データの積が閾値 1を上回るすべ ての遺伝子サンプノレの集団を発現量が多い遺伝子集団とし、 上記発現量が順番に計 算される相関係数度が最初に弱い相関が認められる基準 (例えば 0 . 5以上) を満 たした場合のコントロールサンプルの二つの条件における蛍光強度データの積を閾 値 2とし (ただし、 閾ィ直 2く閾値 1 ) 、 二つの条件における蛍光強度データの積が 閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少ない遺伝子集団とし、 発現量が多!/、遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、 第一主成 分となる漸近線の傾きと切片を求め、 求めた漸近線と X軸との角度を 0とし、 発現 量が少ない遺伝子集団の X— Y軸系における座標を右に Θ角度回転した座標を計算 し、 座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、 蛍光強度平衡軸の 傾きを計算し、 計算された傾き (例えば、 正、 負、 ゼロ等) に基づいて 2つの条件 の輝度データのうちどちらにバイアスが多く含まれているかを判定し、 バイアスが 多く含まれていると判定された条件の輝度データからバイアスを差し引くこと (例 えば、 一定のバイアスをもつ遺伝子集団について座標を回転させる等) により蛍光 強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構 築するので、 実測値のバイアスを効率的に除去し、 つ、 データの性質を明白に表 現できる蛍光強度散布図を作成することができるようになる。 なお、 本プロダラムは軸回転後にバイァスの大小を判定するものに限定されず、 例えば、 軸回転の前にも高発現漸近線と低発現漸近線の傾きを比較することにより, バイアスの大小を判定してもよい。 This shows one example of the bias correction step more specifically. According to this program, a control gene sample for quality control of a DNA concentration dilution series (for example, a DNA sample containing an external gene or a house-keeping gene sample such as a liposome whose expression level hardly changes) is used as a target gene sample. The control gene is removed one by one in order from the gene with the smallest product of the fluorescence intensity data, and the calibration curves of the gene expression level and DNA level are obtained from the data of all remaining control gene samples. The correlation coefficient of the data is calculated for each, and the two values of the control sample, which are calculated in order, when the above-mentioned correlation coefficient first satisfies the criterion (for example, 0.8 or more) that a strong correlation is recognized first. The product of the fluorescence intensity data under the conditions is set as the threshold 1, and the product of the fluorescence intensity data under the two conditions exceeds the threshold 1. When the population of the gene Sampnolle is a gene population with a high expression level, and the above-mentioned expression level is calculated in order, the degree of correlation coefficient satisfies the criterion (for example, 0.5 or more) at which a weak correlation is first recognized. The product of the fluorescence intensity data under the two conditions of the control sample is defined as the threshold value 2 (however, the threshold value is 2 and the threshold value is 1), and the population of all gene samples for which the product of the fluorescence intensity data under the two conditions is less than the threshold value 2 is Principal component analysis was performed using the logarithmic value of the fluorescence intensity of the gene population assuming that the gene population had a low expression level and the expression level was high, and the slope and intercept of the asymptote, which is the first main component, were calculated. The angle between the asymptote and the X axis is set to 0, and the coordinates of the gene group with low expression level are calculated by rotating the coordinates in the X-Y axis system to the right by an angle Θ, and the coordinates of the gene group with low expression level after rotation of the coordinate axis are calculated. Fluorescence intensity flat using Calculate the tilt of the axis and determine which of the two conditions of the luminance data contains more bias based on the calculated slope (eg, positive, negative, zero, etc.) By subtracting the bias from the luminance data of the condition determined to be present (for example, by rotating the coordinates for a gene population with a constant bias), the fluorescence intensity equilibrium axis and the expression level magnification axis become two axes. By constructing a fluorescent scatter plot of the X-Y axis system, it is possible to efficiently remove the bias of measured values and create a fluorescent scatter plot that can clearly express the nature of the data. become. Note that this program is not limited to the method of determining the magnitude of bias after axis rotation. For example, the magnitude of bias is determined by comparing the slope of the asymptote with high expression and the asymptote of low expression before axis rotation. May be.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記主成 分分析は、 分散'共分散行列を用いて行うことを特徴とする。  A program according to the next invention is the program according to the above, wherein the main component analysis is performed using a variance 'covariance matrix.

これは主成分分析の一例を一層具体的に示すものである。 このプログラムによれ ば、 主成分分析は、 分散'共分散行列を用いて行うので、 従来から発現遺伝子解析 に用いられている相関行列を用いた主成分分析法と比較して正規化を要しないため、 効率的に主成分分析を行うことができるようになる。  This more specifically shows an example of the principal component analysis. According to this program, principal component analysis is performed using a variance 'covariance matrix, so that normalization is not required compared to the principal component analysis method using a correlation matrix conventionally used for expression gene analysis. Therefore, the principal component analysis can be performed efficiently.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記遺伝 子検出ステップは、 上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設 定するウィンドウ設定ステツプと、 上記ウィンドゥ設定ステップにより設定された 各ウィンドゥ内において信頼限界点を決定する信頼限界点決定ステップと、 蛍光強 度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドウ移動ステップと、 上記ウィンドゥ移動ステップにより移動した各ウィンドウにつ!/、て上記信頼限界点 決定ステップにより各信頼限界点を求め、 求めた複数の信頼限界点に基づいて信頼 境界線を作成する信頼境界線作成ステップと、 上記信頼境界線作成ステップにより 作成された上記信頼境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子 として抽出する変動遺伝子抽出ステップとをさらに含むことを特徴とする。  The program according to the next invention is the program according to the above, wherein the gene detection step comprises a window setting step of setting a window within a predetermined section in the direction of the fluorescence intensity equilibrium axis, and a window setting step. A confidence limit point determining step of determining a confidence limit point in each set window; a window moving step of moving a window by a certain gene in the direction of the fluorescence intensity equilibrium axis; and a window moving step by the window moving step. ! / The respective confidence limit points are determined in the above-described confidence limit point determination step, and a confidence boundary line creation step of creating a confidence boundary line based on the determined plurality of confidence limit points is provided. A variable gene extracting step of extracting a gene located outside the confidence boundary line as a variable gene whose expression level has changed.

これは遺伝子検出ステップの一例を一層具体的に示すものである。 このプログラ ムによれば、 予め定めた区間内のウィンドウを設定し、 設定された各ウィンドウ内 において遺伝子の輝度データの平均値、 標準偏差、 P値 (例えば、 P=0. 05) 、 重心 などのうち少なくとも一つを用いて信頼限界点を決定する。 そして、 蛍光強度平衡 軸方向に一定遺伝子ずつウィンドウを移動し、 移動した各ウィンドウにつレ、て各信 頼限界点を求め、 求めた複数の信頼限界点に基づいて信頼境界線を作成する信頼境 界線作成ステップと、 上記信頼境界線作成ステップにより作成された上記信頼境界 線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、 安 定性、 再現性、 および、 信頼度の高い発現遺伝子抽出を行うことができるようにな る。 This more specifically shows an example of the gene detection step. According to this program, a window within a predetermined interval is set, and within each set window, the average value, standard deviation, P value (for example, P = 0.05), center of gravity, etc. of the luminance data of the gene are set. Is determined using at least one of the following. Then, a window is moved by a certain number of genes in the axial direction of the fluorescence intensity equilibrium, and each of the moved windows is searched for each trust limit point, and a confidence boundary is created based on the obtained plurality of trust limit points. Since the gene located outside the confidence boundary line created in the boundary line creation step and the confidence boundary line creation step is extracted as a fluctuating gene whose expression level has fluctuated, It will be possible to perform highly qualitative, reproducible, and reliable expression gene extraction.

また、 これにより、 誤差の範囲が異なる実験データであっても、 その誤差に応じ て、 発現量変動倍率の閾値が決められるようになる。  In addition, even in the case of experimental data having a different error range, the threshold value of the expression amount variation magnification can be determined according to the error.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記信頼 限界点決定ステップは、 シミュレ一ションにより得られた重複データの検定統計表 に基づき、 t一分布を用いて上記信頼限界点を決定することを特徴とする。  The program according to the next invention is the program according to the above, wherein the step of determining the confidence limit is performed by using the t-distribution based on a test statistical table of duplicate data obtained by simulation. It is characterized in that it is determined.

これは信頼限界点決定の一例を一層具体的に示すものである。 このプログラムに よれば、 シミュレーションにより得られた重複データの検定統計表に基づき、 t 一 分布を用いて信頼限界点を決定するので、 従来手法と比較して正確かつ効率的に信 頼限界点を求めることができるようになる。  This shows one example of the determination of the confidence limit point more specifically. According to this program, the confidence limit point is determined using the t-one distribution based on the test statistical table of the duplicate data obtained by simulation, so that the confidence limit point can be determined more accurately and efficiently than the conventional method. You can ask for it.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記信頼 境界線作成ステップは、 上記複数の信頼限界点に基づレ、てスプライン曲線を作成す ることにより平滑化を行い上記信頼境界線を作成することを特徴とする。  A program according to the next invention is the program according to the above, wherein the step of creating a confidence boundary line performs smoothing by creating a spline curve based on the plurality of confidence limit points, and performs the smoothing of the confidence boundary. The method is characterized in that a line is created.

これは信頼境界線作成の一例を一層具体的に示すものである。 このプログラムに よれば、 複数の信頼限界点に基づいてスプライン曲線を作成することにより平滑化 を行い信頼境界線を作成するので、 効率的に信頼限界点を補完して信頼曲線を作成 することができるようになる。  This more specifically shows an example of creating a confidence boundary. According to this program, smoothing is performed by creating a spline curve based on multiple confidence points and a confidence boundary is created, so that a confidence curve can be efficiently created by complementing confidence points. become able to.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記信頼 境界線作成ステップは、 蛍光強度の高い領域については、 最後の上記ウィンドウで 求めた信頼限界点の水平延長線を用いて上記信頼限界線を作成することを特徴とす る。  The program according to the next invention is the program according to the above, wherein the step of creating the confidence boundary line comprises, for a region having a high fluorescence intensity, using the horizontal extension line of the confidence limit point obtained in the last window. The feature is to create a limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。 このプログラムに よれば、 蛍光強度の高い領域については、 最後のウィンドウ (最も右側にあるウイ ンドウ) で求めた信頼限界点の X軸に対する水平延長線を用いて信頼限界線を作成 するので、 傾きが少なくどちらに収束する力判断不能の場合であっても、 適切な信 頼限界線を作成することができるようになる。 つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記信頼 境界線作成ステップは、 蛍光強度の低い領域については、 各ウィンドウで求めた信 頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線として用いる ことを特 ί数とする。 This more specifically shows an example of creating a confidence boundary. According to this program, in the area with high fluorescence intensity, the confidence limit line is created using the horizontal extension line to the X axis of the confidence limit point obtained in the last window (the window on the rightmost side). Therefore, even if it is not possible to judge which force converges to which one, the appropriate confidence limit line can be created. The program according to the following invention is the program according to the above, wherein the step of creating a confidence boundary line includes, for an area having a low fluorescence intensity, an asymptote obtained by a least square method from a trust limit point obtained in each window. It is a special feature that extrapolation is used as the above confidence limit line.

これは信頼境界線作成の一例を一層具体的に示すものである。 このプログラムに よれば、 蛍光強度の低い領域については、 例えば、 最初から数十程度の各ウィンド ゥで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼限界線 として用いるので、 蛍光強度が低い遺伝子のスポッ卜についても的確に検出するこ とができるようになる。  This more specifically shows an example of creating a confidence boundary. According to this program, for the region with low fluorescence intensity, for example, the extrapolation of the asymptote obtained by the least-squares method from the reliability limit points obtained in several windows ゥ from the beginning is used as the above-mentioned reliability limit line Therefore, it is possible to accurately detect even a gene spot having a low fluorescence intensity.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 利用者に ウィンドウ内の遺伝子数を入力させる遺伝子数入力ステップをさらに含み、 上記ゥ ィンドゥ設定ステップは、 上記遺伝子数入力ステップにより入力された上記遺伝子 数の上記遺伝子が含まれる上記区間内で上記ウィンドウを設定することを特徴とす る。  The program according to the next invention is the program according to the above, further comprising a gene number input step of allowing a user to input the number of genes in the window, wherein the window setting step is performed by the gene number input step. The window is set in the section in which the number of the genes is included.

これはウィンドウ設定の一例を一層具体的に示すものである。 このプログラムに よれば、 利用者にウィンドウ內の遺伝子数を入力させ、 入力された遺伝子数の遺伝 子が含まれる区間内でウィンドウを設定するので、 実験毎に利用者が設定する遺伝 子数を変動させることができるようになる。  This shows one example of the window setting more specifically. According to this program, the user is required to input the number of genes in window 、, and the window is set within the section including the genes of the input number of genes. It can be varied.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 利用者に 信頼限界値を入力させる信頼限界値入カステツプをさらに含み、 上記信頼限界点決 定ステップは、 上記ウィンドウ内において上記信頼限界値入力ステップにより入力 された上記信頼限界値に基づいて上記信頼限界点を決定することを特徴とする。 これは信頼限界点決定の一例を一層具体的に示すものである。 このプログラムに よれば、 利用者に信頼限界値を入力させ、 ウィンドウ内において入力された信頼限 界値に基づいて信頼限界点を決定するので、 実験毎に利用者が設定する信頼限界値 を変動させることができ、 各実験の誤差を適切な範囲に収めることができるように なる。 つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 利用者に, 上記変動しない遺伝子の分布の形、 上記変動遺伝子の分布の形、 上記変動遺伝子の 検出基準、 実験の繰り返し数、 および、 シミュレーション回数のうち少なくとも一 つに関する情報を含むシミュレーション条件を入力させるシミュレーション条件設 定ステップと、 上記シミュレーション条件設定ステップにて設定された上記シミュ レーション条件に従って、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 上記遺伝子検出手段を実行し、 上記発現遺伝子を検出するシミュレ一ションを複数 回実行し、 上記検出手段による結果の偽陽性率と偽陰性率を計算し、 実験の繰り返 し数、 上記シミュレーション条件、 および検出感度と検出信頼度との関係を計算し、 発現量が変わる遺伝子の検定統計表を作成するシミュレーション実行ステップと、 上記シミュレ一ション条件毎に、 上記シミュレ一ション実行ステツプによるシミュ レーション結果を出力するシミュレ一ション結果出力ステップとをさらに含むこと を特徴とする。 The program according to the next invention is the above-described program, further comprising a confidence limit value input step for allowing a user to input a confidence limit value, wherein the step of determining the confidence limit point comprises: The reliability limit point is determined based on the reliability limit value input in the input step. This shows one example of the determination of the confidence limit point more specifically. According to this program, the user is required to input the confidence limit, and the confidence limit is determined based on the confidence limit entered in the window, so the confidence limit set by the user varies for each experiment. It is possible to keep the error of each experiment within an appropriate range. The program according to the next invention is the program according to the above, wherein the user is provided with a form of the distribution of the gene that does not fluctuate, a form of the distribution of the fluctuating gene, a criterion for detecting the fluctuating gene, the number of repetitions of the experiment, and A simulation condition setting step for inputting a simulation condition including information on at least one of the number of simulations, and the same gene group is repeated from the same distribution according to the simulation condition set in the simulation condition setting step. The above-mentioned gene detecting means is executed, the simulation for detecting the expressed gene is executed plural times, the false positive rate and the false negative rate of the result by the detecting means are calculated, and the number of repetitions of the experiment is calculated. , The above simulation conditions, detection sensitivity and detection reliability And a simulation result output step of outputting a simulation result by the above simulation execution step for each of the above simulation conditions. And further comprising:

このプログラムによれば、 利用者に、 変動しない遺伝子の分布の形 (例えば、 分 布の標準偏差 (例えば、 発現が変わらない遺伝子の分布を標準正規分布として標準 偏差 σ = 1、 中心 μ = 0としたときに、 標準偏差 σの幅を 0 . 1から 1 . 5の範囲 で設定する) ) 、 上記変動遺伝子の分布の形 (例えば、 中心 (例えば、 当該条件の ときに、 中心; uの幅を 0 . 4から 3の範囲で設定する) ) 、 上記変動遺伝子の検出 基準 (例えば、 全体数からみた検出された遺伝子の割合を、 2ノ 3、 2 / 4 , 3 / 4、 3 / 6、 4 / 6などで設定する) 、 実験の繰り返し数、 および、 シミュレーシ ヨン回数 (例えば、 3回から 1 0回の範囲で設定する) のうち少なくとも一つに関 する情報を含むシミュレーシヨン条件を入力させ、 設定されたシミュレーション条 件に従って、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 遺伝子検出 を実行し、 発現遺伝子を検出するシミュレーションを複数回実行し、 上記検出手段 による結果の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 シミュレーション 条件、 および検出感度と検出信頼度との関係を計算し、 発現量が変わる遺伝子の検 定統計表を作成し、 シミュレーション条件毎に、 シミュレーション実行によるシミ ュレーション結果を出力するので、 様々な条件におけるシミュレーシヨン結果を組 み合わせることにより上記の組み合わせによる検出力と検出信頼度を知ることがで きる。 すなわち、 同じ条件の対照実験を繰り返して行い、 得られたそれぞれ異なつ たデータセットに対して変動遺伝子の検出を行い、 あらかじめ決めた回数以上検出 される遺伝子のみを選択することにより、 期待通りの信頼度あるいは検出力で変動 遺伝子を検出できるようになる。 According to this program, the user is given the form of the distribution of the genes that do not fluctuate (for example, the standard deviation of the distribution (for example, the standard deviation of the distribution of genes whose expression does not change is the standard normal distribution, σ = 1, center μ = 0). , The width of the standard deviation σ is set in the range of 0.1 to 1.5))), the shape of the distribution of the fluctuating gene (for example, the center (for example, the center at the time of the condition, the center; u The width is set in the range of 0.4 to 3))), the above-mentioned detection criteria of the fluctuating gene (for example, the ratio of the detected genes in terms of the total number is 2/3, 2/4, 3/4, 3 / 6, 4/6, etc.), the number of repetitions of the experiment, and the number of simulations (for example, set from 3 to 10 times). Enter the conditions and set the simulation According to the case, the same gene group is repeatedly generated from the same distribution, the gene detection is executed, the simulation for detecting the expressed gene is executed multiple times, and the false positive rate and false negative rate of the result obtained by the above detection means are calculated. Calculate and calculate the number of repetitions of the experiment, simulation conditions, and the relationship between detection sensitivity and detection reliability, create a test statistical table for genes whose expression levels change, and use the simulation Since the simulation results are output, it is possible to know the detection power and the detection reliability by the above combinations by combining the simulation results under various conditions. In other words, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene for each of the obtained different data sets, and selecting only genes that are detected more than a predetermined number of times, the expected results are obtained. Fluctuating genes can be detected with reliability or power.

また、 これにより、 発現量が変わらない遺伝子が変動遺伝子として検出されたェ ラー (第一種のエラー) や、 変動遺伝子が発現が変わらない遺伝子として検出され たエラー (第二種のエラー) を算出して比較することにより、 シミュレーションの データから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、 実 際の実験データに対して、 期待される検出力と信頼度を得るために、 実験の繰り返 し数と変動遺伝子の検出基準、 および信頼限界点の組み合わせを設定することがで さる。  In addition, as a result, an error in which a gene whose expression level does not change is detected as a fluctuating gene (a first type of error) and an error in which a fluctuating gene is detected as a gene whose expression does not change (a second type of error) By calculating and comparing, the power and reliability of detecting the fluctuating gene by the above method can be grasped from the simulation data, and the expected power and reliability can be obtained for the actual experimental data. In addition, it is possible to set a combination of the number of repetitions of the experiment, the detection criteria for the fluctuating gene, and the confidence limit.

また、 これにより、 何回実験を行えば、 正確な実験データを取ることができるか を予測することが可能になり、 実験効率を著しく向上させることができるようにな る。  In addition, this makes it possible to predict how many experiments will be performed before accurate experimental data can be obtained, thereby significantly improving the experimental efficiency.

つぎの発明にかかるプログラムは、 上記に記載のプログラムにおいて、 上記遺伝 子検出ステップは、 各スポッ卜の偏差値を計算する偏差値計算ステップをさらに含 むことを特徴とする。  The program according to the next invention is the program described above, wherein the gene detecting step further includes a deviation value calculating step of calculating a deviation value of each spot.

これは遺伝子検出の一例を一層具体的に示すものである。 このプログラムによれ ば、 各スポッ トの偏差値を計算するので、 このように計算された各スポットの偏差 値を変動比率 (倍率) の代わりに用いることで、 スライ ド間の誤差の差異に影響さ れない解析が可能になる。  This more specifically shows an example of gene detection. According to this program, the deviation value of each spot is calculated. By using the deviation value of each spot calculated in this way instead of the fluctuation ratio (magnification), the difference in error between slides is affected. Unresolved analysis becomes possible.

また、 これにより、 本プログラムにより計算される偏差値を、 クラスター解析に 代表される多変量解析において変動比率の対数や正規ィヒした変動比率の変わりに用 いることができ、 発現量の大小による誤差の影響の違レヽに左右されなレ、解析が可能 になる。 また、 本発明は記録媒体に関するものであり、 本発明にかかる記録媒体は、 上記 に記載されたプログラムを記録したことを特徴とする。 In addition, the deviation value calculated by this program can be used instead of the logarithm of the fluctuation ratio or the normal fluctuation ratio in a multivariate analysis represented by a cluster analysis. Analysis that is not affected by differences in the effects of errors becomes possible. In addition, the present invention relates to a recording medium, and the recording medium according to the present invention is characterized by recording the program described above.

この記録媒体によれば、 当該記録媒体に記録されたプログラムをコンピュータに 読み取らせて実行することによって、 上記に記載されたプログラムをコンピュータ を利用して実現することができ、 これら各方法と同様の効果を得ることができる。 図面の簡単な説明  According to this recording medium, the program described above can be realized using a computer by causing a computer to read and execute the program recorded on the recording medium. The effect can be obtained. BRIEF DESCRIPTION OF THE FIGURES

第 1図は、 本発明による分散 ·共分散行列を用いた主成分分析の概念を示す図で あり、 第 2図は、 本発明による新しい座標系での漸近線を求める処理の概念を示す 図であり、 第 3図は、 本発明による分布図の再構築を概念的に示す図であり、 第 4 図は、 本発明による発現倍率の混合正規分布モデルを示す図であり、 第 5図は、 本 発明による発現倍率の混合正規分布モデノレを示す図であり、 第 6図は、 本発明によ る発現倍率の混合正規分布モデルを示す図であり、 第 7図は、 本発明によるシミュ レーシヨンによる第一種の検出ェラ一の計算結果の一例を示した図であり、 第 8図 は、 本発明によるシミュレーションによる第一種の検出エラーの計算結果の一例を 示した図であり、 第 9図は、 本発明によるシミュレーションによる第一種の検出ェ ラーの計算結果の一例を示した図であり、 第 1 0図は、 本発明によるシミュレーシ ヨンによる第一種の検出エラーの計算結果の一例を示した図であり、 第 1 1図は、 本発明による発現変動信頼曲線の作成を概念的に示した図であり、 第 1 2図は、 本 発明による発現変動信頼曲線 作成を概念的に示した図であり、 第 1 3図は、 本発 明による発現変動信頼曲線の作成を概念的に示した図であり、 第 1 4図は、 本発明 による発現変動信頼曲線の作成を概念的に示した図であり、 第 1 5図は、 本実施形 態の本装置のメイン処理を示すフローチャートであり、 第 1 6図は、 本実施形態の 本装置のバックグラウンド補正処理の一例を示すフローチャートであり、 第 1 7図 は、 本実施形態の本装置のバイアス補正処理の一例を示すフローチャートであり、 第 1 8図は、 本実施形態の本装置の遺伝子検出処理の一例を示すフローチャートで あり、 第 1 9図は、 本実施形態の本装置のシミュレーション処理の一例を示すフロ 一チャートであり、 第 2 0図は、 ウィンドウ設定部 1 0 2 iの処理により、 出力装 置 1 1 4に出力される遺伝子抽出条件設定画面の一例を示す図であり、 第 2 1図は、 シミュレーション条件設定部 1 0 2 rの処理により、 出力装置 1 1 4に出力される シミュレーション条件設定画面の一例を示す図であり、 第 2 2図は、 本発明が適用 される本装置の構成の一例を示すプロック図であり、 第 2 3図は、 バイァス補正部 1 0 2 bの構成の一例を示すプロック図であり、 第 2 4図は、 遺伝子検出部 1 0 2 cの構成の一例を示すブロック図であり、 第 2 5図は、 シミュレーション部 1 0 2 dの構成の一例を示すプロック図であり、 第 2 6図は、 本実施形態の本装置の偏差 値を用いた遺伝子検出処理の一例を示すフローチャートであり、 第 2 7図は、 本実 施形態の本装置の偏差値の計算を示す概念図であり、 第 2 8図は、 本実施形態の本 装置のバイアス判定処理の一例を示す概念図である。 発明を実施するための最良の形態 FIG. 1 is a diagram illustrating a concept of principal component analysis using a variance / covariance matrix according to the present invention, and FIG. 2 is a diagram illustrating a concept of a process of obtaining an asymptote in a new coordinate system according to the present invention. FIG. 3 is a diagram conceptually illustrating reconstruction of a distribution map according to the present invention, FIG. 4 is a diagram illustrating a mixed normal distribution model of the expression ratio according to the present invention, and FIG. FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention, FIG. 6 is a diagram showing a mixed normal distribution model of the expression ratio according to the present invention, and FIG. 7 is a simulation according to the present invention. FIG. 8 is a diagram showing an example of a calculation result of a first type detection error according to the present invention. FIG. 8 is a diagram showing an example of a calculation result of a first type detection error by a simulation according to the present invention. Fig. 9 shows the first FIG. 10 is a diagram illustrating an example of a calculation result of a type of detection error, and FIG. 10 is a diagram illustrating an example of a calculation result of a type 1 detection error by a simulation according to the present invention. FIG. 1 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 12 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 1 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 14 is a diagram conceptually illustrating the creation of an expression variation reliability curve according to the present invention. FIG. 5 is a flowchart showing a main process of the present apparatus of the present embodiment. FIG. 16 is a flowchart showing an example of a background correction process of the present apparatus of the present embodiment. A flowchart showing an example of a bias correction process of the present apparatus of the present embodiment. An over preparative furo first 8 figure is a flow chart showing an example of the gene detection processing of the apparatus of the present embodiment, the first 9 figure showing an example of the simulation process of the device of the present embodiment FIG. 20 is a diagram showing an example of a gene extraction condition setting screen output to the output device 114 by the processing of the window setting unit 102 i; FIG. FIG. 12 is a diagram showing an example of a simulation condition setting screen output to the output device 114 by the processing of the simulation condition setting unit 102 r. FIG. 22 shows the configuration of the present device to which the present invention is applied. FIG. 23 is a block diagram showing an example of a configuration of a bias correction unit 102 b, and FIG. 24 is an example of a configuration of a gene detection unit 102 c. FIG. 25 is a block diagram showing an example of the configuration of the simulation unit 102 d. FIG. 26 is a block diagram showing gene detection using the deviation value of the apparatus of the present embodiment. FIG. 27 is a flowchart showing an example of the processing. Is a conceptual diagram showing the calculation of the deviation of location, the second 8 is a conceptual diagram showing an example of a bias determination processing of the apparatus of the present embodiment. BEST MODE FOR CARRYING OUT THE INVENTION

以下に、 本発明にかかる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プ ログラム、 および、 記録媒体の実施の形態を図面に基づいて詳細に説明する。 尚、 この実施の形態によりこの発明が限定されるものではない。  Hereinafter, embodiments of a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium according to the present invention will be described in detail with reference to the drawings. The present invention is not limited by the embodiment.

[本装置の概要]  [Overview of this device]

以下、 本装置の基本概念を説明し、 その後、 本発明の各実施例における本装置の 構成、 処理等について詳細に説明する。  Hereinafter, the basic concept of the present apparatus will be described, and then the configuration, processing, and the like of the present apparatus in each embodiment of the present invention will be described in detail.

[本装置の基本概念]  [Basic concept of this device]

以下、 図 1〜図 6および図 1 1〜図 1 4を用いて本発明の基本概念について説明 する。  Hereinafter, the basic concept of the present invention will be described with reference to FIGS. 1 to 6 and FIGS. 11 to 14.

1 . 対照蛍光測定値の 2段階データ捕正  1. Two-step data collection of control fluorescence measurement

D N Aマイクロアレイ、 または、 D N Aチップを用いた発現遺伝子の測定では、 各遺伝子の発現量は、 各遺伝子に対応する蛍光測定値の輝度に反映され、 各遺伝子 の発現量比は、 対照蛍光測定値との比率として観測される。 しかし、 D NAマイク ロアレイや D NAチップの誤差、 蛍光標識反応の誤差、 測定誤差、 蛍光物質のモノレ 蛍光係数の違いなどにより、 蛍光測定値の比率そのままでは正確に発現量の比を反 映しない。 そこで、 本発明では、 これらの誤差を処理するため以下の処理を行う。In the measurement of expressed genes using a DNA microarray or DNA chip, the expression level of each gene is reflected in the luminance of the fluorescence measurement value corresponding to each gene, and the expression level ratio of each gene is compared to the control fluorescence measurement value. Is observed as a ratio of However, errors in DNA microarrays and DNA chips, errors in fluorescent labeling reactions, measurement errors, and Due to differences in the fluorescence coefficient, etc., the ratio of the expression level is not accurately reflected as it is in the ratio of the measured fluorescence values. Therefore, in the present invention, the following processing is performed to process these errors.

(1) バックグラウンド補正 (1) Background correction

第一段階のデータ補正として、 バックグラウンド補正を行なう。 まず、 遺伝子 i の二つの条件で測定された輝度を (aい b;) とし、 各遺伝子の輝度からバックグ ラウンド (BKGa ^ BKGb i) を差し引く。 この修正結果 (a j-BKGa ,, b;-BKGb;) を (A Β!) とする。  Background correction is performed as the first stage of data correction. First, let the brightness measured under the two conditions of gene i be (a or b;), and subtract the background (BKGa ^ BKGb i) from the brightness of each gene. The correction result (a j-BKGa ,, b; -BKGb;) is defined as (AΒ!).

(2) バイアス補正  (2) bias correction

次に、 第二段階のデータ補正として、 以下の手順によりバイアスの補正を行なう, まず、 本発明のバイアス補正の概要を説明する。 DN A濃度希釈系列の品質管理用 のコント口ール遺伝子サンプノレ (例えば外部遺伝子え D N Aサンプル、 あるいは発 現量がほとんど変わらないリボソームなどの Ho u s e— k e e p i n g遺伝子サ ンプル) を目的遺伝子サンプルと同時に測定し、 蛍光強度データの積の一番小さい 遺伝子から順に一つずつコントロール遺伝子を除き、 残りすベてのコントロール遺 伝子サンプルのデータから遺伝子の発現量と DN A量の検量線をそれぞれ作成し、 データの相関係数を計算し、 順番に計算される上記の相関係数が最初に強い相関が 認められる基準 (例えば 0. 8以上) を満たした場合のコントロールサンプルの二 つの条件における蛍光強度データの積を閾値 1とし、 二つの条件における蛍光強度 データの積が閾値 1を上回るすべての遺伝子サンプルの集団を発現量が多い遺伝子 集団とし、 上記発現量が順番に計算される相関係数度が最初に弱い相関が認められ る基準 (例えば 0. 5以上) を満たした場合のコントロールサンプノレの二つの条件 における蛍光強度データの積を閾値 2とし (ただし、 閾値 2<闞値 1) 、 二つの条 件における蛍光強度データの積が閾値 2を下回るすべての遺伝子サンプノレの集団を 発現量が少ない遺伝子集団とし、 発現量が多い遺伝子集団の蛍光強度対数値を用い て主成分分析を実行し、 第一主成分となる漸近線の傾きと切片を求め、 求めた漸近 線と X軸との角度を 0とし、 発現量が少ない遺伝子集団の X— Y軸系における座標 を右に 0角度回転した座標を計算し、 座標軸回転後の発現量が少ない遺伝子集団の 座標を用いて、 蛍光強度平衡軸の傾きを計算し、 計算された傾き (例えば、 正、 負、 ゼロ等) に基づいて 2つの条件の輝度データのうちどちらにバイァスが多く含まれ ているかを判定し、 バイアスが多く含まれていると判定された条件の輝度データか らバイアスを差し引くこと (例えば、 一定のバイアスをもつ遺伝子集団について座 標を回転させる等) により蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X — Y軸系の蛍光強度散布図を構築するので、 実測値のバイアスを効率的に除去し、 つ、 データの性質を明白に表現できる蛍光強度散布図を作成することができるよ うになる。 Next, as the data correction in the second stage, the bias is corrected by the following procedure. First, the outline of the bias correction of the present invention will be described. Simultaneous measurement of control gene samples (such as DNA samples with external genes or House-keeping gene samples such as ribosomes with almost unchanged expression) for quality control of DNA concentration dilution series simultaneously with target gene samples Then, the control gene was removed one by one from the gene with the smallest product of the fluorescence intensity data, and calibration curves for the gene expression level and DNA level were created from the data of all remaining control gene samples. Calculate the correlation coefficient of the data, and calculate the fluorescence intensities under the two conditions of the control sample when the above-mentioned correlation coefficient first satisfies the criterion (eg, 0.8 or more) that a strong correlation is first recognized. The product of the data is defined as threshold 1, and the population of all gene samples whose fluorescence intensity data under the two conditions exceeds threshold 1 is generated. The two conditions for the control sample when the correlation coefficient for which the expression level is calculated in the above order and the correlation coefficient degree satisfies the criterion (for example, 0.5 or more) at which a weak correlation is first recognized. The product of the fluorescence intensity data at step 2 is defined as threshold 2 (threshold 2 <闞 value 1), and the population of all gene samples whose fluorescence product under the two conditions is less than threshold 2 is defined as the gene group with low expression level. The principal component analysis was performed using the logarithmic fluorescence intensity of the gene population with a high expression level, and the slope and intercept of the asymptote, which was the first principal component, were determined.The angle between the obtained asymptote and the X axis was set to 0. The coordinates of the gene group with low expression level are calculated by rotating the coordinates in the X-Y axis system to the right by 0 degrees, and the gene group with low expression level after rotation of the coordinate axes is calculated. Using the coordinates, the slope of the fluorescence intensity equilibrium axis is calculated, and based on the calculated slope (eg, positive, negative, zero, etc.), it is determined which of the two conditions' luminance data contains more bias. The fluorescence intensity equilibrium axis and the expression level are determined by subtracting the bias from the luminance data under the conditions determined to contain a large amount of bias (for example, by rotating the coordinates for a gene population having a constant bias). A new X-Y fluorescence intensity scatter plot is constructed with two magnification axes, so the bias of the measured values can be efficiently removed and the characteristics of the data can be clearly expressed. Can be created.

以下にバイアス補正手順の一例を詳細に説明する。 Hereinafter, an example of the bias correction procedure will be described in detail.

i ) 対照蛍光測定値の一般関係式  i) General relational expression for control fluorescence measurement

本発明によるバイアス kの補正は、 蛍光測定値 Aと Bの関係を表す一般式 (1) あるいは (1 ' ) に基づく。  The correction of the bias k according to the present invention is based on the general formula (1) or (1 ′) representing the relationship between the fluorescence measurement values A and B.

Lo g2B = a Lo g2 (A-k) +b (1) Lo g 2 B = a Lo g 2 (Ak) + b (1)

L o g 2 (B-k) = a L o g A+ b ( 1 ' ) ここで、 a, b, kは未知のパラメータ定数である。 Aと Bのうち、 より多くの バイアスを含む方から、 平均バイアス kを差し引く。 すなわち、 Aのバックグラウ ンドのノイズが Bより大きく、 多くのバイアスを含んでいる場合には、 式 1を用い ることになり、 一方、 Bのバックグラウンドのノイズが Aより大きく、 多くのバイ ァスを含んでいる場合には、 式 1 ' を用いることになる。 a、 b、 および、 kは ( Lo g 2A-L o g 2B) の直交軸系の蛍光測定 のプロット図から推測する。 L og 2 (Bk) = a L og A + b (1 ′) where a, b, and k are unknown parameter constants. Subtract the average bias k from A and B, which contain more bias. That is, if the background noise of A is larger than B and includes many biases, Equation 1 is used, while the background noise of B is larger than A and many biases are used. If it does, use Equation 1 '. a, b, and k are inferred from the plots of the fluorescence measurements of the orthogonal axis system of (Log 2 AL og 2 B).

i i) 分散'共分散行列を用いた主成分分析による蛍光強度平衡軸の抽出  i i) Extraction of fluorescence intensity equilibrium axis by principal component analysis using 'variance' covariance matrix

発現量が同じであれば、 D N Aマイクロアレイや D N Aチップの対照実験の蛍光 測定値は、 理論的には (L o g 2A— L o g 2B) 直交軸系の蛍光強度散布図上にお いて 1 : 1を示す直線 Lo g2A=Lo g2B上に位置するはずである。 し力 し、 蛍 光物質の性質の違い、 実験条件の違い等の原因で、 同じ蛍光強度を示す蛍光強度平 衡軸 (すなわち、 各遺伝子のスポットについて、 2つの条件で発現量が同等である 遺伝子集団より得られた漸近線) が Lo g A=L o g2Bに従わないことがある。 この場合、 調べる遺伝子数は標本として十分 (例えば、 千以上) であり、 また、 発 現量が変化する遺伝子である変動遺伝子の数は全体数に対して低い割合であること を前提として、 蛍光強度平衡軸は (Lo gzAi, Lo g2B i) 集団の漸近線である と仮定する。 If the expression level is the same, fluorescence measurements of control experiments DNA microarrays and DNA chips, to theoretically have you on the diagram the fluorescence intensity scatter of (L og 2 A- L og 2 B) orthogonal axis system 1 : Should be located on the straight line Lo g 2 A = Lo g 2 B indicating 1. However, due to differences in the properties of the fluorescent substances, differences in experimental conditions, etc. 衡軸(i.e., the spots of each gene, the amount expressed in two conditions obtained from gene cluster is equivalent asymptote) may not follow the Lo g A = L og 2 B . In this case, assuming that the number of genes to be examined is sufficient as a sample (for example, a thousand or more), and that the number of fluctuating genes whose The intensity equilibrium axis is assumed to be the asymptote of the (Lo gzAi, Log 2 B i) population.

ここで、 Aiと Biが kよりはるかに大きい値の場合、 つまりバイアスの影響が少 なく無視できる場合、 式 1と式 1 ' は し o g.,B=a Lo ,A+ b (2) に近似できる。 このとき、 傾き aと切片 bを求めるために、 分散 ·共分散行列を用 いた主成分分析を行なう。 尚、 分散 ·共分散行列を用いた主成分分析は、 従来から 遺伝子の解析で使われている相関行列を用いた主成分分析法と異なり、 正規化を要 しない。  Here, if Ai and Bi are much larger than k, that is, if the effect of the bias is negligible, Equation 1 and Equation 1 'can be rewritten as o g., B = a Lo, A + b (2) Can be approximated. At this time, principal component analysis using a variance / covariance matrix is performed to obtain the slope a and the intercept b. Principal component analysis using a variance / covariance matrix does not require normalization, unlike principal component analysis using a correlation matrix, which has been conventionally used in gene analysis.

ここで、 図 1は分散 ·共分散行列を用いた主成分分析の概念を示す図である。 L 082 を に、 L o g2Bを yに簡略化すると、 漸近線を表す式 2は、 Here, FIG. 1 is a diagram showing the concept of principal component analysis using a variance / covariance matrix. Simplifying L 08 2 to, and L og 2 B to y, Equation 2 representing the asymptote is

V; a X + b (3) となる。 V; a X + b (3)

従って、 各点 (X i, y;) から漸近線までの距離 d ;は、

Figure imgf000033_0001
により求められる。 Therefore, the distance d ; from each point ( X i , y;) to the asymptote is
Figure imgf000033_0001
Required by

また、 全ての点から漸近線までの距離 Dは、

Figure imgf000034_0001
となる。 The distance D from all points to the asymptote is
Figure imgf000034_0001
Becomes

ここで、 距離 Dが最小となる場合 分布図上で最も適切となる漸近線のパラメ ータ aと bが決められる。  Here, when the distance D is the minimum, the parameters a and b of the asymptote that are most appropriate on the distribution map are determined.

距離 Dが最小の場合には、  When the distance D is minimum,

Figure imgf000034_0002
Figure imgf000034_0002

の二つの条件を満たす。 Satisfies the two conditions.

また、 式 6より、 b = Y-aX (8) となる。 ただし、 Fは の平均、 は X iの平均を意味する c From Equation 6, b = Y-aX (8). However, F is the average of the means the average of X i c

また、 式 7より、

Figure imgf000034_0003
となる。 ただし、 式 9で、 aは二つの解のうち、 ゼロより大きいものとする。 また、 3 ま ;の分散、 3 ま 1の分散、 Sxyは X iと y ;の共分散を意味する。 実際の 補正では、 aと bは積 A ; B iの上位遺伝子集団 (Lo g2A, L o g2B) を用いる c 簡単な計算法としては全遺伝子の積 A ; B iの上位 (例として 70 %) の遺伝子集団 を用いて求める。 正確求めるには、 DN A濃度希釈系列の品質管理用のコントロー ル遺伝子サンプノレ (例えば外部遺伝子え DN Aサンプル、 あるいは発現量がほとん ど変わらないリボソームなどの Ho u s e— k e e p i n g遺伝子サンプル) を目 的遺伝子サンプルと同時に測定し、 蛍光強度データの積の一番小さい遺伝子から順 に一^ 3ずつコントロール遺伝子を除き、 残りすベてのコントロール遺伝子サンプノレ のデータから遺伝子の発現量と DN A量の検量線をそれぞれ作成し、 データの相関 係数を計算し、 順番に計算される上記の相関係数が最初に強い相関が認められる基 準 (例えば 0. 8以上) を満たした場合のコントロールサンプルの二つの条件にお ける蛍光強度データの積を閾値 1とし、 二つの条件における蛍光強度データの積が 閾値 1を上回るすべての遺伝子サンプルの集団を発現量が多レ、遺伝子集団とする。 i i i) バイアスの修正 Also, from equation 7,
Figure imgf000034_0003
Becomes Where, in Equation 9, a is greater than zero among the two solutions. Also, 3 or; the dispersion, 3 or 1 of dispersion, S xy is X i and y; means a covariance. In actual correction, a and b are the product A; B-level gene cluster (Lo g 2 A, L og 2 B) of the i of c A simple calculation method all genes using the product A; higher B i (eg 70%). To obtain an accurate determination, use a control for quality control of the DNA concentration dilution series. The same gene sample as the target gene sample is measured simultaneously with the target gene sample, and the gene with the smallest product is measured. The control genes are removed one by three in order, and calibration curves for the gene expression level and DNA level are created from the remaining control gene sample data, and the correlation coefficient of the data is calculated and calculated in order. When the above correlation coefficient first satisfies the criterion (eg, 0.8 or more) for which a strong correlation is initially recognized, the product of the fluorescence intensity data under the two conditions of the control sample is defined as the threshold 1, and the two The population of all the gene samples whose product of the fluorescence intensity data under the condition exceeds the threshold 1 is defined as the gene population whose expression level is high. iii) Correction of bias

(L o g2A-L o g2B) の直交軸系では、 Aはバックグラウンドのノイズが大 きく、 Bより多くのバイアスを含んでいる場合、 漸近線と Lo g2A軸との交わる 点の座標は (Ac, 0) とすると、 式 1より a L o g2 (2 " A -k) +b = 0 (10) となり、 k = 2 A, — 2— b/a (1 1) となる。 In the orthogonal system of (L og 2 AL og 2 B), A is the coordinate of the point where the asymptote intersects with the Log 2 A axis if the background noise is large and contains more bias than B If (A c , 0), then from Equation 1, a L og 2 (2 "A -k) + b = 0 (10) and k = 2 A, — 2 — b / a (1 1) .

また、 Bはバックグラウンドのノイズが大きく、 Aより多くのバイアスを含んで いる場合、 漸近線と Lo g2B軸との交わる点の座標は (0, Bc) とすると、 式 1 ' より、 a L o g 2 (2 " B„— K) =b (12) となり、 k = 2 " Bc- 2b/a (13) となる。 Also, if B has a large background noise and contains more bias than A, then the coordinates of the point where the asymptote intersects the Log 2 B axis are (0, B c ). , A L og 2 (2 "B„ — K) = b (12) And k = 2 "B c -2 b / a (13).

ここで、 a, bはすでに求められているため、 Acと Beはそれぞれ (L o g2A — Lo g2B) の直交軸系の積 A iB ;の下位遺伝子集団 (Lo g2A, L o g2B) カ ら求められた漸近線と L o g2A軸、 あるいは、 L o g 2B軸の交差点の値として 求められる。 蛍光測定値の小さい遺伝子は、 誤差の強い影響を受けるため、 積 A; B ;の下位遺伝子集団 (L o g 2A, L o g2B) の漸近線の計算に使われる遺伝子 は, 簡単な計算法は全遺伝子の積 A i B;の下位 (例として 10%) を用いる。 正確 に求めるには、 DNA濃度希釈系列の品質管理用のコントロール遺伝子サンプノレ ( 例えば外部遺伝子 λ D Ν Αサンプル、 あるいは発現量がほとんど変わらないリボソ ームなどの Ho u s e— k e e p i n g遺伝子サンプル) を目的遺伝子サンプノレと 同時に測定し、 蛍光強度データの積の一番小さい遺伝子から順に一つずつコント口 —ノレ遺伝子を除き、 残りすベてのコントロール遺伝子サンプルのデータから遺伝子 の発現量と DN A量の検量線をそれぞれ作成し、 データの相関係数を計算し、 順番 に計算される上記の相関係数が最初に弱い相関が認められる基準 (例えば 0. 5以 上) を満たした場合のコントロールサンプルの二つの条件における蛍光強度データ の積を閾値 2とし (ただし、 閾値 2<閾値 1) 、 二つの条件における蛍光強度デー タの積が閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少ない遺伝子集 団とする。 Here, a, b because it has already been determined, A c and B e are each - product of orthogonal axes system (L og 2 A Lo g 2 B) A iB; lower gene cluster (Lo g 2 A , L og 2 B) It is obtained as the value of the intersection of the asymptote obtained from f and the L og 2 A-axis or the L og 2 B-axis. Since genes with small fluorescence measurements are strongly affected by errors, the genes used to calculate the asymptote of the lower gene population (Log 2 A, Log 2 B) of the product A; B ; The method uses the lower (for example, 10%) of the product of all genes A i B; To obtain it accurately, a control gene for control of the quality of a DNA concentration dilution series, such as a control gene for sample control (for example, a sample of an external gene λD Ho 、, or a House-keeping gene sample such as a ribosome whose expression level hardly changes) is used as the target gene. Measured at the same time as Sampnore, one control at a time starting with the gene with the smallest product of the fluorescence intensity data.- Excluding the Nore gene, calibration of the gene expression level and DNA amount from the data of all remaining control gene samples Create a line for each, calculate the correlation coefficient of the data, and calculate the correlation coefficient of the control sample if the above correlation coefficient, which is calculated in order, first meets the criteria for weak correlation (for example, 0.5 or more). The product of the fluorescence intensity data under the two conditions is defined as threshold 2 (where threshold 2 <threshold 1), and the fluorescence intensity data under the two conditions is used. Product expression levels a population of all the genetic sample below the threshold 2 is less genetic Group The.

また、 測定ィ直 (Α,) と (Bi) とのどちらがより大きいバイアスを含むことを判 断するには、 漸近線が L o g2A軸と Lo g 2B軸のどちらかに交差することにより 判断できる。 このとき、 切片 =BC. (Bがより多くのバイアスを含む場合) (14) あるいは、 切片 = AC. (Aがより多くのバイアスを含む場合) (14' ) となる。 Also, to determine which of the measurement lines (と,) and (Bi) contains a larger bias, the asymptote must intersect either the Log 2 A axis or the Log 2 B axis. Can be determined by Then the intercept = B C. (When B contains more bias) (14) Or intercept = A C. (When A contains more bias) (14 ').

i V) バイアスの判定  i V) Bias judgment

図 2は新しい座標系での漸近線を求める処理の概念を示す図である。  FIG. 2 is a diagram showing a concept of a process for obtaining an asymptote in a new coordinate system.

最小二乗法により、 積 A; B ;の下位遺伝子集団の漸近線として、 By the least squares method, the asymptote of the subgene population of the product A; B ;

y = α + 3 (15) が求められる。 y = α + 3 (15) is obtained.

ただし、 最小二乗法の独立変数と従厲変数を決めるには、 まず (Lo g2A— L o g2B) 軸系は積 A; B;の上位遺伝子集団から求めた漸近線を新たな X軸とする 軸系に回転する必要がある。 よって、 (Lo g2Ai, Lo g2B5) の新しレ、座標 ( L o g2Aj' , L o g 2B i ' ) は、 However, in order to determine the independent variable and the dependent variable of the least squares method, first, the (Log 2 A—Log 2 B) axis system uses the asymptote obtained from the upper gene group of the product A; B; It is necessary to rotate to the axis system. Therefore, the new coordinates of (Lo g 2 Ai, Log 2 B 5 ) and the coordinates (L og 2 Aj ', L og 2 B i') are

(Log2AA ( CosB SineYLog2Ai (Log 2 AA (CosB SineYLog 2 A i

(16)  (16)

- Sind CosB A より求められる。  -Required by Sind CosB A.

また、 傾き α= t a η Θから、 tan0 a  From the slope α = t a η 、, tan0 a

Sine = , CosQ =  Sine =, CosQ =

+ tan20 Vl+α' が求められる。 + tan 20 Vl + α 'is required.

次に、 新しい座標系で AiB iの下位遺伝子集団 (A , ) の漸近線を最小 二乗法で求める。 .で漸近線を y =m x + n (1 7) とする。 mが負数の場合、 Bが Aより多くのバイアスを含むと判定する。 一方、 m が正数の場合、 Aが 13より多くのバイアスを含んでいると判定する。 Next, minimize the asymptote of the subgene population (A,) of AiB i in the new coordinate system. Calculate by the square method. The asymptote is assumed to be y = mx + n (1 7). If m is negative, it is determined that B contains more bias than A. On the other hand, if m is a positive number, it is determined that A contains more than 13 biases.

V) バイアスの計算  V) Calculation of bias

式 17で示す漸近線において、 mが負数の場合、 (L o g 2 A, L o g2B) 軸系 において、 積 AiB ;の下位遺伝子集団 (Ai, , Bi' ) のデ一タを用いて Lo g2 B軸との切片は、 最小二乗法 (Lo g2Aは独立変数、 Lo g2Bは従属変数) より 求められる。 In the asymptote shown in Eq. 17, if m is a negative number, in the (L og 2 A, L og 2 B) axis system, the data of the lower gene group (Ai,, Bi ') of the product AiB ; The intercept with the Log 2 B axis is obtained by the least squares method (Log 2 A is an independent variable, Log 2 B is a dependent variable).

(18)

Figure imgf000038_0001
n it (18)
Figure imgf000038_0001
n it

切片 B ―》 ん <¾r2 — α 〉 Log2Ai (19) i=l となる。 Intercept B-> n <¾r 2 — α> Log2 A i (19) i = l.

一方、 mが正数の場合、 (Lo g2A, Lo g2B) 軸系において、 積 AiBiの下 位遺伝子集団 (Α , 13 ) のデータを用いて Lo g2A軸との切片は、 最小二 乗法より (Lo g2Bは独立変数、 Lo g2Aは従属変数) で求められる。 On the other hand, when m is a positive number, in the (Lo g 2 A, Log 2 B) axis system, the intercept with the Log 2 A axis is calculated using the data of the lower gene group (Α, 13) of the product AiBi. (Log 2 B is an independent variable, and Log 2 A is a dependent variable).

Figure imgf000038_0002
n n
Figure imgf000038_0002
nn

切片 AC =— L^2A—丄丄 y 2^ (2 i) となる。 The intercept A C = — L ^ 2 A— 丄 丄 y 2 ^ (2 i).

第二段間のデータ補正は、 対照測定値の片方のデータ全体に対して、 式 1 1、 あ るいは、 式 1 3で得られたバイアスを差し引くことで行われる。  The data correction between the second stage is performed by subtracting the bias obtained by Equation 11 or 13 from the entire data of one of the control measurement values.

以上の補正により、 新たなデータプロット図 (L o g 2Ai, L o g 2 (B i— k) ) 、 あるいは、 (L o g 2 (Ai_k) , L o g 2B i) を用いて (以下、 「補正プロ ット (L o g 2Ai, L o g 2B J」 という) 、 次段階の分析に進む。 従って、 式 1 あるいは、 式 1 ' は、 With the above correction, a new data plot (L og 2 Ai, L og 2 (B i−k)) or (L og 2 (Ai_k), L og 2 B i) is used (hereinafter, “ The correction plot (referred to as “Log 2 Ai, Log 2 BJ”) proceeds to the next stage of analysis.

L o g 2B = K L o g 2A+ I (2 2) として表現できる。 L og 2 B = KL og 2 A + I (2 2)

2. 多重検定による発現量が変ィヒした遺伝子の頑健 (口バス卜) 検出法  2. Robust detection of genes with altered expression levels by multiple tests

本方法において、 補正されたデータは発現量が変わる遺伝子集団と発現量が変わ らない遺伝子集団との混合分布で構成されていると仮定する。 まず、 データ対ごと に、 蛍光強度平衡軸方向に一定区間内のウィンドウを設定し、 各ウィンドウ内でス チューデントの t一分布に基づいた任意危険率の信頼限界点を求める。 続いて、 蛍 光強度平衡軸 (X軸) 方向に一定遺伝子ずつウィンドウを移動させ、 各信頼限界点 を求める。 求めた複数の信頼限界点を平滑ィ匕 (スプライン) により補完し、 信頼境 界線 (信頼曲線) とする。  In this method, it is assumed that the corrected data is composed of a mixed distribution of a gene group whose expression level changes and a gene group whose expression level does not change. First, for each data pair, a window is set within a certain interval in the direction of the fluorescence intensity equilibrium axis, and within each window, the confidence limit of the arbitrary risk factor based on the Student's t-distribution is determined. Then, move the window by a fixed number of genes in the direction of the fluorescence intensity equilibrium axis (X-axis) to find each confidence limit point. The obtained plurality of confidence limit points are complemented by a smooth line (spline) to make a confidence boundary line (confidence curve).

この結果より、 信頼境界線の外側に位置する遺伝子を発現量が変わった遺伝子と して選択する。 さらに高い抽出信頼性を得るため、 繰り返し実験による多数決の比 率を基準にして、 確実に発現量の変わった遺伝子を選択する。 次に、 抽出の第一種 のエラーを減らすため、 マルチテストで、 決められた回数以上発現量が変化したと して抽出された場合にのみ、 遺伝子の発現量が変化したと認める。  Based on this result, genes located outside the confidence boundary are selected as genes whose expression levels have changed. In order to obtain even higher extraction reliability, the genes whose expression levels have been changed are surely selected based on the ratio of the majority decision by repeated experiments. Next, in order to reduce errors of the first kind of extraction, the multi-test recognizes that the gene expression level has changed only if the expression level has changed more than a predetermined number of times.

( 1 ) 蛍光強度平衡軸と発現量の比によるデータ分布の再構築 図 3は、 分布図の再構築を概念的に示す図である。 図 3に示すように、 各補正プ ロット (L o g 2A i , L o g 2 B i ) から蛍光強度平衡軸 L o g 2 B i = κ L o g 2 A i + Iまでの垂直の距離は、 発現量の比に比例すると考えられる。 また、 蛍光強度 平衡軸上、 右へ移動する程、 蛍光強度が比例して高くなるのは明らかである。 従つ て、 各遺伝子から蛍光強度平衡軸までの距離を計算して Y軸の値とし、 蛍光強度平 衡軸を X軸にした蛍光強度散布図はデータの性質を明白に表現できる。 ここで各遺 伝子の Y軸の値 d 2 (発現量の倍率) は、 式 4により計算する。 (1) Reconstruction of data distribution based on ratio of fluorescence intensity equilibrium axis to expression level FIG. 3 is a diagram conceptually showing reconstruction of the distribution map. As shown in FIG. 3, the correction plot vertical distance from (L og 2 A i, L og 2 B i) to the fluorescent intensity equilibrium axis L og 2 B i = κ L og 2 A i + I is It is considered to be proportional to the expression level ratio. Also, it is clear that the fluorescence intensity increases proportionately to the right on the fluorescence intensity equilibrium axis. Therefore, the distance from each gene to the fluorescence intensity equilibrium axis is calculated and used as the value on the Y axis, and the fluorescence intensity scatter plot with the fluorescence intensity equilibrium axis as the X axis can clearly express the nature of the data. Here, the Y-axis value d 2 (magnification of the expression amount) of each gene is calculated by Equation 4.

また、 各遺伝子の X軸の値 d i (蛍光強度) は、 蛍光測定値 Aと Bの集団は様々 な誤差を含んでいるにもかかわらず、 全体的に Aと Bの関係を示す式 2 2に従う。 そして、 再構築した蛍光強度散布図は、 (蛍光強度一発現量の変化率) の X— Y軸 を持つ。  In addition, the X-axis value di (fluorescence intensity) of each gene is calculated by the equation 22 which shows the relationship between A and B as a whole, despite the fact that the populations of the fluorescence measurement values A and B contain various errors. Obey. Then, the reconstructed fluorescence intensity scatter diagram has the XY axis of (fluorescence intensity-change rate of expression level).

( 2 ) 発現量が変わる遺伝子集団と、 発現量が変わらない遺伝子集団との混合分布 モデルの多重検定  (2) Multiple testing of a mixed distribution model of a gene population whose expression level changes and a gene population whose expression level does not change

図 4から図 6は、 発現倍率の混合正規分布モデルを示す図である。  4 to 6 are diagrams showing a mixed normal distribution model of the expression fold.

実際のデータの分布は、 発現量が変わった遺伝子 (変動遺伝子) の集団と、 発現 量が変わらない遺伝子 (非変動遺伝子) の集団の混合分布であると考えることがで きる。 本方法の混合分布モデルは、 図 4に示すように、 発現量の変化率を表す Y軸 において、 ゼロを中心とした非変動遺伝子の集団分布と、 それぞれ発現比が上昇、 および、 下降したある一点を中心とした変動遺伝子の集団分布からなっていると仮 定している。 ここでは説明の便宜上、 正規分布のみを示すが、 本発明は正規分布の 場合に限定されず、 全ての分布のデータに適用することができる。  The distribution of the actual data can be considered to be a mixed distribution of a group of genes whose expression level has changed (variable genes) and a group of genes whose expression level does not change (non-variable genes). As shown in Fig. 4, the mixture distribution model of this method has a population distribution of non-variable genes centered on zero on the Y-axis that represents the rate of change in the expression level, and an expression ratio that increases and decreases, respectively. It is assumed that it consists of a population distribution of fluctuating genes around one point. Here, for convenience of explanation, only the normal distribution is shown, but the present invention is not limited to the normal distribution, and can be applied to data of all distributions.

ここで、 図 5に示すように、 変動遺伝子の全体に対する割合がそれほど大きくな い場合 (例えば、 変動遺伝子の集団が全体数の 1 0 %を占める場合など) には、 そ の混合分布は図 6のように、 正規分布に近似する。 従って、 一定の信頼限界値であ る P値 (P— v a l u e ) を条件にした混合分布の蛍光倍率データに対して t分布 に基づき、 変動遺伝子を抽出できる。  Here, as shown in Fig. 5, when the ratio of the fluctuating genes to the whole is not so large (for example, when the fluctuating gene population occupies 10% of the total number), the mixture distribution is It approximates to normal distribution as shown in 6. Therefore, it is possible to extract a fluctuating gene based on the t-distribution with respect to the fluorescence magnification data of the mixture distribution under the condition of the P value (P-value) which is a certain confidence limit value.

本方法は、 実際のデータの分散と中心の計算に基づいて発現量変動倍率の閾^:を 決めているため、 本方法は頑健 (ロバスト) であるという特徴を持っている。 すな わち、 本方法は誤差の範囲が異なる実験データでもその誤差に応じて、 発現量変動 倍率の閾値が決められる。 また、 本方法のもう一つの特徴として、 同じ条件の対照 実験で得られた異なるデータセットに対して、 数回の検出を行ない、 あら力、じめ決 めた回数以上検出される遺伝子のみを選択することにより、 高い信頼度で変動遺伝 子を検出できることが挙げられる。 This method is based on the calculation of the variance and the center of the actual data. As a result, the method has the feature of being robust. In other words, in the present method, even in experimental data having a different error range, the threshold value of the expression amount variation magnification is determined according to the error. Another feature of this method is that several detections are performed on different data sets obtained from control experiments under the same conditions, and only those genes that are detected more than a predetermined number of times are detected. By selecting this, it is possible to detect variable genes with high reliability.

さらに、 非変動遺伝子、 および、 変動遺伝子集団の混合分布を、 六つのパラメ一 タ (全遺伝子数、 発現が変動する遺伝子の割合、 遺伝子分布の標準偏差 (幅) 、 発 現が変動する遺伝子の分布の中心、 検出基準 (検出数/全体数) 、 および、 各デー タセット (ウィンドウ) 内の信頼限界値 (P_v a l u e) ) を変えてシミュレ一 シヨンすることにより、 第一種の検出エラーと第二種の検出エラーを計算できる。 その結果は、 実験のガイドラインとすることができる。 ここで、 「第一種の検出ェ ラー」 は、 変わらないものが変わるものとして検出された偽陽性エラーをいい、 「 第二種の検出エラー」 は、 変わるものが変わらないものとして検出された偽陰性ェ ラーをレヽう。  Furthermore, the mixed distribution of non-variable genes and fluctuating gene populations is calculated using six parameters (total number of genes, percentage of genes whose expression fluctuates, standard deviation (width) of gene distribution, and genes whose expression fluctuates). By changing the center of the distribution, the detection criterion (number of detections / total number), and the confidence limit (P_value) in each data set (window), the simulation error of the first kind and the second Two types of detection errors can be calculated. The results can be used as experimental guidelines. Here, `` Type 1 detection error '' refers to a false positive error that was detected as something that did not change, and `` Type 2 detection error '' was detected as something that changed did not change. Check for false negative errors.

(3) 移動ウィンドゥ法を用いたデータに合わせた発現変動信頼曲線の作成 蛍光強度の小さレ、遺伝子ほど、 その発現変化量の値がバックグラウンドなどの誤 差に強い影響を受ける。 例えば、 対照実験で各蛍光値に除去不可能の誤差 αΛと o;B が存在するとすれば、 ある遺伝子 iの発現変化量は、 蛍光倍率 = (Α,-αΛί) / (B i-aB i) として現れる。 従って、 Ai >〉aA i、 そして、 B ;〉〉 ct B iの場合、 蛍光倍率は A i/B iとして近似できるが、 Aiと αΛい そして、 B iとひ B iの値が 近い場合は、 その誤差による影響は無視できない。 よって、 実際に t分布に基づき、 遺伝子を選択する場合、 バックグラウンドなどの誤差により異なる程度をもって影 響を受ける遺伝子集団が混在することを考えると、 蛍光強度に応じて異なる集団の t値を決定するべきである。 (3) Creation of an expression fluctuation confidence curve in accordance with the data using the moving window method The smaller the fluorescence intensity and the gene, the more strongly the expression change value is affected by errors such as background. For example, a control experiment with non-removable error alpha lambda and o in each fluorescence value; if B is present, altered expression of a gene i is the fluorescence ratio = (Α, -α Λί) / (B ia B i ). Therefore, Ai >> a A i, then B; >> For ct B i, although fluorescent magnification can be approximated as A i / B i, Ai and alpha lambda There Then, the value of B i Tohi B i If they are close, the effect of the error cannot be ignored. Therefore, when selecting genes based on the t-distribution, the t-values of different populations are determined according to the fluorescence intensity, considering that gene populations affected to different degrees due to background and other errors are mixed. Should be.

ここで、 図 1 1〜図 1 4は、 信頼曲線の作成を概念的に示した図である。 まず、 図 1 1に示すように、 本装置は、 一定遺伝子数で構成されたウィンドウ内の遺伝子 の発現量の倍率分布に対して分散と中心を計算して、 倍率変化の t値を決める (倍 率座標軸の値に相当する) 。 尚、 この発現変化の信頼限界点の蛍光強度平衡軸上の 値はウィンドウ内部の全ての it伝子の蛍光強度平衡軸値 m e d i a n値を用いるこ ととする。 Here, FIGS. 11 to 14 are diagrams conceptually showing the creation of a confidence curve. First, as shown in Fig. 11, this device uses a gene in a window composed of a fixed number of genes. Calculate the variance and center for the fold distribution of the expression level, and determine the t-value of the fold change (corresponding to the value on the fold coordinate axis). Note that the median value of the fluorescence intensity equilibrium axis value of all the it genes in the window is used as the value on the fluorescence intensity equilibrium axis at the confidence limit point of this expression change.

次に、 本装置は、 図 1 2に示すように、 ウィンドウ内の蛍光強度平衡軸上下にお いて、 発現変化の信頼限界点の座標をそれぞれ決めた後、 蛍光強度平衡軸が増加す る方向に一定遺伝子分ウィンドウを移動させる。 以降、 この操作を繰り返す。 本装置は、 全ての発現変化の信頼限界点の計算を行なった後、 発現変化の信頼限 界点を 3次スプライン曲線によって、 信頼限界点同士をつなぎ発現変化境界線であ る発現変動信頼曲線を作成する。 ここで、 両端のウィンドウにおいて、 3次スプラ ィン曲線による補完ができない蛍光強度の領域では、 図 1 3で示すように、 蛍光強 度の高レ、ところ (点線で示す) では最後のウィンドウで求めた発現変化の信頼限界 点の水平延長線を用い、 また蛍光強度の低いところ (点線で示す) では一番左から 続いた数十個のウインドウの境界点から最小二乗法により求めた漸近線の補外を発 現変動信頼曲線 (補外発現変化境界線) とする。  Next, as shown in Fig. 12, the system determines the coordinates of the confidence limit point of the expression change above and below the fluorescence intensity equilibrium axis in the window, and then the direction in which the fluorescence intensity equilibrium axis increases. To move the window for a certain number of genes. Thereafter, this operation is repeated. After calculating the confidence limit points of all expression changes, this device connects the confidence limit points of the expression changes by cubic spline curves to each other, and expresses the expression variation reliability curve that is the boundary line of the expression change. Create Here, in the window at both ends, in the region of the fluorescence intensity that cannot be complemented by the cubic spline curve, as shown in Fig. 13, the fluorescence intensity is high, but in the last window (indicated by the dotted line), The asymptote obtained by the least-squares method from the boundary points of several tens of windows that continued from the leftmost point using the horizontal extension line of the confidence limit point of the calculated expression change, and where fluorescence intensity is low (indicated by the dotted line). The extrapolation of is used as the expression fluctuation reliability curve (extrapolation expression change boundary line).

ついで、 図 1 4に示すように、 蛍光強度平衡軸上下の発現変動信頼曲線で挟んだ 領域より外れた遺伝子を、 発現量が変化した遺伝子、 つまり、 発現量が上昇、 ある いは、 下降したものとして抽出する。 最終的な遺伝子の抽出は、 前述した多重検定 ( 2— ( 2 ) ) により行なう。  Next, as shown in Fig. 14, the genes whose expression level was changed, that is, the genes outside the region sandwiched by the expression fluctuation reliability curves above and below the fluorescence intensity equilibrium axis, that is, the expression level increased or decreased Extract as things. The final gene extraction is performed by the multiple test (2- (2)) described above.

[装置構成]  [Device configuration]

次に、 遺伝子発現情報解析装置の構成について以下に図 2 2〜図 2 5を参照して 説明する。 図 2 2は、 本発明が適用される本装置の構成の一例を示すプロック図で あり、 該構成のうち本発明に関係する部分のみを概念的に示している。  Next, the configuration of the gene expression information analyzer will be described below with reference to FIGS. FIG. 22 is a block diagram showing an example of the configuration of the present apparatus to which the present invention is applied, and conceptually shows only those parts of the configuration relating to the present invention.

図 2 2において遺伝子発現情報解析装置 1 0 0は、 概略的に、 遺伝子発現情報解 析装置 1 0 0の全体を統括的に制御する C P U等の制御部 1 0 2、 通信回線等に接 続されるルータ等の通信装置 (図示せず) に接続される通信制御インタ—フェース 咅 0 4、 入力装置 1 1 2や出力装置 1 1 4に接続される入出力制御インターフエ ース部 1 0 8、 および、 各種のデータベースやテーブルなどを格納する記憶部 1 0 6を備えて構成されており、 これら各部は任意の通信路を介して通信可能に接続さ れている。 さらに、 この遺伝子発現情報解析装置 1 0 0は、 ルータ等の通信装置お よび専用線等の有線または無線の通信回線を介して、 ネットワークに通信可能に接 続されてもよい。 In FIG. 22, the gene expression information analyzer 100 is connected to a control unit 102 such as a CPU for controlling the entire gene expression information analyzer 100 and a communication line. Communication control interface 咅 04 connected to communication devices (not shown) such as routers, input / output control interfaces connected to input devices 112 and output devices 114 And a storage unit 106 for storing various databases and tables. These units are communicably connected via an arbitrary communication path. Further, the gene expression information analyzer 100 may be communicably connected to a network via a communication device such as a router and a wired or wireless communication line such as a dedicated line.

記憶部 1 0 6に格納される各種のデータベースやテーブル (測定輝度データ 1 0 6 aおよびシミュレーション結果データ 1 0 6 b ) は、 固定ディスク装置等のスト レージ手段であり、 各種処理に用いる各種のプログラムゃテ一ブルやファイルゃデ ータベースゃゥェブぺージ用フアイル等を格納する。  Various databases and tables (measured luminance data 106a and simulation result data 106b) stored in the storage unit 106 are storage means such as a fixed disk device, and are used for various types of processing. Stores program tables and files for file database pages.

これら記憶部 1 0 6の各構成要素のうち、 測定輝度データ 1 0 6 aは、 D NAチ ップゃ D N Aマイクロアレイなどにより実験された遺伝子の発現量を示す各スポッ トの測定輝度データを各実験毎に格納した測定輝度データ格納手段である。 また、 シミュレーション結果データ 1 0 6 bは、 本装置によるシミュレーション結果デー タを格納したシミュレーション結果データ格納手段である。  Among the constituent elements of the storage unit 106, the measured luminance data 106a is the measured luminance data of each spot that indicates the expression level of the gene that was tested by a DNA chip or DNA microarray. This is a measured luminance data storage means stored for each experiment. The simulation result data 106 b is a simulation result data storage unit that stores simulation result data by the present apparatus.

また、 図 2 2において、 通信制御ィンターフェース部 1 0 4は、 遺伝子発現情報 解析装置 1 0 0とネットワーク (またはルータ等の通信装置) との ¾における通信 制御を行う。 すなわち、 通信制御インターフェース部 1 0 4は、 他の端末と通信回 線を介してデータを通信する機能を有する。  In FIG. 22, a communication control interface unit 104 controls communication between the gene expression information analysis device 100 and a network (or a communication device such as a router). That is, the communication control interface unit 104 has a function of communicating data with another terminal via a communication line.

また、 図 2 2において、 入出力制御インターフェース部 1 0 8は、 入力装置 1 1 2や出力装置 1 1 4の制御を行う。 ここで、 出力装置 1 1 4としては、 モニタ (家 庭用テレビを含む) の他、 スピーカを用いることができる (なお、 以下においては 出力装置 1 1 4をモニタとして記載する場合がある) 。 また、 入力装置 1 1 2とし ては、 キーボード、 マウス、 および、 マイク等を用いることができる。 また、 モニ タも、 マウスと協働してボインティングデバイス機能を実現する。  In FIG. 22, an input / output control interface unit 108 controls the input device 112 and the output device 114. Here, as the output device 114, in addition to a monitor (including a home television), a speaker can be used (hereinafter, the output device 114 may be described as a monitor). As the input device 112, a keyboard, a mouse, a microphone, and the like can be used. The monitor also implements a pointing device function in cooperation with the mouse.

また、 図 2 2において、 制御部 1 0 2は、 O S (O p e r a t i n g S y s t e m) 等の制御プログラム、 各種の処理手順等を規定したプログラム、 および所要 データを格納するための内部メモリを有し、 これらのプログラム等により、 種々の 処理を実行するための情報処理を行う。 制御部 1 0 2は、 機能概念的に、 バックグ ラウンド補正部 1 0 2 a、 バイアス補正部 1 0 2 b、 遺伝子検出部 1 0 2 c、 およ び、 シミュレーション部 1 0 2 dを備えて構成されている。 Further, in FIG. 22, the control unit 102 has a control program such as an OS (Operating System), a program defining various processing procedures, and an internal memory for storing required data. By these programs, various Information processing for executing the processing is performed. The control unit 102 is conceptually provided with a background correction unit 102 a, a bias correction unit 102 b, a gene detection unit 102 c, and a simulation unit 102 d. It is configured.

このうち、 バックグラウンド補正部 1 0 2 aは、 2つの条件で同一の遺伝子の発 現量を示す蛍光強度を測定した各スポットの測定輝度データからバックグラウンド 値を除去することによりバックグラウンド補正された輝度データを作成するバック グラウンド補正手段である。  Of these, the background correction unit 102a corrects the background by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under the two conditions. This is the background correction means for creating the brightness data.

また、 バイァス補正部 1 0 2 bは、 バックダラゥンド補正手段によりバックダラ ゥンド補正された輝度データの対数を X— Y軸にとり蛍光強度散布図を作成し、 各 遺伝子のスポットについて蛍光強度平衡軸に対するバイアスを求め、 輝度データか ら当該バイァスを除去することにより蛍光強度平衡軸と発現量の倍率軸を 2軸とす る新たな X— Y軸系の蛍光強度散布図を構築するバイァス補正手段である。  In addition, the bias correction unit 102b generates a fluorescence intensity scatter diagram by taking the logarithm of the luminance data subjected to the back ground correction by the back ground correction unit on the XY axis, and creates a bias with respect to the fluorescence intensity equilibrium axis for each gene spot. This is a bias correction means for constructing a new XY-axis fluorescence intensity scatter diagram having two axes of the fluorescence intensity equilibrium axis and the magnification axis of the expression level by removing the bias from the obtained luminance data.

ここで、 図 2 3は、 バイアス補正部 1 0 2 bの構成の一例を示すブロック図であ り、 該構成のうち本発明に関係する部分のみを概念的に示している。 図 2 3に示す ように、 バイァス補正部 1 0 2 bは、 機能概念的に、 第一主成分作成部 1 0 2 e、 座標回転部 1 0 2 f 、 バイアス判定部 1 0 2 g、 および、 捕正プロット生成部 1 0 2 hを備えて構成されている。  Here, FIG. 23 is a block diagram showing an example of the configuration of the bias correction section 102b, and conceptually shows only those portions of the configuration relating to the present invention. As shown in FIG. 23, the bias correction unit 102 b is functionally conceptually composed of a first principal component creation unit 102 e, a coordinate rotation unit 102 f, a bias determination unit 102 g, and It is configured to include a capture plot generator 102h.

図 2 3において、 第一主成分作成部 1 0 2 eは、 発現量が多い遺伝子集団の対数 値を用いて主成分分析を実行し、 第一主成分となる漸近線の傾きと切片を求める第 一主成分作成手段である。  In Fig. 23, the first principal component generator 102 e performs principal component analysis using the logarithmic value of the gene group with a high expression level, and finds the slope and intercept of the asymptote as the first principal component This is the first principal component creating means.

また、 座標回転部 1 0 2 f は、 第一主成分作成手段により求めた漸近線と X軸と の角度を 0とし、 発現量が少ない遺伝子集団の X— Y軸系における座標を右にり角 度回転した座標を計算する座標回転手段である。  Also, the coordinate rotation unit 102 f sets the angle between the asymptote obtained by the first principal component creation means and the X axis to 0, and shifts the coordinates of the gene group with low expression level in the XY axis system to the right. This is a coordinate rotation unit that calculates the coordinate rotated by an angle.

また、 バイアス判定部 1 0 2 gは、 座標回転手段による座標軸回転後の発現量が 少ない遺伝子集団の座標を用いて、 蛍光強度平衡軸の傾きを計算し、 計算された傾 きに基づいて 2つの条件の輝度データのうちどちらにバイアスが多く含まれている カを判定するバイアス判定手段である。 また、 補正プロット生成部 1 0 2 hは、 バイアス判定手段にてバイアスが多く含 まれていると判定された条件の輝度データからバイアスを差し引くことにより蛍光 強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構 築する補正プロット生成手段である。 In addition, the bias determination unit 102 g calculates the inclination of the fluorescence intensity equilibrium axis using the coordinates of the gene group whose expression level is small after the rotation of the coordinate axis by the coordinate rotation means, and calculates the slope based on the calculated inclination. This is a bias determination means for determining which of the luminance data under the two conditions contains a large amount of bias. In addition, the correction plot generator 102h subtracts the bias from the luminance data of the condition determined to contain a large amount of bias by the bias determination means, thereby obtaining the fluorescence intensity equilibrium axis and the expression level magnification axis by two. This is a correction plot generation means for constructing a new X-Y axis fluorescence intensity scatter plot as the axis.

再び図 2 2に戻り、 遺伝子検出部 1 0 2 cは、 バイアス補正手段により構築され た新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検 出する遺伝子検出手段である。  Returning to Fig. 22 again, the gene detection unit 102c detects a fluctuating gene whose expression level fluctuates based on a new X-Y axis fluorescence intensity scatter diagram constructed by the bias correction means. It is a detecting means.

ここで、 図 2 4は、 遺伝子検出部 1 0 2 cの構成の一例を示すプロック図であり 該構成のうち本発明に関係する部分のみを概念的に示している。 図 2 4に示すよう に、 遺伝子検出部 1 0 2 cは、 機能概念的に、 ウィンドウ設定部 1◦ 2 i、 信頼限 界点決定部 1 0 2 j、 ウィンドウ移動部 1 0 2 k、 信頼境界線作成部 1 0 2 m、 変 動遺伝子抽出部 1 0 2 n、 遺伝子数入力部 1 0 2 p、 信頼限界値入力部 1 0 2 q、 および、 偏差値処理部 1 0 2 uを備えて構成されている。  Here, FIG. 24 is a block diagram showing an example of the configuration of the gene detection section 102c, and conceptually shows only a portion related to the present invention in the configuration. As shown in Fig. 24, the gene detection unit 102 c is functionally conceptualized as a window setting unit 1 2 i, a confidence limit point determination unit 102 j, a window moving unit 102 k, and a reliability Boundary line creation unit 102 m, variable gene extraction unit 102 n, gene number input unit 102 p, confidence limit input unit 102 q, and deviation value processing unit 102 u It is configured.

図 2 4において、 ウィンドウ設定部 1 0 2 iは、 蛍光強度平衡軸方向に予め定め た区間内のウィンドウを設定するウィンドウ設定手段である。  In FIG. 24, a window setting unit 102 i is a window setting unit that sets a window within a predetermined section in the direction of the fluorescence intensity equilibrium axis.

また、 信頼限界点決定部 1 0 2 jは、 ウィンドウ設定手段により設定された各ゥ ィンドウ内において信頼限界点を決定する信頼限界点決定手段である。  The confidence limit point determining unit 102 j is a confidence limit point determination unit that determines a confidence limit point in each window set by the window setting unit.

また、 ウィンドウ移動部 1 0 2 kは、 蛍光強度平衡軸方向に一定遺伝子ずつゥィ ンドウを移動するウィンドゥ移動手段である。  The window moving unit 102 k is a window moving means for moving a window by a certain gene in the direction of the fluorescence intensity equilibrium axis.

また、 信頼境界線作成部 1 0 2 mは、 ウィンドゥ移動手段により移動した各ウイ ンドウにつレ、て信頼限界点決定手段により各信頼限界点を求め、 求めた複数の信頼 限界点に基づレ、て信頼境界線を作成する信頼境界線作成手段である。  In addition, the confidence boundary creating unit 102m finds each confidence limit point by means of the confidence limit point determination means for each window moved by the window moving means, and based on the plurality of confidence limit points thus found. Reasonable boundary creation means for creating a confidence boundary.

また、 変動遺伝子抽出部 1 0 2 nは、 信頼境界線作成手段により作成された信頼 境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出する変動 遺伝子抽出手段である。  The fluctuating gene extraction unit 102 n is fluctuating gene extraction means for extracting a gene located outside the reliability boundary created by the reliability boundary creation means as a fluctuating gene whose expression level has fluctuated.

また、 遺伝子数入力部 1 0 2 pは、 利用者にウィンドウ内の遺伝子数を入力させ る遺伝子数入力手段である。 また、 信頼限界値入力部 1 0 2 qは、 利用者に信頼限界値を入力させる信頼限界 値入力手段である。 The gene number input section 102p is a gene number input means for allowing a user to input the number of genes in the window. The confidence limit value input section 102q is a confidence limit value input means for allowing a user to input a confidence limit value.

また、 偏差値処理部 1 0 2 uは、 各スポッ卜の偏差値を計算する偏差値計算手段 である。  Further, the deviation value processing unit 102 u is a deviation value calculation means for calculating a deviation value of each spot.

再び図 2 2に戻り、 シミュレーション部 1 0 2 dは、 予め定めた条件に従って、 複数回のシミュレーションを実行してシミュレーション結果を条件毎に出力するシ ミュレーション手段である。  Returning to FIG. 22 again, the simulation unit 102 d is a simulation unit that executes a plurality of simulations according to predetermined conditions and outputs a simulation result for each condition.

ここで、 図 2 5は、 シミュレーション部 1 0 2 dの構成の一例を示すプロック図 であり、 該構成のうち本発明に関係する部分のみを概念的に示している。 図 2 5に 示すように、 シミュレーション部 1 0 2 dは、 機能概念的に、 シミュレーション条 件設定部 1 0 2 r、 シミュレーション実行部 1 0 2 s、 および、 シミュレーション 結果出力部 1 0 2 tを備えて構成されている。  Here, FIG. 25 is a block diagram showing an example of the configuration of the simulation section 102d, and conceptually shows only a portion related to the present invention in the configuration. As shown in FIG. 25, the simulation unit 102 d is functionally conceptually composed of a simulation condition setting unit 102 r, a simulation execution unit 102 s, and a simulation result output unit 102 t. It is provided with.

図 2 5において、 シミュレーション条件設定部 1 0 2 rは、 利用者に、 遺伝子の 分布の標準偏差、 変動遺伝子の分布の中心、 変動遺伝子の検出基準、 および、 シミ ユレーシヨン回数のうち少なくとも一つに関する情報を含むシミュレーション条件 を入力させるシミュレーシヨン条件設定手段である。  In FIG. 25, the simulation condition setting unit 102 r provides the user with at least one of the standard deviation of the distribution of genes, the center of the distribution of fluctuating genes, the criteria for detecting fluctuating genes, and the number of simulations. Simulation condition setting means for inputting simulation conditions including information.

また、 シミュレーション実行部 1 0 2 sは、 シミュレーション条件設定手段にて 設定されたシミュレーション条件に従って、 同一の遺伝子群に対して同じ分布から 繰り返して生成し、 遺伝子検出手段を実行し、 発現遺伝子を検出するシミュレ一シ ヨンを複数回実行し、 検出手段による結果の偽陽性率と偽陰性率を計算し、 実験の 繰り返し数、 シミュレーション条件、 および検出感度と検出信頼度との関係を計算 し、 発現量が変わる遺伝子の検定統計表を作成するシミュレーション実行手段であ る。  In addition, the simulation execution unit 102 s repeatedly generates the same gene group from the same distribution according to the simulation conditions set by the simulation condition setting unit, executes the gene detection unit, and detects the expressed gene. Simulate multiple times, calculate the false positive rate and false negative rate of the results obtained by the detection means, calculate the number of repetitions of the experiment, simulation conditions, and the relationship between detection sensitivity and detection reliability, and express This is a simulation execution means for creating a test statistical table of genes whose amounts change.

また、 シミュレーション結果出力部 1 0 2 tは、 シミュレーション条件毎に、 シ ミュレ一ション実行手段によるシミュレーション結果を出力するシミュレーション 結果出力手段である。  The simulation result output unit 102t is a simulation result output unit that outputs a simulation result by the simulation execution unit for each simulation condition.

なお、 これら各部によって行なわれる処理の詳細については、 後述する。 [本装置の処理] The details of the processing performed by these units will be described later. [Processing of this device]

次に、 このように構成された本実施の形態における本装置の本実施形態の処理の 一例について、 以下に図 7〜図 1 0、 図 1 5〜図 2 8を参照して詳細に説明する。  Next, an example of the processing of the present embodiment of the present embodiment configured as described above will be described in detail with reference to FIGS. 7 to 10 and FIGS. 15 to 28. .

[本装置のメイン処理]  [Main processing of this device]

まず、 本装置のメイン処理について図 1 5を参照して説明する。 図 1 5は本実施 形態の本装置のメイン処理の一例を示すフローチヤ一卜である。  First, the main processing of the present apparatus will be described with reference to FIG. FIG. 15 is a flowchart showing an example of a main process of the present apparatus of the present embodiment.

まず、 遺伝子発現情報解析装置 1 0 0は、 バックダラゥンド補正部 1 0 2 aの処 理により、 図 1 6を用いて後述するバックグラウンド補正処理を実行する (ステツ プ S— 1 ) 。 すなわち、 ノ ックグラウンドネ 正部 1 0 2 aは、 D NAマイクロアレ ィゃ D NAチップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を 測定した各スポットの測定輝度データからバックグラウンド値を除去することによ りバックダラゥンド補正された輝度データを作成する。  First, the gene expression information analyzing apparatus 100 executes a background correction process, which will be described later with reference to FIG. 16, by the processing of the back target correction unit 102a (step S-1). In other words, the positive part 102a of the knock ground pattern is obtained by measuring the background intensity from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene was measured under the two conditions using a DNA microarray DNA chip or the like. Then, the luminance data corrected for the back ground is created by removing the luminance.

ついで、 遺伝子発現情報解析装置 1 0 0は、 バイアス補正部 1 0 2 bの処理によ り、 図 1 7を用いて後述するバイアス補正処理を実行する (ステップ S— 2 ) 。 す なわち、 バイアス補正部 1 0 2 bは、 バックグラウンド補正された輝度データの対 数 (自然対数または 2の対数等) を X— Y軸にとり蛍光強度散布図 (スキヤッタ一 プロット) を作成し、 各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平 衡軸に対するバイアスを求め、 輝度データから当該バイアスを除去することにより 蛍光強度 ¥衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図 を構築する。  Next, the gene expression information analyzing apparatus 100 executes a bias correction process described later with reference to FIG. 17 by the process of the bias correction unit 102b (step S-2). That is, the bias correction unit 102b generates a fluorescence intensity scatter diagram (a scatter plot) by taking the logarithm (natural logarithm or logarithm of 2) of the background-corrected luminance data on the XY axis. The bias for the fluorescence intensity equilibrium axis, which shows the same fluorescence intensity for each gene spot, is calculated, and the bias is removed from the luminance data to obtain a new X with the fluorescence intensity equilibrium axis and the expression level magnification axis as two axes — Construct a Y-axis fluorescence intensity scatter plot.

ついで、 遺伝子発現情報解析装置 1 0 0は、 遺伝子検出部 1 0 2 cの処理により 図 1 8および図 2 0を用いて後述する移動ウィンドウによる遺伝子検出処理を実行 する (ステップ S— 3 ) 。 すなわち、 遺伝子検出部 1 0 2 cは、 構築された新たな X— Y軸系の蛍光強度散布図に基づいて発現量が変動した変動遺伝子を検出する。 っレ、で、 遺伝子発現情報解析装置 1 0 0は、 シミュレーション部 1 0 2 dの処理 により、 図 1 9および図 2 1等を用いて後述するシミュレーション処理を実行する (ステップ S— 4 ) 。 すなわち、 シミュレーション部 1 0 2 dは、 予め定めた条件 に従って、 複数回のシミュレーションを実行してシミュレーション結果を条件毎に 出力する。 Next, the gene expression information analyzer 100 executes a gene detection process using a moving window, which will be described later with reference to FIGS. 18 and 20, by the process of the gene detection unit 102c (step S-3). That is, the gene detection unit 102c detects a fluctuating gene whose expression level has fluctuated based on the constructed fluorescence scatter diagram of the new XY axis system. Then, the gene expression information analyzer 100 executes a simulation process described later with reference to FIGS. 19 and 21 and the like by the process of the simulation unit 102 d (step S-4). That is, the simulation unit 102 d is based on a predetermined condition. According to, the simulation is executed a plurality of times and the simulation result is output for each condition.

これにて、 本装置のメイン処理が終了する。  Thus, the main processing of the present apparatus ends.

レベックグラウンド補正処理]  Rebek ground correction process]

次に、 バックグラウンド補正処理の詳細について図 16を用いて説明する。 図 1 6は本実施形態の本装置のバイアス補正処理の一例を示すフローチヤ一トである。 まず、 遺伝子発現情報解析装置 100は、 バックグラウンド補正部 102 aの処 理により、 遺伝子の二つの条件で測定された輝度から、 平均あるいは局部のバック グラウンド値を求め (ステップ SA— 1) 、 このバックグラウンド値を測定値から 除去し、 この修正の結果を A群、 および、 B群とする (ステップ SA—2)。  Next, details of the background correction processing will be described with reference to FIG. FIG. 16 is a flowchart showing an example of the bias correction process of the present apparatus of the present embodiment. First, the gene expression information analyzer 100 calculates the average or local background value from the luminance measured under the two conditions of the gene by the processing of the background correction unit 102a (step SA-1). The background value is removed from the measured values, and the result of this correction is used as group A and group B (step SA-2).

すなわち、 バックグラウンド補正部 102 aは、 個々のスポットの蛍光強度測定 値からブランクのスポットの平均バックグラウンド値、 あるいは、 各スポットの周 囲の領域のバックグラウンド値を、 各スポットの蛍光強度測定値から引くことによ り、 バックグラウンド補正を行う。 これにてバックグラウンド補正処理を終了する。 レくィァス補正処理]  That is, the background correction unit 102a calculates the average background value of the blank spot or the background value of the area surrounding each spot from the measured fluorescence intensity of each spot, and calculates the measured fluorescence intensity of each spot. Perform background correction by subtracting from. This ends the background correction processing. Correction processing]

次に、 バイアス補正処理の詳細について、 図 17を参照して説明する。 図 17は 本実施形態の本装置のバイアス補正処理の一例を示すフローチヤ一卜である。 まず、 バイァス補正部 102 bは、 第一主成分作成部 102 eの処理により、 A群、 およ び、 B群に対し、 2を底にした対数を計算し、 Lo g2A, Lo g2Bを X, Y軸と した直交軸系にスキヤッタープロットする (ステップ SB— 1)。 Next, details of the bias correction processing will be described with reference to FIG. FIG. 17 is a flowchart showing an example of the bias correction process of the present apparatus of the present embodiment. First, the bias correction unit 102b calculates the base-2 logarithm of the group A and the group B by the processing of the first principal component creation unit 102e, and calculates Log 2 A, Log g 2 Perform a scatter plot on the orthogonal axis system with B as the X and Y axes (step SB-1).

次に、 バイアス補正部 102 bは、 第一主成分作成部 102 eの処理により、 積 ABの上位遺伝子集団 (例えば、 上位 70%までの遺伝子集団) の対数値を用いて、 分散 ·共分散行列を用いた主成分分析を実行し、 第一主成分となる漸近線の傾きと 切片を求める (ステップ SB— 2)。  Next, the bias correction unit 102b uses the logarithmic value of the upper gene group of the product AB (for example, the gene group up to the upper 70%) by the processing of the first principal component creation unit 102e to calculate the variance and covariance. Principal component analysis using a matrix is performed to find the slope and intercept of the asymptote that is the first principal component (step SB-2).

ついで、 バイアス補正部 102 bは、 座標回転部 102 f の処理により、 求めた 漸近線と Lo g 2 A軸の角度を Θとし、 積 A Bの下位に属する遺伝子集団 (例えば、 下位 10%に含まれる遺伝子の集団など) の Lo g 2A-L o g 2 B軸系における座 標を右に 0角度回転した座標を計算する (ステップ SB— 3) 。 Next, the bias correction unit 102b sets the angle between the asymptote and the Log 2 A axis obtained by the processing of the coordinate rotation unit 102f to Θ, and the gene population belonging to the lower order of the product AB (for example, seat in Lo g 2 AL og 2 B axis system are such groups of genes) Calculate the coordinates of the target rotated right by 0 degrees (step SB-3).

ついで、 バイアス補正部 102 bは、 バイアス判定部 1 02 gの処理により、 座 標軸回転後の積 A Bの下位遺伝子集団の座標を用いて、 漸近線のィ頃きを計算する ( ステップ SB— 4) 。  Next, the bias correction unit 102b calculates the asymptote by using the coordinates of the lower gene group of the product AB after the rotation of the coordinate axis by the processing of the bias determination unit 102g (step SB-). Four) .

ついで、 バイアス補正部 102 bは、 バイアス判定部 102 gの処理により、 漸 近線の傾き力;、 正数か否か判定する (ステップ SB— 5) 。 正数の場合、 バイアス 半 IJ定部 102 gは、 Aのデータはより多くのバイアスを含んでいると判定する。 従 つて、 バイァス補正部 1 02 bは、 バイァス判定部 1 02 gの処理により、 L o g 2A— L o g2B軸系にある積 ABの下位遺伝子集団 (例えば、 下位 10%に含まれ る遺伝子の集団など) の座標を用い、 L o g2B軸のデータを独立変数として、 L o g 2 Aのデータを従属変数として用いた最小二乗法により、 下位遺伝子集団の漸 近線と L o g2A軸との交差点 (Ac, 0) の値 Acを求める (ステップ SB— 6) ついで、 バイアス補正部 1 02 bは、 補正プロッ ト生成部 1 0 2 hの処理により バイアスを求め、 対照測定値のデータからバイアスを差し引く (ステップ SB— 7 ) 。 Next, the bias correction unit 102b determines whether or not the asymptotic gradient force is a positive number by the processing of the bias determination unit 102g (step SB-5). In the case of a positive number, the bias half IJ constant unit 102g determines that the data of A includes more bias. Accordance connexion, Baiasu correction unit 1 02 b, due Baiasu determining unit 1 02 g of processing, L og 2 A- lower gene cluster of product AB in L og 2 B shafting (e.g., Ru contained in the bottom 10% Log 2 B axis data as an independent variable, and L og 2 A data as a dependent variable, using the least squares method, the asymptote of the lower gene group and L og 2 Find the value A c of the intersection (A c , 0) with the A axis (step SB-6). Then, the bias correction unit 102 b calculates the bias by the processing of the correction plot generation unit 102 h, Subtract the bias from the measured data (step SB-7).

一方、 ステップ SB— 5において、 漸近線の傾きが正数でない場合、 バイアス判 定部 102 gは、 ゼロであるか否か判定する (ステップ SB— 8) 。 ゼロの場合、 バイァス補正処理を終了する。  On the other hand, when the asymptote has a non-positive slope in step SB-5, the bias determining unit 102g determines whether or not the value is zero (step SB-8). If zero, the bias correction process ends.

また、 ステップ SB— 8において、 漸近線の傾きがゼロでない場合、 バイアス判 定部 102 gは、 Bのデータがより多くのバイアスを含んでいると判定する。 従つ て、 バイアス補正部 102 bは、 補正プロット生成部 1 02 hの処理により、 L o g2A—L o g2B軸系にある積 ABの下位遺伝子集団 (例えば、 下位 10%に含ま れる遺伝子の集団など) の座標を用い、 L o g 2A軸のデータを独立変数として、 L o g2Bのデータを従属変数として用いた最小二乗法により、 下位遺伝子集団の 漸近線と L o g 2B軸との交差点 (0, Bc) の値 Bcを求め (ステップ SB— 9) 上述したステツプ S B— 7の処理を行なう。 If the slope of the asymptote is not zero in step SB-8, the bias determining unit 102g determines that the data of B contains more bias. Accordingly, the bias correction unit 102b performs the processing of the correction plot generation unit 102h to generate the lower gene group of the product AB (for example, included in the lower 10%) in the Log 2 A—Log 2 B axis system. Log 2 A axis data as independent variables and L og 2 B data as dependent variables using the least-squares method, asymptote of lower gene population and L og 2 B obtains the value B c of intersection (0, B c) of the shaft (step into SB-9) performs the process of step into SB-7 described above.

次に、 バイアス補正部 1◦ 2 bは、 ネ ft正プロット生成部 102 hの処理により、 バイアスを差し引いたデータを用いて、 直交軸系 Lo g2 (A-k) — Lo g2B軸 系あるいは L o g2A— L o g2 (B-k) 軸系を構築する (ステップ SB— 10) t これにてバイァス捕正処理を終了する。 Next, the bias correction unit 1◦2 b, by the processing of the neft positive plot generation unit 102h, Using the data from which the bias has been subtracted, construct an orthogonal axis system Lo g 2 (Ak) — Log 2 B axis system or Log 2 A—Log 2 (Bk) axis system (Step SB-10) t To end the bias correction process.

[遺伝子検出処理]  [Gene detection processing]

次に、 遺伝子検出処理の詳細について、 図 18を参照して説明する。 図 18は本 実施形態の本装置の遺伝子検出処理の一例を示すフローチヤ一トである。  Next, details of the gene detection processing will be described with reference to FIG. FIG. 18 is a flowchart showing an example of the gene detection process of the present apparatus of the present embodiment.

まず、 遺伝子発現情報解析装置 100の遺伝子検出部 102 cは、 ウィンドウ設 定部 102 iの処理により、 利用者に対して、 図 11を用いて上述したウィンドウ 内の遺伝子数、 および、 信頼限界値である信頼度 (Pe値) を設定させるための遺 伝子抽出条件設定画面を出力装置 114に出力する (ステップ SC_1) 。 First, the gene detection unit 102c of the gene expression information analysis apparatus 100 provides the user with the number of genes in the window described above with reference to FIG. A gene extraction condition setting screen for setting the reliability (P e value) is output to the output device 114 (step SC_1).

ここで、 図 20は、 ウィンドウ設定部 102 iの処理により、 出力装置 114に 出力される遺伝子抽出条件設定画面の一例を示す図である。 図 20に示すように、 遺伝子抽出条件設定画面は、 ウィンドウ内遺伝子数の入力領域 MA— 1、 信頼限界 値である信頼度 (Pe値) の入力領域 MA— 2、 設定終了ボタン MA— 3等を含ん で構成される。 Here, FIG. 20 is a diagram showing an example of a gene extraction condition setting screen output to the output device 114 by the processing of the window setting unit 102i. As shown in Fig. 20, the gene extraction condition setting screen has an input area MA-1 for the number of genes in the window, an input area MA-2 for the confidence level (P e value), which is the confidence limit value, and an MA- 3 setting end button. And so on.

ここで、 利用者が、 図 20に示す遺伝子抽出条件設定画面を見ながら入力装置 1 12を用いて、 入力領域 MA— 1、 MA— 2の各項目の入力を完了した後、 設定終 了ボタン MA— 3を選択すると、 遺伝子数入力部 102 pおよび信頼限界値入力部' 102 qは、 遺伝子抽出条件設定画面で設定された情報に基づいて、 図 11に示す ウィンドゥ内の遺伝子が設定値となるようにウィンドウの大きさを調整する。  Here, the user completes the input of each item of the input areas MA-1 and MA-2 using the input device 112 while watching the gene extraction condition setting screen shown in Fig. 20, and then a setting end button. When MA-3 is selected, the gene number input section 102p and the confidence limit value input section '102q' determine the set values of the genes in the window shown in Fig. 11 based on the information set on the gene extraction condition setting screen. Adjust the size of the window so that

再び図 18に戻り、 遺伝子検出部 102 cは、 信頼限界点決定部 102〗の処理 により、 X軸の最左端から、 ウィンドウ内の各点の Y軸 (変化倍率) の値を用いて, 分散と中心を計算し、 信頼限界点である発現量変化が増加の境界値 yl imi t+、 減 少の倍率の境界値 y , imi t、 および、 X軸の重心を求める (ステップ SC— 2)。 ついで、 遺伝子検出部 102 cは、 ウィンドウ移動部 102 kの処理により、 X 軸の蛍光強度が増す方向にウィンドウを一定遺伝子分移動させ、 信頼限界点決定部 102 jの処理により、 新たなウィンドウでの信頼限界点となる発現量変化倍率の 境界値 y , i m i t +と y , i m i t、 および、 X軸の重心を求める (ステップ S C— 3 ) ついで、 遺伝子検出部 1 0 2 cは、 この処理をウィンドウが X軸の最右端になる まで繰り返す (ステップ S C— 4 ) 。 Returning to FIG. 18 again, the gene detection unit 102c calculates the variance from the leftmost end of the X-axis using the value of the Y-axis (multiplication factor) of each point in the window from the processing of the confidence limit point determination unit 102〗 and a central compute, the trust boundary value of limit point at which the expression level change increases y l imi t +, boundary values y of the subtractive low magnification, imi t, and calculates the center of gravity of the X-axis (step SC- 2) . Then, the gene detection unit 102c moves the window by a certain number of genes in the direction in which the X-axis fluorescence intensity increases by the processing of the window moving unit 102k, and the new window is processed by the processing of the confidence limit point determination unit 102j. Of the fold change in expression level Find the boundary values y, imit + and y, imit , and the center of gravity of the X axis (step SC-3). Then, the gene detector 102c repeats this process until the window is at the rightmost end of the X axis. (Step SC-4).

ついで、 遺伝子検出部 1 0 2 cは、 信頼境界線作成部 1 0 2 mの処理により、 全 てのウィンドウの発現変化の信頼限界点である発現量変化倍率境界点を 3次スプラ イン曲線によりつなぎ、 発現変動信頼曲線である発現倍率の増加境界線、 および、 減少境界線を決める (ステップ S C— 5 ) 。  Next, the gene detection unit 102 c calculates the expression amount change magnification boundary point, which is the reliability limit point of the expression change in all windows, by the processing of the confidence boundary line creation unit 102 m using a cubic spline curve. Then, determine the boundary for increasing and decreasing the expression fold, which is the reliability curve of expression fluctuation (Step SC-5).

ついで、 遺伝子検出部 1 0 2 cは、 変動遺伝子抽出部 1 0 2 nの処理により、 発 現変動信頼曲線である発現倍率の増加境界線、 および、 減少境界線で挟んだ領域よ り外れた遺伝子 (変動遺伝子) を抽出することにより、 多重検定により発現量が変 化した遺伝子を頑健 (ロバスト) に検出することができる (ステップ S C— 6 ) 。 また、 本発明は、 各スポットの偏差値を計算することにより、 遺伝子検出効率の 向上を行ってもよい。 以下に、 本実施形態の本装置の偏差値を用いた遺伝子検出処 理の詳細について、 図 2 6および図 2 7を参照して説明する。 図 2 6は本実施形態 の本装置の偏差値を用いた遺伝子検出処理の一例を示すフローチヤ一トである。 まず、 禾 ϋ用者がウィンドウ内の遺伝子数および信頼度 (P e値) を設定した後 ( ステップ S E— 1 ) 、 偏差値処理部 1 0 2 uは、 上述したように蛍光強度平衡軸方 向に一定数の遺伝子を含むウィンドウを設定し、 各ウィンドウ内全遺伝子の発現量 の変化率を表す Y軸の値を用いて、 平均値、 標準偏差値を求める。 次に、 偏差値処 理部 1 0 2 uは、 全遺伝子の X軸の値を用いて重心 (蛍光強度の中間値に相当する ) を求める (ステップ S E— 2 ) 。  Next, the processing of the variable gene extraction unit 102n caused the gene detection unit 102c to deviate from the region between the increase and decrease boundaries of the expression fold, which is the expression fluctuation reliability curve. By extracting genes (variable genes), it is possible to robustly detect genes whose expression levels have been changed by multiple tests (Step SC-6). In the present invention, the gene detection efficiency may be improved by calculating the deviation value of each spot. Hereinafter, the details of the gene detection processing using the deviation value of the present apparatus of the present embodiment will be described with reference to FIGS. 26 and 27. FIG. 26 is a flowchart showing an example of a gene detection process using the deviation value of the present apparatus of the present embodiment. First, after the user sets the number of genes in the window and the reliability (P e value) (Step SE-1), the deviation value processing unit 102 u determines the fluorescence intensity equilibrium axis as described above. A window containing a certain number of genes is set in the direction, and the average value and standard deviation value are calculated using the Y-axis value that represents the rate of change in the expression level of all genes in each window. Next, the deviation value processing unit 102 u obtains the center of gravity (corresponding to an intermediate value of the fluorescence intensity) using the values of the X-axis of all the genes (step SE-2).

続レ、て、 偏差値処理部 1 0 2 uは、 X軸方向に一定遺伝子ずつウィンドウを移動 させ、 最右端のウィンドウまで同様の処理を繰り返す (ステップ S E— 3 ) 。  Then, the deviation value processing unit 102 u shifts the window by a constant gene in the X-axis direction, and repeats the same processing until the rightmost window (step S E-3).

ついで、 偏差値処理部 1 0 2 uは、 求めた複数の (蛍光強度の中間値、 平均値) のデータセットを一連の (X , y ) のデータとして平滑化によりネ甫完し (例えば、 3次スプライン曲線を作成) 、 図 2 7に示す平均値の平滑線とする。 また、 偏差値 処理部 1 0 2 uは、 同様に複数の (蛍光強度の中間値、 標準偏差 ί直) のデータセッ トを平滑ィ匕により補完し (例えば、 3次スブラィン曲線を作成) 、 図 27に示す標 準偏差値の平滑線とする (ステップ SE— 4) 。 Next, the deviation value processing unit 102 u performs a smoothing process on the obtained data sets of the (intermediate values and average values of the fluorescence intensities) as a series of (X, y) data (for example, Create a cubic spline curve), and use the average value shown in Fig. 27 as a smooth line. In addition, the deviation value processing unit 102 u similarly stores a plurality of data sets (intermediate values of fluorescence intensity, standard deviations). Are complemented by a smoothing ridge (for example, a cubic submarine curve is created) to obtain a standard deviation smoothed line shown in FIG. 27 (step SE-4).

ついで、 偏差値処理部 102 uは、 各遺伝子の蛍光強度平衡軸の値 (X軸の値) より、 それに対応する平均値の平滑線上の Y値、 そして標準偏差値の平滑線上の Y 値を用いて、 以下の数式により偏差値を計算する (ステップ SE— 5) 。 偏差値 = (遺伝子の y値一平滑線から得られた平均値) /  Then, the deviation value processing unit 102 u calculates the Y value on the smooth line of the average value and the Y value on the smooth line of the standard deviation value from the value of the fluorescence intensity equilibrium axis (the value of the X axis) of each gene. The deviation value is calculated using the following formula (Step SE-5). Deviation value = (average value obtained from the y value of the gene minus the smoothed line) /

平滑線から得られた標準偏差値 σ このように計算された各スポッ トの偏差値を変動比率 (倍率) の代わりに用いる ことにより、 スライ ド間の誤差の差異に影響されない解析が可能になる。 すなわち、 従来各マイクロアレイなどの物理的な誤差、 各チップごとに検出する際の人為的な 誤差が一定ではないため、 チップ間等の比較を行うことが困難であつたが、 偏差値 を用いることによりチップ間等の比較が容易になる。  Standard deviation σ obtained from the smoothed line σ By using the deviation of each spot calculated in this way instead of the variation ratio (magnification), analysis that is not affected by differences in errors between slides becomes possible. . In other words, it has been difficult to compare between chips, etc., because the physical error of each microarray and the like and the artificial error when detecting each chip were not constant in the past. This facilitates comparison between chips and the like.

また、 従来から遺伝子発現パターンの分類や共発現遺伝子の抽出のために階層的 クラスタリング (一次元、 二次元) 、 Κ一 Me a n s法、 自己組織化マップ法など を用いたクラスター解析に代表される多変量解析が行われている。 例えば、 変動比 率の対数を用いるものとして、 MB E i s e n, PT S p e 1 1 ma n, PO B r own, D B o t s t e i n (1 998) , " C l u s t e r a n a l y s i s a n d d i s p l a y o f g e n ome— w i d e e x p r e s s i o n p a t t e r n s " , P r o c e e d i n g s o f t h e Na t i o n a l Ac a d emy o f S c i e n c e s, 95 (25) : 14863- 14868が公知である。 また、 正規化した変動比率を用いるもの として、 TR Go l u b, DK S l o n i m, P Tama y o, C Hu a r d, M Ca a s e n b e e k, J P Me s i r o v, H C o l l e r, ML L o h, J R Down i n g, MA C a 1 i g i u r i , CD B 1 o om f i e 1 d, E S L a n d e r (1 999) 、 " Mo 1 e c u 1 a r c l a s s i f i c a t i o n o f c a n c e r : c 1 a s s d i s c o v e r y a n d c l a s s p r e d i c t i o n b y g e n e e x p r e s s i o n mo n i t o r i n g " , S c i e n c e , 28 6 : 5 3 1 - 5 3 7が公知である。 ここで、 本方法により計算される偏差値を、 クラスタ一解析に代表される多変量解析において変動比率の対数や正規ィ匕した変動 比率の代わりに用いることにより、 発現量の大小による誤差の影響の違いに左右さ れない解析が可能になる。 Traditionally, cluster analysis using hierarchical clustering (one-dimensional or two-dimensional), Κ-one Means method, self-organizing map method, etc. has been used to classify gene expression patterns and extract co-expressed genes. Multivariate analysis has been performed. For example, assuming that the logarithm of the variation ratio is used, MB E isen, PT Spe 11 man, PO Brown, DB otstein (1 998), "Clusteranalysisanddisplayofgenome—wideexpressionpatterns", ProceedingsoftheNational Ac ad emy of Sciences, 95 (25): 14863-1868. In addition, assuming that the normalized fluctuation ratio is used, TR Go lub, DK S lonim, P Tama yo, C Huard, M Ca asenbeek, JP Me sirov, HC oller, ML Loh, JR Downing, MA Ca 1 igiuri, CD B 1 o om fie 1 d, ESL ander (1 999), "Mo 1 ecu 1 arclassificationofcancer: c 1 assdiscoveryandclasspredictionbygeneexpressionmonitoring ", Science, 286: 531-15337. Here, the deviation calculated by this method is used for cluster-one analysis. By using instead of the logarithm of the variation ratio or the normalized variation ratio in the representative multivariate analysis, it becomes possible to perform analysis independent of the difference in the influence of errors depending on the expression level.

これにて遺伝子検出処理が終了する。  This ends the gene detection process.

[シミュレーション処理]  [Simulation processing]

次に、 本発明のシミュレーション処理の詳細について、 図 1 9および図 2 1を参 照して説明する。 図 1 9は本実施形態の本装置の遺伝子検出処理の一例を示すフ口 —チヤ一トである。  Next, details of the simulation processing of the present invention will be described with reference to FIGS. FIG. 19 is a flowchart showing an example of the gene detection process of the present apparatus of the present embodiment.

まず、 遺伝子発現情報解析装置 1 00のシミュレーション部 1 02 dは、 シミュ レーシヨン条件設定部 1 02 rの処理により、 利用者に対して、 シミュレーション の各種の条件パラメータ (例えば、 遺伝子分布の標準偏差 (幅) 、 発現が変動する 遺伝子の分布の中心、 検出基準 (検出数 Z全体数) 、 および、 シミュレーション回 数) を設定させるためのシミュレーション条件設定画面を出力装置 1 1 4に出力す る (ステップ SD— 1)。  First, the simulation unit 102d of the gene expression information analyzer 100 provides the user with various condition parameters of the simulation (for example, the standard deviation of the gene distribution (eg, the standard deviation of the gene distribution) by the processing of the simulation condition setting unit 102r. Output the simulation condition setting screen for setting the width, the distribution center of the gene whose expression fluctuates, the detection criterion (the number of detections Z as a whole, and the number of simulations) to the output device 114 (step SD—1).

ここで、 図 2 1は、 シミュレーション条件設定部 1 0 2 rの処理により、 出力装 置 1 14に出力されるシミュレーション条件設定画面の一例を示す図である。 図 2 1に示すように、 シミュレーション条件設定画面は、 遺伝子分布の標準偏差の入力 領域 MB— 1、 遺伝子分布の中心の入力領域 MB - 2、 検出基準の入力領域 MB― 3、 シミュレーション回数の入力領域 MB— 4、 設定終了ボタン MB— 5を含んで 構成される。  Here, FIG. 21 is a diagram illustrating an example of a simulation condition setting screen output to the output device 114 by the processing of the simulation condition setting unit 102r. As shown in Fig. 21, the simulation condition setting screen shows the input area MB-1 of the standard deviation of the gene distribution, the input area MB-2 at the center of the gene distribution, the input area MB-3 of the detection standard, and the input of the number of simulations. It consists of area MB-4 and setting end button MB-5.

なお、 遺伝子の分布の標準偏差は、 例えば、 発現が変わらない遺伝子の分布を標 準正規分布として標準偏差 σ = 1、 中心 μ = 0としたときに、 標準偏差 σの幅を 0. 1力 ら 1. 5の範囲で設定してもよい。 また、 変動遺伝子の分布の中心は、 例えば、 当該条件のときに、 中心 μの幅を 0 . 4から 3の範囲で設定してもよい。 また、 変 動遺伝子の検出基準は、 例えば、 全体数からみた検出された遺伝子の割合を、 2 / 3、 2 / 4、 3 4、 3 / 6 , 4 / 6などで設定してもよい。 また、 シミュレーシ ヨン回数は、 例えば、 3回から 1 0回の範囲で設定してもよレ、。 The standard deviation of the gene distribution is, for example, the standard deviation σ = 1 and the center μ = 0 as the standard normal distribution of the gene whose expression does not change. May be set in the range of 1.5. Also, the center of the distribution of the fluctuating gene is, for example, Under this condition, the width of the center μ may be set in the range of 0.4 to 3. In addition, the detection criterion for the transgene may be, for example, set to 2/3, 2/4, 34, 3/6, 4/6, etc., based on the total number of detected genes. The number of simulations may be set, for example, in a range of 3 to 10 times.

ここで、 利用者が、 シミュレーション条件設定画面を見ながら入力装置 1 1 2を 用いて、 入力領域 MB― 1〜入力領域 MB— 4の各項目の入力を完了した後、 設定 終了ボタン MB— 5を選択すると、 シミュレーション部 1 0 2 dは、 シミュレーシ ョン実行部 1 0 2 sの処理により、 シミュレーション条件設定画面で設定された情 報に基づいて、 上述したバックグラウンド補正処理、 バイアス補正処理、 および、 遺伝子検出処理を繰り返して実行して、 遺伝子検出処理により抽出した発現量が変 わる遺伝子集団 (変動遺伝子集団) 、 および、 発現量が変動しなかった遺伝子集団 (非変動遺伝子集団) の混合分布のシミュレーション処理を行う (ステップ S D— 2 ) 。  Here, after the user completes the input of each of the input areas MB-1 to MB-4 using the input device 112 while watching the simulation condition setting screen, the setting end button MB-5 When is selected, the simulation unit 102d executes the above-described background correction process and bias correction process based on the information set on the simulation condition setting screen by the processing of the simulation execution unit 102s. , And the gene group whose expression level is changed (variable gene group) and the gene group whose expression level does not change (non-variable gene group) extracted by the gene detection process are repeatedly executed. A simulation process of the mixture distribution is performed (Step SD-2).

ついで、 シミュレ一ション部 1 0 2 dは、 シミュレーション結果出力部 1 0 2 t の処理により、 図 7から図 1 0に示すシミュレーション結果画面用データを出力装 置 1 1 4に出力する (ステップ S D— 3 ) 。  Then, the simulation unit 102 d outputs the simulation result screen data shown in FIGS. 7 to 10 to the output unit 114 by the processing of the simulation result output unit 102 t (step SD — 3)

ここで、 図 7から図 1 0は、 シミュレーションによる第一種の検出エラー (偽陽 性) の計算結果の一例を示した図である。 混合分布は、 上述した六つのシミュレ一 シヨン条件で設定したパラメータ (全遺伝子数、 発現が変動する遺伝子の割合、 遺 伝子分布の標準偏差 (幅) 、 発現が変動する遺伝子の分布の中心、 検出基準 (検出 数/全体数) 、 および、 各データセット (ウィンドウ) 内の信頼限界値 (P— v a 1 u e ) ) に依存する。  Here, FIG. 7 to FIG. 10 are diagrams showing an example of a calculation result of a first-type detection error (false positive) by simulation. The mixture distribution is based on the parameters set under the above six simulation conditions (the total number of genes, the proportion of genes whose expression fluctuates, the standard deviation (width) of the gene distribution, the center of the distribution of genes whose expression fluctuates, It depends on the detection criteria (number of detections / overall number) and the confidence limits (P—va 1 ue) in each dataset (window).

図 7は、 発現が変わる遺伝子集団 (変動遺伝子集団) の中心 μ 'を ± σ、 標準偏 差を 1に設定し、 検出基準を 3回のうち 2回が検出されるとき (検出基準 = 2 / 3 ) 、 第一種の検出エラーの計算結果をグラフ出力した図である。  Figure 7 shows the case where the center μ 'of the gene group whose expression changes (variable gene group) is set to ± σ and the standard deviation is set to 1, and two out of three detection criteria are detected (detection criteria = 2 / 3) is a graph showing a calculation result of a first-type detection error.

一方、 図 8は、 発現が変わる遺伝子集団 (変動遺伝子集団) の中心 μ 'を土。、 標準偏差を 1に設定し、 検出基準を 4回のうち 3回が検出されたら、 発現が変わつ たとする場合を示した図である。 On the other hand, Fig. 8 shows the soil at the center μ 'of the gene group whose expression changes (variable gene group). , The standard deviation is set to 1, and the expression changes when 3 out of 4 detection criteria are detected. It is a figure showing the case where it is assumed.

これら 2つの図の比較により、 α (第一種の検出エラー) は各ウィンドウ内で検 出する Ρε値に大きく依存することがわかる。 尚、 図の横軸は発現が変動した遺伝 子集団が全遺伝子を占める割合を表している。 These Comparison of the two figures, alpha (first type of detection error) is seen to be highly dependent on [rho epsilon values detect within each window. The horizontal axis in the figure represents the proportion of the gene group whose expression fluctuated occupies all genes.

すなわち、 発現が変わらない遺伝子の分布を標準正規分布 (標準偏差 σ = 1、 中 心 μ = 0) として発生し、 一方、 発現が変わる遺伝子の分布は標準正規分布の左か 右に 50 %の確率で発生する。  That is, the distribution of genes whose expression does not change is generated as a standard normal distribution (standard deviation σ = 1, center μ = 0), while the distribution of genes whose expression changes is 50% to the left or right of the standard normal distribution. Occurs with probability.

ただし、 混合分布は合計六つのパラメータ (すべての遺伝子の数、 発現が変わる 遺伝子が全体に占める割合、 発現が変わる遺伝子の分布の標準偏差と中心、 そして 検出の基準および各データセット内の信頼限界) に依存する。  However, the mixture distribution has a total of six parameters (the number of all genes, the percentage of genes whose expression changes, the standard deviation and center of the distribution of genes whose expression changes, the detection criteria, and the confidence limits within each dataset. )

- また、 多重検定の第一種のエラー α、 すなわち、 発現が変わらない遺伝子が変わ る遺伝子として検出されたエラーのみを示し、 またすベての結果はパラメータを固 定した後、 十回の計算結果の平均を表している。 また、 発現が変わる遺伝子集団の 中心 ' =±σ、 標準偏差 =1のとき、 (a) 検出基準: 3回のうち 2回が検出さ れたら、 発現が変わったとする場合 (図 7の場合) と、 (b) 検出基準: 4回のう ち 3回が検出されたら、 発現が変わったとする場合 (図 8の場合) との比較により αは各ウィンドウ内で検出する P e値に大きく依存することがわかる。  -In addition, only the first type of error α of the multiple test, that is, the error detected as a gene whose expression does not change is shown as a changing gene, and all the results are 10 times after fixing the parameters. Shows the average of the calculation results. In addition, when the center of the gene population whose expression changes' = ± σ, standard deviation = 1, (a) Detection criteria: If expression is changed when two out of three detections are detected (Fig. 7 ) And (b) Detection criteria: If three out of four detections were detected, the expression would change (in the case of Fig. 8), and α was larger than the Pe value detected in each window. It turns out that it depends.

さらに、 図 9、 および、 図 10は、 発現が変わる遺伝子集団 (変動遺伝子集団) の標準偏差を 1とした場合を示した図である。 図 9では、 Pe=0. 1 5となり、 図 10では、 Pc = 0. 25となる。 従って 95%の信頼度を得るためには、 検定 基準を 3回中 2回とする場合は、 データセット内の信頼限界 Peを 0. 1 5以下に 設定すればよく、 一方、 検定基準を 4回中 3回とする場合は、 データセット内の信 頼限界 Peを 0. 25以下に設定すればよいことがわかる。 尚、 図の横軸は、 発現 が変わる遺伝子集団の中心と発現が変わらない遺伝子集団の標準偏差とを積算した 数値を表し、 図中の 「TNum」 は全遺伝子数、 「d i ί— x%」 は発現が変わる 遺伝子集団が占める割合、 そして、 「2/3」 、 および、 「3Z4」 は検出基準を 意味する。 これにて、 シミュレーション処理を終了する。 Further, FIGS. 9 and 10 are diagrams showing the case where the standard deviation of the gene group whose expression changes (variable gene group) is set to 1. FIG. In FIG. 9, P e = 0.15, and in FIG. 10, P c = 0.25. Thus in order to obtain a 95% confidence level, when the two three times in assay criteria may be set confidence limits P e in the dataset 0.5 to 1 5 or less, whereas, the test reference If the four 3 times, it can be seen that the confidence limits P e of the data set may be set to 0.25 or less. The horizontal axis of the figure represents the value obtained by integrating the center of the gene group whose expression changes and the standard deviation of the gene group whose expression does not change. In the figure, “TNum” is the total number of genes and “di di—x% "" Means the percentage of the gene population whose expression is changed, and "2/3" and "3Z4" mean the detection criteria. This ends the simulation processing.

[他の実施の形態]  [Other embodiments]

さて、 これまで本発明の実施の形態について説明したが、 本発明は、 上述した実 施の形態以外にも、 上記特許請求の範囲に記載した技術的思想の範囲内において種 々の異なる実施の形態にて実施されてよいものである。  Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, but may be implemented in various different forms within the scope of the technical idea described in the claims. It may be implemented in a form.

また、 実施形態において説明した各処理のうち、 自動的に行なわれるものとして 説明した処理の全部、 または、 一部を手動的に行なうこともでき、 あるいは、 手動 的に行なわれるものとして説明した処理の全部または一部を公知の方法で自動的に 行なうこともできる。  Further, of the processes described in the embodiment, all or some of the processes described as being automatically performed may be manually performed, or the processes described as being performed manually may be performed. All or a part of the method can be automatically performed by a known method.

この他、 上記文書中や図面中で示した処理手順、 制御手順、 具体的名称、 シミュ レ一シヨン条件等のパラメータを含む情報、 画面例については、 特記する場合を除 いて任意に変更することができる。  In addition, the processing procedures, control procedures, specific names, information including parameters such as simulation conditions, and screen examples shown in the above documents and drawings, and screen examples may be arbitrarily changed unless otherwise specified. Can be.

また、 シミュレ一ション部 1 0 2 dは、 ガンマ分布などの他の分布と混合分布シ ミュレーシヨンをすることにより、 上述した信頼度 (P e値) 、 第一種、 第二種の 検出エラー等を求めてもよい。 上述した実施形態においては、 変動しない遺伝子の 分布と変動する遺伝子の分布が正規分布となる場合を一例として説明したが、 例え ば、 変動する遺伝子の分布は正規分布以外の分布 (たとえばガンマ分布) で発生さ せてもよく、 本発明をあらゆる分布をとる遺伝子集団に適用することが可能である c また、 上述した本装置のバイアス判定部 1 0 2 gによるバイアス判定処理は、 軸 回転後にバイアスの大小を判定するものに限定されず、 例えば、 図 2 8に示すよう に、 軸回転の前に高発現漸近線の傾き aと低発現漸近線の傾き bとを比較すること により、 バイアスの大小を判定してもよい。 また、 本処理において座標の回転は必 要条件ではない。 The simulation section 102 d simulates a mixture distribution with another distribution such as a gamma distribution to obtain the above-described reliability (P e value), the first type and the second type of detection error, and the like. May be required. In the above-described embodiment, the case where the distribution of the unchanging gene and the distribution of the fluctuating gene are a normal distribution has been described as an example. For example, the distribution of the fluctuating gene is a distribution other than the normal distribution (eg, a gamma distribution). in may be generated, c also it is possible to apply the present invention to gene group take any distribution, bias determination process by the bias determining unit 1 0 2 g of the apparatus described above, the bias after axial rotation For example, as shown in Fig. 28, by comparing the slope a of the high expression asymptote with the slope b of the low expression asymptote before the rotation of the axis, as shown in Fig. 28, the bias The magnitude may be determined. In this process, the rotation of the coordinates is not a necessary condition.

また、 遺伝子発現情報解析装置 1 0 0に関して、 図示の各構成要素は機能概念的 なものであり、 必ずしも物理的に図示の如く構成されていることを要しない。  Regarding the gene expression information analyzer 100, the components shown in the drawings are functionally conceptual, and need not necessarily be physically configured as shown in the drawings.

例えば、 遺伝子発現情報解析装置 1 0 0の各部が備える処理機能、 特に制御部 1 0 2にて行なわれる各処理機能については、 その全部、 または、 任意の一部を、 C PU (Ce n t r a l P r o c e s s i n g Un i t) , および、 当該 CPU にて解釈実行されるプログラムにて実現することができ、 あるいは、 ワイヤードロ ジックによるハードウェアとして実現することも可能である。 尚、 プログラムは、 後述する記録媒体に記録されており、 必要に応じて遺伝子発現情報解析装置 100 に機械的に読み取られる。 For example, with respect to the processing functions provided in each unit of the gene expression information analysis apparatus 100, in particular, each processing function performed by the control unit 102, all or any part of the processing functions is represented by C It can be implemented by PU (Central Processing Unit) and a program interpreted and executed by the CPU, or it can be implemented as hardware by wire logic. The program is recorded on a recording medium described later, and is mechanically read by the gene expression information analyzer 100 as necessary.

また、 遺伝子発現情報解析装置 100は、 既知のパーソナノレコンピュータ、 ヮー クステーション等の情報処理端末等のコンピュータ (情報処理装置) にプリンタ、 モニタ、 イメージスキャナ等の周辺装置を接続し、 該情報処理装置に本発明の方法 を実現させるソフトウェア (プログラム、 データ等を含む) を実装することにより 実現してもよい。  Further, the gene expression information analyzer 100 connects peripheral devices such as a printer, monitor, and image scanner to a computer (information processing device) such as an information processing terminal such as a known personal computer or a work station. This may be realized by mounting software (including programs, data, and the like) for realizing the method of the present invention on the device.

さらに、 遺伝子発現情報解析装置 100の分散 ·統合の具体的形態は図示のもの に限られず、 その全部、 または、 一部を、 各種の負荷等に応じた任意の単位で、 機 能的または物理的に分散 ·統合して構成することができる。 例えば、 各データべ一 スを独立したデータベース装置として独立に構成してもよく、 また、 処理の一部を CG I (C ommo n Ga t ewa y I n t e r f a c e) を用いて実現して ちょい。  Furthermore, the specific form of dispersion / integration of the gene expression information analysis device 100 is not limited to the illustrated one, and all or a part thereof may be functionally or physically integrated in an arbitrary unit corresponding to various loads. Distributed and integrated. For example, each database may be configured independently as an independent database device, and a part of the processing may be realized by using CGI (Common Gateway Technology Int e rfa ace).

また、 本発明にかかるプログラムを、 コンピュータ読み取り可能な記録媒体に格 納することもできる。 ここで、 この 「記録媒体」 は、 フレキシブルディスク、 光磁 気ディスク、 ROM、 E PROM, EE PROM, CD-ROM, MO、 DVD等 の任意の 「可搬用の物理媒体」 や、 各種コンピュータシステムに内蔵される ROM、 RAM, HD等の任意の 「固定用の物理媒体」 、 あるいは、 LAN、 WAN, イン ターネッ卜に代表されるネットワークを介してプログラムを送信する場合の通信回 線や搬送波のように、 短期にプログラムを保持する 「通信媒体」 を含むものとする。 また、 「プログラム」 は、 任意の言語や記述方法にて記述されたデータ処理方法 であり、 ソースコードやバイナリコード等の形式を問わない。 尚、 「プログラム」 は必ずしも単一的に構成されるものに限られず、 複数のモジュールやライプラリー として分散構成されるものや、 OS (Op e r a t i n g Sy s t em) に代表 される別個のプログラムと協働してその機能を達成するものをも含む。 尚、 実施の 形態に示した各装置において記録媒体を読み取るための具体的な構成、 読み取り手 順、 あるいは、 読み取り後のインストール手順等については、 周知の構成や手順を 用いることができる。 Further, the program according to the present invention can be stored in a computer-readable recording medium. Here, this “recording medium” refers to any “portable physical medium” such as a flexible disk, magneto-optical disk, ROM, EPROM, EE PROM, CD-ROM, MO, DVD, etc., and various computer systems. Such as a communication line or carrier wave when transmitting a program via an arbitrary "fixed physical medium" such as built-in ROM, RAM, HD, etc., or a network represented by LAN, WAN, Internet And “communications media” that hold programs for a short period of time. A “program” is a data processing method described in an arbitrary language or description method, regardless of the format of source code or binary code. Note that “programs” are not necessarily limited to a single program, but are typically distributed as multiple modules and libraries, and are typically represented by an operating system (OS). Including those that achieve their functions in coordination with separate programs. It should be noted that a known configuration or procedure can be used for a specific configuration for reading a recording medium in each device described in the embodiment, a reading procedure, an installation procedure after reading, and the like.

また、 遺伝子発現情報角军析装置 1 0 0がスタンドアローンの形態で処理を行う場 合を一例に説明したが、 遺伝子発現情報解析装置 1 0 0とは別筐体で構成されるク ライアント端末からネットワークを介して送信される要求に応じて処理を行い、 そ の処理結果を当該クライアント端末に返却するように構成してもよレ、。  Also, the case where the gene expression information analyzing apparatus 100 performs the processing in a stand-alone form has been described as an example, but the client terminal configured in a separate housing from the gene expression information analyzing apparatus 100 has been described. , Processing may be performed in response to a request transmitted from the client via the network, and the processing result may be returned to the client terminal.

ここで、 ネットワークは、 遺伝子発現情報解析装置 1 0 0と外部のクライアン卜 装置とを相互に接続する機能を有し、 例えば、 インタ一ネットや、 イントラネット や、 L AN (有線/無線の双方を含む) や、 VANや、 パソコン通信網や、 公衆電 話網 (アナログ/デジタルの双方を含む) や、 専用回線網 (アナログ /デジタルの 双方を含む) や、 C AT V網や、 I MT 2 0 0 0方式、 G S M方式、 または、 P D CZP D C— P方式等の携帯回線交換網/携帯パケット交換網、 無線呼出網、 B 1 u e t o o t h等の局所無線網、 P H S網、 C S、 B Sまたは I S D B等の衛星通 信網等のうちいずれかを含んでもよい。 すなわち、 本装置は、 有線 ·無線を問わず 任意のネットワークを介して、 各種データを送受信することができる。  Here, the network has a function of interconnecting the gene expression information analyzer 100 and an external client device. For example, the network includes an Internet, an intranet, and a LAN (both wired and wireless). , VAN, PC communication network, public telephone network (including both analog and digital), leased line network (including both analog and digital), CATV network, IMT2 0 0 0 system, GSM system, PD CZP DC-P system, etc., mobile line switching network / mobile packet switching network, paging network, local wireless network such as B 1 uetooth, PHS network, CS, BS or ISDB etc. Or any of the above satellite communication networks. That is, the present device can transmit and receive various data via any network, whether wired or wireless.

以上詳細に説明したように、 本発明によれば、 D NAマイクロアレイや D N Aチ ップなどにより 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各ス ポットの測定輝度デ一タ力 らノくックグラウンド値を除去することによりバックグラ ゥンド補正された輝度データを作成することができる遺伝子発現情報解析装置、 遺 伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 また、 本発明によれば、 バックグラウンド補正された輝度データの対数 (自然対 数または 2の対数等) を X— Y軸にとり蛍光強度散布図 (スキヤッタープロット) を作成し、 各遺伝子のスポットについて同じ蛍光強度を示す蛍光強度平渙 ΐ軸に対す るバイアスを求め、 輝度データから当該バイアスを除去することにより蛍光強度平 衡軸と発現量の倍率軸を 2軸とする新たな X— Υ軸系の蛍光強度散布図を構築する ので、 より多くのバイアスを含む蛍光成分の判定を行い、 このバイアスを除去した 上で蛍光強度平衡軸と発現量の倍数軸とを 2軸とする新しい直行軸系を構築するこ とができる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 およ び、 記録媒体を提供することができる。 As described in detail above, according to the present invention, the measured luminance data of each spot obtained by measuring the fluorescence intensity indicating the expression level of the same gene under two conditions using a DNA microarray, a DNA chip, or the like. It is possible to provide a gene expression information analysis apparatus, a gene expression information analysis method, a program, and a recording medium that can create background-corrected luminance data by removing a force background value. . According to the present invention, the logarithm (natural logarithm or logarithm 2) of the background-corrected luminance data is plotted on the XY axis to create a fluorescence intensity scatter diagram (Scatter plot), and spots of each gene are generated. The bias for the fluorescence intensity 強度 axis, which shows the same fluorescence intensity, is obtained for the axis, and the bias is removed from the luminance data to remove the bias from the luminance data. Construct a fluorescence intensity scatter plot of the axis system Therefore, a fluorescent component containing more bias is determined, and after removing this bias, a gene capable of constructing a new orthogonal axis system having two axes of the fluorescence intensity equilibrium axis and the multiple axis of the expression level An expression information analyzer, a gene expression information analysis method, a program, and a recording medium can be provided.

また、 本発明によれば、 構築された新たな X— Y軸系の蛍光強度散布図に基づい て発現量が変動した変動遺伝子を検出するので、 従来の遺伝子検出法に比べて、 測 定装置、 標本間の誤差、 および、 蛍光標識効率などの違いの影響を受けずに正確に 発現量が変動した遺伝子を検出することができる遺伝子発現情報解析装置、 遺伝子 発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。  In addition, according to the present invention, a fluctuating gene whose expression level fluctuates is detected based on a constructed fluorescence scatter plot of a new XY axis system. A gene expression information analysis apparatus, a gene expression information analysis method, a program, and the like, which can accurately detect a gene whose expression level has fluctuated without being affected by errors between samples and differences in fluorescence labeling efficiency and the like. A recording medium can be provided.

また、 本発明によれば、 D N A濃度希釈系列の品質管理用のコント口一ノレ遺伝子 サンプル (例えば外部遺伝子え D NAサンプル、 あるいは発現量がほとんど変わら ないリボソームなどの H o u s e - k e e p i n g遺伝子サンプル) を目的遺伝子 サンプルと同時に測定し、 蛍光強度データの積の一番小さい遺伝子から順に一つず っコントロール遺伝子を除き、 残りすベてのコントロール遺伝子サンプルのデータ から遺伝子の発現量と D N A量の検量線をそれぞれ作成し、 データの相関係数を計 算し、 順番に計算される上記の相関係数が最初に強い相関が認められる基準 (例え ば 0 . 8以上) を満たした場合のコントロールサンプルの二つの条件における蛍光 強度データの積を閾値 1とし、 二つの条件における蛍光強度データの積が閾値 1を 上回るすべての遺伝子サンプノレの集団を発現量が多い遺伝子集団とし、 上記発現量 力 S順番に計算される相関係数度が最初に弱い相関が認められる基準 (例えば 0 . 5 以上) を満たした場合のコントロールサンプルの二つの条件における蛍光強度デー タの積を閾値 2とし (ただし、 閾値 2 <閾値 1 ) 、 二つの条件における蛍光強度デ 一タの積が閾値 2を下回るすべての遺伝子サンプルの集団を発現量が少なレ、遺伝子 集団とし、 発現量が多い遺伝子集団の蛍光強度対数値を用いて主成分分析を実行し、 第一主成分となる漸近線の傾きと切片を求め、 求めた漸近線と X軸との角度を Θと し、 発現量が少ない遺伝子集団の X— Y軸系における座標を右に Θ角度回転した座 標を計算し、 座標軸回転後の発現量が少ない遺伝子集団の座標を用いて、 蛍光強度 平衡軸の傾きを計算し、 計算された傾き (例えば、 正、 負、 ゼロ等) に基づいて 2 つの条件の輝度データのうちどちらにバイアスが多く含まれているかを判定し、 バ ィァスが多く含まれていると判定された条件の輝度データからバイアスを差し引く こと (例えば、 一定のバイアスをもつ遺伝子集団について座標を回転させる等) に より蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散 布図を構築するので、 実測値のバイアスを効率的に除去し、 かつ、 データの性質を 明白に表現できる蛍光強度散布図を作成することができる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 また、 本発明によれば、 主成分分析は、 分散 ·共分散行列を用いて行うので、 従 来から発現遺伝子解析に用レ、られている相関行列を用レ、た主成分分析法と比較して 正規化を要しないため、 効率的に主成分分析を行うことができる遺伝子発現情報解 析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供すること ができる。 In addition, according to the present invention, a control gene sample for quality control of a DNA concentration dilution series (for example, an external gene or a DNA sample, or a house-keeping gene sample such as a ribosome whose expression level hardly changes) is used. Measure simultaneously with the target gene sample, remove all control genes in order from the gene with the smallest product of the fluorescence intensity data, and calibrate the gene expression level and DNA amount from the data of all remaining control gene samples And calculate the correlation coefficient of the data, and calculate the correlation coefficient of the control sample when the above-mentioned correlation coefficient, which is calculated in order, first satisfies the criteria for strong correlation (for example, 0.8 or more). The product of the fluorescence intensity data under the two conditions is defined as the threshold 1, and the product of the fluorescence intensity data under the two conditions exceeds the threshold 1. The gene sampnore population is defined as a gene group with a high expression level, and the control is performed when the correlation coefficient degree calculated in the order of the expression level satisfies a criterion (for example, 0.5 or more) at which a weak correlation is first recognized. The product of the fluorescence intensity data under the two conditions of the sample is defined as threshold 2 (threshold 2 <threshold 1), and the product of all the gene samples whose product of the fluorescence intensity data under the two conditions is less than threshold 2 is expressed. Assuming that the gene population is small, the principal component analysis is performed using the logarithmic value of the fluorescence intensity of the gene population with high expression level, and the slope and intercept of the asymptote as the first principal component are calculated. The angle with respect to the X axis is defined as 、, and the coordinates of the gene group with low expression level in the X-Y axis system are rotated to the right by Θ angle, and the coordinates of the gene group with low expression level after rotation of the coordinate axis are calculated. Using the fluorescence intensity The slope of the equilibrium axis is calculated, and based on the calculated slope (eg, positive, negative, zero, etc.), it is determined which of the two conditions of the luminance data contains more bias. By subtracting the bias from the luminance data of the condition determined to be included (for example, rotating the coordinates for a gene population having a constant bias), the fluorescence intensity equilibrium axis and the expression level magnification axis are set to two axes. A new X-Y axis fluorescence intensity scatter plot that efficiently removes bias from measured values and creates a fluorescence intensity scatter plot that can clearly express the nature of data A gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium can be provided. In addition, according to the present invention, since the principal component analysis is performed using a variance / covariance matrix, it is compared with a principal component analysis method using a correlation matrix which has been conventionally used for analysis of expressed genes. As a result, since normalization is not required, it is possible to provide a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can efficiently perform principal component analysis.

また、 本発明によれば、 予め定めた区間内のウィンドウを設定し、 設定された各 ウィンドウ内において遺伝子の輝度データの平均値、 標準偏差、 P値 (例えば、 9 Further, according to the present invention, a window within a predetermined section is set, and within each set window, the average value, standard deviation, and P value (eg, 9

5 %値) 、 重心などのうち少なくとも一つを用いて信頼限界点を決定する。 そして、 蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動し、 移動した各ウィンドウ につレ、て各信頼限界点を求め、 求めた複数の信頼限界点に基づレ、て信頼境界線を作 成する信頼境界線作成手段と、 上記信頼境界線作成手段により作成された上記信頼 境界線の外側に位置する遺伝子を発現量が変動した変動遺伝子として抽出するので、 安定性、 再現性、 および、 信頼度の高い発現遺伝子抽出を行うことができる遺伝子 発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提 供することができる。 5% value), determine the confidence limit using at least one of the center of gravity. Then, the window is moved by a certain number of genes in the direction of the fluorescence intensity equilibrium axis, and each of the moved windows is determined for each of the confidence limit points, and a confidence boundary is created based on the obtained plurality of confidence limit points. And extracting the genes located outside of the trust boundary created by the trust boundary creation means as fluctuating genes whose expression levels fluctuate, so that the stability, reproducibility, and It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can perform highly reliable expression gene extraction.

また、 本発明によれば、 誤差の範囲が異なる実験データであっても、 その誤差に 応じて、 発現量変動倍率の閾値が決められる遺伝子発現情報解析装置、 遺伝子発現 情報解析方法、 プログラム、 および、 記録媒体を提供することができる。  Further, according to the present invention, even when experimental data have different error ranges, a gene expression information analysis apparatus, a gene expression information analysis method, a program, A recording medium can be provided.

また、 本発明によれば、 シミュレーションにより得られた重複データの検定統計 表に基づき、 t一分布を用いて信頼限界点を決定するので、 従来手法と比較して正 確かつ効率的に信頼限界点を求めることができる遺伝子発現情報解析装置、 遺伝子 発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 また、 本発明によれば、 複数の信頼限界点に基づいてスプライン曲線を作成する ことにより平滑化を行レ、信頼境界線を作成するので、 効率的に信頼限界点を補完し て信頼曲線を作成することができる遺伝子発現情報解析装置、 遺伝子発現情報解析 方法、 プログラム、 および、 記録媒体を提供することができる。 Further, according to the present invention, a test statistic of duplicate data obtained by simulation is provided. Since the confidence limit is determined using the t-distribution based on the table, a gene expression information analysis device, a gene expression information analysis method, A program and a recording medium can be provided. In addition, according to the present invention, smoothing is performed by creating a spline curve based on a plurality of confidence limit points, and a confidence boundary is created, so that the confidence curve is efficiently complemented and the confidence curve is complemented. A gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can be created can be provided.

また、 本発明によれば、 蛍光強度の高い領域については、 最後のウィンドウ (最 も右側にあるウィンドウ) で求めた信頼限界点の X軸に対する水平延長線を用いて 信頼限界線を作成するので、 傾きが少なくどちらに収束するか判断不能の場合であ つても、 適切な信頼限界線を作成することができる遺伝子発現情報角析装置、 遺伝 子発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 また、 本発明によれば、 蛍光強度の低い領域については、 例えば、 最初から数十 程度の各ウィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外 を上記信頼限界線として用いるので、 蛍光強度が低い遺伝子のスポットについても 的確に検出することができる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。  Further, according to the present invention, for a region having a high fluorescence intensity, a confidence limit line is created using a horizontal extension line to the X axis of the confidence limit point obtained in the last window (the window on the rightmost side). Even if the slope is so small that it is not possible to determine which one to converge on, a gene expression information analyzing apparatus, a gene expression information analysis method, a program, and a recording medium capable of creating an appropriate confidence limit line are provided. Can be provided. Further, according to the present invention, for a region having a low fluorescence intensity, for example, the extrapolation of an asymptote obtained by the least square method from the reliability limit points obtained in several tens of windows from the beginning is used as the reliability limit line. Since it is used, it is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can accurately detect even a spot of a gene with low fluorescence intensity.

また、 本発明によれば、 利用者にウィンドウ内の遺伝子数を入力させ、 入力され た遺伝子数の遺伝子が含まれる区間内でウィンドウを設定するので、 実験毎に利用 者が設定する遺伝子数を変動させることができる遺伝子発現情報解析装置、 遺伝子 発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 また、 本発明によれば、 禾 者に信頼限界値を入力させ、 ウィンドウ内において 入力された信頼限界値に基づいて信頼限界点を決定するので、 実験毎に利用者が設 定する信頼限界値を変動させることができ、 各実験の誤差を適切な範囲に収めるこ とができる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 およ び、 記録媒体を提供することができる。  Further, according to the present invention, the user is required to input the number of genes in the window, and the window is set in a section including the genes of the input number of genes. Therefore, the number of genes set by the user for each experiment is determined. A gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can be varied can be provided. Further, according to the present invention, the confidence limit value is determined by the user based on the confidence limit value input in the window, and the confidence limit value set by the user for each experiment. Thus, it is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium capable of varying the experiment and keeping the error of each experiment within an appropriate range.

また、 本発明によれば、 利用者に、 変動しない遺伝子の分布の形 (例えば、 分布 の標準偏差 (例えば、 発現が変わらない遺伝子の分布を標準正規分布として標準偏 差 σ = 1、 中心 μ = 0としたときに、 標準偏差 σの幅を 0 . 1から 1 . 5の範囲で 設定する) ) 、 上記変動遺伝子の分布の形 (例えば、 中心 (例えば、 当該条件のと きに、 中心 μの幅を 0 . 4から 3の範囲で設定する) ) 、 上記変動遺伝子の検出基 準 (例えば、 全体数からみた検出された遺伝子の割合を、 2 / 3、 2 / 4 , 3 / 4 , 3 / 6、 4 / 6などで設定する) 、 実験の繰り返し数、 および、 シミュレーション 回数 (例えば、 3回から 1 0回の範囲で設定する) のうち少なくとも一つに関する 情報を含むシミュレーシヨン条件を入力させ、 設定されたシミュレ一ション条件に 従って、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 遺伝子検出を実 行し、 発現遺伝子を検出するシミュレーションを複数回実行し、 上記検出手段によ る結果の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 シミュレーション条件、 および検出感度と検出信頼度との関係を計算し、 発現量が変わる遺伝子の検定統計 表を作成し、 シミュレーション条件毎に、 シミュレーション実行によるシミュレ一 ション結果を出力するので、 様々な条件におけるシミュレーション結果を組み合わ せることにより上記の組み合わせによる検出力と検出信頼度を知ることができる。 すなわち、 同じ条件の対照実験を繰り返して行い、 得られたそれぞれ異なったデー タセットに対して変動遺伝子の検出を行い、 あらかじめ決めた回数以上検出される 遺伝子のみを選択することにより、 期待通りの信頼度あるいは検出力で変動遺伝子 を検出できる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 お よび、 記録媒体を提供することができる。 In addition, according to the present invention, the user is provided with a form of distribution of the gene that does not fluctuate (for example, distribution (For example, if the distribution of genes whose expression does not change is the standard normal distribution and the standard deviation σ = 1 and the center μ = 0, the width of the standard deviation σ is in the range of 0.1 to 1.5. Set)), the distribution form of the above-mentioned fluctuating gene (for example, the center (for example, the width of the center μ is set in the range of 0.4 to 3 under the conditions)), Quasi (for example, set the ratio of the detected genes based on the total number to 2/3, 2/4, 3/4, 3/6, 4/6, etc.), the number of repetitions of the experiment, and the number of simulations (For example, set in the range of 3 to 10 times), input simulation conditions including information on at least one of them, and distribute the same gene group according to the set simulation conditions. Gene generation, and gene detection The simulation for detecting the expressed gene is executed multiple times, and the false positive rate and false negative rate of the results obtained by the above detection means are calculated, and the number of repetitions of the experiment, the simulation conditions, and the detection sensitivity and detection reliability are determined. Calculate the relationship between the two, and create a test statistical table for the genes whose expression levels change, and output the simulation results by executing the simulation for each simulation condition.These combinations can be performed by combining the simulation results under various conditions. The detection power and the detection reliability can be known. In other words, by repeatedly performing a control experiment under the same conditions, detecting a fluctuating gene in each of the obtained different data sets, and selecting only genes that are detected more than a predetermined number of times, reliability as expected is obtained. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that can detect a fluctuating gene with degree or power.

また、 本発明によれば、 発現量が変わらない遺伝子が変動遺伝子として検出され たエラー (第一種のエラー) や、 変動遺伝子が発現が変わらない遺伝子として検出 されたエラ一 (第二種のエラー) を算出して比較することにより、 シミュレーショ ンのデ一タから上記の手法による変動遺伝子を検出する検出力と信頼度を把握でき、 実際の実験データに対して、 期待される検出力と信頼度を得るために、 実験の繰り 返し数と変動遺伝子の検出基準、 および信頼限界点の組み合わせを設定することが できる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 Further, according to the present invention, an error in which a gene whose expression level does not change is detected as a fluctuating gene (a first type of error) or an error in which a fluctuating gene is detected as a gene whose expression does not change (a second type of error) Error), the power and reliability of detecting the fluctuating gene by the above method can be grasped from the simulation data, and the expected power is compared with the actual experimental data. Gene expression information analyzer, gene expression information analysis method, program, and program that can set a combination of the number of repetitions of the experiment, the detection criteria of the fluctuating gene, and the confidence limit point in order to obtain reliability and reliability A recording medium can be provided.

また、 本発明によれば、 シミュレ一ションにより得られた重複データの検定統計 表に基づき、 何回実験を行えば、 正確な実験データを取ることができるかを予測す ることが可能になり、 実験効率を著レく向上させることができる遺伝子発現情報解 析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供すること ができる。  Further, according to the present invention, it is possible to predict how many experiments should be performed to obtain accurate experimental data based on a test statistical table of duplicate data obtained by simulation. The present invention can provide a gene expression information analyzer, a gene expression information analysis method, a program, and a recording medium that can significantly improve the experimental efficiency.

また、 本発明によれば、 各スポットの偏差値を計算するので、 このように計算さ れた各スポットの偏差値を変動比率 (倍率) の代わりに用いることで、 スライ ド間 の誤差の差異に影響されない解析が可能になる遺伝子発現情報解析装置、 遺伝子発 現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。  In addition, according to the present invention, since the deviation value of each spot is calculated, the deviation value of each spot calculated in this way is used instead of the variation ratio (magnification), so that the difference in error between the slides is obtained. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that enable analysis that is not affected by the information.

さらに、 本発明によれば、 本装置により計算される偏差値を、 クラスター解析に 代表される多変量解析において変動比率の対数や正規化した変動比率の変わりに用 いることができ、 発現量の大小による誤差の影響の違いに左右されない解析が可能 になる遺伝子発現情報解析装置、 遺伝子発現情報解析方法、 プログラム、 および、 記録媒体を提供することができる。 産業上の利用可能性  Further, according to the present invention, the deviation value calculated by the present apparatus can be used in place of the logarithm of the variation ratio or the normalized variation ratio in a multivariate analysis represented by a cluster analysis. It is possible to provide a gene expression information analysis device, a gene expression information analysis method, a program, and a recording medium that enable analysis that is not affected by differences in the effects of errors depending on the magnitude. Industrial applicability

以上のように、 本発明にかかる遺伝子発現情報解析装置、 遺伝子発現情報解析方 法、 プログラム、 および、 記録媒体は、 D NAマイクロアレイや D NAチップなど の測定値データの解析を行うバイオインフォマテイクス分野において極めて有用で ある。  As described above, the gene expression information analysis apparatus, gene expression information analysis method, program, and recording medium according to the present invention are bioinformatics for analyzing measured value data such as DNA microarrays and DNA chips. Very useful in the field.

本発明は、 産業上多くの分野、 特に医薬品、 食品、 化粧品、 医療、 遺伝子発現解 析等の分野で広く実施することができ、 極めて有用である。  INDUSTRIAL APPLICABILITY The present invention can be widely implemented in many industrial fields, particularly in the fields of pharmaceuticals, foods, cosmetics, medical treatment, gene expression analysis, and the like, and is extremely useful.

Claims

請 求 の 範 囲 The scope of the claims 1 . 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポッ卜の 測定輝度データからバックダラゥンド値を除去することによりバックグラウンド補 5 正された輝度データを作成するバックグラウンド補正手段と、 1. Background correction means that creates background-corrected luminance data by removing the background value from the measured luminance data of each spot that measures the fluorescence intensity indicating the expression level of the same gene under the two conditions. When, 上記バックダラゥンド補正手段によりバックグラゥンド補正された上記輝度デー タの対数を X— Y軸にとり蛍光強度散布図を作成し、 各遺伝子のスポッ卜について 蛍光強度平衡軸に対するバイアスを求め、 上記輝度データから当該バイアスを除去 することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Y軸0 系の蛍光強度散布図を構築するバイアス補正手段と、  The logarithm of the luminance data subjected to the background correction by the background correction means is plotted on the XY axis to create a fluorescence intensity scatterplot, and a bias for the fluorescence intensity equilibrium axis is obtained for each gene spot. Bias correction means for constructing a new XY axis 0 system fluorescence intensity scatter diagram having two axes, the fluorescence intensity equilibrium axis and the expression amount magnification axis, by removing the bias; 上記バイアス補正手段により構築された新たな X— Y軸系の蛍光強度散布図に基 づいて発現量が変動した変動遺伝子を検出する遺伝子検出手段と、  Gene detection means for detecting a fluctuating gene whose expression level fluctuates based on a new XY-axis fluorescence intensity scatter diagram constructed by the bias correction means, を備えたことを特徴とする遺伝子発現情報解析装置。  A gene expression information analysis device comprising: 2 . 上記バイアス補正手段は、 2. The bias correction means is 5 発現量が多い遺伝子集団の対数値を用いて主成分分析を実行し、 第一主成分とな る漸近線の傾きと切片を求める第一主成分作成手段と、 (5) a first principal component creating means for performing principal component analysis using a logarithmic value of a gene group having a large amount of expression to obtain a slope and an intercept of an asymptote as a first principal component; 上記第一主成分作成手段により求めた上記漸近線と X軸との角度を Θとし、 発現 量が少ない遺伝子集団の X— Y軸系における座標を右に Θ角度回転した座標を計算 する座標回転手段と、 The angle between the asymptote and the X-axis obtained by the first principal component creating means is defined as 、, and the coordinate rotation of the gene population with low expression level in the X-Y axis system is calculated by rotating the coordinate Θ to the right by Θ. Means, 0 上記座標回転手段による座標軸回転後の上記発現量が少なレヽ遺伝子集団の座標を 用いて、 上記蛍光強度平衡軸の傾きを計算し、 計算された傾きに基づいて 2つの条 件の上記輝度データのうちどちらに上記バイアスが多く含まれているかを判定する バイアス判定手段と、 0 The inclination of the fluorescence intensity equilibrium axis is calculated using the coordinates of the low-expression gene group after the rotation of the coordinate axis by the coordinate rotation means, and the luminance data under the two conditions is calculated based on the calculated inclination. Bias determination means for determining which of the above includes the bias more; 上記バイアス判定手段にて上記バイアスが多く含まれていると判定された条件の5 上記輝度データから上記バイァスを差し引くことにより上記蛍光強度平 ¾ί軸と上記 ' 発現量の倍率軸を 2軸とする新たな X— Υ軸系の蛍光強度散布図を構築する補正プ ロット生成手段と、 をさらに備えたことを特徴とする請求の範囲第 1項に記載の遺伝子発現情報解析 By subtracting the bias from the brightness data of the condition 5 in which the bias determination means determined that the bias was contained in a large amount, the fluorescence intensity flat axis and the expression level magnification axis were set to two axes. A correction plot generating means for constructing a new X—Υ-axis fluorescence intensity scatter plot; The gene expression information analysis according to claim 1, further comprising: 3 . 上記主成分分析は、 分散 ·共分散行列を用いて行うこと、 3. The principal component analysis is performed using a variance / covariance matrix. を特徴とする請求の範囲第 2項に記載の遺伝子発現情報解析装置。  3. The apparatus for analyzing gene expression information according to claim 2, wherein: 4 . 上記遺伝子検出手段は、 4. The gene detection means is as follows: 上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドウ 設定手段と、  Window setting means for setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction, 上記ウィンドゥ設定手段により設定された各ウィンドウ内にぉレ、て信頼限界点を 決定する信頼限界点決定手段と、  A confidence limit point determining means for determining a confidence limit point within each window set by the window setting means; 蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドゥ移動手段 と、  Window moving means for moving the window by a given gene in the fluorescence intensity equilibrium axis direction; 上記ウィンドウ移動手段により移動した各ウィンドウにつレ、て上記信頼限界点決 定手段により各信頼限界点を求め、 求めた複数の信頼限界点に基づいて信頼境界線 を作成する信頼境界線作成手段と、  For each window moved by the window moving means, each confidence limit point is determined by the confidence limit point determination means, and a confidence boundary line creation means is created based on the obtained plurality of confidence limit points. When, 上記信頼境界線作成手段により作成された上記信頼境界線の外側に位置する遺伝 子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出手段と、  A variable gene extracting means for extracting a gene located outside the trust boundary created by the trust boundary creating means as a variable gene having a variable expression level; をさらに備えたことを特徴とする請求の範囲第 1項〜第 3項のいずれか一つに記 載の遺伝子発現情報解析装置。  The gene expression information analyzer according to any one of claims 1 to 3, further comprising: 5 . 上記信頼限界点決定手段は、 シミュレーションにより得られた重複データの 検定統計表に基づき、 t—分布を用いて上記信頼限界点を決定すること、  5. The above-mentioned confidence limit point determining means decides the above-mentioned confidence limit point using a t-distribution based on a test statistical table of duplicate data obtained by simulation, を特徴とする請求の範囲第 4項に記載の遺伝子発現情報解析装置。  The gene expression information analysis device according to claim 4, characterized in that: 6 . 上記信頼境界線作成手段は、 上記複数の信頼限界点に基づいてスプライン曲 線を作成することにより平滑ィ匕を行レ、上記信頼境界線を作成すること、 6. The reliability boundary line creating means performs smoothing by creating a spline curve based on the plurality of reliability limit points, and creates the reliability boundary line. を特徴とする請求の範囲第 4項または第 5項に記載の遺伝子発現情報解析装置。  The gene expression information analysis device according to claim 4 or 5, characterized in that: 7 . 上記信頼境界線作成手段は、 蛍光強度の高い領域については、 最後の上記ゥ ィンドウで求めた信頼限界点の水平延長線を用いて上記信頼限界線を作成すること を特徴とする請求の範囲第 4項〜第 6項のいずれか一つに記載の遺伝子発現情報 解析装置。 7. The above-mentioned confidence boundary line creating means, for a region having a high fluorescence intensity, creates the above-mentioned confidence limit line using a horizontal extension line of the confidence limit point obtained in the last window. Gene expression information according to any one of the ranges 4 to 6 Analysis device. 8 . 上記信頼境界線作成手段は、 蛍光強度の低い領域については、 上記ウィンド ゥで求めた信頼限界点から最小二乗法により求めた漸近線の捕外を上記信頼限界線 として用いること、  8. The confidence boundary line creation means uses, for the region with low fluorescence intensity, the extrapolation of the asymptote obtained by the least squares method from the confidence limit point obtained in the window 上 記 as the confidence limit line, を特徴とする請求の範囲第 4項〜第 Ί項のレ、ずれか一つに記載の遺伝子発現情報 解析装置。  The gene expression information analyzer according to any one of claims 4 to 6, wherein the analyzer is characterized in that: 9 . 禾 IJ用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力手段、  9. Gene number input means for letting the IJ user input the number of genes in the window, をさらに備え、  Further comprising 上記ウィンドゥ設定手段は、 上記遺伝子数入力手段により入力された上記遺伝子 数の上記遺伝子が含まれる上記区間内で上記ウィンドウを設定すること、  The window setting means sets the window within the section including the gene having the number of genes input by the gene number input means, を特徴とする請求の範囲第 4項〜第 8項のいずれか一つに記載の遺伝子発現情報 解析装置。  The gene expression information analyzer according to any one of claims 4 to 8, characterized in that: 1 0 . 利用者に信頼限界値を入力させる信頼限界値入力手段、  1 0. Confidence limit value input means for allowing the user to input the confidence limit value, をさらに備え、  Further comprising 上記信頼限界点決定手段は、 上記ウィンドウ内において上記信頼限界値入力手段 により入力された上記信頼限界値に基づいて上記信頼限界点を決定すること、 を特徴とする請求の範囲第 4項〜第 9項のレ、ずれか一つに記載の遺伝子発現情報 解析装置。  4. The method according to claim 4, wherein the confidence limit point determination means determines the confidence limit point based on the confidence limit value input by the confidence limit value input means within the window. The gene expression information analyzer according to any one of the items 9 to 9. 1 1 . 利用者に、 上記変動しない遺伝子の分布の形、 上記変動遺伝子の分布の形、 上記変動遺伝子の検出基準、 実験の繰り返し数、 および、 シミュレーション回数の うち少なくとも一つに関する情報を含むシミュレーション条件を入力させるシミュ レーシヨン条件設定手段と、  1 1. A simulation including information on at least one of the distribution form of the unchanging gene, the distribution form of the fluctuation gene, the detection criteria of the fluctuation gene, the number of repetitions of the experiment, and the number of simulations. Simulation condition setting means for inputting conditions; 上記シミュレーション条件設定ステップにて設定された上記シミュレーシヨン条件 に従って、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 上記遺伝子検 出手段を実行し、 上記発現遺伝子を検出するシミュレーションを複数回実行し、 上 記検出手段による結果の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 上記シ ミュレ一シヨン条件、 および検出感度と検出信頼度との関係を計算し、 発現量が変 わる遺伝子の検定統計表を作成するシミュレーション実行手段と、 According to the simulation conditions set in the simulation condition setting step, the same gene group is repeatedly generated from the same distribution, the gene detection means is executed, and the simulation for detecting the expressed gene is performed a plurality of times. Calculation, calculate the false positive rate and false negative rate of the results obtained by the above detection means, calculate the number of repetitions of the experiment, the above simulation conditions, and the relationship between detection sensitivity and detection reliability. Strange Simulation execution means for creating a test statistical table of the genes 上記シミュレーション条件毎に、 上記シミュレ一シヨン実行手段によるシミュレ ーション結果を出力するシミュレーション結果出力手段と、  Simulation result output means for outputting a simulation result by the simulation execution means for each of the simulation conditions; をさらに備えたことを特徴とする請求の範囲第 1項〜第 1 0項のいずれか一つに 記載の遺伝子発現情報解析装置。  The gene expression information analysis device according to any one of claims 1 to 10, further comprising: 1 2 . 上記遺伝子検出手段は、  1 2. The above gene detection means, 各スポットの偏差ィ直を計算する偏差値計算手段、  Deviation value calculating means for calculating the deviation of each spot, をさらに備えたことを特徴とする請求の範囲第 1項〜第 1 1項のいずれか一つに 記載の遺伝子発現情報解析装置。  The gene expression information analyzer according to any one of claims 1 to 11, further comprising: 1 3 . 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポッ ト の測定輝度データからバックグラウンド値を除去することによりバックグラウンド 補正された輝度データを作成するバックダラゥンド補正ステップと、  13. A background correction step of creating background-corrected luminance data by removing the background value from the measured luminance data of each spot whose fluorescence intensity indicating the same gene expression level under the two conditions was measured. , 上記バックグラウンド補正ステップによりバックグラウンド補正された上記輝度 データの対数を X _ Y軸にとり蛍光強度散布図を作成し、 各遺伝子のスポットにつ いて蛍光強度平衡軸に対するバイアスを求め、 上記輝度データから当該バイアスを 除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— Υ軸系の蛍光強度散布図を構築するバイアス補正ステップと、  A logarithm of the luminance data background-corrected in the background correction step is used as an X_Y axis to create a fluorescence intensity scatterplot, a bias for the fluorescence intensity equilibrium axis is determined for each gene spot, and A bias correction step of constructing a new X-Υ-axis fluorescence intensity scatter diagram having two axes of the fluorescence intensity equilibrium axis and the expression amount magnification axis by removing the bias; 上記バイァス補正ステップにより構築された新たな X - Υ軸系の蛍光強度散布図 に基づレ、て発現量が変動した変動遺伝子を検出する遺伝子検出ステップと、 を含むことを特徴とする遺伝子発現情報解析方法。  A gene detection step for detecting a fluctuating gene whose expression level has fluctuated based on the fluorescence intensity scatterplot of the new X-Υ axis system constructed by the above bias correction step. Information analysis method. 1 4 . 上記バイァス補正ステツプは、  1 4. The above bias correction step is 発現量が多い遺伝子集団の対数値を用いて主成分分析を実行し、 第一主成分とな る漸近線の傾きと切片を求める第一主成分作成ステップと、  Performing a principal component analysis using a logarithmic value of a gene population having a large amount of expression to obtain a slope and an intercept of an asymptote as a first principal component; 上記第一主成分作成ステップにより求めた上記漸近線と X軸との角度を Θとし、 発現量が少ない遺伝子集団の X— Υ軸系における座標を右に Θ角度回転した座標を 計算する座標回転ステップと、  The angle between the asymptote and the X-axis obtained in the first principal component creation step is defined as 、, and the coordinate rotation in the X- 少 な い axis system of the gene group with a low expression level is calculated by rotating the coordinate 右 to the right by Θ. Steps and 上記座標回転ステップによる座標軸回転後の上記発現量が少ない遺伝子集団の座 標を用いて、 上記蛍光強度平衡軸の傾きを計算し、 計算された傾きに基づいて 2つ の条件の上記輝度データのうちどちらに上記バイアスが多く含まれているかを判定 するバイァス判定ステツプと、 The locus of the gene group having a low expression level after the rotation of the coordinate axis by the coordinate rotation step A bias determination step of calculating a slope of the fluorescence intensity equilibrium axis using a target, and determining which of the two brightness data includes the bias more based on the calculated slope. , 上記バイアス判定ステップにて上記バイアスが多く含まれていると判定された条 件の上記輝度データから上記バイアスを差し引くことにより上記蛍光強度平衡軸と 上記発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する補 正プロット生成ステップと、  By subtracting the bias from the luminance data of the condition determined to contain a large amount of the bias in the bias determination step, a new fluorescence intensity equilibrium axis and a magnification axis of the expression level are set to two axes. Generating a correction plot for constructing a fluorescence intensity scatter plot of the X—Y axis system; をさらに含むことを特徴とする請求の範囲第 1 3項に記載の遺伝子発現情報解析 方法。  14. The method for analyzing gene expression information according to claim 13, further comprising: 1 5 . 上記主成分分析は、 分散 ·共分散行列を用いて行うこと、  15. The principal component analysis should be performed using a variance-covariance matrix. を特徴とする請求の範囲第 1 4項に記載の遺伝子発現情報解析方法。  The method for analyzing gene expression information according to claim 14, wherein: 1 6 . 上記遺伝子検出ステップは、  1 6. The above gene detection step 上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドウ 設定ステップと、  A window setting step of setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction, 上記ウィンドゥ設定ステップにより設定された各ウィンドウ内におレ、て信頼限界 点を決定する信頼限界点決定ステツプと、  A confidence limit point determining step for determining a confidence limit point within each window set by the window setting step; 蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドゥ移動ステ ップと、  A window moving step for moving the window by a given gene in the direction of the fluorescence intensity equilibrium axis; 上記ウィンドゥ移動ステップにより移動した各ウィンドウについて上記信頼限界 点決定ステップにより各信頼限界点を求め、 求めた複数の信頼限界点に基づいて信 頼境界線を作成する信頼境界線作成ステップと、  A confidence boundary line creation step of obtaining each confidence limit point in the confidence limit point determination step for each window moved in the window moving step, and creating a confidence boundary line based on the obtained plurality of confidence limit points; 上記信頼境界線作成ステツプにより作成された上記信頼境界線の外側に位置する 遺伝子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出ステップと、 をさらに含むことを特徴とする請求の範囲第 1 3項〜第 1 5項のいずれ力一つに 記載の遺伝子発現情報解析方法。  A variable gene extraction step of extracting a gene located outside the confidence boundary created by the confidence boundary creation step as a variation gene whose expression level has fluctuated, further comprising: Item 6. The method for analyzing gene expression information according to any one of Items 3 to 15. 1 7 . 上記信頼限界点決定ステツプは、 シミュレーションにより得られた重複デ ータの検定統計表に基づき、 t一分布を用いて上記信頼限界点を決定すること、 を特徴とする請求の範囲第 1 6項に記載の遺伝子発現情報解析方法。 17. The above-mentioned confidence limit point determination step is to determine the above-mentioned confidence limit point using a t-one distribution based on a test statistical table of duplicate data obtained by simulation, 17. The method for analyzing gene expression information according to claim 16, wherein: 1 8 . 上記信頼境界線作成ステップは、 上記複数の信頼限界点に基づいてスブラ ィン曲線を作成することにより平滑化を行い上記信頼境界線を作成すること、 を特徴とする請求の範囲第 1 6項または第 1 7項に記載の遺伝子発現情報解析方 法。  18. The step of creating a confidence boundary line, wherein the step of creating a confidence boundary line by performing smoothing by creating a Sburyn curve based on the plurality of confidence limit points to create the confidence boundary line. 16. The method for analyzing gene expression information according to paragraph 16 or 17. 1 9 . 上記信頼境界,線作成ステップは、 蛍光強度の高い領域については、 最後の 上記ウィンドウで求めた信頼限界点の水平延長線を用いて上記信頼限界線を作成す ること、  1 9. The confidence boundary and line creation step is to create the confidence limit line using the horizontal extension line of the confidence limit point obtained in the last window for the area with high fluorescence intensity. を特徴とする請求の範囲第 1 6項〜第 1 8項のいずれか一つに記載の遺伝子発現 情報解析方法。  The method for analyzing gene expression information according to any one of claims 16 to 18, characterized in that: 2 0 . 上記信頼境界線作成ステップは、 蛍光強度の低レ、領域にっレ、ては、 上記ゥ ィンドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼 限界線として用いること、  20. The above-mentioned confidence boundary line creation step is to calculate the extrapolation of the asymptote obtained by the least squares method from the confidence limit point obtained in the above window, Use as a line, を特徴とする請求の範囲第 1 6項〜第 1 9項のいずれか一つに記載の遺伝子発現 情報解析方法。  The method for analyzing gene expression information according to any one of claims 16 to 19, characterized in that: 2 1 . 利用者にウィンドウ內の遺伝子数を入力させる遺伝子数入力ステップ、 をさらに含み、  2 1. A gene number input step for allowing a user to input the number of genes in the window さ ら に. 上記ウィンドウ設定ステップは、 上記遺伝子数入力ステップにより入力された上 記遺伝子数の上記遺伝子が含まれる上記区間内で上記ウィンドウを設定すること、 を特徴とする請求の範囲第 1 6項〜第 2 0項のいずれか一つに記載の遺伝子発現 情報解析方法。  The window setting step is to set the window within the section including the genes having the number of genes input in the gene number input step, wherein the window is set. Item 7. The method for analyzing gene expression information according to any one of items 0. 2 2 . 利用者に信頼限界値を入力させる信頼限界値入力ステップ、  2 2. Confidence limit value input step for user to input confidence limit value, をさらに含み、  Further comprising 上記信頼限界点決定ステップは、 上記ウィンドウ内にぉレ、て上記信頼限界値入力 ステツプにより入力された上記信頼限界値に基づいて上記信頼限界点を決定するこ と、  Determining the confidence limit point based on the confidence limit value input in the confidence limit value input step in the window; を特徴とする請求の範囲第 1 6項〜第 2 1項のいずれか一つに記載の遺伝子発現 情報解析方法。 The gene expression according to any one of claims 16 to 21, characterized in that: Information analysis method. 2 3 . 利用者に、 上記変動しない遺伝子の分布の形、 上記変動遺伝子の分布の形、 上記変動遺伝子の検出基準、 実験の繰り返し数、 および、 シミュレーション回数の うち少なくとも一^ 3に関する情報を含むシミュレーション条件を入力させるシミュ レーション条件設定ステツプと、  23. Include the user with information on at least one of the following: the form of the distribution of the gene that does not fluctuate, the form of the distribution of the fluctuating gene, the detection criteria for the fluctuating gene, the number of repetitions of the experiment, and the number of simulations A simulation condition setting step for inputting simulation conditions; 上記シミュレーション条件設定ステップにて設定された上記シミュレーシヨン条件 に従って、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 上記遺伝子検 出手段を実行し、 上記発現遺伝子を検出するシミュレーションを複数回実行し、 上 記検出手段による結果の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 上記シ ミュレーシヨン条件、 および検出感度と検出信頼度との関係を計算し、 発現量が変 わる遺伝子の検定統計表を作成するシミュレーション実行ステツプと、 According to the simulation conditions set in the simulation condition setting step, the same gene group is repeatedly generated from the same distribution, the gene detection means is executed, and the simulation for detecting the expressed gene is performed a plurality of times. Execute, calculate the false positive rate and false negative rate of the results obtained by the above detection means, calculate the number of repetitions of the experiment, the above simulation conditions, and the relationship between detection sensitivity and detection reliability, and the expression level changes A simulation execution step for creating a test statistical table of genes; 上記シミュレーション条件毎に、 上記シミュレーション実行ステツプによるシミ ユレーション結果を出力するシミュレーション結果出力ステップと、  A simulation result output step of outputting a simulation result by the simulation execution step for each of the simulation conditions; をさらに含むことを特徴とする請求の範囲第 1 3項〜第 2 2項のいずれか一つに 記載の遺伝子発現情報解析方法。  The method for analyzing gene expression information according to any one of claims 13 to 22, further comprising: 2 4 . 上記遺伝子検出ステツプは、  2 4. The above gene detection steps 各スポットの偏差値を計算する偏差値計算ステップ、  A deviation calculation step for calculating the deviation of each spot, をさらに含むことを特徴とする請求の範囲第 1 3項〜第 2 3項のいずれか一つに 記載の遺伝子発現情報解析方法。  The method for analyzing gene expression information according to any one of claims 13 to 23, further comprising: 2 5 . 2つの条件で同一の遺伝子の発現量を示す蛍光強度を測定した各スポット の測定輝度データからバックグラウンド値を除去することによりバックグラウンド 補正された輝度データを作成するバックグラウンド補正ステップと、 25. A background correction step of creating background-corrected luminance data by removing the background value from the measured luminance data of each spot where the fluorescence intensity indicating the expression level of the same gene under the two conditions was measured. , 上記バックグラウンド補正ステップによりバックグラウンド補正された上記輝度 データの対数を X _ Y軸にとり蛍光強度散布図を作成し、 各遺伝子のスポットにつ いて蛍光強度平衡軸に対するバイアスを求め、 上記輝度データから当該バイアスを 除去することにより上記蛍光強度平衡軸と発現量の倍率軸を 2軸とする新たな X— A logarithm of the luminance data background-corrected in the background correction step is used as an X_Y axis to create a fluorescence intensity scatterplot, a bias for the fluorescence intensity equilibrium axis is determined for each gene spot, and By removing the bias, a new X- Υ軸系の蛍光強度散布図を構築するバイアス補正ステップと、 上記バイァス補正ステツプにより構築された新たな X _ Y軸系の蛍光強度散布図 に基づいて発現量が変動した変動遺伝子を検出する遺伝子検出ステップと、 を含む遺伝子発現情報解析方法をコンピュータに実行させることを特徴とするプ ログラム。 A bias correction step for constructing a fluorescence intensity scatter plot of the Υ axis system, A gene detection step of detecting a fluctuating gene whose expression level fluctuates based on the fluorescence intensity scatterplot of the new X_Y axis system constructed by the above bias correction step, and causing the computer to execute a gene expression information analysis method comprising: A program characterized by the following. 2 6 . 上記バイアス補正ステップは、 2 6. The bias correction step is 発現量が多い遺伝子集団の対数値を用いて主成分分析を実行し、 第一主成分とな る漸近線の傾きと切片を求める第一主成分作成ステップと、  Performing a principal component analysis using a logarithmic value of a gene population having a large amount of expression to obtain a slope and an intercept of an asymptote as a first principal component; 上記第一主成分作成ステップにより求めた上記漸近線と X軸との角度を Θとし、 発現量が少ない遺伝子集団の X— Y軸系における座標を右に Θ角度回転した座標を 計算する座標回転ステップと、  The angle between the asymptote and the X axis obtained in the first principal component creation step is defined as Θ, and the coordinates of the gene group with low expression level are calculated by rotating the coordinates in the XY axis system to the right by Θ degrees. Steps and 上記座標回転ステップによる座標軸回転後の上記発現量が少ない遺伝子集団の座 標を用いて、 上記蛍光強度平衡軸の傾きを計算し、 計算された傾きに基づいて 2つ の条件の上記輝度データのうちどちらに上記バイァスが多く含まれているかを判定 するバイアス判定ステップと、  Using the coordinates of the gene group whose expression level is small after the rotation of the coordinate axis in the coordinate rotation step, the inclination of the fluorescence intensity equilibrium axis is calculated, and the luminance data of the two conditions are calculated based on the calculated inclination. A bias determination step of determining which of the biases is included in which of the above, 上記バイアス判定ステップにて上記バイアスが多く含まれていると判定された条 件の上記輝度データから上記バイァスを差し引くことにより上記蛍光強度平衡軸と 上記発現量の倍率軸を 2軸とする新たな X— Y軸系の蛍光強度散布図を構築する捕 正プロット生成ステップと、  By subtracting the bias from the luminance data under the conditions determined to contain a large amount of bias in the bias determination step, a new fluorescence intensity equilibrium axis and a magnification axis of the expression level are newly set as two axes. Generating a calibration plot for constructing a fluorescence intensity scatter plot of the X-Y axis system; をさらに含むことを特徴とする請求の範囲第 2 5項に記載のプログラム。  26. The program according to claim 25, further comprising: 2 7 . 上記主成分分析は、 分散 ·共分散行列を用 、て行うこと、 27. The principal component analysis should be performed using a variance / covariance matrix. を特徴とする請求の範囲第 2 6項に記載のプロダラム。  27. The program according to claim 26, wherein: 2 8 . 上記遺伝子検出ステップは、 2 8. The above gene detection step 上記蛍光強度平衡軸方向に予め定めた区間内のウィンドウを設定するウィンドウ 設定ステップと、  A window setting step of setting a window within a predetermined section in the fluorescence intensity equilibrium axis direction, 上記ウィンドウ設定ステップにより設定された各ウィンドウ内において信頼限界 点を決定する信頼限界点決定ステップと、  A confidence limit point determining step of determining a confidence limit point in each window set in the window setting step; 蛍光強度平衡軸方向に一定遺伝子ずつウィンドウを移動するウィンドウ移動ステ ップと、 A window moving step that moves the window by a given gene in the direction of the fluorescence intensity equilibrium axis. And 上記ウィンドゥ移動ステップにより移動した各ウィンドウにつレ、て上記信頼限界 点決定ステップにより各信頼限界点を求め、 求めた複数の信頼限界点に基づレヽて信 頼境界線を作成する信頼境界線作成ステツプと、  For each window moved in the window moving step, each confidence limit point is determined in the confidence limit point determination step, and a confidence boundary line is created based on the determined plurality of confidence limit points. Creation steps, 上記信頼境界線作成ステップにより作成された上記信頼境界線の外側に位置する 遺伝子を発現量が変動した変動遺伝子として抽出する変動遺伝子抽出ステップと、 をさらに含むことを特徴とする請求の範囲第 2 5項〜第 2 7項のいずれか一つに 記載のプログラム。  A variable gene extraction step of extracting a gene located outside the confidence boundary line created in the confidence boundary line creation step as a variation gene whose expression level has fluctuated, further comprising: The program according to any one of Items 5 to 27. 2 9 . 上記信頼限界点決定ステップは、 シミュレーションにより得られた重複デ ータの検定統計表に基づき、 t—分布を用いて上記信頼限界点を決定すること、 を特微とする請求の範囲第 2 8項に記載のプロダラム。  29. The confidence limit point determining step is characterized in that the confidence limit point is determined using a t-distribution based on a test statistical table of duplicate data obtained by a simulation. 28. The program according to item 28. 3 0 . 上記信頼境界線作成ステップは、 上記複数の信頼限界点に基づレ、てスプラ ィン曲線を作成することにより平滑化を行い上記信頼境界線を作成すること、 を特徴とする請求の範囲第 2 8項または第 2 9項に記載のプログラム。 30. The step of creating a confidence boundary line, wherein the confidence boundary line is smoothed by creating a spline curve based on the plurality of confidence limit points to create the confidence boundary line. The program according to paragraph 28 or 29. 3 1 . 上記信頼境界線作成ステップは、 蛍光強度の高い領域については、 最後の 上記ウィンドウで求めた信頼限界点の水平延長線を用いて上記信頼限界線を作成す ること、 31. In the above-described step of creating a confidence boundary line, for the region with a high fluorescence intensity, the confidence limit line is created using the horizontal extension line of the confidence limit point obtained in the last window. を特徴とする請求の範囲第 2 8項〜第 3 0項のいずれか一つに記載のプログラム c 3 2 . 上記信頼境界線作成ステップは、 蛍光強度の低レ、領域にっ 、ては、 上記ゥ インドウで求めた信頼限界点から最小二乗法により求めた漸近線の補外を上記信頼 限界線として用いること、 The program c32 according to any one of claims 28 to 30, wherein the confidence boundary line creating step comprises the steps of: The extrapolation of the asymptote obtained by the least-squares method from the confidence limit point obtained in the window is used as the reliability limit line, を特徴とする請求の範囲第 2 8項〜第 3 1項のいずれか一つに記載のプログラム。 3 3 . 利用者にウィンドウ内の遺伝子数を入力させる遺伝子数入力ステップ、 をさらに含み、  The program according to any one of claims 28 to 31, characterized in that: 3 3. A gene number input step of allowing the user to input the number of genes in the window, 上記ウィンドウ設定ステップは、 上記遺伝子数入力ステップにより入力された上 記遺伝子数の上記遺伝子が含まれる上記区間内で上記ウインドウを設定すること、 を特徴とする請求の範囲第 2 8項〜第 3 2項のいずれか一つに記載のプログラム。 The window setting step is to set the window within the section including the gene having the number of genes input in the gene number input step, wherein the window is set. The program according to any one of the two items. 3 4 . 禾 IJ用者に信頼限界値を入力させる信頼限界値入力- をさらに含み、 3 4. The system further includes a confidence limit value input for allowing a user to enter a confidence limit value. 上記信頼限界点決定ステップは、 上記ウィンドゥ内にぉレ、て上記信頼限界値入力 ステップにより入力された上記信頼限界値に基づいて上記信頼限界点を決定するこ と、  Determining the confidence limit point based on the confidence limit value input in the confidence limit value input step in the window; を特徴とする請求の範囲第 2 8項〜第 3 3項のいずれか一つに記載のプログラム。 3 5 . 利用者に、 上記変動しない遺伝子の分布の形、 上記変動遺伝子の分布の形、 上記変動遺伝子の検出基準、 実験の繰り返し数、 および、 シミュレーション回数の うち少なくとも一つに関する情報を含むシミュレーション条件を入力させるシミュ レーシヨン条件設定ステップと、  The program according to any one of claims 28 to 33, characterized by: 3 5. A simulation including information on at least one of the following: the form of the distribution of the gene that does not fluctuate, the form of the distribution of the fluctuating gene, the criteria for detecting the fluctuating gene, the number of repetitions of the experiment, and the number of simulations A simulation condition setting step for inputting conditions; 上記シミユレ一ション条件設定ステップにて設定された上記シミユレーション条件 に従って、 同一の遺伝子群に対して同じ分布から繰り返して生成し、 上記遺伝子検 出手段を実行し、 上記発現遺伝子を検出するシミュレーションを複数回実行し、 上 記検出手段による結果の偽陽性率と偽陰性率を計算し、 実験の繰り返し数、 上記シ ミュレーシヨン条件、 および検出感度と検出信頼度との関係を計算し、 発現量が変 わる遺伝子の検定統計表を作成するシミュレーション実行ステップと、 A simulation for repeatedly generating the same gene group from the same distribution according to the simulation conditions set in the simulation condition setting step, executing the gene detection means, and detecting the expressed gene Is performed several times to calculate the false positive rate and false negative rate of the results obtained by the above detection means, calculate the number of repetitions of the experiment, the above simulation conditions, and the relationship between detection sensitivity and detection reliability, and express the expression level. A simulation execution step for creating a test statistical table for genes whose 上記シミュレーション条件毎に、 上記シミュレーション実行ステップによるシミュ レーション結果を出力するシミュレーション結果出力ステップと、 A simulation result output step of outputting a simulation result by the simulation execution step for each of the simulation conditions; をさらに含むことを特徴とする請求の範囲第 2 5項〜第 3 4項のいずれか一つに 記載のプログラム。  The program according to any one of claims 25 to 34, further comprising: 3 6 . 上記遺伝子検出ステップは、 3 6. The above gene detection step 各スポッ卜の偏差値を計算する偏差値計算ステップ、  A deviation calculation step for calculating a deviation of each spot, をさらに含むことを特徴とする請求の範囲第 2 5項〜第 3 5項のいずれか一つに 記載のプログラム。  The program according to any one of claims 25 to 35, further comprising: 3 7 . 上記請求の範囲第 2 5項から第 3 6項のいずれか一つに記載されたプログ ラムを記録したことを特徴とするコンピュータ読み取り可能な記録媒体。 37. A computer-readable recording medium on which the program according to any one of claims 25 to 36 is recorded.
PCT/JP2003/001900 2002-02-21 2003-02-21 Gene expression data analyzer, and method, program and recording medium for gene expression data analysis Ceased WO2003070938A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2003569831A JP4438414B2 (en) 2002-02-21 2003-02-21 Gene expression information analyzing apparatus, gene expression information analyzing method, program, and recording medium
AU2003211240A AU2003211240A1 (en) 2002-02-21 2003-02-21 Gene expression data analyzer, and method, program and recording medium for gene expression data analysis

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2002/45407 2002-02-21
JP2002045407 2002-02-21

Publications (1)

Publication Number Publication Date
WO2003070938A1 true WO2003070938A1 (en) 2003-08-28

Family

ID=27750582

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2003/001900 Ceased WO2003070938A1 (en) 2002-02-21 2003-02-21 Gene expression data analyzer, and method, program and recording medium for gene expression data analysis

Country Status (3)

Country Link
JP (1) JP4438414B2 (en)
AU (1) AU2003211240A1 (en)
WO (1) WO2003070938A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006300799A (en) * 2005-04-22 2006-11-02 Sony Corp Biological information processing apparatus and method, program, and recording medium
EP2270747A1 (en) 2009-06-30 2011-01-05 Sysmex Corporation Methods for detecting nucleic acid with microarray and program product for use in microarray data analysis
JP2017146238A (en) * 2016-02-18 2017-08-24 株式会社東芝 Biomarker search device, method for searching biomarker and program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000039339A1 (en) * 1998-12-28 2000-07-06 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles
WO2002001477A1 (en) * 2000-06-28 2002-01-03 Center For Advanced Science And Technology Incubation, Ltd. Method for processing gene expression data, and processing programs

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000039339A1 (en) * 1998-12-28 2000-07-06 Rosetta Inpharmatics, Inc. Statistical combining of cell expression profiles
WO2002001477A1 (en) * 2000-06-28 2002-01-03 Center For Advanced Science And Technology Incubation, Ltd. Method for processing gene expression data, and processing programs

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHEN Y. ET AL.: "Ratio-based decisions and the quantitative analysis of cDNA microarray images", J. BIOMED. OPT., vol. 2, no. 4, 1997, pages 364 - 374, XP002900577 *
KOOPERBERG C. ET AL.: "Improved background correction for spotted DNA microarrays", J. COMPUT. BIOL., vol. 9, no. 1, February 2002 (2002-02-01), pages 55 - 66, XP002970213 *
NEWTON M.A. ET AL.: "On differential variability of expression ratios: improving statistical inference about gene expression changes from microarray data", J. COMPUT. BIOL., vol. 8, no. 1, 2001, pages 37 - 52, XP002970214 *
TRAN P.H. ET AL.: "Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals", NUCLEIC ACIDS RES., vol. 30, no. 12, June 2002 (2002-06-01), pages E54, XP002970215 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006300799A (en) * 2005-04-22 2006-11-02 Sony Corp Biological information processing apparatus and method, program, and recording medium
EP2270747A1 (en) 2009-06-30 2011-01-05 Sysmex Corporation Methods for detecting nucleic acid with microarray and program product for use in microarray data analysis
JP2017146238A (en) * 2016-02-18 2017-08-24 株式会社東芝 Biomarker search device, method for searching biomarker and program

Also Published As

Publication number Publication date
AU2003211240A1 (en) 2003-09-09
JPWO2003070938A1 (en) 2005-06-09
JP4438414B2 (en) 2010-03-24

Similar Documents

Publication Publication Date Title
Tang et al. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies
Finotello et al. Next-generation computational tools for interrogating cancer immunity
Caicedo et al. Cell Painting predicts impact of lung cancer variants
Meyer et al. AMBER: assessment of metagenome BinnERs
Schmid et al. On the use of Harrell’s C for clinical risk prediction via random survival forests
Franks et al. Feature specific quantile normalization enables cross-platform classification of molecular subtypes using gene expression data
Coombes et al. Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization
Karpievitch et al. Normalization and missing value imputation for label-free LC-MS analysis
White et al. Bioinformatics strategies for proteomic profiling
US20250069714A1 (en) Systems and methods for determining microsatellite instability
Ren et al. Gaussian graphical model-based heterogeneity analysis via penalized fusion
AU2023273836A1 (en) Systems and methods for spatial alignment of cellular specimens and applications thereof
Cantini et al. Assessing reproducibility of matrix factorization methods in independent transcriptomes
Martin et al. Rank Difference Analysis of Microarrays (RDAM), a novel approach to statistical analysis of microarray expression profiling data
Balagurunathan et al. Noise factor analysis for cDNA microarrays
Tolliver et al. Robust unmixing of tumor states in array comparative genomic hybridization data
RU2744604C2 (en) Method for non-invasive prenatal diagnostics of fetal chromosomal aneuploidy from maternal blood
WO2003070938A1 (en) Gene expression data analyzer, and method, program and recording medium for gene expression data analysis
Yin et al. MIXnorm: normalizing RNA-seq data from formalin-fixed paraffin-embedded samples
De Hertogh et al. A benchmark for statistical microarray data analysis that preserves actual biological and technical variance
Glick et al. Panoramic: a package for constructing eukaryotic pan‐genomes
York et al. Network analysis of reverse phase protein expression data: Characterizing protein signatures in acute myeloid leukemia cytogenetic categories t (8; 21) and inv (16)
JP2005038256A (en) Effective factor information selection apparatus, effective factor information selection method, program, and recording medium
EP3180724B1 (en) Methods and systems for detecting minor variants in a sample of genetic material
Wu et al. SAS-cam: a program for automatic processing and analysis of small-angle scattering data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2003569831

Country of ref document: JP

122 Ep: pct application non-entry in european phase

Ref document number: 03705378

Country of ref document: EP

Kind code of ref document: A1