[go: up one dir, main page]

WO2005083128A2 - Methods for predicting cancer outcome and gene signatures for use therein - Google Patents

Methods for predicting cancer outcome and gene signatures for use therein Download PDF

Info

Publication number
WO2005083128A2
WO2005083128A2 PCT/US2005/006201 US2005006201W WO2005083128A2 WO 2005083128 A2 WO2005083128 A2 WO 2005083128A2 US 2005006201 W US2005006201 W US 2005006201W WO 2005083128 A2 WO2005083128 A2 WO 2005083128A2
Authority
WO
WIPO (PCT)
Prior art keywords
genes
classifier
accession
origin
cancer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2005/006201
Other languages
French (fr)
Other versions
WO2005083128A3 (en
Inventor
Timothy J. Yeatman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of South Florida
University of South Florida St Petersburg
Original Assignee
University of South Florida
University of South Florida St Petersburg
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of South Florida, University of South Florida St Petersburg filed Critical University of South Florida
Publication of WO2005083128A2 publication Critical patent/WO2005083128A2/en
Anticipated expiration legal-status Critical
Publication of WO2005083128A3 publication Critical patent/WO2005083128A3/en
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/10Signal processing, e.g. from mass spectrometry [MS] or from PCR
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/118Prognosis of disease development
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see, for example, Schena et al., “Quantitative monitoring of gene expression patterns with a complementary DNA micro-array,” Science, 270:467-470 (1995); Lockhart et al, “Expression monitoring by hybridization to high-density oligonucleotide arrays,” Nature Biotechnology, 14:1675-1680 (1996); and Blanchard et al., “Sequence to array: Probing the genome's secrets,” Nature Biotechnology, 14:1649 (1996)).
  • Schena et al. “Quantitative monitoring of gene expression patterns with a complementary DNA micro-array,” Science, 270:467-470 (1995); Lockhart et al, “Expression monitoring by hybridization to high-density oligonucleotide arrays," Nature Biotechnology, 14:1675-1680 (1996); and Blanchard et al., “Sequence to array: Pro
  • DNA microarrays which are sometimes commonly referred to as biochips, DNA chips, gene arrays, gene chips, and genome chips.
  • DNA microarrays exploit a phenomenon known as base-pairing or hybridization.
  • genetic samples are arranged in an orderly manner (typically in a rectangular grid) on a substrate.
  • microarrays include an array of oligonucleotide or peptide nucleic acid (PNA) probes, and the array is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array on the chip is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined.
  • PNA peptide nucleic acid
  • Such arrays of expression levels include metadata describing characteristics of the people whose genetic material is sampled and additional metadata which identifies specific genes whose expression levels are represented in such arrays.
  • microarrays are already being used for a number of beneficial purposes including, for example, identifying biomarkers of cancer (Welsh, JB et al., "Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum,” PNAS, 100(6):3410-3415 (March 2003)), creating gene expression-based classifications of cancers (Alzadeh, AA et al., "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, 403:513-11 (2000); and Garber, ME et al., “Diversity of gene expression in adenocarcinoma of the lung," Proc Natl Acad Sci USA, 98:13784-9 (2001)), and in drug discovery (Marton, MJ et al, "Drug target validation and identification of secondary drug target effects using Microarrays," Nat Med, 4(11): 1293-301 (1998); and Gray, NS et al., "
  • SAM has been used for a variety of purposes, including identifying potential drugs that would be effective in treating various conditions associated with specific gene expressions (Bunney WE, et al., "Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders," Am J Psychiatry, 160(4):657-66 (Apr. 2003)).
  • SNM Small Vector Machine
  • SVMs utilize the technique of "kernels" to automatically realize a non-linear mapping to a feature space (Furey, T.S. et al, "Support vector machine classification and validation of cancer tissue samples using microarray expression data,” Bioinformatics, 16(10):906-914 (2000)).
  • colon cancer is a deadly disease afflicting nearly 130,000 new patients yearly in the United States.
  • Colon cancer is the only cancer that occurs with approximately equal frequency in men and women. There are several potential risk factors for the development of colon and/or rectal cancer.
  • the modified Dukes' staging system for colon cancer discriminates four stages (A, B, C, and D), primarily based on clinicopathologic features such as the presence or absence of lymph node or distant metastases. Specifically, colonic tumors are classified by four Dukes' stages: A, tumor within the intestinal mucosa; B, tumor into muscularis mucosa; C, metastasis to lymph nodes and D, metastasis to other tissues.
  • A tumor within the intestinal mucosa
  • B tumor into muscularis mucosa
  • C metastasis to lymph nodes
  • D metastasis to other tissues.
  • the Dukes' staging system based on the pathological spread of disease through the bowel wall, to lymph nodes, and to distant organ sites such as the liver, has remained the most popular.
  • the Dukes' staging system remains the standard for predicting colon cancer prognosis, and is the primary means for directing adjuvant therapy.
  • the Dukes' staging system has only been found useful in predicting the behaviour of a population of patients, rather than an individual. For this reason, any patient with a Dukes A, B, or C lesion would be predicted to be alive at 36 months while a patient staged as Dukes D would be predicted to be dead.
  • application of this staging system results in the potential over-treatment or under-treatment of a significant number of patients. Further, Dukes' staging can only be applied after complete surgical resection rather than after a pre-surgical biopsy.
  • Microarray technology has permitted development of multi- organ cancer classifiers (Giordano, T.J. et al, "Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles," Am J Pathol, 159:1231-8 (2001); Ramaswamy, S. et al, “Multiclass cancer diagnosis using tumor gene expression signatures,” Proc Natl Acad Sci USA, 98:15149-54 (2001); and Su, A.I. et al, "Molecular classification of human carcinomas by use of gene expression signatures," Cancer Res, 61:7388-93 (2001)), identification of tumor subclasses (Dyrskjot, L.
  • Vasselli, JR et al "Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor," Proc Natl Acad Sci USA, 100:6958- 63 (2003); and Takahashi, M. et al, "Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification,” Proc Natl Acad Sci USA, 98:9754-9 (2001)) in many types of cancer. Classification of patient prognosis by microarray analysis has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis.
  • the present invention provides systems and methods for predicting outcomes in patients diagnosed with cancer. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer.
  • the present invention provides a gene expression profile based classifier that provides a means for accurately predicting colon cancer outcome.
  • genes are classified according to degree of correlation with a clinical outcome for a cancer of interest (such as colon cancer). These genes are used to establish a set of reference gene expression levels (also referred to herein as a "classifier"). Biological information regarding the patient is received and used to extrapolate intracellular gene expression. The intracellular gene expression levels are compared to those in the classifier to predict clinical outcome.
  • a method in which the specific gene signatures for colon cancer are identified.
  • frozen tumor specimens form patients with known outcomes are collected and frozen.
  • the outcomes are linked to a specific core set of genes that are weighted in importance by (1) selecting genes of interest by applying microarray analysis; (2) producing a classifier using support vector machines (SVM); and (3) cross-validating the genes of interest and the classifier by comparing them against an independent set of test data.
  • SVM support vector machines
  • significance analysis of microarrays SAM
  • Genome wide microarray analyses can produce large datasets that can be pattern- matched to clinicopathologic parameters such as patient outcomes and prognosis.
  • the subject invention identifies gene expression signatures that would predict colon cancer outcome more accurately than the well-accepted Dukes' staging system.
  • a group of colon cancer patients was examined to develop a survival classifier, which was subsequently validated using an entirely independent test set of data derived on a different microarray platform at a different performance site.
  • the classifier of the subject invention was ultimately based on a core set of genes selected for their correlation to survival. A number of the genes in the core set demonstrated intrinsic biological significance for colon cancer progression. With the ability to predict cancer outcomes/prognosis using the subject invention, appropriate treatment protocols can be selected for patients.
  • patients assessed using the subject invention and identified to have poor outcomes may be treated more aggressively or with specific agents (i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.).
  • specific agents i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.
  • an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.
  • Figure IA is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when correlated with prognosis/patient survival.
  • Figure IB is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when grouped by Dukes' stage B and C.
  • Figure 2A graphically illustrates a Kaplan-Meier survival curve based on gene expression profiling in accordance with the present invention.
  • Figure 2B graphically illustrates a Kaplan-Meier survival curve based on Dukes' staging.
  • Figures 3A-3C illustrate survival curves for molecular classifiers in accordance with the subject invention.
  • DETAILED DISCLOSURE OF THE INVENTION The present invention provides systems and methods for predicting cancer prognosis and outcomes. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier for predicting cancer outcomes/prognosis. Both microarray analysis and binary classification are used to create the classifier of the invention.
  • the subject invention provides methods for predicting patient outcomes comprising: identifying genes that correlate with a clinical outcome for a cancer of interest (such as colon cancer); establishing a set of reference gene expression levels (also referred to herein as a "classifier") for said identified genes; receiving biological information regarding the patient; using the biological information to extrapolate intracellular gene expression; and comparing intracellular gene expression levels to those in the classifier to predict clinical outcome.
  • Biological information of the invention includes, but is not limited to, clinical samples of bodily fluids or tissues; DNA profile information; and RNA profile information. Methods for preparing clinical samples for gene expression analysis are well known in the art, and can be carried out using commercially available kits.
  • the subject invention provides methods for predicting colon cancer patient outcomes using a SAM selected set of genes derived from a genome wide analysis of gene expression. Those patients with good and bad prognoses are first clustered into groups that suggest outcome-rich information that is likely present in the gene expression dataset. Subsequently, a supervised SVM analysis identifies a core set of genes that appears in a majority (i.e., 50% or greater, including for example, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%) of the cross validation folds and accurately predicts colon cancer survival. Preferably, a core set of genes that appears in 75%> of the cross validation folds is identified by an SVM to be used in predicting colon cancer survival.
  • a gene core set is derived from a cDNA microarray that includes both named and unnamed genes.
  • the resultant gene set is highly accurate in predicting cancer survival when compared with Dukes staging data from the same patients.
  • a normalized and scaled oligonucleotide-based cancer database is evaluated against a completely independent set of test data derived from a different microarray platform. Accordingly, the subject invention provides a system for predicting clinical outcome in a patient diagnosed with cancer, wherein the system is useful in offering support/advice in making treatment decisions.
  • the system comprises (1) a data storage device for collecting data (i.e., gene data); and (3) a computing means for receiving and analyzing data to accurately determine genes associated with poor or good patient prognosis.
  • a graphical user interface can be included with the systems of the invention to display clinical data as well as enable user-interaction.
  • the system of the invention further includes an intelligence system that can use the analyzed clinical data to classify gene samples and offer support/advice for making clinical decisions (i.e., to interpret predicted clinical outcome and provide appropriate treatment).
  • An intelligence system of the subject invention can include, but is not limited to, artificial neural networks, fuzzy logic, evolutionary computation, knowledge-based systems, and artificial intelligence.
  • the computing means is preferably a digital signal processor, which can automatically and accurately analyze gene data and determine those genes that strongly correlate to clinical outcome.
  • the system of the subject invention is stationary.
  • the system of the invention can be used within a healthcare setting (i.e., hospital, physician's office).
  • the term "patient” refers to humans as well as non-human animals including, and not limited to, mammals, birds, reptiles, amphibians, and fish.
  • Preferred non-human animals include mammals (i.e., mouse, rat, rabbit, monkey, dog, cat, primate, pig).
  • a patient may also include transgenic animals.
  • a patient may be a laboratory animal raised by humans in a controlled environment other than its natural habitat.
  • cancer refers to a malignant tumor (i.e., colon or prostate cancer) or growth of cells (i.e., leukaemia). Cancers tend to be less differentiated than benign tumors, grow more rapidly, show infiltration, invasion, and destruction, and may metastasize.
  • Cancer include, and are not limited to, colon and rectal cancers, fibrosarcoma, myxosarcoma, antiosarcoma, leukaemia, squamous cell carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, and hepatocellular carcinoma.
  • a "marker gene,” as used herein, refers to any gene or gene product (i.e., protein, peptide, mRNA) that indicates a particular clinicopatho logical state (i.e., carcinoma, normal dysplasia and outcomes) or indicates a particular cell type, tissue type, or origin. The expression or lack of expression of a marker gene may indicate a particular physiological and/or diseased state of a patient, organ, tissue, or cell.
  • the expression or lack of expression may be determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene chip analysis, etc.
  • the level of expression of a marker gene is quantifiable.
  • polynucleotide or “oligonucleotide,” as used herein, refers to a polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides.
  • the polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (i.e., 2-aminoadensoine, 2-thio-thymidine, inosine, pyrrolo- pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5- bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2- thiocytidine), chemically modified bases, biologically
  • tumor refers to an abnormal growth of cells.
  • the growth of the cells of a tumor typically exceeds the growth of normal tissue and tends to be uncoordinated.
  • the tumor may be benign (i.e., lipoma, fibroma, myxoma, lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or malignant (i.e., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, colon cancer, lung cancer, etc.).
  • Bodily fluid refers to a mixture of molecules obtained from a patient. Bodily fluids include, but are not limited to, exhaled breath, whole blood, blood plasma, urine, semen, saliva, lymph fluid, meningal fluid, amniotic fluid, glandular fluid, sputum, feces, sweat, mucous, and cerebrospinal fluid. Bodily fluid also includes experimentally separated fractions of all of the preceding solutions or mixtures containing homogenized solid material, such as feces, tissues, and biopsy samples.
  • Computing Means Correlating genes to clinical outcomes in accordance with the subject invention can be performed using software on a computing means.
  • the computing means can also be responsible for maintenance of acquired data as well as the maintenance of the classifier system itself.
  • the computing means can also detect and act upon user input via user interface means known to the skilled artisan (i.e., keyboard, interactive graphical monitors) for entering data to the computing system.
  • the computing means further comprises means for storing and means for outputting processed data.
  • the computing means includes any digital instrumentation capable of processing data input from the user. Such digital instrumentation, as understood by the skilled artisan, can process communicated data by applying algorithm and filter operations of the subject invention.
  • the digital instrumentation is a microprocessor, a personal desktop computer, a laptop, and/or a portable palm device.
  • the computing means can be general purpose or application specific.
  • the subject invention can be practiced in a variety of situations.
  • the computing means can directly or remotely connect to a central office or health care center.
  • the subject invention is practiced directly in an office or hospital.
  • the subject invention is practiced in a remote setting, for example, personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, wherein the patient is located some distance from the physician.
  • the computing means is a custom, portable design and can be carried or attached to the health care provider in a manner similar to other portable electronic devices such as a portable radio pr computer.
  • the computing means used in accordance with the subject invention can contain at least one user-interface device including, but not limited to, a keyboard, stylus, microphone, mouse, speaker, monitor, and printer. Additional user-interface devices contemplated herein include touch screens, strip recorders, joysticks, and rollerballs.
  • the computing means comprises a central processing unit (CPU) having sufficient processing power to perform algorithm operations in accordance with the subject invention.
  • CPU central processing unit
  • the algorithm operations can be embodied in the form of computer processor usable media, such as floppy diskettes, CD-ROMS, zip drives, nonvolatile memory, or any other computer-readable storage medium, wherein the computer program code is loaded into and executed by the computing means.
  • the operational algorithms of the subject invention can be programmed directly onto the CPU using any appropriate programming language, preferably using the C programming language.
  • the computing means comprises a memory capacity sufficiently large to perform algorithm operations in accordance with the subject invention.
  • the memory capacity of the invention can support loading a computer program code via a computer-readable storage media, wherein the program contains the source code to perform the operational algorithms of the subject invention.
  • the memory capacity can support directly programming the CPU to perform the operational algorithms of the subject invention.
  • a standard bus configuration can transmit data between the CPU, memory, ports and any communication devices.
  • the memory capacity of the computing means can be expanded with additional hardware and with saving data directly onto external mediums including, for example, without limitation, floppy diskettes, zip drives, non-volatile memory and CD-ROMs.
  • the computing means can also include the necessary software and hardware to receive, route and transfer data to a remote location.
  • the patient is hospitalized, and clinical data generated by a computing means is transmitted to a central location, for example, a monitoring station or to a specialized physician located in a different locale.
  • the patient is in remote communication with the health care provider.
  • patients can be located at personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, and by using the classifier system of the invention, still provide clinical data to the health care provider.
  • mobile stations such as ambulances, and mobile clinics, can monitor patient health by using a portable computing means of the subject invention when transporting and/or treating a patient.
  • security measures such as encryption software and firewalls, can be employed.
  • clinical data can be transmitted as unprocessed or "raw" signal(s) and/or as processed signal(s).
  • transmitting raw signals allows any software upgrades to occur at the remote location where a computing means is located.
  • both historical clinical data and real-time clinical data can be transmitted.
  • Communication devices such as wireless interfaces, cable modems, satellite links, microwave relays, and traditional telephonic modems can transfer clinical data from a computing means to a healthcare provider via a network.
  • Networks available for transmission of clinical data include, but are not limited to, local area networks, intranets and the open internet.
  • a browser interface for example, NETSCAPE NAVIGATOR or INTERNET EXPLORER, can be incorporated into communications software to view the transmitted data.
  • a browser or network interface is incorporated into the processing device to allow the user to view the processed data in a graphical user interface device, for example, a monitor.
  • the results of algorithm operations of the subject invention can be displayed in the form of interactive graphics.
  • genes can be selected that most closely correlate with selected survival times. Permutation analysis can then used to estimate the false discovery rate (FDR).
  • FDR false discovery rate
  • the resultant mean-centered gene expression vectors can then be clustered and visualized using known computer software (i.e., Cluster 3.0 and Java TreeView 1.03, both of which are provided by Hoon MJLd, et al, "Open Source Clustering Software,” Bioinformatics 2003, in press).
  • a gene classifier can be constructed to predict a set time of outcome among a set number of patients using microarray data produced on a cDNA platform.
  • the classifier of the subject invention is produced on a computing means that using SAM two-class gene selection and a support vector machine classification.
  • the SAM procedure is empirically set to select enough genes to satisfy a set FDR. Such selected genes can then be used in a linear support vector machine to classify the samples as having poor or good prognosis.
  • Leave-one-out cross-validation (LOOCV) operation can also be utilized to construct a classifier (i.e., neural network-based classifier) as well as to estimate the prediction accuracy of the classifier of the subject invention.
  • LOOCV Leave-one-out cross-validation
  • the classification process includes both gene selection and SVM classification creation; therefore, both steps can be performed on each training set after the test example is removed.
  • samples can be classified as having "good” or “poor” prognosis based on survival for a certain set amount of time.
  • "good” or “poor” prognosis is based on more or less than 36 months, respectively.
  • the subject invention provides a means for ranking the genes selected. The number of times a particular gene is chosen can be an indicator of the usefulness of that gene for general classification and may imply biological significance.
  • the classifier of the subject invention is prepared by (1) SAM gene selection using a t-test and (2) classification using a neural network.
  • the classifier is prepared after a test sample is left out (from the LOOCV) to avoid bias from the gene selection step. Since the classification problem is a binary decision, a t-test was used for gene selection.
  • a feed-forward back-propogation neural network system see Rumelhart, D.E. and J.L.
  • the classifier can split the samples into various groups (i.e., two groups: those predicted as good or poor prognosis). Classifier accuracy can be reported to the user both as overall accuracy and as specificity/sensitivity.
  • a McNemar's Chi- Squared test is used to compare the molecular classifier with the use of a Dukes' staging classifier.
  • several permutations of the dataset i.e., 1,000 permutations are used to measure the significance of the classifier results as compared to chance.
  • Example 1 Human Colon Cancer Survival Classifier Training Set Tumor Samples
  • a colon cancer survival classifier was developed using 78 tumor samples, including 3 adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank (Tampa, Florida) based on evidence for good (survival > 36 mo) or poor prognosis (survival ⁇ 36 mo) from the Tumor Registry. Dukes' stages can include B, C, and D. In this particular embodiment, survival was measured as last contact minus collection date for living patients, or date of death minus collection date for patients who have died.
  • the number of samples per Dukes' stage was as follows: 23 patients with stage B, 22 patients with stage C and 30 patients with stage D disease. Just as adenomas can be included to help train the classifier to recognize good prognosis patients, Dukes D patients with synchronous metastatic disease can be used to train the classifier to recognize poor prognosis patients.
  • all samples were selected to have at least 36 months of follow-up. The follow-up results in this embodiment showed that thirty-two of the patients survived more than 36 months, while 46 patients died within 36 months. With this particular embodiment, the median follow-up time for all 78 patients was 27.9 months.
  • Test Set Tumor Samples (Denmark) In another embodiment, eighty-eight patients with Dukes' stage B and C colorectal cancer and a minimum follow-up time of 60 months were selected for array hybridization. Ten micrograms of total RNA were used as starting material for the cDNA preparation and hybridized to Affymetrix U133A GeneChips (Santa Clara, CA) by standard protocols supplied by the manufacturer. The UL 33A gene chip is disclosed in U.S. Patent Nos.
  • RNAzol WAK-Cl emie Medical
  • spin column technology Sigma
  • samples can be microdissected (>80% tumor cells) by frozen section guidance and RNA extraction performed using Trizol followed by secondary purification on RNAEasy columns.
  • the sa-mples can then be profiled on cDNA arrays (i.e., TIGR's 32,488-element spotted cDl A anays, containing 31,872 human cDNAs representing 30,849 distinct transcripts - 23,936 unique TIGR TCs and 6,913 ESTs, 10 exogenous controls printed 36 times, and ⁇ negative controls printed 36- 72 times).
  • tumor samples are co-hybridized with a common reference pool in the Cy5 channel for normalization purposes.
  • cDNA synthesis, aminoallyl labeling and hybridizations can be performed according to previously published protocols (see Hegde, P. et al, "A concise guide to cDNA microareay analysis,” Biotechniques; 29:552-562 (2000) and Yang, IV, et al, "Within t e fold: assessing differential expression measures and reproducibility in microarray assays," Genome Biol; 3 :research0062 (2002)).
  • labeled first-strand cDNA is prepared, and co- hybridized with labeled samples are prepared, from a universal reference RNA consisting of equimolar quantities of total RNA derived from three cell lines, CaCO2 (colon), KM12L4A (colon), and UI 18MG (brain).
  • Array probes are identified and local background can be subtracted in Spotfinder (Saeed, A.I. et al, "TM4: a free, open-source system for microanay data management and analysis," Biotechniques; 34:374-8 (2003)).
  • Individual anays can be normalized in MIDAS (see Saeed, A.I. ibid.) using LOWESS (an algorithm known to the skilled artisan for use in normalizing data) with smoothing parameter set to 0.33.
  • the first and second strand cDNA synthesis can be performed using the Superscript II System (Invitrogen) according to the manufacturer's instructions except using an oligodT primer containing a T7 RNA polymerase promoter site.
  • Labeled cRNA is prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). Biotin labeled CTP and UTP (Enzo) are used in the reaction together with unlabeled NTP's. Following the IVT reaction, the unincorporated nucleotides are removed using RNeasy columns (Qiagen).
  • cRNA Fifteen micrograms of cRNA are fragmented at 940C for 35 min in a fragmentation buffer containing 40 mM Tris-acetate pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, the fragmented cRNA in a 6xSSPE-T hybridization buffer (1 M NaCl, 10 mM Tris pH 7.6, 0.005% Triton) is heated to 95°C for 5 min and subsequently to 45°C for 5 min before loading onto the Affymetrix HG_U133A probe array cartridge. The probe array is then incubated for 16 h at 45 °C at constant rotation (60 rpm). The washing and staining procedure can be performed in an Affymetrix Fluidics Station.
  • the probe anay can be exposed to several washes (i.e., 10 washes in 6> ⁇ SSPE-T at 25°C followed by 4 washes in 0.5xSSPE-T at 50°C).
  • the biotinylated cRNA can then be stained with a streptavidinphycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, OR) in 6xSSPE-T for 30 min at 25 °C followed by 10 washes in 6xSSPE-T at 25°C.
  • An antibody amplification step can then follow, using normal goat IgG as blocking reagent, final concentration 0.1 mg/ml (Sigma) and biotinylated anti- streptavidin antibody (goat), final concentration 3 mg/ml (Vector Laboratories). This can be followed by a staining step with a strep tavidin-phycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, OR) in 6xSSPE-T for 30 min at 25°C and 10 washes in 6xSSPE-T at 25°C.
  • the probe arrays are scanned (i.e., at 560 nm using a confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner G2500A)). The readings from the quantitative scanning can then be analyzed by the Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized to a common mean expression value of 150.
  • MAS 5.0 Affymetrix Gene Expression Analysis Software
  • the first analysis of the colon cancer survival data can be performed using censored survival time (in months) and 500 permutations. Significance analysis of microarrays (SAM) can then be used to select genes most closely correlated to survival. The subset of genes that conespond to an empirically derived, estimated false discovery rate (FDR) is then chosen. This subset of genes can then be used in subsequent analyses.
  • SAM microarrays
  • FDR estimated false discovery rate
  • Cluster 3.0 and Java TreeView 1.03 are used to cluster and visualize the SAM-selected genes.
  • a hierarchical clustering algorithm can be chosen, with complete linkage and the conelation coefficient (i.e., Pearson correlation coefficient) as the similarity metric.
  • the Dukes' staging clusters are manually created in the appropriate format.
  • SAM survival analysis can be used to identify a set of genes most correlated with censored survival time using the training set tumor samples. In one embodiment, a set of 53 genes was found, conesponding to a median expected false discovery rate (FDR) of 28%.
  • FDR median expected false discovery rate
  • genes denoted with (+) indicate a positive conelation to survival time and genes without the (+) notation indicate a negative conelation in survival time (over expression in poor prognosis cases). Included in this list of genes in Table 1 are several genes believed to be biologically significant, such as osteopontin and neuregulin.
  • Figure 1 A presents a graphical representation of the 53 SAM-selected genes (as described above) as a clustered heat map. The red color represents over-expressed genes relative to green, under-expressed genes.
  • Figure 1 A shows only the Dukes' stage B and C cases, whose outcome Dukes' staging predicts poorly. Since only genes conelated with survival are used in clustering, the distinctly illustrated clusters in the heatmap conespond to very different prognosis groups.
  • the 53 SAM-selected genes were also arranged by annotated Dukes' stage in Figure IB.
  • Figure 2A shows the Kaplan-Meier plot for two dominant clusters of genes conelated with stage B and C test set tumor samples. Clearly, these genes separated the cases into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1) (P ⁇ 0.001 using a log rank test).
  • Figure 2B presents a Kaplan-Meier plot of the survival times of Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference.
  • gene expression profiles separate good and poor prognosis cases better than Dukes' staging.
  • a gene- expression based classifier is more accurate at predicting patient prognosis than the traditional Dukes' staging.
  • Dukes' Staging provides only a probability of survival for each member of a population of patients, based on historical statistics. Accordingly, the prognosis of an individual patient can be predicted based on historical outcome probabilities of the associated Dukes' stage. For example, if a Dukes' C survival rate was 55% at 36 months of follow up, any individual Dukes' C patient would be classified as having a good prognosis since more than 50% of patients would be predicted to be alive.
  • a classifier of the invention was compared to the Dukes' clinical staging approach currently in widespread use.
  • a classifier (Classifier A) of the present invention predicted 100%, 69%, 55% and 20% for Adenomas, and Dukes' stages B, C and D cancers, respectively.
  • the overall accuracy was 77% (63% sensitivity/97% specificity).
  • Classifier A was evaluated in predicting prognosis for each patient at 36 months follow-up as compared to Dukes' staging predictions.
  • Table 2A Shown first in Table 2A are the relative accuracies of Dukes' staging and the cDNA classifier (molecular staging) for all tumors and then a comparison by Dukes' stage.
  • Table 2B Dukes' staging was particularly bad at predicting outcome for patients with poor prognosis (70% and 55% for all stages and B and C, respectively).
  • molecular staging as provided by the present invention, identified the good prognosis cases (the "default" classification using Dukes' staging), but also identified poor prognosis cases with a high degree of accuracy, Table 2C.
  • Tables 2A-2C also show the detailed confusion matrix for all samples in the dataset, showing the equivalent misclassification rate of both good and poor prognosis groups by the classifier of the subject invention.
  • Leave-one-out cross-validation technique can be utilized for evaluating the performance of a classifier construction method of the subject invention. This approach tends towards high variance in accuracy estimates, but with low bias.
  • a classifier of the subject invention can be created on all available training data, then tested for accuracy by classifying the left-out example.
  • a classifier was constructed in two steps: first a gene selection procedure was performed with SAM and then a support vector machine was constructed.
  • the gene selection approach used was a univariate selection. SAM (significance analysis of microanays) was the method chosen for selecting genes. Since gene selected was to be based on two classes (good vs.
  • the two-class SAM method can be used for selecting genes with the best d values.
  • SAM calculates false discovery rates empirically through the use of permutation analysis.
  • SAM provides an estimate of the false discovery rate (FDR) along with a list of genes considered significant relative to censored survival. This feature of SAM was used with this particular embodiment to select the number of genes that resulted in the smallest FDR possible. In one embodiment, this FDR was zero.
  • the set of 53 genes (significant genes, as described above) at a FDR of 28% was used in this particular embodiment. Using this subset of 53 genes, the samples were clustered as a way of visualizing the SAM results (see Figures IA and IB).
  • a linear support vector machine (SVM) was constructed.
  • the software used for this approach can be implemented in a weka machine learning toolkit.
  • a linear SVM was then chosen to reduce the potential for overfitting the data, given the small sample sizes and large dimensionality.
  • One further advantage of this approach is the transparency of the constructed model, which is of particular interest when comparing the classifier of the subject invention on two different platforms (see below).
  • using LOOCV via statistical analytic tools for comparing groups (i.e., parametric tests such as t-test/ANOVA; see also Dyrskjot L et al, "Identifying distinct classes of bladder carcinoma using microarrays," Nat.
  • M denotes genes that were used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and the U133A-limited cDNA classifier are marked by *. Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 3 are hereby incorporated by reference.
  • a third human colorectal cancer survival classifier in accordance with the present invention, was prepared using U133A-limited genes selected by LOOCV via statistical analytic tools (i.e., t-test).
  • the list of U133A-limited genes selected using LOOCV via t-test is provided in the following Table 4.
  • the named genes common to both the original classifier (a set of 43 genes) and the U133A-limited classifier are marked with an asterisk.
  • Table 5 illustrates seven genes selected by SAM survival analysis, where osteopontin and neuregulin are noted to be present and in common with the gene lists for all classifiers.
  • genes denoted with (+) indicate a positive conelation to survival time and genes without the (+) notation indicate a negative conelation in survival time (over expression in poor prognosis cases)
  • M denotes genes used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and U133A-limited cDNA classifier are marked by *.
  • Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 4 are hereby incorporated by reference.
  • Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 5 are hereby incorporated by reference.
  • Cross Platform Validation Systems and methods of the subject invention can be tested by applying a classifier to an immediately available, well-annotated, independent test set of colon cancer tumor samples (Denmark, as described above) run on the Affymetrix platform.
  • database software such as the Resourcer software from TIGR (see also Tsai J et al, "RESOURCER: a database for annotating and linking microanay resources within and across species," Genome Biol, 2:software0002.1-0002.4 (2001)
  • genes can be mapped out from the cDNA chip to a conesponding gene on the Affymetrix platform. The linkage is done by common Unigene IDs.
  • 12,951 genes (out of 32,000) were mapped to an Affymetrix U133A GeneChip.
  • probes on the cDNA chip are unknown expressed sequence tag markers (ESTs) which can reduce the number of usable genes identified.
  • ESTs unknown expressed sequence tag markers
  • a classifier of the subject invention can address this lack of conespondence in platforms.
  • a U133A-limited cDNA classifier was constructed in accordance with the subj ect invention by using the identical approach on this reduced set of overlapping genes. With the U133A-limited cDNA classifier, only those cDNA probes are chosen that (according to Resourcerer) mapped to an Affymetrix probe set. This approach enables cross-platform comparison.
  • the training set samples were used together with the test set tumor samples in a flip-dye design.
  • the end expression value from a cDNA probe is then the log2 of the training set to test set sample ratio.
  • This same reference RNA was used on two U133A Affymetrix chips.
  • the U133A chip value conesponding to a cDNA probe is the ratio of training set to test set sample (on U133A chips).
  • Each of the Affymetrix U133A anays (both the test set and the reference samples) was scaled to a constant average intensity (150) prior to taking the ratio and the test sample chip values were averaged.
  • the results of a full LOOCV for the U133A-limited classifier on the test set sample (Moffitt Cancer Center cDNA microanay data set; original 78 samples) are shown in Tables 6A-6C.
  • Figure 3A illustrates the survival curve for a cDNA classifier of the subject invention on the 78 training set samples (LOOCV);
  • Figure 3B illustrates the survival curve for the U133A-limited cDNA classifier (LOOCV);
  • Figure 3C illustrates the survival curve for an independent test set classification (Denmark test set sample).
  • Tables 6A-6C The confusion matrix and accuracy rates by Dukes' stage are also presented in Tables 6A-6C.
  • the U133A-limited classifier was tested on the test set of colorectal cancer samples from Denmark that were profiled on the Affymetrix U133A platform.
  • the normalized and scaled test-set data were evaluated with the U133A- limited cDNA classifier. Because the Denmark cases included only Dukes' stages B and C, classification of outcome by Dukes' staging would predict all samples to be of good prognosis.
  • the accuracy of the cDNA classifier was reduced from 72% in LOOCN of the training set (Tables 6A-6C) to 68%> in the Denmark cross-platform test set (Tables 7A-7C).
  • the present invention provides a colon cancer clinical classifier with significant accuracy in LOOCN that exceeds that of Dukes staging.
  • the utility of the classifier of the subject invention can be validated, such as against in an independent colon cancer population using a completely different microarray platform.
  • the gene classifier of the subject invention can be based on a core set of genes that have biological significance for any type of cancer, including human colon cancer progression.
  • the molecular staging/classifier of the subject invention provides more accurate predictions of patient outcome than is cunently possible with cunent clinical staging systems, which may, in fact, misclassify patients,
  • a set of genes is derived from a genome wide analysis of gene expression using known microanay analysis techniques (i.e., SAM). By clustering groups of patients with good and bad prognoses, it is illustrated that the prognosis/classifier of the subject invention presents outcome-rich information.
  • a supervised learning analysis can be used to identify a core set of informative genes.
  • a core set of 43 genes was identified that appeared in 75 % of the cross validation iterations and accurately predicted colorectal cancer survival.
  • This core set was derived from a 32,000-element cDNA microanay that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients.
  • a means for validating a prognosis/survival classifier is provided by the present invention.
  • a normalized and scaled oligonucleotide-based colorectal cancer database from Denmark was evaluated based on the Affymetrix U133A GeneChipTM.
  • a colorectal cancer classifier (U133A-based cDNA classifier) was produced on the training data set using a limited set of genes common to both the U133A and the cDNA microanay (for 78 genes). The U133A-based cDNA classifier was then applied directly to the normalized and scaled Denmark test population.
  • the classifier of the subject invention can identify those genes that are most biologically significant based on their frequency of appearance in the classification set.
  • those genes that are most biologically significant to colorectal cancer were identified using the classifier provided in Example 1. Specifically, osteopontin and neuregulin reported biological significance in the context of colorectal cancer.
  • Osteopontin a secreted glycoprotein and ligand for CD44 and ov/33, appears to have a number of biological functions associated with cellular adhesion, invasion, angiogenesis and apoptosis (see Fedarko NS et al, "Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer," Clin Cancer Res, 7:4060-6 (2001); Yeatman TJ and Chambers AF, "Osteopontin and colon cancer progression,” Clin Exp Metastasis, 20:85-90 (2003)).
  • osteopontin was identified as a gene whose expression was strongly associated with colorectal cancer stage progression (Agrawal D et al, "Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling," J Natl Cancer Inst, 94:513-21 (2002)).
  • INSIG-2 one of the 43 core classifier genes provided in Example 1, was recently identified as an osteopontin signature gene, suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival.
  • neuregulin appeared to have biological significance in the context of colorectal cancer based on frequency of appearance in the classification set of the present invention.
  • Neuregulin a ligand for tyrosine kinase receptors (ERBB receptors) may have biological significance in the context of colorectal cancer where cunent data suggest a strong relationship between colon cancer growth and the ERBB family of receptors (Carraway KL, 3rd, et al, "Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases," Nature, 387:512-6 (1997)).
  • Neuregulin was recently identified as a prognostic gene whose expression conelated with bladder cancer recunence (Dyrskjot L, et al, "Identifying distinct classes of bladder carcinoma using microanays,” Nat Genet, 33:90-6 (2003)). Accordingly, the identification of such genes may be significant in terms of gene therapy. For example, a therapeutic gene may be identified, which when reintroduced into tumor cells, may anest or even prevent growth in cancer cells. Additionally, using the classifier of the present invention, a therapeutic gene may be identified that enables increased responsiveness to interventions such as radiation or chemotherapy.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Theoretical Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Pathology (AREA)
  • Signal Processing (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Immunology (AREA)
  • Oncology (AREA)
  • Microbiology (AREA)
  • Hospice & Palliative Care (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention pertains to specific gene signatures for cancer that are used to predict survival and novel processes for identifying such gene signatures. In one embodiment, gene signatures for human colorectal cancer are identified and outcomes are linked to the specific gene signatures using significance analysis of microarrays (SAM) and support vector machines (SVM) to provide a prognosis/survival classifier.

Description

DESCRIPTION
METHODS FOR PREDICTING CANCER OUTCOME AND GENE SIGNATURES FOR USE THEREIN
CROSS-REFERENCE TO A RELATED APPLICATION This application claims the benefit of U.S. Provisional Application Serial No. 60/547,871, filed February 25, 2004, which is hereby incorporated by reference in its entirety.
BACKGROUND OF THE INVENTION In the last decade, scientists have labored to complete a high-quality, comprehensive sequence of the human genome. With its recent completion, a large number of genomic data sets have been made available in public databases. The available data, however, does not provide explanations regarding which aspects of human biology affect which genes. Researchers are just beginning to explore genomic function. Several technological advances have made it possible to accurately measure cellular constituents and therefore derive profiles. For example, new techniques provide the ability to monitor the expression level of a large number of transcripts at any one time (see, for example, Schena et al., "Quantitative monitoring of gene expression patterns with a complementary DNA micro-array," Science, 270:467-470 (1995); Lockhart et al, "Expression monitoring by hybridization to high-density oligonucleotide arrays," Nature Biotechnology, 14:1675-1680 (1996); and Blanchard et al., "Sequence to array: Probing the genome's secrets," Nature Biotechnology, 14:1649 (1996)). In organisms for which the complete genome is known, it is possible to analyze the transcripts of all genes within the cell. With other organisms, such as humans, for which there is an increasing knowledge regarding the genome, it is possible to simultaneously monitor large numbers of the genes within the cell. One aspect of human biology/genomic function that is of great interest to the medical research community is cancer. Currently, genetic samples have been taken from patients having various stages of various types of cancer. Such samples have provided an extensive genetic data collection. To provide a system of organization, such genetic data are collected in DNA microarrays, which are sometimes commonly referred to as biochips, DNA chips, gene arrays, gene chips, and genome chips. DNA microarrays exploit a phenomenon known as base-pairing or hybridization. To form the array, genetic samples are arranged in an orderly manner (typically in a rectangular grid) on a substrate. Examples of commonly used substrates include microplates and blotting membranes. Many modern microarrays include an array of oligonucleotide or peptide nucleic acid (PNA) probes, and the array is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization. The array on the chip is exposed to labeled sample DNA, hybridized, and the identity/abundance of complementary sequences are determined. There are two major uses of DNA microarray technology. The first involves identification of the gene sequence. The second involves determination of expression level of genes, generally referred to as the abundance of the genes. In particular, expression or abundance of a gene is a measure of a relative level of activity of the gene in replication or translation in the presence of the probe. By analyzing the abundance of various genes in people of various conditions, a relationship between the genetic state of a person, in terms of relative levels of activity of various genes of that person, and that person's condition is assessed. To conduct such analysis, such arrays of expression levels include metadata describing characteristics of the people whose genetic material is sampled and additional metadata which identifies specific genes whose expression levels are represented in such arrays. The use of microarrays are already being used for a number of beneficial purposes including, for example, identifying biomarkers of cancer (Welsh, JB et al., "Large-scale delineation of secreted protein biomarkers overexpressed in cancer tissue and serum," PNAS, 100(6):3410-3415 (March 2003)), creating gene expression-based classifications of cancers (Alzadeh, AA et al., "Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling," Nature, 403:513-11 (2000); and Garber, ME et al., "Diversity of gene expression in adenocarcinoma of the lung," Proc Natl Acad Sci USA, 98:13784-9 (2001)), and in drug discovery (Marton, MJ et al, "Drug target validation and identification of secondary drug target effects using Microarrays," Nat Med, 4(11): 1293-301 (1998); and Gray, NS et al., "Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors," Science, 281:533-538 (1998)). One tool that has been applied to microarrays to decipher and compare genome expression patterns in biological systems is Significance Analysis of Microarrays, or SAM (Tusher, V. et al., "Significance analysis of microarrays applied to ionizing radiation response," Proceedings of the National Academy of Sciences, 2001. First published Apr. 17, 2001, 10.1073/pnas.091062498). This statistical method was developed as a cluster tool for use in identifying genes with statistically significant changes in expression. SAM has been used for a variety of purposes, including identifying potential drugs that would be effective in treating various conditions associated with specific gene expressions (Bunney WE, et al., "Microarray technology: a review of new strategies to discover candidate vulnerability genes in psychiatric disorders," Am J Psychiatry, 160(4):657-66 (Apr. 2003)). The known SNM or (Support Vector Machine) (as described in Michael P. et ah, "Knowledge-based analysis of microarray gene expression data by using support vector machines," Proceedings of the National Academy of Sciences, 97(l):262-67 (2000)) is a correlation tool shown to perform well in multiple areas of biological analysis, including evaluating microarray expression data (Brown et al, "Knowledge-based analysis of microarray gene expression data by using support vector machines," Proc Natl Acad Sci USA, 97:262-267 (2000)), detecting remote protein homologies (Jaakkola, T. et al, "Using the Fisher kernel method to detect remote protein homologies," Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, CA (1999)), and recognizing translation initiation sites (Zien, A. et al, "Engineering support vector machine kernels that recognize translation initiation sites," Bioinformatics, 16(9):799-807 (2000)). When used for classification, SVMs separate a given set of binary labeled training data with a hyper-plane that is maximally distant from set of data (the "maximal margin hyper-plane"). Where no linear separation is possible, SVMs utilize the technique of "kernels" to automatically realize a non-linear mapping to a feature space (Furey, T.S. et al, "Support vector machine classification and validation of cancer tissue samples using microarray expression data," Bioinformatics, 16(10):906-914 (2000)). Ranked as the third most commonly diagnosed cancer and the second leading cause of cancer deaths in the United States (American Cancer Society, "Cancer facts and figures," Washington, DC: American Cancer Society (2000)), colon cancer is a deadly disease afflicting nearly 130,000 new patients yearly in the United States. Colon cancer is the only cancer that occurs with approximately equal frequency in men and women. There are several potential risk factors for the development of colon and/or rectal cancer. Known factors for the disease include older age, excessive alcohol consumption, sedentary lifestyle (Reddy, B.S., "Dietary fat and its relationship to large bowel cancer," Cancer Res., 41:3700-3705 (1981)), and genetic predisposition (Potter, JD "Colorectal cancer: molecules and populations," J Natl Cancer Institute, 91:916-932 (1999)). Several molecular pathways have been linked to the development of colon cancer (see, for example, Leeman MF, et al, "New insights into the roles of matrix metalloproteinases in colorectal cancer development and progression," J Pathol, 201(4):528-34 (2003); Kanazawa, T et al, "Does early polypoid colorectal cancer with depression have a pathway other than adenoma-carcinoma sequence?," Tumori., 89(4):408-l l (2003); and Notarnicola, M. et al, "Genetic and biochemical changes in colorectal carcinoma in relation to morphologic characteristics," Oncol Rep., 10(6): 1987- 91 (2003)), and the expression of key genes in any of these pathways may be affected by inherited or acquired mutation or by hypermethylation. A great deal of research has been performed with regard to identifying genes for which changes in expression may provide an early indicator of colon cancer or a predisposition for the development of colon cancer. Unfortunately, no research has yet been conducted on identifying specific genes associated with colorectal cancer and specific outcomes to provide an accurate prediction of prognosis. Survival of patients with colon and/or rectal cancer depends to a large extent on the stage of the disease at diagnosis. Devised nearly seventy years ago, the modified Dukes' staging system for colon cancer, discriminates four stages (A, B, C, and D), primarily based on clinicopathologic features such as the presence or absence of lymph node or distant metastases. Specifically, colonic tumors are classified by four Dukes' stages: A, tumor within the intestinal mucosa; B, tumor into muscularis mucosa; C, metastasis to lymph nodes and D, metastasis to other tissues. Of the systems available, the Dukes' staging system, based on the pathological spread of disease through the bowel wall, to lymph nodes, and to distant organ sites such as the liver, has remained the most popular. Despite providing only a relative estimate for cure for any individual patient, the Dukes' staging system remains the standard for predicting colon cancer prognosis, and is the primary means for directing adjuvant therapy. The Dukes' staging system, however, has only been found useful in predicting the behaviour of a population of patients, rather than an individual. For this reason, any patient with a Dukes A, B, or C lesion would be predicted to be alive at 36 months while a patient staged as Dukes D would be predicted to be dead. Unfortunately, application of this staging system results in the potential over-treatment or under-treatment of a significant number of patients. Further, Dukes' staging can only be applied after complete surgical resection rather than after a pre-surgical biopsy. Microarray technology, as described above, has permitted development of multi- organ cancer classifiers (Giordano, T.J. et al, "Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles," Am J Pathol, 159:1231-8 (2001); Ramaswamy, S. et al, "Multiclass cancer diagnosis using tumor gene expression signatures," Proc Natl Acad Sci USA, 98:15149-54 (2001); and Su, A.I. et al, "Molecular classification of human carcinomas by use of gene expression signatures," Cancer Res, 61:7388-93 (2001)), identification of tumor subclasses (Dyrskjot, L. et al, "Identifying distinct classes of bladder carcinoma using microarrays," Nat Genet, 33:90-6 (2003); Bhattacharjee, A. et al, "Classification of human lung carcinomas by mRΝA expression profiling reveals distinct adenocarcinoma subclasses," Proc Natl Acad Sci USA, 98:13790-5 (2001); Garber, M.E. et al, "Diversity of gene expression in adenocarcinoma of the lung," Proc Natl Acad Sci USA, 98:13784-9. (2001); and Sorlie, T. et al, "Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications," Proc Natl Acad Sci USA, 98:10869-74 (2001)), discovery of progression markers (Sanchez-Carbayo, M. et al, "Gene Discovery in Bladder Cancer Progression using cDNA Microarrays," Am J Pathol, 163:505-16 (2003); and Frederiksen, CM, et al, "Classification of Dukes' B and C colorectal cancers using expression arrays," J Cancer Res Clin Oncol, 129:263-71 (2003)); and prediction of disease outcome (Henshall, SM et al, "Survival analysis of genome-wide gene expression profiles of prostate cancers identifies new prognostic targets of disease relapse," Cancer Res, 63:4196-203 (2003); Shipp, MA et al, "Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning," Nat Med, 8:68-74 (2002); Beer, DG et al, "Gene-expression profiles predict survival of patients with lung adenocarcinoma," Nat Med, 8:816-24 (2002); Pomeroy, SL et al, "Prediction of central nervous system embryonal tumor outcome based on gene expression," Nature, 415:436-42 (2002); van 't Veer, LJ et al, "Gene expression profiling predicts clinical outcome of breast cancer,: Nature, 415:530-6. (2002); Vasselli, JR et al, "Predicting survival in patients with metastatic kidney cancer by gene-expression profiling in the primary tumor," Proc Natl Acad Sci USA, 100:6958- 63 (2003); and Takahashi, M. et al, "Gene expression profiling of clear cell renal cell carcinoma: gene identification and prognostic classification," Proc Natl Acad Sci USA, 98:9754-9 (2001)) in many types of cancer. Classification of patient prognosis by microarray analysis has promise in predicting the long-term outcome of any one individual based on the gene expression profile of the tumor at diagnosis. Inherent to this approach is the hypothesis that every tumor contains informative gene expression signatures, at the time of diagnosis, which can direct the biological behaviour of the tumor over time. To date, however, little success has been achieved in developing a classifier that will predict colon cancer outcome equivalent to or better than that which is possible using the standard clinicopathologic staging systems (i.e., Dukes' stage system). What is needed is a particularly effective mechanism for analyzing genomic array data to provide a classifier that accurately predicts cancer outcomes, in particular, colon cancer outcomes.
BRIEF SUMMARY OF THE INVENTION The present invention provides systems and methods for predicting outcomes in patients diagnosed with cancer. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier that provides a means for accurately predicting colon cancer outcome. In accordance with an aspect of the invention, genes are classified according to degree of correlation with a clinical outcome for a cancer of interest (such as colon cancer). These genes are used to establish a set of reference gene expression levels (also referred to herein as a "classifier"). Biological information regarding the patient is received and used to extrapolate intracellular gene expression. The intracellular gene expression levels are compared to those in the classifier to predict clinical outcome. In one embodiment of the invention, a method is provided in which the specific gene signatures for colon cancer are identified. To do so, frozen tumor specimens form patients with known outcomes are collected and frozen. The outcomes are linked to a specific core set of genes that are weighted in importance by (1) selecting genes of interest by applying microarray analysis; (2) producing a classifier using support vector machines (SVM); and (3) cross-validating the genes of interest and the classifier by comparing them against an independent set of test data. In a preferred embodiment, significance analysis of microarrays (SAM) is utilized to select genes of interest. Genome wide microarray analyses can produce large datasets that can be pattern- matched to clinicopathologic parameters such as patient outcomes and prognosis. Accordingly, the subject invention identifies gene expression signatures that would predict colon cancer outcome more accurately than the well-accepted Dukes' staging system. In one embodiment, a group of colon cancer patients was examined to develop a survival classifier, which was subsequently validated using an entirely independent test set of data derived on a different microarray platform at a different performance site. The classifier of the subject invention was ultimately based on a core set of genes selected for their correlation to survival. A number of the genes in the core set demonstrated intrinsic biological significance for colon cancer progression. With the ability to predict cancer outcomes/prognosis using the subject invention, appropriate treatment protocols can be selected for patients. For example, patients assessed using the subject invention and identified to have poor outcomes may be treated more aggressively or with specific agents (i.e., anti-sense agents, RNA inhibition agents, small molecule inhibitors of the cancer activity, gene therapy, etc.). Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial.
DESCRIPTION OF THE FIGURES The file of this patent contains at least one drawing executed in color. Copies of this patent with color drawing(s) will be provided by the Patent and Trademark Office upon request and payment of the necessary fee. Figure IA is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when correlated with prognosis/patient survival. Figure IB is a heatmap illustrating cluster analysis of genes selected in accordance with the present invention when grouped by Dukes' stage B and C. Figure 2A graphically illustrates a Kaplan-Meier survival curve based on gene expression profiling in accordance with the present invention. Figure 2B graphically illustrates a Kaplan-Meier survival curve based on Dukes' staging. Figures 3A-3C illustrate survival curves for molecular classifiers in accordance with the subject invention. DETAILED DISCLOSURE OF THE INVENTION The present invention provides systems and methods for predicting cancer prognosis and outcomes. Specifically, the subject invention utilizes molecular staging with gene expression profiles to stage patients with cancer. In a specific embodiment, the present invention provides a gene expression profile based classifier for predicting cancer outcomes/prognosis. Both microarray analysis and binary classification are used to create the classifier of the invention. The subject invention provides methods for predicting patient outcomes comprising: identifying genes that correlate with a clinical outcome for a cancer of interest (such as colon cancer); establishing a set of reference gene expression levels (also referred to herein as a "classifier") for said identified genes; receiving biological information regarding the patient; using the biological information to extrapolate intracellular gene expression; and comparing intracellular gene expression levels to those in the classifier to predict clinical outcome. Biological information of the invention includes, but is not limited to, clinical samples of bodily fluids or tissues; DNA profile information; and RNA profile information. Methods for preparing clinical samples for gene expression analysis are well known in the art, and can be carried out using commercially available kits. In one embodiment, the subject invention provides methods for predicting colon cancer patient outcomes using a SAM selected set of genes derived from a genome wide analysis of gene expression. Those patients with good and bad prognoses are first clustered into groups that suggest outcome-rich information that is likely present in the gene expression dataset. Subsequently, a supervised SVM analysis identifies a core set of genes that appears in a majority (i.e., 50% or greater, including for example, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%) of the cross validation folds and accurately predicts colon cancer survival. Preferably, a core set of genes that appears in 75%> of the cross validation folds is identified by an SVM to be used in predicting colon cancer survival. In one embodiment, a gene core set is derived from a cDNA microarray that includes both named and unnamed genes. The resultant gene set is highly accurate in predicting cancer survival when compared with Dukes staging data from the same patients. To validate a cDNA-based classifier of the subject invention, a normalized and scaled oligonucleotide-based cancer database is evaluated against a completely independent set of test data derived from a different microarray platform. Accordingly, the subject invention provides a system for predicting clinical outcome in a patient diagnosed with cancer, wherein the system is useful in offering support/advice in making treatment decisions. The system comprises (1) a data storage device for collecting data (i.e., gene data); and (3) a computing means for receiving and analyzing data to accurately determine genes associated with poor or good patient prognosis. A graphical user interface can be included with the systems of the invention to display clinical data as well as enable user-interaction. In one embodiment, the system of the invention further includes an intelligence system that can use the analyzed clinical data to classify gene samples and offer support/advice for making clinical decisions (i.e., to interpret predicted clinical outcome and provide appropriate treatment). An intelligence system of the subject invention can include, but is not limited to, artificial neural networks, fuzzy logic, evolutionary computation, knowledge-based systems, and artificial intelligence. In accordance with the subject invention, the computing means is preferably a digital signal processor, which can automatically and accurately analyze gene data and determine those genes that strongly correlate to clinical outcome. In one embodiment, the system of the subject invention is stationary. For example, the system of the invention can be used within a healthcare setting (i.e., hospital, physician's office).
Definitions As used herein, the term "patient" refers to humans as well as non-human animals including, and not limited to, mammals, birds, reptiles, amphibians, and fish. Preferred non-human animals include mammals (i.e., mouse, rat, rabbit, monkey, dog, cat, primate, pig). A patient may also include transgenic animals. In certain embodiments, a patient may be a laboratory animal raised by humans in a controlled environment other than its natural habitat. The term "cancer," as used herein, refers to a malignant tumor (i.e., colon or prostate cancer) or growth of cells (i.e., leukaemia). Cancers tend to be less differentiated than benign tumors, grow more rapidly, show infiltration, invasion, and destruction, and may metastasize. Cancer include, and are not limited to, colon and rectal cancers, fibrosarcoma, myxosarcoma, antiosarcoma, leukaemia, squamous cell carcinoma, basal cell carcinoma, malignant melanoma, renal cell carcinoma, and hepatocellular carcinoma. A "marker gene," as used herein, refers to any gene or gene product (i.e., protein, peptide, mRNA) that indicates a particular clinicopatho logical state (i.e., carcinoma, normal dysplasia and outcomes) or indicates a particular cell type, tissue type, or origin. The expression or lack of expression of a marker gene may indicate a particular physiological and/or diseased state of a patient, organ, tissue, or cell. Preferably, the expression or lack of expression may be determined using standard techniques such as RT-PCR, sequencing, immunochemistry, gene chip analysis, etc. In certain particular embodiments, the level of expression of a marker gene is quantifiable. The term "polynucleotide" or "oligonucleotide," as used herein, refers to a polymer of nucleotides. Typically, a polynucleotide comprises at least three nucleotides. The polymer may include natural nucleosides (i.e., adenosine, thymidine, guanosine, cytidine, uridine, deoxyadenosine, deoxythymidine, deoxyguanosine, and deoxycytidine), nucleoside analogs (i.e., 2-aminoadensoine, 2-thio-thymidine, inosine, pyrrolo- pyrimidine, 3-methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5- bromouridine, C5-fluorouridine, C5-iodouridine, C5-methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8-oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, and 2- thiocytidine), chemically modified bases, biologically modified bases (i.e., methylated bases), intercalated bases, modified sugars (i.e., 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose), or modified phosphate groups (i.e., phosphorothioates and 5'-N- phosphoramidite linkages). As used herein, the term "tumor" refers to an abnormal growth of cells. The growth of the cells of a tumor typically exceeds the growth of normal tissue and tends to be uncoordinated. The tumor may be benign (i.e., lipoma, fibroma, myxoma, lymphangioma, meningioma, nevus, adenoma, leiomyoma, mature teratoma, etc.) or malignant (i.e., malignant melanoma, ovarian cancer, carcinoma in situ, carcinoma, adenocarcinoma, liposarcoma, mesothelioma, squamous cell carcinoma, basal cell carcinoma, colon cancer, lung cancer, etc.). The term "bodily fluid," as used herein, refers to a mixture of molecules obtained from a patient. Bodily fluids include, but are not limited to, exhaled breath, whole blood, blood plasma, urine, semen, saliva, lymph fluid, meningal fluid, amniotic fluid, glandular fluid, sputum, feces, sweat, mucous, and cerebrospinal fluid. Bodily fluid also includes experimentally separated fractions of all of the preceding solutions or mixtures containing homogenized solid material, such as feces, tissues, and biopsy samples.
Computing Means Correlating genes to clinical outcomes in accordance with the subject invention can be performed using software on a computing means. The computing means can also be responsible for maintenance of acquired data as well as the maintenance of the classifier system itself. The computing means can also detect and act upon user input via user interface means known to the skilled artisan (i.e., keyboard, interactive graphical monitors) for entering data to the computing system. In one embodiment, the computing means further comprises means for storing and means for outputting processed data. The computing means includes any digital instrumentation capable of processing data input from the user. Such digital instrumentation, as understood by the skilled artisan, can process communicated data by applying algorithm and filter operations of the subject invention. Preferably, the digital instrumentation is a microprocessor, a personal desktop computer, a laptop, and/or a portable palm device. The computing means can be general purpose or application specific. The subject invention can be practiced in a variety of situations. The computing means can directly or remotely connect to a central office or health care center. In one embodiment, the subject invention is practiced directly in an office or hospital. In another embodiment, the subject invention is practiced in a remote setting, for example, personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, wherein the patient is located some distance from the physician. In a related embodiment, the computing means is a custom, portable design and can be carried or attached to the health care provider in a manner similar to other portable electronic devices such as a portable radio pr computer. The computing means used in accordance with the subject invention can contain at least one user-interface device including, but not limited to, a keyboard, stylus, microphone, mouse, speaker, monitor, and printer. Additional user-interface devices contemplated herein include touch screens, strip recorders, joysticks, and rollerballs. Preferably, the computing means comprises a central processing unit (CPU) having sufficient processing power to perform algorithm operations in accordance with the subject invention. The algorithm operations, including the microarray analysis operations (such as SAM or binary classification), can be embodied in the form of computer processor usable media, such as floppy diskettes, CD-ROMS, zip drives, nonvolatile memory, or any other computer-readable storage medium, wherein the computer program code is loaded into and executed by the computing means. Optionally, the operational algorithms of the subject invention can be programmed directly onto the CPU using any appropriate programming language, preferably using the C programming language. In certain embodiments, the computing means comprises a memory capacity sufficiently large to perform algorithm operations in accordance with the subject invention. The memory capacity of the invention can support loading a computer program code via a computer-readable storage media, wherein the program contains the source code to perform the operational algorithms of the subject invention. Optionally, the memory capacity can support directly programming the CPU to perform the operational algorithms of the subject invention. A standard bus configuration can transmit data between the CPU, memory, ports and any communication devices. In addition, as understood by the skilled artisan, the memory capacity of the computing means can be expanded with additional hardware and with saving data directly onto external mediums including, for example, without limitation, floppy diskettes, zip drives, non-volatile memory and CD-ROMs. Further, the computing means can also include the necessary software and hardware to receive, route and transfer data to a remote location. In one embodiment, the patient is hospitalized, and clinical data generated by a computing means is transmitted to a central location, for example, a monitoring station or to a specialized physician located in a different locale. In another embodiment, the patient is in remote communication with the health care provider. For example, patients can be located at personal residences, mobile clinics, vessels at sea, rural villages and towns without direct access to healthcare, and ambulances, and by using the classifier system of the invention, still provide clinical data to the health care provider. Advantageously, mobile stations, such as ambulances, and mobile clinics, can monitor patient health by using a portable computing means of the subject invention when transporting and/or treating a patient. To ensure patient privacy, security measures, such as encryption software and firewalls, can be employed. Optionally, clinical data can be transmitted as unprocessed or "raw" signal(s) and/or as processed signal(s). Advantageously, transmitting raw signals allows any software upgrades to occur at the remote location where a computing means is located. In addition, both historical clinical data and real-time clinical data can be transmitted. Communication devices such as wireless interfaces, cable modems, satellite links, microwave relays, and traditional telephonic modems can transfer clinical data from a computing means to a healthcare provider via a network. Networks available for transmission of clinical data include, but are not limited to, local area networks, intranets and the open internet. A browser interface, for example, NETSCAPE NAVIGATOR or INTERNET EXPLORER, can be incorporated into communications software to view the transmitted data. Advantageously, a browser or network interface is incorporated into the processing device to allow the user to view the processed data in a graphical user interface device, for example, a monitor. The results of algorithm operations of the subject invention can be displayed in the form of interactive graphics.
Dukes' Staging as a Classifier Since Dukes' staging describes the survival of a population of patients, rather than an individual, any individual patient can be classified as alive or dead using the survivorship of the population to predict that of the individual. In other words, if the survival of a Dukes C population is 55% at 36 months of follow up, the Dukes C individual patient would be classified as alive at 36 months but with only a 55% accuracy rate. By making these assumptions, the accuracy of a staging by a microarray classifier of the subject invention to that of a clinical staging system can be compared.
Identification of Prognosis-Related Genes As a first step in the survival analysis of microarray data, genes that best separate cancer patients with poor and good prognosis were identified. Censored-survival analysis using significance analysis of microarrays (SAM) or any other microarray analysis (i.e., clustering methods such as those disclosed by Eisen et al, "Cluster analysis and display of genome-wide expression patterns," Proc. Natl Acad. Sci. USA, 95:14863-14868 (1998); Alon et al, "Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays," Proc. Natl Acad. Sci. USA, 96:6745-6750 (1999); and Ben-Dor et al, "Tissue classification with gene expression profiles," J. Comput. Biol, 7:559-583 (2000); classification trees such those disclosed by Dubitzky et al, "A database system for comparative genomic hybridization analysis," IEEE Eng Med Biol Mag, 20(4):75-83 (2001); genetic algorithms such as those disclosed by Li et al, "Computational analysis of leukemia microarray expression data using the GA/KNN," in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); neural networks such as those disclosed by Hwang et al, "Applying machine learning techniques to analysis of gene expression data: cancer diagnosis," in Methods of Microarray Data Analysis, Kluwer Academic Publishers (2001); and the "Neighborhood Analysis" (a weighted correlation method) as disclosed by Golub et al, "Molecular classification of cancer: class discovery and class prediction by gene expression monitoring," Science, 286:531-537 (1999)) can be used to select genes correlated with prognosis in accordance with the subject invention. Using SAM or any other microarray analysis, genes can be selected that most closely correlate with selected survival times. Permutation analysis can then used to estimate the false discovery rate (FDR). The resultant mean-centered gene expression vectors can then be clustered and visualized using known computer software (i.e., Cluster 3.0 and Java TreeView 1.03, both of which are provided by Hoon MJLd, et al, "Open Source Clustering Software," Bioinformatics 2003, in press).
Classifier Construction and Evaluation According to the present invention, a gene classifier can be constructed to predict a set time of outcome among a set number of patients using microarray data produced on a cDNA platform. In one embodiment, the classifier of the subject invention is produced on a computing means that using SAM two-class gene selection and a support vector machine classification. In one embodiment, the SAM procedure is empirically set to select enough genes to satisfy a set FDR. Such selected genes can then be used in a linear support vector machine to classify the samples as having poor or good prognosis. Leave-one-out cross-validation (LOOCV) operation can also be utilized to construct a classifier (i.e., neural network-based classifier) as well as to estimate the prediction accuracy of the classifier of the subject invention. In one embodiment, the classification process includes both gene selection and SVM classification creation; therefore, both steps can be performed on each training set after the test example is removed. According to the subject invention, samples can be classified as having "good" or "poor" prognosis based on survival for a certain set amount of time. In a preferred embodiment, "good" or "poor" prognosis is based on more or less than 36 months, respectively. By using the leave-one-out cross validation approach, the subject invention provides a means for ranking the genes selected. The number of times a particular gene is chosen can be an indicator of the usefulness of that gene for general classification and may imply biological significance. In a preferred embodiment, the classifier of the subject invention is prepared by (1) SAM gene selection using a t-test and (2) classification using a neural network. The classifier is prepared after a test sample is left out (from the LOOCV) to avoid bias from the gene selection step. Since the classification problem is a binary decision, a t-test was used for gene selection. Preferably, once a gene set is selected, a feed-forward back-propogation neural network system (see Rumelhart, D.E. and J.L. McClelland, "Parallel Distributed Processing: Exploration in the Microstructure of Cognition," Cambridge, MA: MIT Press (1986); and Fahlman, S.E., "Faster-Learning Variations on Back-Propogation: An Empirical Study," Proceedings of the 1988 Connectionist Models Summer School, Los Altos, CA: Morgan-Kaufinann (1988)) is used. In one embodiment, a feed-forward back-propogation neural network with a single layer of 10 units is used. Neural network systems are extremely robust to both the number of genes selected and the level of noise in these genes.
Statistical Significance Differences between Kaplan-Meier curves can be evaluated using the log-rank test, which is well known to the skilled statistician. This can be performed both for the initial survival analysis and for the classifier results. In accordance with the present invention, the classifier can split the samples into various groups (i.e., two groups: those predicted as good or poor prognosis). Classifier accuracy can be reported to the user both as overall accuracy and as specificity/sensitivity. In one embodiment, a McNemar's Chi- Squared test is used to compare the molecular classifier with the use of a Dukes' staging classifier. In a related embodiment, several permutations of the dataset (i.e., 1,000 permutations) are used to measure the significance of the classifier results as compared to chance. Example 1 — Human Colon Cancer Survival Classifier Training Set Tumor Samples In one embodiment of the subject invention, a colon cancer survival classifier was developed using 78 tumor samples, including 3 adenomas and 75 cancers. Informative frozen colorectal cancer samples were selected from the Moffitt Cancer Center Tumor Bank (Tampa, Florida) based on evidence for good (survival > 36 mo) or poor prognosis (survival < 36 mo) from the Tumor Registry. Dukes' stages can include B, C, and D. In this particular embodiment, survival was measured as last contact minus collection date for living patients, or date of death minus collection date for patients who have died. hi this embodiment, the number of samples per Dukes' stage was as follows: 23 patients with stage B, 22 patients with stage C and 30 patients with stage D disease. Just as adenomas can be included to help train the classifier to recognize good prognosis patients, Dukes D patients with synchronous metastatic disease can be used to train the classifier to recognize poor prognosis patients. In a related embodiment, all samples were selected to have at least 36 months of follow-up. The follow-up results in this embodiment showed that thirty-two of the patients survived more than 36 months, while 46 patients died within 36 months. With this particular embodiment, the median follow-up time for all 78 patients was 27.9 months. The median follow-up for the poor prognosis cases (<36 months survival) was 11.7 months and for the good prognosis cases (>36 months survival) it was 64.2 months. Since the NIH consensus conference in 1990, chemotherapeutic application in the United States has been relatively homogeneous, with nearly all Dukes stage B avoiding chemotherapy, and nearly all Dukes stage C receiving 6 months of adjuvant 5- fluorouracil (5-FU) and leucovorin.
Test Set Tumor Samples (Denmark) In another embodiment, eighty-eight patients with Dukes' stage B and C colorectal cancer and a minimum follow-up time of 60 months were selected for array hybridization. Ten micrograms of total RNA were used as starting material for the cDNA preparation and hybridized to Affymetrix U133A GeneChips (Santa Clara, CA) by standard protocols supplied by the manufacturer. The UL 33A gene chip is disclosed in U.S. Patent Nos. 5,445,934; 5,700,637; 5,744,305; 5,945,334; 6,054,270; 6,140,044; 6,261,776; 6,291,183; 6,346,413; 6,399,365; 6,420,169»; 6,551,817; 6,610,482; and 6,733,977; and in European Patent Nos. 619,321 and 373,203, all of which are hereby incorporated in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification-. With this particular embodiment, there were 28 patients with stage B and 60 patients with stage C colorectal cancers. All Dukes' stage B patients were treated by surgical resection alone whereas all C patients received 5-FU/leucovorin adjuvant chemotherapy in addition to surgery. Colorectal tumor samples were obtained fresh from surgery and were immediately snap-frozen in fluid nitrogem but were not microdissected, with the potential for inclusion of samples with < 80% pmrity. Total RNA was isolated from 50-150 mg tumor sample using RNAzol (WAK-Cl emie Medical) or using spin column technology (Sigma) according to the manufacturer's instructions. Results were noted (i.e., fifty-seven of the patients survived more than 3<5 months, while 31 died within 36 months).
32K cDNA Array Hybridization and scanning According to the subject invention, samples can be microdissected (>80% tumor cells) by frozen section guidance and RNA extraction performed using Trizol followed by secondary purification on RNAEasy columns. The sa-mples can then be profiled on cDNA arrays (i.e., TIGR's 32,488-element spotted cDl A anays, containing 31,872 human cDNAs representing 30,849 distinct transcripts - 23,936 unique TIGR TCs and 6,913 ESTs, 10 exogenous controls printed 36 times, and Λ negative controls printed 36- 72 times). In one embodiment, tumor samples are co-hybridized with a common reference pool in the Cy5 channel for normalization purposes. cDNA synthesis, aminoallyl labeling and hybridizations can be performed according to previously published protocols (see Hegde, P. et al, "A concise guide to cDNA microareay analysis," Biotechniques; 29:552-562 (2000) and Yang, IV, et al, "Within t e fold: assessing differential expression measures and reproducibility in microarray assays," Genome Biol; 3 :research0062 (2002)). For example, labeled first-strand cDNA is prepared, and co- hybridized with labeled samples are prepared, from a universal reference RNA consisting of equimolar quantities of total RNA derived from three cell lines, CaCO2 (colon), KM12L4A (colon), and UI 18MG (brain). Detailed protocols and description of the array are available at <http://cancer.tigr.org>. Array probes are identified and local background can be subtracted in Spotfinder (Saeed, A.I. et al, "TM4: a free, open-source system for microanay data management and analysis," Biotechniques; 34:374-8 (2003)). Individual anays can be normalized in MIDAS (see Saeed, A.I. ibid.) using LOWESS (an algorithm known to the skilled artisan for use in normalizing data) with smoothing parameter set to 0.33.
Microarray Hybridization and Scanning of Denmark Samples The first and second strand cDNA synthesis can be performed using the Superscript II System (Invitrogen) according to the manufacturer's instructions except using an oligodT primer containing a T7 RNA polymerase promoter site. Labeled cRNA is prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo). Biotin labeled CTP and UTP (Enzo) are used in the reaction together with unlabeled NTP's. Following the IVT reaction, the unincorporated nucleotides are removed using RNeasy columns (Qiagen). Fifteen micrograms of cRNA are fragmented at 940C for 35 min in a fragmentation buffer containing 40 mM Tris-acetate pH 8.1, 100 mM KOAc, 30 mM MgOAc. Prior to hybridization, the fragmented cRNA in a 6xSSPE-T hybridization buffer (1 M NaCl, 10 mM Tris pH 7.6, 0.005% Triton) is heated to 95°C for 5 min and subsequently to 45°C for 5 min before loading onto the Affymetrix HG_U133A probe array cartridge. The probe array is then incubated for 16 h at 45 °C at constant rotation (60 rpm). The washing and staining procedure can be performed in an Affymetrix Fluidics Station. The probe anay can be exposed to several washes (i.e., 10 washes in 6><SSPE-T at 25°C followed by 4 washes in 0.5xSSPE-T at 50°C). The biotinylated cRNA can then be stained with a streptavidinphycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, OR) in 6xSSPE-T for 30 min at 25 °C followed by 10 washes in 6xSSPE-T at 25°C. An antibody amplification step can then follow, using normal goat IgG as blocking reagent, final concentration 0.1 mg/ml (Sigma) and biotinylated anti- streptavidin antibody (goat), final concentration 3 mg/ml (Vector Laboratories). This can be followed by a staining step with a strep tavidin-phycoerythrin conjugate, final concentration 2 mg/ml (Molecular Probes, Eugene, OR) in 6xSSPE-T for 30 min at 25°C and 10 washes in 6xSSPE-T at 25°C. The probe arrays are scanned (i.e., at 560 nm using a confocal laser-scanning microscope (Hewlett Packard GeneArray Scanner G2500A)). The readings from the quantitative scanning can then be analyzed by the Affymetrix Gene Expression Analysis Software (MAS 5.0) and normalized to a common mean expression value of 150.
Survival Analysis The first analysis of the colon cancer survival data can be performed using censored survival time (in months) and 500 permutations. Significance analysis of microarrays (SAM) can then be used to select genes most closely correlated to survival. The subset of genes that conespond to an empirically derived, estimated false discovery rate (FDR) is then chosen. This subset of genes can then be used in subsequent analyses. In one embodiment, Cluster 3.0 and Java TreeView 1.03 are used to cluster and visualize the SAM-selected genes. A hierarchical clustering algorithm can be chosen, with complete linkage and the conelation coefficient (i.e., Pearson correlation coefficient) as the similarity metric. In another embodiment, the Dukes' staging clusters are manually created in the appropriate format. Clustering software produces heatmap (see Figures IA and IB) and dendrograms. The highest level partition of the SAM-selected genes can then be chosen as a survival grouping. Given two clusters of survival times, Kaplan-Meier curves can be plotted (see Figures 2 A and 2B). Identification of Prognosis-Related Genes According to the subject invention, SAM survival analysis can be used to identify a set of genes most correlated with censored survival time using the training set tumor samples. In one embodiment, a set of 53 genes was found, conesponding to a median expected false discovery rate (FDR) of 28%. These genes are listed in the following Table 1, wherein genes denoted with (+) indicate a positive conelation to survival time and genes without the (+) notation indicate a negative conelation in survival time (over expression in poor prognosis cases). Included in this list of genes in Table 1 are several genes believed to be biologically significant, such as osteopontin and neuregulin.
Figure imgf000023_0001
Figure imgf000024_0001
Figure imgf000025_0001
Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 1 are hereby incorporated by reference. Figure 1 A presents a graphical representation of the 53 SAM-selected genes (as described above) as a clustered heat map. The red color represents over-expressed genes relative to green, under-expressed genes. Figure 1 A shows only the Dukes' stage B and C cases, whose outcome Dukes' staging predicts poorly. Since only genes conelated with survival are used in clustering, the distinctly illustrated clusters in the heatmap conespond to very different prognosis groups. The 53 SAM-selected genes were also arranged by annotated Dukes' stage in Figure IB. Unlike Figure IA, where two gene groups were apparent, there was no discernible gene expression grouping when ananged by Dukes' stage. Figure 2A shows the Kaplan-Meier plot for two dominant clusters of genes conelated with stage B and C test set tumor samples. Clearly, these genes separated the cases into two distinct clusters of patients with good prognosis (cluster 2) and poor prognosis (cluster 1) (P<0.001 using a log rank test). Figure 2B presents a Kaplan-Meier plot of the survival times of Dukes' stage B and C tumors grouped by stage, showing no statistically significant difference. As illustrated in figures IA, IB, 2 A, and 2B, gene expression profiles separate good and poor prognosis cases better than Dukes' staging. This suggests that a gene- expression based classifier, as provided by the present invention, is more accurate at predicting patient prognosis than the traditional Dukes' staging. Dukes' Staging as a Prognosis Classifier As noted above, Dukes' staging provides only a probability of survival for each member of a population of patients, based on historical statistics. Accordingly, the prognosis of an individual patient can be predicted based on historical outcome probabilities of the associated Dukes' stage. For example, if a Dukes' C survival rate was 55% at 36 months of follow up, any individual Dukes' C patient would be classified as having a good prognosis since more than 50% of patients would be predicted to be alive.
Performance of a Colorectal Cancer Survival Classifier of the present invention as compared to Dukes' Staging In order to determine the value of the human colon cancer prognosis/survival classifier of the subject invention, a classifier of the invention was compared to the Dukes' clinical staging approach currently in widespread use. In an initial set of 78 tumors (from the test set tumor samples described above), a classifier (Classifier A) of the present invention predicted 100%, 69%, 55% and 20% for Adenomas, and Dukes' stages B, C and D cancers, respectively. The overall accuracy was 77% (63% sensitivity/97% specificity). Using LOOCV, Classifier A was evaluated in predicting prognosis for each patient at 36 months follow-up as compared to Dukes' staging predictions. The results of LOOCV demonstrated that Classifier A of the subject invention was 90% accurate (93% sensitivity/84% specificity) in predicting the conect prognosis for each patient at 36 month of follow-up. A log-rank test of the two predicted groups (good and poor prognosis) was significant (P<0.001), demonstrating the ability of Classifier A to distinguish the two outcomes (Figure 2A). Permutation analysis demonstrates the result is better than possible by chance (PO.001 - 1000 permutations). This result is also significantly higher than that observed using Dukes' staging as a classifier (77%) for the same group of patients (P=0.03878). The results for both Dukes' staging and molecular staging are summarized in Tables 2A-2C below. Shown first in Table 2A are the relative accuracies of Dukes' staging and the cDNA classifier (molecular staging) for all tumors and then a comparison by Dukes' stage. As shown in Table 2B, Dukes' staging was particularly bad at predicting outcome for patients with poor prognosis (70% and 55% for all stages and B and C, respectively). In contrast, molecular staging, as provided by the present invention, identified the good prognosis cases (the "default" classification using Dukes' staging), but also identified poor prognosis cases with a high degree of accuracy, Table 2C. Tables 2A-2C also show the detailed confusion matrix for all samples in the dataset, showing the equivalent misclassification rate of both good and poor prognosis groups by the classifier of the subject invention.
Figure imgf000027_0001
Figure imgf000028_0001
Figure imgf000028_0002
* Dukes' staging vs. cDNA Classifier, P=0.03878, one-sided McNemar's test.
Classifier Construction Leave-one-out cross-validation technique can be utilized for evaluating the performance of a classifier construction method of the subject invention. This approach tends towards high variance in accuracy estimates, but with low bias. Within each step of the leave-one-out cross-validation (or fold), a classifier of the subject invention can be created on all available training data, then tested for accuracy by classifying the left-out example. In one embodiment, a classifier was constructed in two steps: first a gene selection procedure was performed with SAM and then a support vector machine was constructed. In a related embodiment, the gene selection approach used was a univariate selection. SAM (significance analysis of microanays) was the method chosen for selecting genes. Since gene selected was to be based on two classes (good vs. poor prognosis), the two-class SAM method can be used for selecting genes with the best d values. SAM calculates false discovery rates empirically through the use of permutation analysis. SAM provides an estimate of the false discovery rate (FDR) along with a list of genes considered significant relative to censored survival. This feature of SAM was used with this particular embodiment to select the number of genes that resulted in the smallest FDR possible. In one embodiment, this FDR was zero. The set of 53 genes (significant genes, as described above) at a FDR of 28% was used in this particular embodiment. Using this subset of 53 genes, the samples were clustered as a way of visualizing the SAM results (see Figures IA and IB). Once the genes were selected using the SAM method, a linear support vector machine (SVM) was constructed. The software used for this approach can be implemented in a weka machine learning toolkit. A linear SVM was then chosen to reduce the potential for overfitting the data, given the small sample sizes and large dimensionality. One further advantage of this approach is the transparency of the constructed model, which is of particular interest when comparing the classifier of the subject invention on two different platforms (see below). In another embodiment, using LOOCV via statistical analytic tools for comparing groups (i.e., parametric tests such as t-test/ANOVA; see also Dyrskjot L et al, "Identifying distinct classes of bladder carcinoma using microarrays," Nat. Genet., 33:90- 6 (2003)), a list of 43 genes (from the 53 SAM selected genes as described above) was selected for use in constructing a second human colorectal cancer survival classifier, in accordance with the present invention. The list of 43 genes is provided in the following Table 3.
Figure imgf000030_0001
Figure imgf000031_0001
Figure imgf000032_0001
Figure imgf000033_0001
Figure imgf000034_0001
Figure imgf000035_0001
M denotes genes that were used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and the U133A-limited cDNA classifier are marked by *. Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 3 are hereby incorporated by reference.
In yet another embodiment, a third human colorectal cancer survival classifier, in accordance with the present invention, was prepared using U133A-limited genes selected by LOOCV via statistical analytic tools (i.e., t-test). The list of U133A-limited genes selected using LOOCV via t-test is provided in the following Table 4. The named genes common to both the original classifier (a set of 43 genes) and the U133A-limited classifier are marked with an asterisk. Table 5 illustrates seven genes selected by SAM survival analysis, where osteopontin and neuregulin are noted to be present and in common with the gene lists for all classifiers. In Table 5, genes denoted with (+) indicate a positive conelation to survival time and genes without the (+) notation indicate a negative conelation in survival time (over expression in poor prognosis cases)
Figure imgf000035_0002
Figure imgf000036_0001
Figure imgf000037_0001
Figure imgf000038_0001
Figure imgf000039_0001
Figure imgf000040_0001
Figure imgf000041_0001
Figure imgf000042_0001
M denotes genes used to classify 75% of all tumors, and genes appearing in both the cDNA classifier and U133A-limited cDNA classifier are marked by *. Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 4 are hereby incorporated by reference.
Figure imgf000043_0001
Any and all of the nucleotide and/or amino acid sequences associated with the accession numbers listed in Table 5 are hereby incorporated by reference.
Cross Platform Validation Systems and methods of the subject invention can be tested by applying a classifier to an immediately available, well-annotated, independent test set of colon cancer tumor samples (Denmark, as described above) run on the Affymetrix platform. Using database software such as the Resourcer software from TIGR (see also Tsai J et al, "RESOURCER: a database for annotating and linking microanay resources within and across species," Genome Biol, 2:software0002.1-0002.4 (2001)), genes can be mapped out from the cDNA chip to a conesponding gene on the Affymetrix platform. The linkage is done by common Unigene IDs. In one embodiment, 12,951 genes (out of 32,000) were mapped to an Affymetrix U133A GeneChip. In certain instances, probes on the cDNA chip are unknown expressed sequence tag markers (ESTs) which can reduce the number of usable genes identified. Thus, a classifier of the subject invention can address this lack of conespondence in platforms. Accordingly, in a related embodiment, a U133A-limited cDNA classifier was constructed in accordance with the subj ect invention by using the identical approach on this reduced set of overlapping genes. With the U133A-limited cDNA classifier, only those cDNA probes are chosen that (according to Resourcerer) mapped to an Affymetrix probe set. This approach enables cross-platform comparison. For example, the training set samples were used together with the test set tumor samples in a flip-dye design. The end expression value from a cDNA probe is then the log2 of the training set to test set sample ratio. This same reference RNA was used on two U133A Affymetrix chips. Once the U133A-limited cDNA classifier was constructed, a linear scaling factor based on the expression of a common training set (H. Lee Moffitt Cancer Center & Research Institute, Tampa, Florida) sample applied to both the cDNA microanays and the U133A GeneChips, was applied equally to all Affymetrix samples (training set as well as test set samples from DENMARK). Using this assumption, the U133A chip value conesponding to a cDNA probe is the ratio of training set to test set sample (on U133A chips). Each of the Affymetrix U133A anays (both the test set and the reference samples) was scaled to a constant average intensity (150) prior to taking the ratio and the test sample chip values were averaged. The results of a full LOOCV for the U133A-limited classifier on the test set sample (Moffitt Cancer Center cDNA microanay data set; original 78 samples) are shown in Tables 6A-6C. The accuracy of the U133A-limited classifier was 72% (80% sensitivity/59% specificity), which contrasted from the original cDNA classifier results (90%), P=0.001154). Many ESTs were selected both in the SAM survival analysis and in the original cDNA-based classifier, indicating unknown genes (ESTs) may be very important to colorectal cancer outcome. The U133A-limited classifier was not significantly different, however, than the Dukes' staging (11%), P=0.4862 using a two- sided McNemar's test, and still significantly discriminated the two groups, as can be seen in Figure 3B (PO.001). Figures 3 A through 3C illustrate survival curves for molecular classifiers in accordance with the subject invention. Specifically, Figure 3A illustrates the survival curve for a cDNA classifier of the subject invention on the 78 training set samples (LOOCV); Figure 3B illustrates the survival curve for the U133A-limited cDNA classifier (LOOCV); and Figure 3C illustrates the survival curve for an independent test set classification (Denmark test set sample). A large difference in sensitivity can be seen between the Dukes' method and the classifier (Tables 6A-6C). The confusion matrix and accuracy rates by Dukes' stage are also presented in Tables 6A-6C.
Figure imgf000045_0001
With respect to comparing the predictive power of a classifier of the subject invention to Dukes' staging, the U133A-limited classifier was tested on the test set of colorectal cancer samples from Denmark that were profiled on the Affymetrix U133A platform. The normalized and scaled test-set data were evaluated with the U133A- limited cDNA classifier. Because the Denmark cases included only Dukes' stages B and C, classification of outcome by Dukes' staging would predict all samples to be of good prognosis. The accuracy of the cDNA classifier was reduced from 72% in LOOCN of the training set (Tables 6A-6C) to 68%> in the Denmark cross-platform test set (Tables 7A-7C). A diminished accuracy (4%») was expected due to the limitations imposed by cross-platform analyses, however this reductiom was very small compared to that caused by limiting the classifier gene set to U133A content. This result is not significantly different from that achieved by classification using Dukes' staging (64%, P=0.7194 using a two sided McΝemar's test) and is better than other reported results (47%) (see Sorlie T et al, "Repeated observation of breast tumor subtypes in independent gene expression data sets," Proc Natl Acad Sci U S A, 100:84 18-23 (2003)) for cross-platform analyses where scaling was required. Moreover, the cla_ssifier of the subject invention was able to predict the outcome for poor prognosis patients (sensitivity) with an accuracy of 55 % whereas 0% would be predicted conectly by Dukes' staging.
Figure imgf000046_0001
Figure imgf000046_0002
Figure imgf000047_0001
The present invention provides a colon cancer clinical classifier with significant accuracy in LOOCN that exceeds that of Dukes staging. The utility of the classifier of the subject invention can be validated, such as against in an independent colon cancer population using a completely different microarray platform. The gene classifier of the subject invention can be based on a core set of genes that have biological significance for any type of cancer, including human colon cancer progression.
Application of Prognosis Classifier with Therapy The benefit of adjuvant chemotherapy for colorectal cancer appears limited to patients with Dukes stage C disease where the cancer has metastasized to lymph nodes at the time of diagnosis. For this reason, the clinicopathological Dukes' staging system is critical for determining how adjuvant therapy is administered. Unfortunately, as noted above, Dukes' staging is not very accurate in predicting overall survival and thus its application likely results in the treatment of a large number of patients to benefit an unknown few. Alternatively, there are a number of patients who would benefit from therapy that do not receive it based on the Dukes' staging system. Accordingly, an important contribution of the prognosis/survival classifier of the present invention is the ability to identify those Dukes' stage B and C cases for which chemotherapy may be beneficial. The molecular staging/classifier of the subject invention provides more accurate predictions of patient outcome than is cunently possible with cunent clinical staging systems, which may, in fact, misclassify patients, hi accordance with the present invention, a set of genes is derived from a genome wide analysis of gene expression using known microanay analysis techniques (i.e., SAM). By clustering groups of patients with good and bad prognoses, it is illustrated that the prognosis/classifier of the subject invention presents outcome-rich information. In a further aspect of the present invention, a supervised learning analysis can be used to identify a core set of informative genes. In a prefened embodiment, a core set of 43 genes was identified that appeared in 75 % of the cross validation iterations and accurately predicted colorectal cancer survival. This core set was derived from a 32,000-element cDNA microanay that included both named and unnamed genes. This gene set was highly accurate in predicting survival when compared with Dukes' staging data from the same patients. A means for validating a prognosis/survival classifier is provided by the present invention. In one embodiment, to validate a cDNA-based classifier for human colorectal cancer, a normalized and scaled oligonucleotide-based colorectal cancer database from Denmark was evaluated based on the Affymetrix U133A GeneChipTM. In a related embodiment, a colorectal cancer classifier (U133A-based cDNA classifier) was produced on the training data set using a limited set of genes common to both the U133A and the cDNA microanay (for 78 genes). The U133A-based cDNA classifier was then applied directly to the normalized and scaled Denmark test population. In addition to identifying those patients for whom therapy is most beneficial, the classifier of the subject invention can identify those genes that are most biologically significant based on their frequency of appearance in the classification set. In one embodiment, those genes that are most biologically significant to colorectal cancer were identified using the classifier provided in Example 1. Specifically, osteopontin and neuregulin reported biological significance in the context of colorectal cancer. Osteopontin, a secreted glycoprotein and ligand for CD44 and ov/33, appears to have a number of biological functions associated with cellular adhesion, invasion, angiogenesis and apoptosis (see Fedarko NS et al, "Elevated serum bone sialoprotein and osteopontin in colon, breast, prostate, and lung cancer," Clin Cancer Res, 7:4060-6 (2001); Yeatman TJ and Chambers AF, "Osteopontin and colon cancer progression," Clin Exp Metastasis, 20:85-90 (2003)). Using an oligonucleotide microanay platform, osteopontin was identified as a gene whose expression was strongly associated with colorectal cancer stage progression (Agrawal D et al, "Osteopontin identified as lead marker of colon cancer progression, using pooled sample expression profiling," J Natl Cancer Inst, 94:513-21 (2002)). INSIG-2, one of the 43 core classifier genes provided in Example 1, was recently identified as an osteopontin signature gene, suggesting that an osteopontin pathway may be prominent in regulating colon cancer survival. Similarly, neuregulin appeared to have biological significance in the context of colorectal cancer based on frequency of appearance in the classification set of the present invention. Neuregulin, a ligand for tyrosine kinase receptors (ERBB receptors), may have biological significance in the context of colorectal cancer where cunent data suggest a strong relationship between colon cancer growth and the ERBB family of receptors (Carraway KL, 3rd, et al, "Neuregulin-2, a new ligand of ErbB3/ErbB4-receptor tyrosine kinases," Nature, 387:512-6 (1997)). Neuregulin was recently identified as a prognostic gene whose expression conelated with bladder cancer recunence (Dyrskjot L, et al, "Identifying distinct classes of bladder carcinoma using microanays," Nat Genet, 33:90-6 (2003)). Accordingly, the identification of such genes may be significant in terms of gene therapy. For example, a therapeutic gene may be identified, which when reintroduced into tumor cells, may anest or even prevent growth in cancer cells. Additionally, using the classifier of the present invention, a therapeutic gene may be identified that enables increased responsiveness to interventions such as radiation or chemotherapy.
Sequences
ACCESSION No. AA149253
ORIGIN 1 aatatggaca gggagtctca ttgtgtttat catatcaatt aatattacag tacatccttg 61 gtaatacaaa attgtacacc ttcatcaaat aaattaggat aaattaaacc aataaattat 121 gcaaagtctt cagaacaata gacaacaaca aaaattcaca attgaaattg cctctagcta 181 aaaaaaacaa acaaaaatca aaaattgact ttatcagttc agttattgta ctatattcaa 241 atcaaagggt ctttattaca aaaaagagct taataatgct atttacaaca tattgctaaa 301 taatataaag gcagtgtttt gtcacggttt atactatata catatgagaa atggctggga 361 caatattgag ggaagcccat gaccttttgg attcttccag gtagcgctga gaccnatccc 421 aatacatttt ttttccttag ttccaaattt gganggcgta atatngcagt tttnagaaat 481 tttccncccc ccntttttag gggggattgg atattttana aaaattccgg atggaatacg 541 gtttccccna aggagggtag cntggtt
ACCESSION No. AA775616
ORIGIN 1 tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat 61 acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt 121 gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat 181 taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc 241 ttttagttta cagggagtrt ccatgaagcc acaaactaaa ctaattatca aacacatcag 301 ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat 361 gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt 421 gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg 481 aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg 541 ctg
ACCESSION No. AA045075
ORIGIN 1 ttttttnttt tttttttttt tttttttttt tccaggaaag acagatgtta tttaccacca 61 atgaattttt atcatattta aatgaacttg aaaatgtcat tcaactcaaa tccctcaatc 121 aacttacttc agcccattct gaaacttcat attgcagcaa accagccatg tgaaagaaat 181 aaattcaat
ACCESSION No. AA425320
ORIGIN 1 ttttcaggtt gtaaatattt atatttctct cacatacaat gttgtatgag acacttgttt 61 taatatgtat ccataggatt aatactcata tggagtataa tgtggaaaag tgcagaacta 121 aagaaataag tctatccgaa aacaaaagca cacatttctc aggatttaaa aatattgcac 181 atagtaaggt tgcacagaaa ttactggctg gttttacaaa cagaatgagg tatcagtcaa 241 tctctagata aagatgagag agaggataaa ctacacacac acaaacacat aaatccatac 301 taagacctaa gagtgccaac aactaagaaa gaaatatgaa aaagctatgt taggtagcca 361 ggatttcaac actacaaaat catttttagg ctggaaccaa acacataaca atctcttggc 421 aatatttcgt taagttttca acttttttcc agcctaaatg actatgggca ataaaaccat 481 ttcctttacc ccagttctac tgtagaaagg cacagcgctg tggtaaatat caaaccattc 541 ctttctcaac
ACCESSION No. AA437223
ORIGIN 1 tttggtgaat aaactaacag ctttattaat gaaggcaaac atcagatcat tgtatgaata 61 ttatatatat atataaaaag aaatccaaac taacagcatt gtatttcaaa agtactgtac 121 ttctgtttct tttaaagaga cttgtcatct gtttttataa aacaaaatgg gtactcttct 181 cctaaaaaat cctggaaaaa tgaaatagtc aatttcaagc tgatgaattg aacacacctt 241 tctttaaatg cagactattg ctaggaagca aataaagtca agcatcagaa agaagatgta 301 tgagaaatgc atgaaagtca gagaaaaggg atgtagtgaa attactgcta atctttcccc 361 cctatattca aagaccatcc aaaactggtc tttcatacaa atataaaata actataaaga 421 gagggaattt gaaaccatac ccatctgaaa tc
ACCESSION No. AA479270
ORIGIN 1 ctctgaattc atttatttag aggtaaaaca cagccattca aaattgtgga atacaatgtc 61 tacacacaga ataaggttgg ggaattaagc tgaattgtta tattccattc acattaataa 121 atatttttaa agaagaaatt gtagatttta aaagcttcat tagacactag tgacacatac 181 aaataactaa actctcatac tgcttgattt tcaggttgaa aggttacaat aatctatata 241 tttcaattac atggcagtaa atacaaaagc attttaaaca tcttttgaac tgtgtagtat 301 actataagca ggagttt
ACCESSION No. AA486233
ORIGIN 1 caaattgaat attttattaa catggtagtt gcctttgtaa catgtgcaca cacactcgca 61 cactcagaat gatctgcctg ggggaaaaat actaaatatg cctaagggga aaatgaaaaa 121 taaaaaaatt cctgtaggtt ttcattattg taggcaatta tgtccacatc acttacaaag 181 ctattgccaa atctgtccaa ggaagcagag tttgaagtga gggctaggga caggaatctt 241 gggaaaaatt caacagtggc atagcagagc tctcaatatg agaaagctga cataatgtgg 301 acttttgctg tgaattacct ctttgcaaaa tatggggaga ggtttatcaa tgggcagaaa 361 ataagagaag gcggtgtgaa gtaggcttct gcagtcaatt ttcctcacag tattgtgcag 421 ggtcatcaag aaaatgctta gtctttctct ggaaccagtt tcagaacttt tccaattgca 481 atggtcttac cctcatctct taagggtgaa cgacccacct aagggaagtc tttaaag
ACCESSION No. AA487274
ORIGIN 1 tattactgca tatgttatat taaatttaca caatgatata taaaaacaca tactgtttat 61 attatatagt aatttaacat caacaggagt atcaacacaa gtactactca tgcacaaaac 121 atgcatatat tggtatacaa aaagcaattt tacacaatac tgtttaccaa aaattttttc 181 ttaaaaaaca gcccttccac ataggatcaa aggtccaatc tggactggat tgcactaata 241 tgttcaggtc aacgcttcgg tggcatagcg ctcagtgagc aattctggga ttggagtcat 301 gcccaagggc tacttcatta atagtga
ACCESSION No. AA488652
ORIGIN 1 tttttttttt tttgcaacgc aagggctctt tattgtcagc gagacgagca ggccaaacgg 61 gcactgaggc tccacggggc ccaggcctct ttccgtggaa gagaggcaag aggggtttca 121 ggattcagag gggtcctccg ctcacgcagc accatgcaaa tatagagcta aaaactttct 181 gaatgtctct ggcttgaaac caactgggcc aacaggttcc acaaccactc tctttttgat 241 cactgggaga caccaaaaat gctgatagag gagctggtct gagtccaccc aggccaaatt 301 cttgacaccc tcgttagagt ccaggtctgt ggtattcagt tgaaacacta ggaaatggaa 361 gacacgtcca tccgtgccca ggctctgcac caccacgggc tgctccaaga ccttggcatc 421 attcccatag aggagccggg cctgagcagg gcactgcaaa agcaaacagg atcatcttgg 481 cccgcagctg atctggttga aggcggtgtg gtcgtaaatt ggctttgtcc agtaagtaca 541 gggtatgggg ataggggtaa ggatag
ACCESSION No. AA694500
ORIGIN 1 tttgacagaa gaaacatttt taattgttct tgtcctgccc catcaccagg ggagtcccgg 61 cattgctcag gctcactgcg cttgctttcc cctgggatgt cgaggacact ttgacctcat 121 ctatgtcata gcccatgtgt ttctcagatg ccaccgccat aagatctagt gccccctggt 181 gccattggga taggcaggcc agagaggcat gggagctggg tgtgcaccag gccacagggc 241 tgtggggcat gcagccgatg gtgcagcttc aggtggatgt gctgggtgaa gcgactccgg 301 cagacactgc actggaaggg ccgggtccgg aggtgca
ACCESSION No. AA704270
ORIGIN 1 ctaaatcaag tagtgctact gaaatccagt gcctaatgga gcagatggtg gaggtcttag 61 actctggaac atttatagtg atgcttctga atgcaaaaca ccaagagtgg atttcacagg 121 ctgtgaatct gatttgattt tgatgggagt aaagcttcca ttttcactgt acttgaacca 181 caaaagaaaa aaagcatgtg tgactgacac aagctagtta agaaaaagga acatgttaaa 241 tattagtccc ataaagggaa gcagtttaaa caagtgatta tttgtttgta tcatttaaca 301 tgattatgtt tgtatacaat accaccgtttAA706226
ACCESSION No. AA709158
ORIGIN 1 tttttttcct tcaactccct ccaagttgtt tatttaataa taataaaaaa gaaatgcaca 61 cacataaacc tgaactcccc cccaccccac cctcccttac tcccagtaac tagctccaaa 121 atgaaaaaac ttcccttgtc ccacctgggg actaaattcc cacctccact gccataacac 181 tagagaaaca aaataaaaaa tatgcagcag ctcaccaccc accccacaac tgaacctcac 241 acaatcccct caaacaaaga agccaggact gggggttcac aggaatgaga ggagccctat 301 attctgaaaa gggatgagaa gagaggtgaa cacccccacc tcaaataagt gcttaacccc 361 cacacctgct ctttccttta ccaattgccc caagcctggg gaatcaggga aatttgaaac 421 agt
ACCESSION No. AA775616
ORIGIN 1 tttttacatt caagataaaa gatttattca caccacaaaa agataatcac aacaaaatat 61 acactaactt aaaaaacaaa agattatagt gacataaaat gttatattct ctttttaagt 121 gggtaaaagt attttgtttg cgtctacata aatttctatt catgagagaa taacaaatat 181 taaaatacag tgatagtttg catttcttct atagaatgaa catagacata accctgaagc 241 ttttagttta cagggagttt ccatgaagcc acaaactaaa ctaattatca aacacatcag 301 ttatttccag actcaaatag atacacattc aaccaataaa ctgagaaaga agcatttcat 361 gttctctttc attttgctat aaagcatttt ttcttttgac taaatgcaaa gtgagagatt 421 gtattttttc tccttttaat tgacctcaga agatgcacta tctaattcat gagaaatacg 481 aaatttcagg tgtttatctt cttccttact tttggggtct acaccagcat atcttcatgg 541 ctg ACCESSION No. AA777892
ORIGIN 1 cagcttgcat cataagtttt attcccgatg cgggacagat ctttccatcc ctcaaatgta 61 ttacatgtcg ccacggaagg gcttaggatg ctgctcccat ctccaggaaa gatgagaaaa 121 aggtacagac tgggagccag tccaggacca ttctgcagtt cctggctctc ttaccctccc 181 ttctcagcag aggaattatc tctcatccat tcagttaaaa agaaaaaaaa aaaaatcatt 241 aacaaaacaa aacacacctt aagtattggg caggggtgtt cttgtcctca gtaggacgtc 301 aagttctggg tcaccaatgg tgattttttt tgtttttgtt ttttgtcatt tttgtttgtt 361 attttttttt tttnnatttg ttagttatgg ntagcagttg tgtgtccacc tcatctgcag 421 gcagctgcac atagcggacg actgagcccc tgatgaagca gttcttgact gataacatgt 481 gagggtattt ctcagggtct gtgacactga tgtcggttag tttgatattg aggtactggt 541 ccacagagtg gagggttcca cagatgctca ggtcattctt gagttccacg actacatacc 601 ttgccacaag agacttgaaa aaggagtaga agagcat
ACCESSION No. AA873159
ORIGIN 1 tttctgtagg atttttattg gtggcacctg gggccacatg gagggagtcc tcagcacagg 61 cgctggggtg tgggaaattt cagaggcccc tcctgggatg tcacccttca ggtcctcatg 121 agtcaatctt gagtttctcc ttcactttct gaaatggctc tggaaaacca ctcccgcatc 181 ttggcagaaa gttcactctg tttgatgcgg ctgatgagtt cccgagcctt gtcctccagt 241 gtgtttccaa actccttcag cttatccaag gcactggaga cgtctggggt cccctgggct 301 ggggctgggc cttccaagac gatcgacaga accaccacca ggaccgggag cgacaggaag
ACCESSION No. AA969508
ORIGIN 1 tttttttttt ttttttcact tcttcaacaa gtatttattg aacgccaact atggaccagg 61 ccctgtgctc aatgctgggt acagagtgga gactgaacca ggcatggcac ctggcctcat 121 gagcttacac tcgagtggga ggcacagtca accaacaagt aaattacaca aatggatatg 181 cagtggcaaa ttctccatga agggaaagaa cagaggcctt gtgatagagg aactccacaa 241 gtaaagtagt cgaggaaggc ctcttggacg aggcaacgtt gaagccaagg cctgagggtc 301 tgcagaactc agccatgcac agggtagggg aagagcattc ttggcaaagg gaacagcata 361 tgcaaagtg
ACCESSION No. AI203139
ORIGIN 1 ttttttgagt ttggcatgtt aatttttatc agcgacttct ggggcctagc accattcccg 61 gaagaaggga gttgtcgggc agggtcctta atgggggttg caattcttgt cttggttggg 121 aaagagccta gctgggaaca ggggtcgttt gtgtagtaac tgtattaagc
ACCESSION No. AI299969 ORIGIN 1 gcggccgcgc cggctccagg gccatttagc ccccaggagg agaatcgagc aatctttttg 61 gaagtccaga agaagctact ccttccagca ggcctaatag gatggcatct aatatttttg 121 gaccaacaga agaacctcag aacataccca agaggacaaa tcccccaggg ggtaaaggaa 181 gtggtatctt tgacgaatca acccccgtgc agactcgaca gcacctgaac ccacctggag 241 ggaagaccag cgacattttt gggtctccgg tcactgccac ttcacgcttg gcacacccaa 301 acaaacccaa ggatcatgtt ttcttatgtg aaggagaaga accaaaatcg gatcttaaag 361 ctgcaaggag catcccggct ggagcagagc caggtgagaa aggcagcgcc agaaaagcag 421 gccccgccaa ggagcag
ACCESSION No. H17364
ORIGIN 1 tttttacttg aaattaaatt tggnctctaa agttggtgta gcagcagttg atcagnactg 61 aaaaacggtt tttagtctcg gaaaaagact gattttgctt ttttataaat attattagat 121 ttattaattt ttcgtgctca atgtgtaaat tgtattataa ttcattgtga tttatttcac 181 ttttaatttg ctggtgtttt aataaatggg ggtgttactg aatctttctt cccacttcca 241 tttcttttga ccacccctta accctcaact gtgacggtag tagtattatc atttatacca 301 aagttttgca tagtccctgt tgactttgta atgttaacgg agtcataaaa gcactaggca 361 agagaaagat agaaatttgc ttttaatctt tttgcctttt attttgcaca ttatgcaaaa 421 gggaaaacat taaaggacac tttttttaag ngagtgaaac atgggnaagg catccagtgc 481 tttatgcaca ttgtnagcta atcaggccat tat
ACCESSION No. HI 7627
ORIGIN 1 tttttttttg ggcagatgag aaacagaatt atcatcagag tcttgctaca aacagggaaa 61 aacacaaacc aagatgacac acggacatgg tagattaaac attcctcccc accttcagga 121 tacatttaca ttgnaataaa tactgcaatc tcagcagcgg caaacaagga ggaatntagg 181 aaatgcccac ctcctcccct ctgtcttatc tgtgtgctct cttccttggg tagcaccgat 241 ctccccaggg tgctgggtga gaaacaggac aggggngaag aggtccgtgc atgctcactt 301 gcccttttgc
ACCESSION No. H19822 ORIGIN 1 gaagtcatan tatgataaac attttattac actaaaaaag tcatctgtta actgactgaa 61 ctgcaggggg accacatgtg aggttacttc agaaaaatgg catcagataa catatataga 121 tttctggcat tataaaatgg ctagattctc ccctaccttc cctcattaaa tattaatcag 181 tggcttaggt cagttctagt gggaacactt aattgctgac ttcacataaa accaggntta 241 gcctaatgtg ccaatggtat gagtccattc ctgggccatn ttcccaacag ccagaccgct 301 gtggcttgga caccggaggc aacatctggg gggcctcagt tccactcctc tgtggtnagc 361 ttgctttccc aataactggc tntggagtca catcaacaat ggtggcattn catctggggn 421 ccacatgagc cctttggggg tgctgcatcc ctactng
ACCESSION No. H23551
ORIGIN 1 ttttttttta tgcacactaa ggnatatttt attgtggcat taattagatg aaagttagta 61 atatgncatt gaccaaaaca tttgattgac aagnaccata aaggttaact gagagttttc 121 tttaatataa ttgttgtaca gacaaggatt cctgctgtat agagtatata gaaggatgac 181 atactctagg aattaggaac aatatatatt caatacaata acaaaactat atagtacttt 241 aagaactctt tcacatatat gaacactctt acttaggaac ttcagctgtt taaagtaagc 301 aatatgcaaa cctataaagt acacaccaaa aaaatctaac ctacaaaaca cccaaagcaa 361 atgttagcat atctctatta tcaagaatat cttctcacca tcgtttcttt caaaaatatg 421 tgaaaaagtt ctttctttcc ttatgagtgg caatttttaa aggcccctct tctgaaatta 481 gntatgttcc aatccactat cactcttaag ggaaaatgga accnctctgg g
ACCESSION No. H62801
ORIGIN 1 aatgatatca gaacctttta aatgatctag tatctgtgat gttagcgccc ttgggattca 61 gaaagtggtg tgcatagtaa aagctttcat tgtaactcac cctgcctaga tatgcagaaa 121 gcaaattcag tgataagatc tttcctggga gaccaatcag cagcctcagg ctctgttggg 181 gtctatcaca atgatgttat ctaaatttag ggcaaggaac cctttcccca tcttttagag 241 ggcagtgagt gttctaatca cttcaagata ggtatctgat aaaagtcttg gggccaactt 301 tttcatactt aggnagggca caactaaaat ggatatactt aaaatggtat caaaggaggg 361 ttaggtgtac actctactag gtgtaaggtn tatttcatta caaaatggct ttgg
ACCESSION No. H85015
ORIGIN 1 cacccaggct acagtgcagt agagcaatca caactcactg cagcctcaac ctccctgggn 61 ncatgcaatc ctcccacctc agcctcgcaa gtagctcgga ccatggccac acgccaccac 121 acccggccaa ctttcgtact tcttgcagag agagggattt gccatgttgc ccaggccggt 181 cttgaatttc cgggctcgag tgatccactc acctcagcct cccaaagtac tgtgattaca 241 ggcatgagnc actntgccca gccaataaan tcttt
ACCESSION No. N21630
ORIGIN 1 gaacagacta aatttgtttt aacaatccca tttacaattc aaattccttt aaacaactta 61 atagcattta tacatttaaa aaaatgattc ttttaagcag cattgcaaat gcttgacccc 121 attagcataa accttcccaa gtgcttaact ctcataaaca taataaatta aacatatggt 181 gactttccaa gttctctgaa acatttcagt acttttgcag acttagtaac attttaaaat 241 acctttcaac tgaaactcat aagtctaaaa gtctgttaag cattttaaat tagaatctta 301 aggccagtgt cacatattgt aatatgccaa ttatgtttaa atacttcaaa cagcaaatac 361 tacagtttat ctcaatgaat ataataacca ttcctgctgg gcgcagtggc tcatgccttt 421 aatcccagtc attaaggagg ctgaggtggg aagattgctt gaaaccagga gattgcctca 481 ggcctgggca acatggtgag acctcctatc tcaaaaatcn aaataaaaat tagctgggca 541 ggtggctcat cctgtagccc agcntctcag gaggctgagg tgggaggata gcctcgccta 601 ggagacggag ctgcagtgag c
ACCESSION No. N36176
ORIGIN 1 aataaagaca agtgttcaga tttatttgga aattcacagt ttctaatggc actacagctc 61 cgtagttaca tattgaaaat tctcttccca caacacacag atcacataat ttctcactgt 121 atctctgctc tcatctggac ctcttttcaa ggggcttcta taaaatcagg ncctcttgnt 181 cngganagnn nantngngcn gacaggaaag aaatttaaat cttctaaaac acgctgttaa 241 cctaaagcag caacttaaac aaacaaaaaa ggcgttaaat aagtcacatt acaaacaata 301 cccaagaaag gtattaggca agtttaaaaa cagttatcac tactaaaagt gctcaataag 361 ttataactta aacatcacaa caataaatgg tcaattctct ccctttcaaa aagaaacatg 421 ttccactttc attcactact gtacaatcat acta
ACCESSION No. N72847
ORIGIN 1 attgttactc tagttttaat ggtttcacaa atacaaaagt tgctagataa gcagtaccaa 61 catatctaaa tctccaatga tgttcaatta aaattttatt tatagactca tacactcagc 121 aaaaccactc atttaataag tccaactgaa ataaattctt attaataaaa tacctatatt 181 gaaagtaata tattgtaaga actctacctt aaattgacca tggggatgaa ctacaatgtc 241 ataaaatatg agccaaaatg ttcactcaat aattttaatt acatcacaat taagcccaga 301 actatgcctt ttttttggtg taaggctgaa taaggaccga aactggatgg agagaaaatt 361 gctttctaaa gcctcattta ctggcaataa cttaccttat gcaataacca acatcacgng 421 actgg
ACCESSION No. N92519
ORIGIN 1 ttttttttaa ctcttaaaaa aaatcatttt attgatcctt taccatacaa aatttattca 61 aattacaccc atttgaagtg gtaagatcac agctagagaa caggtcaccc tgtaacaaat 121 ctatttacaa aatccatcat aaaagctttt ttttgttttt ttttacatta tattacatat 181 tttctttttt aaaagcatac aacacaaagc taaactgatt agtagtttgc ctactcccaa 241 ttttgggaga aatacttcct ttttacaaaa tcacgtnccc cgtaggaaaa gaaattccca 301 caccctgaca attggccaac cgacttactc tgcaagccat cttcttcaaa tccctccttc 361 tcatacacac gangttgtca tgcacacact gaatcntaat ttcttttccn ggaagcttaa 421 ncctttaaat accgggaatt attttcagat ctncacgtnc caacaaaaat ggaaacaagg 481 gccccaccaa gnccgggaaa acnaaaccca ataccctntt aaaaatttca aggc
ACCESSION No. R27767
ORIGIN 1 tttttancna tttgtaaata agtttaattt ttnagttttt caatgacatt cagtagagat 61 agttatattg gctatataac acaagtaaag tggtgtttgg aaagtggagg actaggtttt 121 ggcacggggc taggacgggg tgaccgccgc ctcaccacca cagactggag ggggcttttg 181 agagctgggc ttcgctcccg aggactcagc tcagaaactg ctgaggcccg tgatgcagaa 241 ccagtgccgt aggtgggcat ctggccatgg cttcgagctc tcaggatgct tttgtatctt 301 gagagggtgc ctccagagaa tgtctgctcc ttgggcctca tctncccggg ttatnccccg 361 gcag
ACCESSION No. R34578 ORIGIN 1 atttttgaag nngnttcgat gtcttactgt tatgaccata aaaccaataa agctactttg 61 aaaagttaaa gccaggngta attaaacaac tcatacttga ttgttaaagt cagtctctna 121 aaagtgtaat tttaaaaagg taataaaaaa ggtatancat tat ACCESSION No. R38360 ORIGIN 1 tttttttttt ttcaaaaatg tcaaacttta ttcaagtgtt atggtaagaa atttgaaatt 61 cttaggtaag ctantgaata aatccttggg caggtgcagg catacagatt ctggggtgca 121 gctgctgagt ttaaaagctt cctttggaga tgccccgnng gggnnacacc ccctntcccg 181 cctntcaaga ggaggccatc ctggggcagc acgttagggg caaatggccc agatgcccag 241 ctnagggaaa cctccatgcc tagaggagga ggtcgctctg ggagcaggag gaccttcttg 301 gaacccctgt tnacaggntc ctttttcttg ntttttccag nacctcctgc aggg
ACCESSION No. R43597 ORIGIN 1 tttttttttt ttttttcagg attcactgcc tggggtatcc cactatatat atctcaccta 61 tgatgtagtg gtgcttgaaa tactcatctc attagctcga ttttattatt ctaatctaag 121 gttttttata ttattcatac tatgatattt ttagggacaa tcagtaatat ttggggcaga 181 gtactgaggg acctcttgaa gtctgcaaca gcatgcattt tctttgtttt tgtggggagt 241 gcttccctgt aggctgtctt tgttctagga acactgnctc caaatttatt tccatgggga 301 tgtagggggc tagtaggccc atggtggaaa ggtcttctgt aaatctccnt gggggggtnt 361 gagttattgg gggttatttc taacagggan ttttcccaaa ggggg
ACCESSION No. R43684 ORIGIN 1 tttttttttt ttttcattca aaaatatata atttattgag tacttgctag acacaatgga 61 tacaatgatt atatagtccc aatcctccag gagaacaata gacagacacc tttataatat 121 gtatgtggag tgctctgaca gggaaaagca caaggtccat gggggtggga gtggcccagn 181 agctaaggaa ctcttccccc atgaagtggt tacttacttt ctaatcttta atttaggatt 241 ctctcatgga acatttgant ggtgaaattt tactacataa aggttctcaa ccctaggagg 301 tttatccctg cccccctggg aacatttggn caatgtctga acaacaagtt tattntcaca 361 actggggagg ggngaaggaa gttagcagag gccaaggatg nctggctaaa ccttaaattc 421 ctacat
ACCESSION No. W73732 ORIGIN 1 tatttcaaaa aaagtctttt aattgttcaa aatagcacaa aacgacatcg cactatggta 61 atattgagtc acaggggtta cnctacaata gtgaacggng tactcncctc agaaacaaat 121 cant
ACCESSION No. AA450205 ORIGIN 1 tttttgtttt ctttcattat ctttatttta aatttgatat tttagaatag gaaattatct 61 ttcacagcaa tgcctcctgg tctgataata cagtatctca tttctgaatg taaagattta 121 aaataaatca aaatgaacat taaggcgtac aaagctactt taagtctgct cttaagatca 181 gtttttgctc atattcaaaa tacatggaat gttggcacaa aactgaagct gctgtagaaa 241 gatcacagat gttctgtggg ttactcaaac ttccatttct ctaaaaacat acccttacat 301 ggtcttaatt ttatgaattt aagtgttgag aaatatctaa ataataagta acaattaaaa 361 taaaatgttt tatttgtaaa ttatgtacag aatacacttt acgttacgc
ACCESSION No. AI081269
ORIGIN 1 tttttttttt ttctaaaact acctttattg tggttggctc gacataagat gccgccatca 61 gcagaattat aaaactgtac aggaggcaca aaaataggct gtttaactta gataatgacc 121 ctcatgtctt caagctttaa aaatgcacat aaaagttgta caatctggca gtttataaaa 181 tataaagcta aaaagaggat tttgggttcc acaaagaaga ctgtatcaca caattaacac 241 gtactaatta aacaattaac catccacaca gaagacataa tg
ACCESSION No. R59314
ORIGIN 1 tttttttttt ttttcaaaaa ctttattctt ttctaataaa aatgatatat gttcattata 61 aaaagtttca aacacacatg agtctganga ntgtaaagat cacccaaata ccacagccca 121 gaaaaaaaaa tccttaacat ttggtganga tctctctatg aaacatacat tatcttaaaa 181 tattcaatgt tataaatgag ctcatattca acatatatcc tgtngtctac tttttgattc 241 aataatattt tgggaacata tatccatngc antaaacata tatctaaata tttttaaatg 301 acaactggca tgggnnttta tttaatccat cttttactga gggatgtttc agttgtttcc 361 aatgttttaa tatcataaac atcatggaaa tataccnttg gggctccatg tttgganggc 421 ttggggcaac ctt
ACCESSION No. AA702174
ORIGIN 1 catcttcagc attaagaagt gctgacacaa tatcattaac tgttttatag ttctctccag 61 ttgtcaggat tttactttga actgtttgtt tcaccaggtc tctattaaag cccatttcca 121 aggcagattt aaccacaggt gtattcatca tgacagcatc ttctgaagaa ctttctccag 181 gtccaaaatg aataattggt gggtcagcat tttcttctcc agtggtatct gaagttgaca 241 acagctgttc aagaagatga ggatatctac cttgaatctc atcaacaaac tcttggcctt 301 tcattcgtat caagaactca caccttggaa accacttggc atgttctacc catggatcat 361 ctccagattc ccaacacctc aagccaccat cacaacaaaa gcatttgaca tcatcattgc 421 gacccacata ataaaaacca gcacttgcaa gctgctcagg ctgaactgga acactagatg 481 gccagtacat aaatgttctc attcgagctg catgtgtctg catgctcaga tttgaaatgc 541 taaacctcag agtttctaga gaa
ACCESSION No. AI002566 ORIGIN 1 tttttttttt tttttttttt tttttttttt ttttcacaat tcttaagtct tgttaagaaa 61 gtaaaaaacg tttgggtata ttttgatcca tgggtggcat tttcaaatgt gcaaaaacaa 121 agtcttggaa gagattcctt gtcactagaa agttcgccct tccttttgct gtcagttgta 181 cgtaagagaa attcgtccac attaaggaat ccaaaaaggg taaactaaag ggatttaaaa 241 agagtacatt acaaagaata agaagccctg taacatctat ctgagaatac tagataaatc 301 tgtgagtaga tgtggcacct ggagctactc actacattac taaaaacaga aacaagaaat 361 ctataatggc aggatcacaa catttgcgcg caaatagcta ace ACCESSION No. AA676797
ORIGIN 1 aataccttct gttttaagtt tttcttttgt tttcatcttg gaaaaaagga aatttagaaa 61 taagacagga aaagaatggc ccagaaattc agcacaaaga gaggtgtaca cattgacgcc 121 atctgtgggt cacatacgaa cgcctctggg acagagctct aaaacgagtc acgtgtcgta 181 gggagtgggc ctgtggcaag gcagtcctcg cagtgtgcag ggacgcaggc ccccttacca 241 tggaagcccc acccagaagg aagtgggtgc cccatgcagg ccgaggtgga tgaggggaca 301 gtggtgtgct cacagctgtc agctccccac tgaagcccca aaccagcaga tgtgggcagg 361 ggctcaagtg gtgtctgact acccaggtca cacgtgcctt aagcgtgaaa gctgtcagct 421 cccggcacgg gctctggtgg ggctgggaac accaggacac acatgggctg aagcttccag 481 agacagtgag acacggaagg gacagagagg tgccctccac acagtgtg
ACCESSION No. AA453508
ORIGIN 1 tttggttatt cagtatttat tctgcaatgc aaaggtgaca aactaaaata taaaaaggct 61 gttatggctt aacatttttg ttgcagatta aatatgcagc attgaaaaat ggaaaggcgt 121 ggcttcatct ctgaccagca gagttaaaaa gaaaaatctc tccattttcc ttcatcatca 181 tgggatacac tgttcaggca atccaaatta ataaagactt gcactttcat atgaacacaa 241 gatcaagtgt accagttagg ttttcacatt cacagtatat aagaaaatac acatggaagg 301 aaaagtaaag ggttaact
ACCESSION No. W93980
ORIGIN 1 tgaatgaggc aacaaaagca gagatttatt gaaaatgaag gtacacttca cagggtggga 61 gtggcttgag caagtggttc aagagcctgg ttaccgaatt ttttgggggt taaatatcct 121 ctagaggttt cccattggtt acttgatgta cacccttgta aatgaagtag tgcccacaat 181 cagtctgatt ggttgaggga ggggacctat cagaggctga agcaagtttc aaagttacac 241 cctatgcaaa tctctgattg attgggaaaa ggctgaagtg aagttacaaa gttatactcc 301 tatgcaaatg aagacttggg cccatgacca gcctcattgg gttgtggaaa gggaccaatc 361 agaggtactt tcaatttttc catctaccat gcagaaaaag gttcgggggt ggggggttgc 421 caaagggaag ttagccnaac aaactcctga cctaccaaca gagggtccca gttgggtagg 481 ggggcctggg
ACCESSION No. AA045308 ORIGIN 1 ctattaatca acacttttta atgtagtaca tatatatctt acagttattt aagtcaaata 61 tgtaaaggtt tacaactgat ttacagatga agcaatcaca gattgcagta atatgtgtgt 121 gtgtatatat atatttatnc catatataca cacacgccaa tcaaggggaa aactgcatcc 181 tggcaatttt acagtctgaa gttttgttgg tatatctacc atttcacatc cttttcatct 241 tgcttttctg tacaaaagat atttttngcc ttcttcattc ctgatgagat ttttctgcga 301 taactttaca ttcgtacatt gccagttgtc gaccaatgtt tcccattgtt atgcctccag 361 caaaaaatat ACCESSION No. AA953396
ORIGIN 1 atctgtcagt aaattacatg tatcctggct gtttatttca aaaatgcttc agtatgtatt 61 tcctaaaata gggatattct cctttgtaat cacagcaggg tagatactgc tctttagttg 121 tcatgtctct tagccttctt taatgtggaa cacgtccaca ccctttcttt atcttctgtc 181 ttttaaacat cttttctgtt gtccaatttt taacaacaaa gatgttaaaa atcagaaaac 241 tcagaaaagc acatggtgta ttaaaattcc acctaggaat aactgccatt aaagttttgg 301 tgtctccctt tctgtctctt cagatgcaac ttactagtct agacaaagca ggtttctcag 361 tgaataaaac at
ACCESSION No. AA962236
ORIGIN 1 ctaatcctgc gaatatgggt agtgcttcgt tccatggacg ttacgccccg ggagtctctc 61 agtatcttgg tagtggctgg gtccggtggg cataccactg agatcctgag gctgcttggg 121 agcttgtcca atgcctactc acctagacat tatgtcattg ctgacactga tgaaatgagt 181 gccaataaaa taaattcttt tgaactagat cgagctgata gagaccctag taacatgtat 241 accaaatact acattcaccg aattccaaga agccgggagg ttcagcagtc ctggccctcc 301 accgttttca ccaccttgca ctccatgtgg ctctcctttc ccctaattca cagggtgaag 361 ccagatttgg tgttgtgtaa cggaccagga a
ACCESSION No. AA418726
ORIGIN 1 tttgagtttc aaaggattta tttgatttcc ccacatgatc acaaccatgg ttttacattg 61 atagagtctg ttgccactga caaacagaat gcagatgaaa acaaacgcac tcctttcctc 121 tcaaaggtac acagtggggg tgccaggctt cttgtgaggg aggtgtcctt gaagtctctg 181 aacagtctgg ggattcagga cctgattcta attgcttaaa acaactcgga ggcaaaagat 241 attttccaag aggagatgca tgctgtgtgc agtctcgatg tgactgcaca cagaa
ACCESSION No. R43713 ORIGIN 1 tttttttttg atgtgctaat tttatttttc taatacttac caaaataaat gccaccactt 61 aacatagaaa aaattgttcc catgtgacct aaaatcattc ctcagtcacc cctgaactgg 121 ctagtagcga gcatatgtgg agcggtggtg agggcaggat agcctggtta taggaaacct 181 cagantagga aagacctggg ttcaaatccc cactctgcca cttactagnc tgtgtgactt 241 tgggacaagt tgtgaaacct ctctgaggat ttatttcttc atgtaaaatg tcaccgataa 301 tggataactc agtgggtgta agantgatct attttaagga ttctagggca gagtcccngg 361 gcagggcagt taaggcactt aaataggatg gacagnctat tcattnaatt attaggcagt 421 tttttcctta atggagggtc cttgttggaa ggaccccttt tttcttaacc tec
ACCESSION No. AA664240 ORIGIN 1 tgtgataggg ttccactttt tctctcatac tggtgtgcag ttgctgattc atggctcact 61 gcatcttcag tctcccatgt taaaggaatc ctttcacctc agcctactga gtgtgcacca 121 ccaggtccag ctaattgttt ttttaacttt tttttttttt tttttttctt ggtagagaca 181 gggtcccctc tgttgcccag gatggtttgg aactcctggg ctcaagcaat cctcccactt 241 tggcttccca aagtgctgag attacaggca tgagcactat gcccaacctg agcaggatga 301 cttaaacctg atcaattcta ctccaaaaca gcaactatca ttaagtcagg ggtgtcaagg 361 aggactctgt gaaggcaaag actagactgg gatgtgtgcg agagtgggat aagaaggccc 421 atccctagca gactg
ACCESSION No. AA477404
ORIGIN 1 ggaaaacaaa aggaaaactt atttattctt agaggtggga atgtggggag tggggcagaa 61 caggtggtgg ccctgggaga gggtcccaag gggcagaggt tggggatgtc tcagtaaaga 121 ggggcaggtc atgaatagag cctccacccc cagcaggggt tccttgggcc cgcccaagca 181 ctgggctaaa acgtggaaac tgggcattga caaagtacag egg
ACCESSION No. AA826237
ORIGIN 1 aaagatgaga accagaatge ttatatttta ttagtatcca agactgggga gagggatggg 61 gtgggagaga tcaagaattg gggagcagat gggaggcgct acctcactca ggagacacga 121 gttcttatcc aagttcaagg tgaaagaagt gagggcagga agagaaatct ccctgctagc 181 aacagcgact cagggagaaa ctctgggccc atagctagct ggaggcaggg tgacattgct 241 cccaccaatg ggccatcttc ttagctacac ctttgtagct gtggtgccag gcagaagaac 301 cacctggaaa ctgagctaag gcaggttcct tcttccaaca gaagacacag ctgggcaggg 361 actgtgcaga ctcaacaggg ccaggccagc tagtggcang tcagtgttca tgtctctcac 421 cagtgcctgg agggtcccca gccaaggaaa gaactggtca gttcctgc
ACCESSION No. AA007421
ORIGIN 1 gtttgtagca gttccaaaaa gaaagcagaa ctcatttagc aattgtgata aaagaaggaa 61 aaatgcatat gttttaaaag tcattaacgc atcgtgaaag cgctcccaat caacctcatt 121 ccctaggatt ttcagctaac taacaatagt gtctttttaa tttgatgtca tgaaaatctg 181 gtcacagcaa acacaatgtt ttctaaagca gatctggcct ccgagggagg aaagctctcc 241 agggcctcca gtgccttgtt tccatggtaa cgacacaggt caatagctga agtcacacct 301 ttgccagctt tgattctttc tcgcaactgg gagtctgagg caagaggatc acttgagccc 361 aggagtggga ggctgcagta agctatgatt gtgacactgc actccagcct gagcgacaga 421 gcgagaccct atctcttagc atagtccaat cttccttttt cttgag
ACCESSION No. AA478952 ORIGIN 1 tttcccagcc ctcaggccac tttattgctc aagagtggtc agtctggggt atctgcatgc 61 ctgaactcca tgatgatgtc gcctgtgtcg gggtgaaact ccactgcata gctgacagtc 121 cgtgggccac ccagcagtgc tctgggatct ggggcagggc tgaagaagta gacggcctgc 181 ttgcagtggg ggttccagca gcagcccccc tcgggatctg caggctccag gaggccagtg 241 ctgagcgtgc actccggggt caggtggtac tccatccata gcaccgctgc gtggctctgc 301 acgggccttc tgagctccac ggtgccctcg gcacacaggg gctgcagggg ca ACCESSION No. AA885096
ORIGIN 1 gtctgtgact cttggttagg gcaaatttca aatccattat aatacataca ttgcagcaac 61 actgagtttc ttataatagg tactatccaa agctttcttt tttttacatg tatcacttaa 121 tcctcacaac cacctgagga ttaataccat ttacctgttt tacagataag gaaaacaatc 181 atttttcaat tatgactatg cccccaaaca ctggtttgga tggagccttc actggtatag 241 agaatgacct tcttccctta gactagactc tggctataat aaaggatggt ttaatcatcc 301 cctgaagcaa tgcataagat aatctgcaat gtatcttcac atactgtacc ttatttgata 361 ggcaagagac ccataaagga agctgagcat ggattatcag cttcatcaca aatctgaaga 421 aactgacatt tatgttatgt tgccttaccc aagttgggac atcagagcag caac
ACCESSION No. H29032 ORIGIN 1 tttttttttt tctataaatc tctaatgtta tttaggtttt ttaaggtttt ggaagtaaca 61 gagggataca tacagcaaga tccacttaca tagttttaaa acatgcaaaa caagattata 121 tatcgtccat atgtaattat atctgtggta aaatataaag atatgcattt tggggacata 181 gtcaccagat tattagtagc tcaaggaaag gcaggaggaa gagtgctctg ggtgggggga 241 ggttcacagg gtgcttggac tgtacctatg atttcttcaa ataaaaattt caagcaagta 301 taaaatatgg gatataggaa tgtaaaggat ttgggcaaag ctgggctggg tgggtatcca 361 atgttcctta tcaccatctc tgtacttctc tgantgcttt aaataggtca caatcnttgt 421 aag
ACCESSION No. Rl 0545 ORIGIN 1 tagaatgaat tgcagaggaa agttttatga atatggtgat gagttagtaa aagtggccat 61 tattgggctt attctctgct ttatagttgt gaaatganga gtaaaancaa ttngtttgac 121 tattttaaaa ttatattaga ccttaagctn ttttagcaag c
ACCESSION No. AA448641
ORIGIN 1 agccttagga atggttttta ttcacttgaa cactgtacaa atattacaat ttccttttgc 61 tgcaaaaagt ataaaaataa tctttatata ggaatccatt cgttactgta aatctttcta 121 aatctctgca aatggcccta aatgagggta aatgaaaaag ccgaaatgaa gagagggtta 181 tggggcagca ggaggtgggg ccaatcatca gggctggacc acccagactc ctccccagag 241 acctctgttc cttcttggta gccgccccca ccacctgcag gttctagggc taaaggccca 301 gcagaagtgg gcacgtgaga gggccaggag gagctggagg gtcagggggt gggggatagc 361 gaaggaagct agaagtggtg ctggcatgtg cccagttcca ccccacca
ACCESSION No. R38266 ORIGIN 1 tttttttttt atcttttaaa tgggatttat ttatgtttac ataaaaggta gcaaatgtta 61 cataagttgt ttccttaaga acatttattt tgtacaatca cattgttatc aagcaagact 121 tatggaaaat ttcctgggtc cacaacactg aactttgaaa ctactgtagc attctctttt 181 ccaagtttaa acatgacttt gtgcactgaa gaagtatggc ttcgcattgc acagtgggtc 241 acatgtgaca acctgacacc aagcgagaag ccttttgatg aaggaatgtt ttatcttttg 301 ttgaggttac caaaatgggg actttcatgt gtggtggatt atccaaaccc catanttttt 361 ttttncggtt ccatttctgg cttccaattn aaattaaccc ggtttaaact aggcnggttt 421 nggccaatgn ta
ACCESSION No. H17543
ORIGIN 1 tttttttttt tttaacctct tgctcatttt tattccagaa cctaggaaga actagtacac 61 tgaaggcatt tgatgtttgt tatgaaaagg aaacaacaaa aaaatcaagt tcaggctggg 121 catggtgcct catacccgta atcccaagca ctttgggagg ctgaggcagg agggatgctt 181 gagcccaggg agtttgagat cagcctaggc cacatattca gaccccattg ctaccaaaaa 241 atttttaaat taaaaaatgg ctaggcatgg tgggcataca actgtaattc aagctacttg 301 aggaggctga ggtggggagg atcacttgaa cccggggggt tgagggccac agcgagctgt 361 gattcacaac actacactcc accctggggc gacgaagcaa gatttcgttt tcaaaaaaca 421 atttttgttt caantcccat cttcaccnta aaaacctngc tac'attcccc aggggaaaac 481 caattttca
ACCESSION No. T81317
ORIGIN 1 taaagnnatg aggtcttgct ctgtcaccca ggctggagtg cagtggcaat tgtccctcct 61 cagtaagtgc aagccaccat accaggccct ttgaacatat tttaaatggc tgatttaaag 121 tctttgccta atactaaagt ctaacatttg ggcttcctca gggaacattt tctaatttac 181 tgctttctct cctatgtgtg gaccatactt aagtggtttt ttgcatgctt tgtaataaca 241 gtctcttgaa aactaaacat tttaaataag gtaatgtgac aactcgnaaa aatcaggatt 301 cttcccctac cagggnattt gttgttatta ctgtttactg ttggttactg gtttattgtt 361 gttnctntta ggtgactttc ctggaactaa ttatctaana tatta
ACCESSION No. AA453790
ORIGIN 1 aacaaatata tttagatata tttaaaagaa ttaaaaaaaa catttcacaa aacatttgtt 61 gccataggaa ttatttttag caataaatgc ccacatcaaa atttaaacat ttttcaaagt 121 atgattatct gtactaagta atgcaacaaa ttatgtaaac agagtcagat acatttccct 181 gtaggagtca cttccttccc gggattaaag ctgtcccaga catctttcca ggggaccaat 241 taagaaactg ctattttcag agcaacagaa ataaaagctt ttatttgttc atttgaatat 301 aaaacaggcg ttatcacaga tgtacaaagc gtactggtgg tttaacatac aagaaggttg 361 ctgtcctttg cacataaaaa ttttgtttga aactgtggct ggttgagtac atgagtt
ACCESSION No. R22340
ORIGIN 1 ttttttaaca taaaggtttt attgaataaa tacatgcact gtcacgtgaa attagttgaa 61 cagaaaggag gttctctact ttttaacccc catcccccac cgctgttctc tatttgcagt 121 ggggggtcca gctggaggtg gaataaatgc ggcaaccaca ganaaaacac acagctacac 181 acaggcctgc atttggctta tgtgcctgaa aaagaagggc cgacctcttg ataaagaatg 241 tctgtaaaag gaattcttac cgtgcagaat atattatcat gggcnantac agttacaagg 301 ctgcttctat tttatttatt ttttgagacg gagttcacct ctgttgccca gggtgggagt 361 gcagtggtgc gatcttgggc tcactggcaa cctccgcctc ctgggttcaa gcantt
ACCESSION No. AA987675
ORIGIN 1 gggtagatag ctagaagtga tagtgctagg tcatatggta aatatatctt caacatttta 61 agatactgcc aaactggttt ccaacgtgac tgcatgtccc atcaacaatg cgtgagtgtt 121 ttagtttttc cacgtcatta tttcacttcc cccaggtgtt actgtccttt tttattatag 181 cattctagtg ggtaagaagt ggtgtctcac tgtagttttg atttgcatgt ccctgctgac 241 tgatgatgct gaccatcttt tcatgtattt tattgtctat tcctacacct ttttgatgaa 301 atggttattc aaatattttg cctattttaa aaatggggta attatcattt tgttgcgtag 361 ttgtaagtgt atttcatatt ctggatatga gtcctgtatt aaatatatga tttgaatttt 421 taaaaaaaaa aaaaaaacct cgt
ACCESSION No. N51543
ORIGIN 1 acgattaatg ttttattatt catattttga caaagatagc atattatatt ccaggacatg 61 gtagttacca tgtggggaaa cctatcaaag catttttaat gactgcttag aataactgta 121 gaaagtactt tctcaatgat ttttgtatgc aagaaaaaaa atacctgaaa gtaaccaaaa 181 gtttcagact ggaaaatatg ccaggaagat tttcttctct cattctcagg tgaggttata 241 atccagtttt agcaaatgtt tgacaattta aaatactttt gaaaactgga gatttaaaaa 301 atgtaaacaa ttggtaggca cagcaaaatc gtagttttcc cttctgatat tatacatttt 361 ggcatctctc tacagttatg attaaccatt aaatnaaggg nagctaaaac gttccaaaaa 421 taggttttac caacattcan tttttaaaat tttccattca agctggtaat ccttttgggt 481 ttcc
ACCESSION No. N74527
ORIGIN 1 aaacgtggca cagtgtgtgt agtgtatgtg actactatca tttgtgtaag agaaagaaaa 61 gtttactatc agagactgta tctggaggga taaacagact ggcaagggtt gcctctggna 121 agaaaccggg gaatagagag cgggagtaga aagactgtat tagctgggtg tggcagcaca 181 cactgtaggc ccagctactc cagaggctga ggggaagact tgctcaagcc caggagttca 241 ggtccagcct gggcaacaca gcaagactaa aaaaaaacaa ctttcttttc caagaatacc 301 ctttttgtaa cttttgaatt ccgtattttt taatggtcta tggtctacaa acactcatgt 361 gcaaacacat tacacgcaga ataagggatc acctgcacga agctatgaac tatttcctca 421 tcccttctag ccccttccta gaggcgaacc ctccgccccc aaccccaggc actatctgtc 481 ctgcttgcac cca
ACCESSION No. AA121778
ORIGIN 1 tttctgtcaa gctgttcttt atttcangga gagggcaggg gcagagcttt acaggagtag 61 agattttgta tgctattgaa ggtaaattgg tatcagttta aattagattg ttttaagtgt 121 aggatgttaa ctataatccc catagcaacc acaaataaaa catctaacaa atatacacaa 181 aggggagtgg aaagagaatc agactagttc actacaaaaa aacagaaaag aaggccataa 241 agaggaaatg aggggccaaa aaagtatatg acatatagaa gaagtgttaa atggtagaag 301 aaagtccttc cttaattact ttaaatgcaa atggattaaa ttttccaatc caaaaggcag 361 aaattggcag aatggacaga naaaacaana catnaacatg atagtgatat gcctgtc
ACCESSION No. AA258031
ORIGIN 1 ggggccccgt gatctcaacg gtcctgccct cggtctccct cttcccccgc cccgccctgg 61 gccaggtgtt cgaatcccga ctccagaact ggcggcgtcc cagtcccgcg ggcgtggagc 121 gctggaggac ccgccctcgg gctcatggcg gccccggtcc gcatgggccg gaagcgcctg 181 ctgcctgcct gtcccaaccc gctcttcgtt cgctggctga ccgagtggcg ggacgaggcg 241 acccgcagca ggcaccgcac gcgcttcgta tttcagaagg cgctgcgttc cctccgacgg 301 tacccactgc cgctgcgcac gggaaggaag ctaagatcct acagcacttc ggagacgggc 361 tctgctggat gctggacgag cggctgcagc ggcaccgaac atcgggcggt gaccatgccc 421 cggact
ACCESSION No. AA702422
ORIGIN 1 aaatgtcttt aattgctgaa tgcctctttg gctaatattt ggaagatcat tatttagtcc 61 tacaacagac gcattgttcc actttcccat cattttgttt gcaaaccgct aaaagtctta 121 tttcctcatc tctttgacac attaccaaag tggaccctat gctgtaatca cacaggataa 181 tgttggaaag tatgaatatc taaattattt tttaaaggta ttattttttt ccttctgttt 241 tcaaatcatt tctgacagtt tctaaagaca tggtcacagc tgcctgaagc atgtcttctt 301 cactcatagc atcacctaga tcactcccaa gtgctcctga actggtggct ggcctttcac 361 atggatgtga actctgtcct gataggtccc cctgctgctg ctgctgctgc tgctgctgct 421 gctgctgctg ctgttgctgc ttttgctgct gtttttcaaa gtaggcttct cgtctcttcc 481 gaagctcttc tgaagtaaga tttgtacctg atgtctgtgt catatcttga gaaatgtttc 541 g
ACCESSION No. T64924
ORIGIN 1 tgagacggan ttgctctgtc gcttaggctg gagagagact ctgtctcaaa aataaaaata 61 aaaataaaat aggagtaatt cacgaggaaa agattacata ggctgctttc ctgcttttct 121 tatccacagg cagttctttg caatgactat ttaaaaacta aaacaacatc acaagtcatg 181 aagtttgtgc tacccctgaa cttgacaaat tgtctgattc aagtgggcaa agcacaatga 241 ttggatgcat ctgaacagaa cctcctctgg aatgggggcc tcactagagt gagctcttca 301 tgagccttgc caccaggggc aggggattat tctgttattt tggcctgttg tagccaagtc 361 tgcaccccta ggcacccaaa acaaactggg gngagttgg
ACCESSION No. R42984
ORIGIN 1 tttttttttt tttttggaaa acactgttta tttgaaaaca atgagacctc aaatatgaaa 61 tatagttaac aatgacattg acactgttgc tagcactttc ccctaaacca cccgtaagtc 121 ttggacgcat gtgcatgcag cacacacaca cacacacaaa aaccaaaaac aaagccaaaa 181 aaaaaaaant cccaaacaca acattccatg nttgttcatt gaactcctga tgccgggagn 241 acaggactgt taaaagattt tgtctcccac attatctctg ggagtggggc acaaagc
ACCESSION No. R59360
ORIGIN 1 ttttttttgg ttttattttc tcctgaagct gaaaatgttt cacccatata aatgtggcat 61 tttagactct agctataaac ctcatcgacc agtatgtttt cagagttgtt cacaacaaaa 121 tattattcgt ttctaaaatc agttttcact ttttggtgat agtattccag gctggactgc 181 ttgaatttta gatgcagaga tcattttata tatatctgtc aatgtaatac agaaaaatta 241 catgtgaatt gtttatgtgc cccctctacg tagggacaca gtatcaatca ctcaataagg 301 cactgtaaca tcaggtgggt gtttggggat aaataacctc ttcggggttt ctttcaatcc 361 cactaccata tggct
ACCESSION No. R63816
ORIGIN 1 aagtcannga tntttactta atttctttca ttgtatactt gtatctcatt ttctcttaac 61 actgaaaatc ctgacttcta aagaaatgta actacttgtt ttcttacaac atagtattct 121 agatacaata ggttcaaaat aacaccagta ttaccattaa caatgagact actaaatgca 181 ttttcacagt gcactaaaat ctcaggaatt cactggcaat ataattcatc catgtaataa 241 aaaaccactt ggtaactcca aaactattca aataaaangg taataacaaa tttaaaaatg 301 gcattttgng ggtttcttcg gaattttttc accctttata ttcccccaaa gggccttctc 361 ctattaattg nggaggggcc ttgggnattg g
ACCESSION No. T49061
ORIGIN 1 ggaccaaaga actttatatt tattttaaat atcaaagtaa cacaaagaac tagttcaata 61 tacagtacac ttcctactct tcacagagaa ctgaaatttt ctataaagac atttatactt 121 aggaaacatc agacaaccaa agtatgtata aaactcacaa gatattttac acacagttca 181 caataattaa ttctgatatt ttaggntttt tctgtcattg cttttaaagc atccttaatt 241 taaaaacaaa aattattatt tgaggactgg aaaacaggtg gcaaaggcat ttctactttt 301 aattatacac tggtaaatcc ccccttaatc caaaacattt tacttncaca t
ACCESSION No. AA016210
ORIGIN 1 cacagcaatt catctttgct tttattaata atttcaacgt atgttttgag cactttacaa 61 tgtaggaaat gctttcatag acattatttc ctatgattct cacaaaacct tcactgaaaa 121 aaaagacttc aaggtcactt gccctatgtt tataaaataa tccgctttaa ataagcagat 181 aggagtccaa aaattcttac aatcataaga aaaaaaaagt ctaaccagta cttaattatt 241 tcttgtcatg attactttgt tttaacgcca ctgtttcctt gcttccccca ttttcttcag 301 ataagtttac tccttttggc ttgtcctgca tccttttctg acagctgccc tgtgtacacc 361 tgccttaaac atctatcctt ctactctgga atagactaag ccaaaagcaa ttaagaaata 421 tttcattcta aagaaaacag aattttagtc caaaacccaa at
ACCESSION No. AA682585 ORIGIN 1 cctgtgggct atattttcct gtatgttttg tatttttttg ttggaaactg aacattccaa 61 gttttacact ggggaagctc tggaaactga attattttac tcctccagga ttgtttattt 121 ttaaaatttt gctggcttat gataaagggt atttcgagga aacagataaa gggatgtata 181 gggcgaggta tgggggaagg ggtgcagagc ttccatgccc tccgtaggtg caccactctc 241 caggaacctg caggtgttca gctatgtgga ggctccctga atgcggtcct cttgggtttt 301 tatggaagct tcataatgtc agcattcctt cccccaaggt atagggcaag actctctctg 361 gggaaggtct taggaccaca atcagaaaag tgggcagaca ttagagtcct gccttggggc 421 agatgaaagg agggcaggag aaggtcagag aaattgtttt tcttgag
ACCESSION No. AA705040
ORIGIN 1 gtagagtcgc ggtctcactg tgttgcccag actcgtctca aaaaactcct gggctcaagc 61 aatcctcctg cctcagcctc ccaaagtgct gggagtctag gggtgagcca tcatgcccag 121 ccaagcctga ttttaaatca ggtctctgcc actagcagct gagagctcct cactgataaa 181 tcctttgcag ctggaagtat tcaatggtat ccagtatatt cccaatggct cattcctctt 241 ggacagagaa actcaagtta aatgaactct tttggctgtt tttctccctc ccctttgttt 301 cctccctctc ccttgcctgt gtctctctgt ccactctctc aggcccttc
ACCESSION No. AA909959 ORIGIN 1 ttttaatggg caaaagaaca agttgcagtc aatggctgca gaggggtgtc tggggtccaa 61 tgtgggctgc actttgtggg tactgaggaa atgggaagat gctgcttcta ggtcagctgg 121 tgggttggag gttgggggct gtaattagca gcagccttag aactgggatg cctttcaatc 181 cctcctggcc ccttatctct gtggggcagt cacaggacat catctgtttt attcaaagtt 241 gggacttgca gcaggagacc ctgtcctgca tggagtaggg gtcctctgtt gacaaacttc 301 ttggtttcca gctcttcccc atctgcagca ggcctctgga ta
ACCESSION No. AI240881
ORIGIN 1 tcggttaaga tttttattat tccagagaaa aattagaatg tatcggtaaa agaaatagga 61 atgcatattt caactcactg tcacaaacag gtgttttatt atcccaaatg acagtgttgc 121 ctgagatgat gcatgtggca gacgaggaac caatgagtcg gtatccttta ggacaagaat 181 atttaatttg ggatccgaac tggatgtctt tgatcacatg tgccatgcca ttcacaggat 241 ctggaggatt acgacatgat ttacgtttgc acttgtcctt agcacttgtc cagactgagt 301 tttttaggca gatgatagaa aacggtcttc cggaataacc agggcggcat tcatagttca 361 gatatgtccc aatgggaaac tcagagtcat cagttaggtt ggtaggcctg gcaaatggaa 421 gcccattccg gacattgcat tga
ACCESSION No. AA133215
ORIGIN 1 caagaacatc ccttttaatc acaaaccact catccacaaa tgtggctatg gggtaagcag 61 tctaggctgg gaccctttcc agaggtaagt caaggtcacg tccctgcccc cttcctaggg 121 tggcggtggc tccagccagg ggggcttcca ggttaatacc agagcctcgg ctactctgga 181 ctcctgtgag ctcttcttgg ctggaagaag gggggcattg tgggcctgct ctgtcccaag 241 gctccagaag ctgcccctac ccaggcctgc ctgc
ACCESSION No. AA699408
ORIGIN 1 taacagtctt aatattcatg tatttattct cagaacatac aaacttatct tctcagagaa 61 tagaaaacag agatttcact cagtgacaaa gatggacaca gccagttcac cgtgtccccc 121 catctactta gaaaatcccc tgggggaggg gatgcctaga gcatacagca ccccttggtg 181 gccggctgtg cacaggtcta aagactctca acttccttta ccatccaaaa aggaaaacag 241 ctgtccagat gacagtaaga ttccactgtc tgtaatcctc atggtgccag gtctcctggg 301 gcatctaggg caatgatgct actgcagttt atgcagttac acagtcaagt ctgtgccaaa 361 ggaggtccca tccggcggcc aggtttctgt
ACCESSION No. AA910771
ORIGIN 1 ttttgttgta gaaatatatt tattaacata agcagttcac aatttactgt aagaaaaaaa 61 gcaagctaca aaacagtgat tccatgttta tattaaaata aacatacaca aattaaaaat 121 ttccttagat atccatttaa tctctgggat cataagcaat gtttaggtat tttttgctca 181 tttattgcct aggttttaca caatgagcat atatgttaat tgtgtaattt aaaattatgg 241 aattaagtgc aagagttcct aaccaccttt tacaaaactg ttatgagaaa atacattcta 301 gattcaaaca aaaactaagc aatatatccc ttattctaac agctctaaaa tctgttcttc 361 tcattatact cccac
ACCESSION No. AI362799 ORIGIN 1 tttttttttt tttttttgca agggctgcgc ggcattttat tttctgaacc ccccacagca 61 ggggcggcca gtcctgctgc aggcagagtt tcagtcttcg gagtttgacc ttctggccca 121 aggtcatcac agccacaggc ggaggctctg gggaaaggtc cagttcctgg gatgctggcc 181 cctaatgatg ggcccatctt tccagtgccg cccttccctc ccgcctggca caggagttct 241 ggagccacgg tcctgagtct acagaacagc ccggtcagcc tcgtcccgcg gtgcaagcga 301 ggcctggcct ccctccctgc ctgtccttgg cccggccaca tcactccctg cgtttcttct 361 tcttctccgg ctcctggaca ttggccgcct ttgctcgggc actggtcagg ggccgaggtg 421 tcctccttct ttggcgagcc cctttttggc cacgggccct
ACCESSION No. H51549 ORIGIN 1 atacaacatc tttatttggc attgganatc ctgacatttg tncattacag ttccttaaaa 61 aacaaaccaa aaaatcagaa caaattaatc aaaaataaag atccaatggc tctatttaca 121 tatngcaaag acagcccagg natcttccnt gcacacacac accccgcccc gatacagtta 181 aggggttaat aagctttggg gagcgcagga ggcaggttcc acagttcatc aatcccaagn 241 cacccccatg aggtaggggt gcctcacaca gccagacggn tatcaagagt atgattggta 301 gctttttcct c ACCESSION No. R06568
ORIGIN 1 ctgtcctgat tagaattaat tttcataaag agaacaagaa tcttgactgg ttcacccttc 61 aattccttgt gcccgcaaca gtgaccggca catggaaagc attcagggaa taaaagcaca 121 atggaaaatt aaaacatact cactgcatgc ctgccaccta taggaaccaa attaaatcac 181 tgccaatatg gcatgggggg aaaaccttcc catttttctg ggaataatgt ttacaaaggg 241 tgggaaaata aggtggcaca ttcacctggg gtggggcatt ttaatttaaa cgctngttga 301 ccccagtngg ttgttacntt tttcaggtgg aatta
ACCESSION No. AA001604
ORIGIN 1 cttatgaata atgttagaaa tggaacatga tgttttaaat gtatacataa accttccaat 61 taattatcag gtgatccagt agtagacctg tgacctctga aggctcctgc ttctcatccc 121 ttcccttctg ctgtgatttg ttgtcttccc tctgctcatt ccccttgtgt ctgtttcttc 181 catcctctcc ccatgctccc tctgttgtca tttcccctta ctctccactg cacccagcct 241 ctgttcataa tttttactgc aattccgatg attgaattat aaactggaag ggagcaggga 301 tattgatctt catgtagttg gacatgtact agactcacgg agaacaagga ctgggttgta 361 ggcacaatgc tgtgtgggtt ttgggtaaat ctaactcaca ctcaacttga ttttgttttc 421 c
ACCESSION No. AA132065
ORIGIN 1 gagacacagt acaacagtct ttaatgtata tataaatatg cctacataac agagtttgat 61 aagagaagtt ttggctatat acaactctgc atgtaatcaa actctagaac atcaaatgca 121 actccactgc atagctgttt tgacagagca acagttaagc ataaaatagc tttgcacctt 181 attattttgg agcaaaataa aaaataacca ccacaaaaaa aatctctaca ataatttaaa 241 ctaaaaatgt tgttgaggat agggtaaaca acaaaaaaga aaataatttg atccatatgt 301 gatatttggc tgaagattaa cagtgttaag tctaaccaac agcgagataa ttttaatttt 361 cccaagcatc ttnctaccgg tttattagcc atatttggat attaagggga agggcatttn 421 gccctttacc aaaaccn
ACCESSION No. AA490493
ORIGIN 1 tctttattga cttattgtaa ttttttggca tacaaattac ttaagtatat ttacaattct 61 tacataatgt acattttaga agataatgta ctttgctcca tttacaatga caaactactg 121 taaaactaca ttcatgaatt agatacaaat cctctacata ctaataaaaa gtaaatggac 181 tgttggttat acattcttta aaatatacct tttcacaggt agcaagaaat agtacatgta 241 ataagtcttt atgactggaa tga
ACCESSION No. AA633845
ORIGIN 1 gtttttaaaa gtcagggttt tttgttgttg cttgtgtgtt ttataattaa catagtttat 61 ttttaatact ggcatccaag aatcctggtt tactcaggtg cagaaagact ctctaactaa 121 gcagccaaaa aaatttttgg tatgcaagtt ttatcatttt ttaatttgca tatgacttga 181 acgtgtcttc aagtataggt ctacataata actttttaag aaaattataa agctcaatac 241 aataaatcta atacataaat gctgcttgta agtcaaatat ttaagagact ataaaaatgg 301 gtaattttgt gataaaattt agaatcattt gacaagagat caatgaattg
ACCESSION No. AI261561
ORIGIN 1 cactgttaaa aatacattta tcattaaaat atattacaca tggagacagg atgcatcata 61 tacagtttgg aagacttgct ggcccagaaa atcccacttg tttcaccgaa cactcatttt 121 ttcagggatt ttacatttta tttttagaga cggggtctcc ctctctcacc cgggctggcg 181 tacagtgatg tggtcatagg tcactgcagc ctcaaactcc tgtgctcaag tgagccaccc 241 acgtcagcct cccaagtaac tgggaccaca ggcacgcatc accacgccca gccaattttt 301 taaaaatgtt tttgtagaga gggggtctcc ccgtgt
ACCESSION No. H81024
ORIGIN 1 agcttcagcc tttattaaac aaaggaggag gtagaaaaca gataagggaa cagttaggga 61 tcccttcttt cccctataca tacacagaca tacaaacaca cgcacccgag tgaatgacag 121 ggaccatcag gcgacagatt gaagggcaga gggaggcagc accctccgag agttggcccg 181 gacccaaggg tgggctgaga cctgggccag gggcagccgt tccgaggggt tntgcctgag 241 cagtttggag atgaggtcct gggctcccgt ggggcacaga agcggggaac tttaggtcca 301 ccttggacga tggcgg
ACCESSION No. N75004
ORIGIN 1 tcaagtcata agataaagtt taatcatttg atcatgttaa aagacacaaa acacagccaa 61 tctaaccaaa tttcaggcat gcatttacat aaatatatta aattaagaaa agaaattgta 121 cacttaaacg tccttttcac ctagaaatca ttaaatccac agatcaacaa taaaaccaat 181 tctctgcatt taccacttca agatacaatt gttctatttt aaagataaca caaactncac 241 tagtctggtt aggaatttat ntgcattata catatattat
ACCESSION No. W96216 ORIGIN 1 tctcaggagg tagaagcttt attatgacat cttcaaaaga caatcaaatc aatagacatt 61 tgctgagcac ctgctgtgtg caagcccgtg tagacagtag ggtccagtgt cccacgcatg 121 gctctcgaat ccccggggag aaaaatcaca tcnggggtca gggagttttg cgtggctgag 181 aacaaagtgg gtttctgaac atcaaagtgc aattcgcttt acggggcaaa ctccgangcc 241 cagccccgcg tngggaagcc gcagcngggc gggcccgctt cctggggctn gcggccgggg 301 tttctctaag ccgcacgcnt tgcgtggtgt tgcggggcct ctcaagcaag cccggaagca 361 gcatccttga gctccggttg ttggagcgct gggacctctg gctgccgccc ccgcagcagc 421 agcaaccact actccgctgt c
ACCESSION No. AA045793 ORIGIN 1 caaggtatag ctaattttat tattatcaaa caaaactagt agatataact tccaggaaat 61 aagttacata aatataacag aataaattca ttttcttaag tttcaaatta aagatgatta 121 agaaatacag ctttatgtaa agtttctgct ttttctcaac cacgcctaaa gaggaaagaa 181 ctggcagcag gaacacttgc tcctaggaaa caaatacaac aaaattataa ttaaaaagat 241 cttcaagcta tcaaaatttg tgagagaagg atggtaagaa tgcagtagaa attaccanat 301 gacaaacaaa atcctatcag ttttcaggtt ggtcaaaaag taacttccat gaatatagcc 361 tgtggatccg gccat
ACCESSION No. AA284172
ORIGIN 1 gtgttaaagt tggatggatt tattttttta aaggcccagt acaaaaaaat ggttgaggaa 61 agtgactctt caacaaaata tacacctgta gaaaaaaatc cctaatatac tgatatttaa 121 ttgaacggaa agtactaaag agaacatact ttaatatcta ggcacaattg gtcaggtact 181 aattataatt tctgttctca tttaaaagtt taaaccaatt cttcaactgg actgatgtgt 241 gtgagtctaa tacagagaag gcacctctct catctctcac tctccttaag gaccttttga 301 gagaaactct ttgtaacact ttaagggaca cagacaatgc actatatcta agtatagata 361 tagttattta acatac
ACCESSION No. AA411324
ORIGIN 1 tttttttttt tcccaaacaa tacatatcag attttatcca ttttgttttc tacatgttct 61 ttgtgactca agtttgacat tagcatttgc accccaaatg agttccccta caaataaaat 121 ttgttcatgt tgacacaaag aacacaaagc aagtatagat ccctcaggaa gttgtcacaa 181 ctcttgataa gattaactcc accactatca tcactttttg ctttgtcccc tagtttgaag 241 cctgctggct tttataattc aatgagaatg actccacact cttctccaaa gcgcccatta 301 tttttagttt ttcggtgcgc gactcaacat aaagacctgt ggctcttatg agctgcctgt 361 ttttaaatgg tgcagtagtt tcagtttcca tttaataagt tcccagataa caaatggaga 421 atgggaagaa tcttctcaag gtcacagtga aggtaaaaat aaattatctc catcactgag 481 aggct
ACCESSION No. AA448261
ORIGIN 1 tttccagaaa aggatatttt ttttattcaa gtaactgcaa ataggaaacc agagagggag 61 ccccaggctg ggacaaatca tggctacccc tccccaacag aacaggggga ggaggtggcc 121 cctacaccct ttatggtcga ttcgggcccc cttgctcact ctgctgcagc atcctagggg 181 cagggccagc cttccctggg actggggtag tcggtcaccc agcctgccat gccccagccc 241 ctcttcccca caaagagtat cttgggggag gggatcgtgg gcagaacagg aggcaatgag 301 gatgaacatt tggcgctggt agcagcagca atgacggatt gtcgaagaat ggaacattga 361 aca
ACCESSION No. AA479952
ORIGIN 1 aacagtctgg ctgttgtttg aattaaactc ttaaacagga tgtttagtta gagggtaatt 61 gttgagtaat gatgcataca acagcatact tccctttctt gctgggggtg cagcttttca 121 gttttcttgt tttactttga cagtgcaagg ggaactgaaa ataatttcca ttgtattatt 181 tatcttagtt cagctgaggg ctttatgaga cagtggatgg ggaggcagta agacggtgat 241 gagataaaat gtgtgtgttg cactgactgt ctataaagtt atcctttctt catgaaaaag 301 tagcatttaa atctggatga gtttataaag gattacaaaa tgctgattta tagagtaaac 361 tttaaaatat taaagactaa agactaaaag aagagtaata atgaagtaat gtag
ACCESSION No. AA485752
ORIGIN 1 ttcggcagca actcctttcc tttatttctt ccccttgtaa agggaaattc aagttcagca 61 gcattccttt cctgccccaa gtcctcaacc agacaagagg ctgcaggcac caaatcttgg 121 gctggataat ggcaaaggcc tcagaagctc acctccagct ctgagcttca acagctgttt 181 gtaccagtga gtcagcatta aatccaccag aaaagaacag caccacccaa agactggggg 241 gcagctgggc ctgaagctgt agggtaaatc agaggcaggc ttctgagtga tgagagtcct 301 gagaca
ACCESSION No. AA504266
ORIGIN 1 tttttttttt tttatatata tatataattt tatttaaaat ttagatccct attcccacac 61 tctaataagc tgtataattt ttgtttagaa tttttctgca aacatactac aataagcttc 121 ttttatttgg agacaaaata cagtggcatt actggaagga atatcacaac attacatttt 181 tatcttaaag gacaagcaaa ctttcagggt tgataatggg ataagcatgt ttgagactgg 241 ttaccttctg gcagttcact gcatctggat atttctgaaa agtatagaga agctcttgga 301 ttttaaaaat atcttaaaat acttttagat gaaaaaattg taaaagttct gcttataagt 361 ttacttttct ccacaattac aatatttaaa acaaagtttt gttgattgac gttttaagca 421 tttaaattta gaatgctaaa aacaattcta tcctacactt tcttcagggt aggggaataa 481 atacatcctt aacattgttt tctggatgta aacagaaatc cagcagaggt catcattatt 541 tagtacaacc agtaaataaa tgtaagagaa t
ACCESSION No. AA630376
ORIGIN 1 agcttggcaa acctttttta ttttgtgata aaaatgcttt catataaatt tcatcttaac 61 tacctttaga atgaaacgga aaagtaaaaa caaagtgtgc attttcctta ctacgtttag 121 tcaggaatat gcggtcattt tattggttac tgggtttctc atacaaacag atataatatc 181 acttttaaga gaaatgtaca caaggaagta accatagtac cacttattag tgggggcctc 241 tgggtacata aatgtgtcct cccaaatagt catcatacat tcaatggtat t
ACCESSION No. AA634261
ORIGIN 1 atagtgaaaa tatactttat tttttaatac aatagctgcc agcaatatac tggtgctgat 61 gttccaaaga taaaagaaaa tacatgcatt ctataataag ctttcatttg cctgttcaag 121 aaattataaa gaaaatactc caattctgtt caacattacg gcttgaggag ttgaaatttt 181 tccatgataa aaatatactt tgtgtggccc aaaccttgac tatttataaa ggatggagtt 241 tttaaaagcc cacatgtatc aataatggat gctcccctct ctttgaatta aatgcctaaa 301 ttcaaattaa tgcaagaaat tggtgaatca ttaaatgatg aaatttgtat caaaatgttc 361 atgaaaaaat acatttctat ttcctctaca tttttacttt gtagttattt tctaaatggg 421 tttaagggca cagaaataaa tgctatctac atgcaactct ggagagattc aaaacacaac 481 agaagttaac atgcctaaat cctagagttg atccatttag tgtaagaata aatgtcagaa 541 ate
ACCESSION No. AA701167
ORIGIN 1 ggtagaggca aagtttcgct atgttgccca ggctggtgtc gaattccagg cctcaggtga 61 tcttcccacc ttggcctccc aaagtgctgg gattacaggc gtgaaccacc gtgccaaacc 121 tacattttta gatttattat ggtgttctga ttaacaataa agctaggtta ttagctgcct 181 gggaagagga ggaagtagat ttttacagtc acttttatag aaactgttaa attcacatga 241 gaaattccac cttacgagaa ttggctccct gacatgtctt tggactacct ctgtttctct 301 aagtttttgt ttttttctgg tgtctgaatt aagttggtga cagatttggg ggatatttga 361 gtagcacttt atctagagtt gc
ACCESSION No. AA703019
ORIGIN 1 ggcatttcag taaatttttt taatgacttt aatgattctt atttaagaaa aagcccttaa 61 ataaatgcta ccaaggcagt aatatttgac catatgaacc agaccaaata ccctttaatt 121 ttagtatatt aacctctgct gtaaatgctc ttttaacatt gccacatgta caaatttgtc 181 tagaacttca cgacacaaaa gtgtgcaaat atgagtctaa gattgtgctg aaatagggaa 241 aggctaacac tgatgtgcaa agtaaaaaag aaagataacc gcttctgcaa caggtaataa 301 aacaaggaaa aaacgagtta ggtcctgcat gtgtctccac ttcattgctt ccatgtttga 361 aaaagggagt ctgttctttt gctaggccat gaggctggaa tccacttggc atactgtgtt 421 gagaggtcta agttcagtgg tgctctcagc agcagccggg agg
ACCESSION No. AA706041 ORIGIN 1 cgctgagctg cttatttatt gaaaataaac gacggaaaag tctggccttg ctcctgtgca 61 agcttggagg cctgggtcgc cgctgtggac aagcgtctta gtgtcatgca gaccagaagg 121 cagctgctgt cccagggccg gggccacctc actgcctctg atggggactc ccagccccca 181 tggctccgct gtgccctggg caggggacgg gctgggggca ggggagggct ggagcccagg 241 aggcagcaca gcagccagaa agccgcacgc tgagcctgca cctatggttc cgggaggggc 301 ttgggccgtc acccaagtgt gatccctaag aacaggaggc ccagcaccct ggaaggaggc 361 gctggaaggc ggggcggtgg tggccccgtc a
ACCESSION No. AA773139 ORIGIN 1 ccatgaacac agtagtgaga tattcctttt ccactcctac actatcttct gcttaaaacc 61 ctctgagggg tcccatctct ctcagggtga tgtctagact tcttctgagg ctagaccagg 121 tggtgcggcc ccatgtgcca cgcacccaag ccccctgcct cagtgtcccc catatcccac 181 accacagggg ggtggctgcg ttctgtatgg taggtggtgc tgaccactgg gcctctgcac 241 acgctgctct cagttccctg gccaactctc cttcaggcct cage ACCESSION No. AA776813 ORIGIN 1 ttttgtagag ctgggatctc actatgttgc ccaaggtggt ctcaaactcc tggcctcaac 61 tgattctcag gcctcagctc cggaagtgct ggaatcacag gcaggagcac ggtaacccgg 121 gccccacagg ggtttggggt c
ACCESSION No. AA862465
ORIGIN 1 tttatgctag gcaaggaggg atgattattt attagcttct acagattaga caatggggtg 61 ggggtgggct caaggtgaga tgattttttg ggtccaagtc tactcaagac aggcatccca 121 gtcttcggtc tccaaatcca cctcctgtct gtccccccac actgctcctc aggccttgtg 181 gatccattga ctgtgatttc tgtggttcag ctcccacatc aggcaggaag ggcagctact 241 gggtctgaga tcccacattg cctccaaccc ttgcttccta gctggcctcc cagggcacca 301 cgaggggctg ggccaggctg ctgtgctgca cgtggcagga gtagggggct gtgtcctgcg 361 ggggcactgc accaccaccc aggactggta agtgccattt ccattgtgaa gaacatctcc 421 cgtactcagg ctcctgcacc tcgcggcccg agtccagtgc acatcaattt ccctgggtag 481 aagtcgtagg ccagcacttc agtttcttct tttctcctgg gggctggtgg ctggtgacac 541 cacagaggga ggatctgccg gtccaggata tttttgct
ACCESSION No. AA977711
ORIGIN 1 tttggcattg taattatgca gaagaaaatc tttattctta gggatcatgc tgggaactga 61 gggatgaagt atatgcatat tccaaatggt tcaggaaaaa tcctgtctat aaagcataca 121 tgataaaatg tcaacaataa gacaaactag aggaaggata tacaggtgct tactgtcaaa 181 tttcaaattt tctgtaggtt tgagagattc aagatgaaaa cttgggggaa aattatatat 241 tctgataata aaacagatgg gaaacaaaga gggcccataa gacagtcact gattaagatg 301 ctttctacat ggatgggcct catccttttg tccaaaggga ctacctggca tctgttccat 361 gttagtgaca gtgactcacc ccaggttgct gcacagatat gagaggcttt agatcatagc 421 acagtc
ACCESSION No. AI288845
ORIGIN 1 tttttagatg ttttaaaata catttatttc atgtcgtttg tccccagggt ttggagtttg 61 atgttctgga ccaagcgtag gctctgagca aatgctacca gggctggaga atcagttctg 121 ccacttccta gttaagtgat cttagacaaa tttccgcgcc ttagttttct tctcagagaa 181 atgagactag tcctatccac actatggaca agtggtagga ggcgaaggag ctcacgtttg 241 taaagagcct tgcacggtgc ctgagacaaa ttcagtgctt agcaaatgtt agctcacctc 301 tcccttttct tcctgtatcc gattttgtat acaaatgtgt agaaaattta catgaaataa 361 tgcagaaag
ACCESSION No. HI 5267 ORIGIN 1 tttttttttt ttacatgaag tagaactttt atttggaaag ttgaatttca tgtataatga 61 aaatattttc aaaccataca tagtcataag cataatacaa acaccaccta caatacaaac 121 acgttttata aagttctact atgaatatta atccaagcca aaagaaaaag gtaatcacgt 181 gaacctgttc tacatacctt tcatctcttt tgatgacgta atcgaacaat ttaaggtaca 241 aaacaangaa agctttgggc tgaaccctac ttatttcact ataggaacac taggatatat 301 actaccacag gtaaccaaac ccaatcccat tataattaat ttaacattgt tacatggatc 361 ctatcttaat ggnatgtaaa cat
ACCESSION No. HI 8956
ORIGIN 1 tttttttttt ttttttttac atgtaagaag tggttttatt ccaggngtgt gtttcataaa 61 gacgaggtcc tcaaggacag ctagtggcac atgctttggt caagaagagg aaaagcaaaa 121 acagaacagg gctgcgttgc cacaaaggac cggctgataa gtgcagagcc tgatctgacc 181 acagcaaagg acagagagac cctcttgaag gccctctggt cagcagtcct cttacattca 241 acaggcgcac ccggctcccc agccccaaag gtccatgccc gagtntggcc cgggcttcta 301 gtccatcctc tgggggagag gcctttgccc tggggcccag ttttgtccta aggtttnggc 361 aggganggtt tcccagatgg aacaggggga tttttagggn tgcacttggg tttncggaag 421 gaaacntcac gacagaggga caggcaaagc ttggccntgg g
ACCESSION No. H73608
ORIGIN 1 aaattttatt aattttattc aggaaagaca ttgactgtta agtttttttt tngggggggg 61 ggtgatgtct tgctattttt taaaaattat atccagacta tgaatttaat atttactacg 121 gctaatcaac tgctcatgtc agtaatcaaa gncagaaatg agccttatac gtacatctac 181 attaaacaca cacacacccc tttaaggggt gctcagtgta gnttctaatg tcagtctgtc 241 cattcaaccc agggcccaag gttgcatcac atcaccaagt tggaatcatg aagacagccc 301 agatttgact gacatgggca cagcagggct ccctcaccac agcccntggc accagttaac 361 tatttctngc tcgngccgaa ttnttgggcc tcgagggcaa ntttccctat tagtnag
ACCESSION No. H99544
ORIGIN 1 gcgnccgccg cccccgcctg ggccgcgctc cccctctccc gctccctccc tccctgctcc 61 aactcctcct ccttctccat gcctctgttc ctcctgctct tacttgtcct gctcctgctg 121 ctcgaggacg ctggagccca gcaaggtgat ggatgtggac acactgtact aggccctgag 181 agtggaaccc ttacatccat aaactaccca cagacctatc ccaacagcac tgtttgtgaa 241 tgggagatcc gtgtaaagat tggganagag gagttcgcat caaatttggt gactttgaca 301 tttgaagatt ctgattcttg tcactntaat tacttgnaga atttataatg ggaattggga 361 gtcagcggaa cttgaaaata aggcaaaata cttggtaggt ctgggggtnt ggcaaaat
ACCESSION No. N45282
ORIGIN 1 ctaggcataa cataaattgt tataattgat cagaatatct tgaatatatt tttacagata 61 actagtggtt tctactagca gattaaaacc aagagaaaat taaaagtaag ttcacattta 121 aaaaaaatta taagcaataa atacagcact acagccacca ctaattctat atacattgga 181 ttacatttaa acaaacactg cattccagaa tgaatatttt atgaataaat gcattggaaa 241 ttaactttag gaaataaaat gacaaattac gaatttagaa aattaaaata tgactttcac 301 aangtaatca cagtaaaatg cagatctaca ttttaaaagc tagaaatttc cccaaattta 361 tttttttgga cagccaagaa gnttgcctta aaaa
ACCESSION No. N48270
ORIGIN 1 tttgcacctt gaaacaattt aataatgtat tacattatag tagcatcaca gcagcagtca 61 ataatgccac tttagacaaa aatcagtatt tccattatgc attctgtgta taagaattca 121 taaatcggta aaagtcattc taagaaaact tggcaaatac agctttggac tggaattggc 181 atttctttgt ctacttttcc ttcccctaga ttctttgttt taaactacag tattcatatt 241 ttaaaatgtt ttaaattatt ttaagacgtt aatatagcag ttacattttt gaatagttat 301 ttgaaagtga ctgtaagata aagttttaga gaatctatta atgggatagg gttgatttac 361 attttcacat ttttcctaaa aatcagcttt ggttttagaa ctgattggtt tttcattttg 421 ggaa
ACCESSION No. N59451
ORIGIN 1 aaaatcactt caagaagcat ttattgagaa tctaagacaa acaccctata ttcaaagagc 61 ttacagttta tggaaaggcc agccaatcaa tatgcaatat ttaagtcttt tcattgaggc 121 aagtgttgat tttgagagca gagagatgat gatcgttttc gagctgagtt accaaggttg 181 gagcttacta aactcacaag ggcagtttca ggaaaggaaa ataccatctg caaaggtata 241 tggctcattc aggggctctc tgaattgtgg ctggagcaaa aggtttgaaa tcttttttct 301 tcccaagaag atgaaagagc tcctggagga cagaaactgc tttttattcc ctttgtatct 361 ctcacagcac ctggatactt aagactaaac tattctttca ctcatatggc ccattatcaa 421 tgtcagcatt gtaaggccct gatggg
ACCESSION No. N95226
ORIGIN 1 tccctttctc cctgtttccc tcccttcttt ccttccttcc ttccttcctt ccttcttaga 61 attcactgaa gtatttccta ggtagccttt tacttactac tttaatcaaa gcttatcttt 121 gtgcccaatg tgtaaaaagt gaaaatgtct cttcgaaatt ctatattaca atatagacag 181 agaagttggg ccttgagggc ttgagtttca cttaaatact atacacatgt ggtatcacac 241 aaggtggagg gggagggaac aaacagaaac ataacaatta tttttattct gtctttacaa 301 aagaaagcct cttctctatg aaaaagtctt tttggcatct gctcccggaa acctgccccg 361 agaacacgtt ccccattgct ttgcaagcat ctctttttaa aagcacanca ctgtccccgg 421 gagtcacgta ggttggatta anctgtctta gttgaccaac gaagaancac tggatgagtt 481 ttccagggat gantggttgt ctggggtgga acatatagtc ctgtctacaa caaatgtaac 541 tcctgatatg ggacnatgaa cncagtgtgt gacccaggag tgnttgatct gtnaacantc 601 gcatgnaatt
ACCESSION No. R37028
ORIGIN 1 ttttttttct ctaagtgata atgatatccc agctagaata attgtgctct ccagaagcaa 61 ttaatctgat ttgcaagcac tgattttttc ttttgcaaaa actaataata ttagcctgac 121 caattatgaa ataattccta aatttacaaa ttcccaaatt tgtgctttca tggcttcctt 181 ctattttaaa tctatattat tttaaacaaa ttttccttaa gnaaaaatga cttaacttca 241 taaaaatcta cccatttatg gtaaataaaa cattaaccaa aaaccaaaat taaagggntt 301 actataaatg gnaacattta cattgctggn tattaaatcc ctttccttgg catt
ACCESSION No. R66605
ORIGIN 1 ttttttatcc ttcttaannn ttattacatg ttttattatc ctgtccccag aggtgggttt 61 atccagaaac caagaaaaaa aatcaatcag aataaactca aaaaaaaaag gtagggggag 121 caaaaccatc aaccaccagg gcagccaggc catcagccca cctccacctc tggagggtcc 181 ccagagaccc acgcccgacg cagacccgga ggaggcatca gcaagggggc ccgggcagag 241 aatcggctat gtctttcatt atgaggaggc agggagagac gggcagagat atgtttgcta 301 gggtgantat atattttata ttaattaaat ccgtaagttt aattaaagta aataggtatt 361 tctctggaag tttttttaat ttctttcntt ttttatagtt tttttggttt tttgtggntt 421 tttttttttt ttttggggtt t
ACCESSION No. T51004
ORIGIN 1 gcagctgttg tcttccaact cagcggcagg tttgctttcc ccacggacac tctggacctt 61 gtagctcctc aagcttccct gtctattgag cagataggaa gccgtgtcaa atatgtggca 121 ccttgaggaa atgcctagtg aatgacagta tgtcctattg tgctctaact ttatttcagc 181 cttatttctt ttctgaatat tatttttcat ttatcttcat ttccttacct attttctttt 241 cttctaaagt atgtatcttt gttagctcca tcatcctttt tgggaatgag gcaagtataa 301 aaataaggta aataaataag gaccccatcc ctaggtattt ttaaggaaac cacccttttg 361 cggggcacac ttggctacct tggggtcttt agggctctgg ggggctttng ggtgtncctc 421 tngggcaggt cctggctggc attggcct
ACCESSION No. T51316
ORIGIN 1 ttcatccgct gcatgtggaa aactggcccg atacctcgca ctacgagttt ctcgccgaca 61 ctatgtggag cgattttgcc tacggtcgca atgccgtata cccggaagcn atcacggcaa 121 cgcanctngt cgcgttatcc cattgaacat tatgagaatc gcgatgtttc ggtcgatggt 181 gcggaaaagc gcggcntgct tcttacttgc cgcattgtgc cgccgattga ccgggaaaag 241 cgattcatgt tgatgttgcg tacatcttgg ggccttgcgt tgagggcgca ccgttcagg
ACCESSION No. T72535
ORIGIN 1 atgacctctg caaagagaag gtcagctata ngtagggaga aaaggaagaa ggcaagaaaa 61 ggagactcga gatgagttta catccaagag aagcacagat gtttgtaatc tacctagaat 121 aatgtgaagt acctgtccag catgtatgct cagatcctcc attcattagc acaagctgaa 181 aacatgaact gcaaattcta caccagcatc ctttgcttcc tccatggcag tgggaggtag 241 caaggggagt ccaacacttc tccatgacgt angaaaggca gggaaaaata ctgnt ACCESSION No. W72103 ORIGIN 1 gtttgtgaaa aggaacaaaa tgaanttgaa ttggacatgt gctttaagca ngccaacaga 61 caacacacca ctagagacac acatcaaaag caatcacagt gctatgatca aatgatgggt 121 acatgtgaac acatc
All patents, patent applications, provisional applications, and publications refened to or cited herein are incorporated by reference in their entirety, including all figures and tables, to the extent they are not inconsistent with the explicit teachings of this specification.
All nucleotide and/or amino acid sequences associated with accession numbers refened to or cited herein are incorporated by reference in their entirety.
It should be understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application.

Claims

Claims I claim: 1. A system for predicting clinical outcome for a patient diagnosed with cancer comprising a computing means; a user interface means that enables data entry, wherein said interface is coupled to said computing means, wherein said computing means is configured to perform microanay analysis and binary classification to generate a set of genes used in predicting clinical outcome.
2. The system of claim 1, wherein the microanay analysis and is significance analysis of microanays and the binary classification is support vector machine.
3. The system of claim 1, wherein the computer is further configured to perform leave-one-out cross validation.
4. The system of claim 1, wherein the computer comprises a database for storing the set of genes, said computer further configured to analyzing biological information from a patient against the set of genes to generate a predicted clinical outcome .
5. The system of claim 1, wherein the patient is diagnosed with colon cancer.
6. A classifier for predicting clinical outcome in a patient diagnosed with cancer comprising a computing means and a user interface, wherein said computing means comprises a storing means and a means for outputting processed data, wlierein said storing means comprises a set of genes classified by outcome, wherein said interface is coupled to said computing means.
7. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; -AA976642; AA133215; AA457267; N50073; R38360; AA450205; AA148578; R38640; .AA487274; N53172; AA045308; AA045075; N63366; R22340; AA437223; AA481250; .AA045793; H87795; AA121806; AA284172; R68106; AA479270; AA432030; R10545; AA453508 AI149393; AA883496; AA167823; AI203139; H19822; W73732; AA777892 AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584 H62801 H17638; R43684; N21630; T81317; R45595; T90789; and AA283062.
8. The classifier of claim 6, wherein said set of genes consists of the following genes: AA045075; AA425320; AA437223; AA479270; AA486233; AA487274 AA488652; AA694500; AA704270; AA706226; AA709158; AA775616; AA777892 AA873159; AA969508; AI203139; AI299969; H17364; H17627; H19822; H23551 H62801; H85015; N21630; N36176; N72847; N92519; R27767; R34578; R38360 R43597; R43684; W73732; AA450205; AI081269; R59314; AA702174; AJ002566 AA676797; AA453508; W93980; AA045308; AA953396; AA962236; AA418726 R43713; AA664240; AA477404; AA826237; AA007421; AA478952; W93980: AA045308; AA953396; AA962236; AA418726; R43713; AA664240; AA477404 AA826237; AA007421; AA478952; AA885096; H29032; R10545; AA448641; R38266 H17543; T81317; AA453790; R22340; AA987675; N51543; N74527; AA121778 AA258031; AA702422; T64924; R42984; R59360; R63816; T49061; AA016210 AA682585; AA705040; AA909959; AI240881; AA133215; AA699408; AA910771 AI362799; H51549; R06568; AA001604; AA132065; AA490493; AA633845 AI261561; H81024; N75004; W96216; AA045793; AA284172; AA411324; AA448261 AA479952; AA485752; AA504266; AA630376; AA634261; AA701167; AA703019 AA706041; AA773139; AA776813; AA862465; AA977711; AI288845; H15267 HI 8956; H73608; H99544; N45282; N48270; N59451; N95226; R37028; R66605 T51004; T51316; T72535; and W72103.
9. The classifier of claim 6, wherein said set of genes consists of the following genes: AA007421; AA045075; AA045308; AA418726; AA425320; AA450205 AA453508; AA453790; AA477404; AA478952; AA479270; AA486233; AA487274 AA664240; AA676797; AA702174; AA706226; AA709158; AA775616; AA826237 AA873159; AA969508; AI002566; AI299969; H17364; H19822; H23551; N36176 N72847; R10545; R27767; R34578; R59314; W73732; AA448641; R59360; AA121778 H51549; H81024; AA490493; R42984; AA258031; AA133215; R63816; N95226 N74527; AA702422; AI261561; AA132065; AI362799; AA045793; AA284172 N51632; AA482110; AA485450; AA699408; N70777; AA993736; AI139498; N59721 AA431885; AA911661; AA775865; R30941; AA703019; AA777192; W72103 H15267; H17638; R60193; R92717; AA706041; AA411324; AA504266; AA932696 AA973494; N45100; AA418410; AA725641; AA954482; H45391; T86932; AA279188 AA485752; AA680132; AA977711; W93370; AA036727; AA071075; AA464612 AA481250; AA598659; AA682905; R17811; W93592; AA017301; AAO46406 AA256304; AA416759; AA448261; AA452130; AA457528; AA460542; AA479952; AA481507; AA504342; AA598970; AA630376; AA634261; AA677254; AA757564 AA775888; AA844864; AA862465; AA989139; AI253017; AI394426; H99544 N41021; N45282; N46845; N48270; N59846; R16760; R44546; R92994; T51004 T56281; T70321; and W45025.
10. The classifier of claim 6, wherein said set of genes consists of the following genes: N36176; AA149253; AA425320; AA775616; N72847; AA706226; AA883496.
11. A method for predicting a clinical outcome for a patient diagnosed with cancer, said method comprising the steps of: a) classifying at least one gene that conelates with a clinical outcome; b) establishing a set of reference gene expression levels based on the at least one gene; c) receiving biological information from the patient; d) extrapolating from the biological information the level of intracellular expression of said at least one gene; e) comparing said level of intracellular expression against said set of reference gene expression levels; and f) predicting a clinical outcome based on the deviation of the intracellular level expression from that of the reference gene expression levels.
12. The method of claim 1, wherein identification of said at least one gene is performed with any on or combination of th-e following: significance analysis of microarrays, cluster analysis, support vector tecfcmology, neural network, and leave-one- out cross validation.
13. The method of claim 1, further comp_rising the step of estimating the accuracy of the predicted clinical outcome.
14. The method of claim 1, wherein the biological information is a clinical specimen of bodily fluid or tissue.
15. The method of claim 14, wherein the biological information is a clinical tumor sample.
16. The method of claim 1, wherein the outcome being evaluated is for a patient diagnosed with colon cancer.
17. The method of claim 1, wherein- the predicted clinical outcome is the probability of patient survival at a predetermined- date.
18. The method of claim 1, further comprising the step of generating a treatment regimen based on the predicted clinical outcome.
19. The method of claim 1, wherein t-αe gene that is identified is one with the accession number selected from the group consisting of: N36176; AA149253 AA425320; AA775616; N72847; AA706226 ; AA976642; AA133215; AA457267 N50073; R38360; AA450205; AA148578; R3 8640; AA487274; N53172; AA045308 AA045075; N63366; R22340; AA437223; AA 81250; AA045793; H87795; AA121806 AA284172; R68106; AA479270; AA432030; R10545; AA453508; AI149393; AA883496; AA167823; AI203139; H19822; W73732; AA777892; AA885478; AA932696; AA481507; H18953; AA709158; AA488652; N39584; H62801; H17638; R43684; N21630; T81317; R45595; T90789; and AA283062.
PCT/US2005/006201 2004-02-25 2005-02-25 Methods for predicting cancer outcome and gene signatures for use therein Ceased WO2005083128A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US54787104P 2004-02-25 2004-02-25
US60/547,871 2004-02-25

Publications (2)

Publication Number Publication Date
WO2005083128A2 true WO2005083128A2 (en) 2005-09-09
WO2005083128A3 WO2005083128A3 (en) 2006-11-23

Family

ID=34910953

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2005/006201 Ceased WO2005083128A2 (en) 2004-02-25 2005-02-25 Methods for predicting cancer outcome and gene signatures for use therein

Country Status (1)

Country Link
WO (1) WO2005083128A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006005461A3 (en) * 2004-07-15 2006-05-04 Bayer Healthcare Ag Diagnostics and therapeutics for diseases associated with g-protein coupled receptor 27 (gpr27)
WO2016119191A1 (en) * 2015-01-30 2016-08-04 Bgi Shenzhen Biomarkers for colorectal cancer related diseases
US9953129B2 (en) 2011-09-23 2018-04-24 Agency For Science, Technology And Research Patient stratification and determining clinical outcome for cancer patients
US10196691B2 (en) 2011-01-25 2019-02-05 Almac Diagnostics Limited Colon cancer gene expression signatures and methods of use
US10214777B2 (en) 2010-09-15 2019-02-26 Almac Diagnostics Limited Molecular diagnostic test for cancer
US10260104B2 (en) 2010-07-27 2019-04-16 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FUREY T S ET AL: "Support vector machine classification and validation of cancer tissue samples using microarray expression data" BIOINFORMATICS, OXFORD UNIVERSITY PRESS, OXFORD,, GB, vol. 16, no. 10, October 2000 (2000-10), pages 906-914, XP002318283 ISSN: 1367-4803 cited in the application *
POMEROY SCOTT L ET AL: "Prediction of central nervous system embryonal tumour outcome based on gene expression." NATURE, vol. 415, no. 6870, 24 January 2002 (2002-01-24), pages 436-442, XP002356657 ISSN: 0028-0836 cited in the application -& POMEROY SCOTT L ET AL: "Supplementary Information for Nature's paper: Prediction of central nervous system embryonal tumour outcome based on gene expression." INTERNET ARTICLE, [Online] 24 January 2002 (2002-01-24), pages 1-102, XP002359041 Retrieved from the Internet: URL:http://www.nature.com/nature/journal/v 415/n6870/suppinfo/415436a.html> *
VAN 'T VEER LAURA J ET AL: "The microarray way to tailored cancer treatment." NATURE MEDICINE. JAN 2002, vol. 8, no. 1, January 2002 (2002-01), pages 13-14, XP002356658 ISSN: 1078-8956 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006005461A3 (en) * 2004-07-15 2006-05-04 Bayer Healthcare Ag Diagnostics and therapeutics for diseases associated with g-protein coupled receptor 27 (gpr27)
US10260104B2 (en) 2010-07-27 2019-04-16 Genomic Health, Inc. Method for using gene expression to determine prognosis of prostate cancer
US10214777B2 (en) 2010-09-15 2019-02-26 Almac Diagnostics Limited Molecular diagnostic test for cancer
US10378066B2 (en) 2010-09-15 2019-08-13 Almac Diagnostic Services Limited Molecular diagnostic test for cancer
US10196691B2 (en) 2011-01-25 2019-02-05 Almac Diagnostics Limited Colon cancer gene expression signatures and methods of use
US9953129B2 (en) 2011-09-23 2018-04-24 Agency For Science, Technology And Research Patient stratification and determining clinical outcome for cancer patients
WO2016119191A1 (en) * 2015-01-30 2016-08-04 Bgi Shenzhen Biomarkers for colorectal cancer related diseases
CN108064273A (en) * 2015-01-30 2018-05-22 深圳华大基因研究院 The biomarker of colorectal cancer relevant disease

Also Published As

Publication number Publication date
WO2005083128A3 (en) 2006-11-23

Similar Documents

Publication Publication Date Title
US20060195266A1 (en) Methods for predicting cancer outcome and gene signatures for use therein
JP6824923B2 (en) Signs and prognosis of growth in gastrointestinal cancer
JP6404304B2 (en) Prognosis prediction of melanoma cancer
KR101530689B1 (en) Prognosis prediction for colorectal cancer
US9115401B2 (en) Partition defined detection methods
US20150079591A1 (en) Predicting response to chemotherapy using gene expression markers
EP2419540B1 (en) Methods and gene expression signature for assessing ras pathway activity
Kuo et al. A primer on gene expression and microarrays for machine learning researchers
WO2005083128A2 (en) Methods for predicting cancer outcome and gene signatures for use therein
NZ555353A (en) TNF antagonists
Kelmansky Where statistics and molecular microarray experiments biology meet
Dago Performance assessment of different microarray designs using RNA-Seq as reference
Westbrook Novel Targets for the Diagnosis and Treatment of Breast Cancer Identified by Genomic Analysis
HK1145342B (en) Prognosis prediction for melanoma cancer

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase