[go: up one dir, main page]

WO2003100030A2 - Kidney toxicity predictive genes - Google Patents

Kidney toxicity predictive genes Download PDF

Info

Publication number
WO2003100030A2
WO2003100030A2 PCT/US2003/006196 US0306196W WO03100030A2 WO 2003100030 A2 WO2003100030 A2 WO 2003100030A2 US 0306196 W US0306196 W US 0306196W WO 03100030 A2 WO03100030 A2 WO 03100030A2
Authority
WO
WIPO (PCT)
Prior art keywords
genes
gene sequences
members
combo
predictive
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2003/006196
Other languages
French (fr)
Other versions
WO2003100030A3 (en
WO2003100030B1 (en
Inventor
Larry Kier
Timothy D. Nolan
Usha Sankar
Maher Derbel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Phase-1 Molecular Toxicology Inc
Original Assignee
Phase-1 Molecular Toxicology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Phase-1 Molecular Toxicology Inc filed Critical Phase-1 Molecular Toxicology Inc
Priority to EP03741753A priority Critical patent/EP1506396A2/en
Priority to AU2003273154A priority patent/AU2003273154A1/en
Priority to CA002477688A priority patent/CA2477688A1/en
Publication of WO2003100030A2 publication Critical patent/WO2003100030A2/en
Anticipated expiration legal-status Critical
Publication of WO2003100030A3 publication Critical patent/WO2003100030A3/en
Publication of WO2003100030B1 publication Critical patent/WO2003100030B1/en
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • This application contains a gene sequence listing and four tables submitted on a compact disc whose file name is 'Tables for Burning", created on February 27, 2003, containing 5 files and is herein incorporated by reference in its entirety.
  • the five files are (a) a gene sequencing Table 32 (403 KB), in Microsoft® Word®, (b) Table 38 (785 KB) in Microsoft Excel®, (c) Table 39 (957KB) in Excel, (d) Table 40 (992 KB) in Excel, and (e) Table 45 (57KB) in Excel.
  • This invention is the field of toxicology. More specifically, it relates to kidney toxicity predictive genes and the methods of using such genes to predict kidney toxicity.
  • Molecular biology and genomics technologies have potential to create dramatic advances and improvements for the science of toxicology as for other biological sciences. See, for example, MacGregor, et al. Fund. Appl. Tox. 26:156- 173, 1995; Rodi et al., Tox. Pathology 27:107-110, 1999; Cunningham et al., Ann. N.Y. Acad. Sci. 919: 52-67, 2000; Pritchard et al., Proc. Natl. Acad. Sci. USA 98:13266-13271 , 2001 ; and Fielden and Zacharewski, Tox.
  • the invention provides kidney toxicity predictive genes and predictive models which are useful to predict toxic responses to one or more agents.
  • the invention provides methods of predicting kidney toxicity in an individual exposed to an agent which include the steps of: (a) obtaining a biological sample from an individual treated with the agent or treating a biological sample obtained from an individual with the agent or treating in vitro cultured cells or explants with the agent; (b) obtaining a gene expression profile from the biological sample or in vitro cultured cells or explants; and (c) using the gene expression profile from the biological sample or cells treated with the agent as a test set and a database of gene expression profiles and toxicity classifications as a training set and using kidney toxicity predictive genes and a Predictive Model to determine whether the agent will induce kidney toxicity in the individual or would be predicted to produce kidney toxicity following in vivo exposure.
  • the predictive model utilizes expression profiles from sets of kidney toxicity predictive gene(s) selected from Combination 6, infra, wherein the set is one or more kidney toxicity predictive gene(s). In other embodiments, the predictive model utilizes expression profiles from sets of one or more kidney toxicity predictive gene(s) selected from Combination 5, 4, 3, 2, or 1 , wherein the set is one or more kidney toxicity predictive gene(s).
  • the invention provides methods for determining the presence or absence of a no-observable effect level (NOEL) of an agent by the steps of: (a) obtaining biological samples from individuals treated with the agent at different dose levels or treating a biological sample obtained from an individual with different dose levels of the agent or treating in vitro cultured cells or explants with different dose levels of the agent; (b) obtaining gene expression profiles of the samples; and (d) using the gene expression profile from the biological samples as a test set and a database of gene expression profiles and toxicity classifications as a training set and using kidney toxicity predictive genes and a Predictive Model to determine or predict whether and at which dose levels the agent will induce kidney toxicity.
  • NOEL no-observable effect level
  • the predictive model utilizes expression profiles from sets of kidney toxicity predictive gene(s) selected from Combination 6, infra, wherein the set is one or more kidney toxicity predictive gene(s). In other embodiments, the predictive model utilizes expression profiles from sets of one or more kidney toxicity predictive gene(s) selected from Combination 5, 4, 3, 2, or 1 , wherein the set is one or more kidney toxicity predictive gene(s).
  • the predictive genes and models may be used with an in vitro system to identify in vitro systems that can be used to accurately predict in vivo toxicity and to use the identified in vitro systems to accurately predict in vivo toxicity.
  • the invention provides methods of identifying a kidney toxicity predictive gene in an individual including the steps of: (a) providing a set of candidate toxicity predictive genes; (b) evaluating said genes for their predictive performance with at least one training and test set of data in a predictive model to identify genes which are predictive of kidney toxicity; and (c) testing the performance of predictive genes for their ability to predict kidney toxicity for different training and test sets of data, for prediction of accurate compared to random classification and prediction of test data external to the data used to derive the predictive genes, in one embodiment, the candidate toxicity predictive genes are rat toxicity genes.
  • the invention provides methods for determining the presence or absence of a no-observable effect level (NOEL) of an agent by the steps of: (a) obtaining biological samples from individuals treated with the agent at different dose levels or treating a biological sample obtained from an individual with different dose levels of the agent or treating in vitro cultured cells or explants with different dose levels of the agent; (b) obtaining gene expression profiles of the samples; and (d) using the gene expression profile from the biological samples as a test set and a database of gene expression profiles and toxicity classifications as a training set and using kidney toxicity predictive genes and a Predictive Model to determine or predict whether and at which dose levels the agent will induce kidney toxicity.
  • NOEL no-observable effect level
  • the predictive model utilizes expression profiles from sets of kidney toxicity predictive gene(s) selected from Combination 6, infra, wherein the set is one or more kidney toxicity predictive gene(s). In other embodiments, the predictive model utilizes expression profiles from sets of one or more kidney toxicity predictive gene(s) selected from Combination 5, 4, 3, 2, or 1 , wherein the set is one or more kidney toxicity predictive gene(s).
  • the invention provides a computer program product which includes a set of kidney toxicity predictive genes derived from mining a database having a plurality of gene expression profiles indicative of toxicity.
  • the set of kidney toxicity predictive genes includes at least one toxicity predictive gene from combination 6, 5, 4, 3, 2, or 1 list.
  • the invention provides a library of information about kidney toxicity predictive genes produced by the methods disclosed herein.
  • the invention provides an integrated system for predicting kidney toxicity comprising: an array reader modified to read gene expression profiles from biological samples exposed to a test agent, operably linked to a computer comprising a database file having a plurality of kidney toxicity predictive genes.
  • FIG. 1 is a flow diagram illustrating the identification of kidney toxicity predictive genes.
  • the pathway is given for discovery of kidney toxicity predictive genes using the database of expression array data (Rat CT array) and toxicity data for kidney samples from rats treated with various compounds (see Table 1).
  • Gene with expressions correlating with pathology were determined using a variety of correlation statistics (see for example Tables 2 and 3).
  • Predictive model used was the GeneSpring Predict Parameter Value model that employs a K-nearest neighbor model.
  • Figure 2 is a graph which shows the percent of overall correct calls as a function of the number of predictivity genes using histopathology correlating genes (Pearson measure) as the input gene list with Training and Test Set A. The percent of overall correct calls is presented as a function of the number of kidney toxicity predictivity. genes.
  • the input genes list consisted of 66 genes that showed a statistically significant correlation with the histopathology scores using Pearson's correlation measure (r-value >0.4). Training and Test Set A was used with other model values of 10 nearest neighbors and a p-value ratio cutoff of 0.5. An optimum gene number of 49 was observed (lowest number of genes giving the highest percent overall calls) for this case.
  • FIG. 3 is a flow diagram illustrating how kidney toxicity predictive genes are evaluated for performance. Performance of predictive model is evaluated using 6 sets of training and test data (Rat CT expression array data). The training and test sets have accurate classification assignments (histopathology "yes” or “no” for each sample) or random classifications assignments ("yes” and "no” randomly assigned to samples). The K-nearest neighbor model is used with input being lists of predictive genes, as indicated, and the training and test set data. Four different measures of prediction are considered as indicated.
  • Figure 4 is a graph that shows the cumulative predictive performance of
  • Combo 6 genes The mean, minimum and maximum percent accuracy for 6 training and test sets are presented for Combo 6 genes that were used cumulatively in the order given in Table 14.
  • Figure 5 is a graph that shows the cumulative predictive performance of
  • Combo 5 genes The mean, minimum and maximum percent accuracy for 6 training and test sets are presented for Combo 5 genes that were used cumulatively in the order given in Table 14.
  • Figure 6 is a graph that shows the cumulative predictive performance of
  • Combo 4 genes The mean, minimum and maximum percent accuracy for 6 training and test sets are presented for Combo 4 genes that were used cumulatively in the order given in Table 14.
  • Figure 7 shows the k-means and tree cluster analysis of Combo 6 genes.
  • Figure 8 shows the Wards cluster analysis of Combo 6 gene set.
  • Figure 9 shows a scanned autoradiogram of a Western blot of serum samples from 8 animals probed with antibodies to clusterin and insulin-like growth factor binding protein 1. Sample information is indicated in the figure. The figure also presents transcriptional differential expression levels of the insulin-like growth factor binding protein 1 gene observed in kidney samples from these animals.
  • Table 1 lists the compounds, dose levels, kidney pathology and abbreviations in the database.
  • Table 2 lists genes whose expression at 24h directly correlates with kidney tubular necrosis at 72h, ranked by Pearson correlation coefficient.
  • Table 3 lists genes whose expression at 24h inversely correlates with kidney tubular necrosis at 72h, ranked by Spearman correlation coefficient.
  • Table 4 lists the distribution of compounds in individual training and test sets for 24 hour kidney data.
  • Table 5 lists the predictive genes for 24 hour expression data.
  • Table 7 lists the randomly selected gene subsets from 24 h combo 6 gene set (28 genes).
  • Table 8 lists the randomly selected gene subsets from 24 h combo 5 gene set (25 genes).
  • Table 9 lists the randomly selected gene subsets from 24 h combo 4 gene set (23 genes).
  • Table 10 lists the randomly selected gene subsets from array genes excluding combo all set.
  • Table 11 lists the kidney toxicity individual sample prediction values for 24 hour data predictive genes (combined list and subsets).
  • Table 12 lists the kidney toxicity compound-dose prediction values for 24 hour data predictive genes (combined list and subsets).
  • Table 13 lists the kidney toxicity compound prediction values for 24 hour data predictive genes (combined list and subsets).
  • Table 14 lists the order of genes used for cumulative analysis of predictive performance of predictive combo gene sets.
  • Table 15 lists the individual gene predictions for combo 6.
  • Table 16 lists the individual gene predictions for combo 5.
  • Table 17 lists kidney toxicity individual sample prediction values for 24 hour data with random gene subsets.
  • Table 18 lists the comparison of predictivity for true kidney toxicity classification and random classification using combo gene sets and random subsets and 24 hour data.
  • Table 19 lists the distribution of compounds in individual training and test sets for 6 hour kidney data.
  • Table 20 lists the genes whose expression at 6 hours directly correlates with kidney tubular necrosis at 72 hours, ranked by Pearson correlation coefficient.
  • Table 21 lists the genes whose expression at 6 hours inversely correlates with kidney tubular necrosis at 72 hours, ranked by Spearman correlation coefficient.
  • Table 22 lists the genes whose expression at 6 hours is predictive of kidney toxicity at 72 hours.
  • Table 23 lists the kidney toxicity compound-dose prediction values for 6 hour data predictive genes (combined list and subsets).
  • Table 24 lists the distribution of compounds in individual training and test sets for the 72 hour kidney data.
  • Table 25 lists the genes whose expression at 72 hours directly correlates with kidney tubular necrosis at 72 hours, ranked by Pearson correlation coefficient.
  • Table 26 lists the genes whose expression at 72 hours inversely correlates with kidney tubular necrosis at 72 hours, ranked by Spearman correlation coefficient.
  • Table 27 lists the genes whose expression at 72 hours is predictive of kidney toxicity at 72 hours.
  • Table 28 lists the kidney toxicity compound-dose prediction values for 72 hour data predictive genes (combined list and subsets).
  • Table 29 lists the predictive performance of various models.
  • Table 30 lists the logistic discrimination coefficients.
  • Table 31 lists the prediction of kidney toxicity for samples external to database.
  • Table 32 lists the genes predictive for kidney tubular necrosis, sequences, and accession numbers.
  • Table 33 lists the kidney predictive genes (376 genes) organized by time point and combo category.
  • Table 34 lists the RCT genes (ESTs) predictive for kidney tubular necrosis: best homology matches.
  • Table 35 lists the genes that are predictive at all three time points.
  • Table 36 lists the genes that are the most predictive across the time points.
  • Table 37 lists the kidney toxicity predictive genes whose protein products are known to be secreted. The genes are from the table listing all the kidney predictive genes at the three time points 6, 24 and 72 hours. The protein products are easier to access since they are secreted into body fluids and are thus more amenable to be quantified. Therefore these proteins could be monitored in body fluids of subjects such as humans and toxicity predictions could be made.
  • Table 38 lists the expression data for the 6 hour timepoint.
  • Table 39 lists the expression data for the 24 hour timepoint.
  • Table 40 lists the expression data for the 72 hour timepoint.
  • Table 41 lists the predictive performance of predictive genes organized by occurrence on training/test set lists (combo number) and time point.
  • Table 42 lists the summary output of the predictive computer software product.
  • Table 43 lists the detailed output of the predictive computer software product.
  • Table 44 lists protein marker candidate identification information that includes the gene name, % correct calls, average fold induction for negative histopathology samples, and average fold induction for positive histopathology samples. [69] Table 45 lists input data used for the predictive computer program product.
  • This invention relates to methods of predicting whether an agent or other stimulus is capable of inducing kidney toxicity in a recipient organism using predictive molecular toxicology analysis.
  • the invention provides methods of predicting kidney toxicity that comprise analyzing gene and/or protein expression across a number of kidney toxicity biomarkers disclosed herein for patterns of expression that correlate with and are predictive of kidney tubule necrosis in the recipient organism. This endpoint is significant because mortality in patients is high for acute renal failure and tubular necrosis is associated with many causes such as ischemia, endotoxemia or exposure to nephrotoxins (Ueda et al., Am. J. Med. 108: 403-415, 2000).
  • the invention is based, in part, upon the discovery that modulated transcriptional regulation of relatively small sets of certain genes in response to a test agent can accurately predict the occurrence of kidney toxicity observed at later time points.
  • kidney toxicity biomarkers which are useful in the practice of the kidney toxicity prediction methods of the invention.
  • applicants have identified 376 kidney toxicity biomarkers that demonstrate utility in predicting kidney toxicity outcomes. These biomarkers have been thoroughly characterized for their predictive performance, individually as well as in various combinations or subsets thereof.
  • various optimized subsets of the kidney toxicity biomarkers of the invention are disclosed, which sets have also been thoroughly characterized for predictive performance using the methods of the invention.
  • subsets of kidney toxicity genes provided herein are several which demonstrate prediction accuracies in the vicinity of 95%.
  • the methods of the invention are capable of distinguishing between agent dose levels which induce toxicity (typically higher doses) and those doses that are non-toxic. This latter feature is an essential component of meaningful toxicological evaluation.
  • Toxic or "toxicity” refers to the result of an agent causing adverse effects, usually by a xenobiotic agent administered at a sufficiently high dose level to cause the adverse effects.
  • kidney toxicity biomarker and “kidney toxicity predictive gene” are used interchangeably and refer to a gene whose expression, measured at the RNA or protein level can predict the likelihood of a kidney toxicity response with accuracy significantly better than would occur by chance.
  • the kidney toxicity response is tubular necrosis.
  • the kidney toxicity response can be other toxicity manifestations that elicit similar detectable gene expression changes. These could include other forms of tubular injury, glomerular toxicity and papillary injury.
  • a "toxicological response” refers to a cellular, tissue, organ or system level response to exposure to an agent. At the molecular level, this can include, but is not limited to, the differential expression of genes encompassing both the up- and down- regulation of expression of such genes at the RNA and/or protein level; the up- or down-regulation of expression of genes which encode proteins associated with response to and mitigation of damage, the repair or regulation of cell damage; or changes in gene expression due to changes in populations of cells in the tissue or organ affected in response to toxic damage.
  • An "agent” or “compound” is any element to which an individual can be exposed and can include, without limitation, drugs, pharmaceutical compounds, household chemicals, industrial chemicals, environmental chemicals, other chemicals, and physical elements such as electromagnetic radiation.
  • biological sample refers to substances obtained from an individual.
  • the samples may comprise cells, tissue, parts of tissues, organs, parts of organs, or fluids (e.g., blood, urine or serum).
  • Biological samples include, but are not limited to, those of eukaryotic, mammalian or human origin.
  • sample is defined for the purposes of prediction as a biological sample and the gene expression data for that sample. Each sample comes from an individual animal. A toxicity classification may also be associated with the sample.
  • Gene expression refers to the relative levels of expression and/or pattern of expression of a gene. In some embodiments, the expression refers to a toxicity gene or toxic response gene. In other embodiments, the expression is of a toxicity predictive gene.
  • Gene expression profile refers to the relative levels of expression of multiple different genes measured for the same sample. Gene expression profiles may be measured in a sample, such as samples comprising a variety of cell types, different tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or serum) by various methods including but not limited to microarray technologies and quantitative and semi-quantitative RT-PCR (e.g., TaqmanTM) techniques, as well as techniques for measuring expression of proteins.
  • a sample such as samples comprising a variety of cell types, different tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or serum) by various methods including but not limited to microarray technologies and quantitative and semi-quantitative RT-PCR (e.g., TaqmanTM) techniques, as well as techniques for measuring expression of proteins.
  • RT-PCR e.g., TaqmanTM
  • “Individual” refers to a vertebrate, including, but not limited to, a human, non- human primate, mouse, hamster, guinea pig, rabbit, cattle sheep, pig, chicken, and dog.
  • the terms “hybridize”, “hybridizing”, “hybridizes” and the like, used in the context of polynucleotides are meant to refer to conventional hybridization conditions, such as hybridization in 50% formamide/6X SSC/0.1% SDS/100 ⁇ g/ml ssDNA, in which temperatures for hybridization are above 37 degrees Celsius and temperatures for washing in 0.1 X SSC/0.1% SDS are above 55 degrees Celsius, and preferably to stringent hybridization conditions.
  • Nucleic acids will hybridize will depend upon factors such as their degree of complementarity as well as the stringency of the hybridization reaction conditions. Stringent conditions can be used to identify nucleic acid duplexes with a high degree of complementarity. Means for adjusting the stringency of a hybridization reaction are well-known to those of skill in the art. See, for example, Sambrook, et al., "Molecular Cloning: A Laboratory Manual,” Second Edition, Cold Spring Harbor Laboratory Press, 1989; Ausubel, et al., “Current Protocols In Molecular Biology,” John Wiley & Sons, 1996 and periodic updates; and Hames et al., "Nucleic Acid Hybridization: A Practical Approach,” IRL Press, Ltd., 1985.
  • conditions that increase stringency include higher temperature, lower ionic strength and presence or absence of solvents; lower stringency is favored by lower temperature, higher ionic strength, and lower or higher concentrations of solvents.
  • identity is used to express the percentage of amino acid residues at the same relative position which are the same.
  • homology is used to express the percentage of amino acid residues at the same relative positions which are either identical or are similar, using the conserved amino acid criteria of BLAST analysis, as is generally understood in the art. Further details regarding amino acid substitutions, which are considered conservative under such criteria, are provided.
  • A. Generation of Toxicology Gene Expression Biomarkers The kidney toxicity biomarkers described herein were initially identified utilizing a database generated from large numbers of in vivo experiments, wherein the differential expression of approximately 700 rat genes, measured at various time points, in response to multiple toxic compounds inducing various specific toxic responses, as visualized through microscopic histopathological analysis, was quantified, as described in pending United States Patent Application filed January 29, 2002 (serial number not yet assigned). This quantitative gene expression data, as well as corresponding histopathological information, was then subjected to an analytical approach specifically designed to identify genes which not only correlated with the observed histopathology, but also demonstrated an ability to be used in a model capable of accurately predicting the occurrence of the toxic response associated with the observed histopathology. A complete description of this identification process is presented in the Examples. A flow diagram illustrating how the kidney toxicity biomarkers of the invention were identified is presented in Figure 1.
  • kidney toxicity gene expression databases may be generated using techniques well known in the art, and used to identify additional kidney toxicity biomarkers, which may also be employed in the practice of the kidney toxicity prediction methods of the invention.
  • Such databases may be generated with test compounds capable of inducing various pathologies indicative of a toxic response in the kidney and/or other organs or systems, over different time periods and under different administration and/or dosing conditions, including without limitation kidney tubule necrosis, glomerular necrosis, glomerular sclerosis and papillary injury.
  • An example of compounds, dose levels, kidney toxicity classifications and histopathology scores used in the Examples which follow is provided in Table 1.
  • Such databases may be generated using organisms other than the rat, including without limitation, animals of canine, murine, or non-human primate species. In addition, such databases may incorporate data derived from human clinical trials and post-approval human clinical experiences.
  • Various methods for detecting and quantitating the expression of genes and/or proteins in response to toxic stimuli may be employed in the generation of such databases, as are generally known in the art. For example, microarrays comprising multiple cDNAs or oligonucleotide probes capable of hybridizing to corresponding transcripts of genes of interest may be used to generate gene expression profiles. Additionally, a number of other methods for detecting and quantitating the expression of gene transcripts are known in the art and may be employed, including without limitation, RT-PCR techniques such as TaqMan®, RNAse protection, branched chain, etc.
  • Databases comprising quantitative gene expression information preferably include qualitative and quantitative and/or semi-quantitative information respecting the observed toxicological responses and other conventional toxicology endpoints, such as for example, body and organ weights, serum chemistry and histopathology observations, histopathology scores and/or similar parameters.
  • the database preferably includes histopathology scores for each animal which has been exposed to one or more agent(s). These scores can be assigned based on actual histopathology observations for the tissue and animal or on the basis of effects observed for other animals treated with the same agent and dose level.
  • the scores are numerical scores that reflect the occurrence and severity of histopathological changes. These scores can be adjusted to have similar range to gene expression changes. For example, a score of 1 could be assigned to samples with no changes and scores of 28 assigned to increasingly severe changes. Because the scores are numerical, they are suitable for use with a variety of statistical correlation and similarity measures.
  • Example 1 An example of a histopathology scoring system is provided in Example 1.
  • histopathology scores may be utilized to identify genes which correlate with the observed toxicological response, using any number of statistical correlation and similarity analysis techniques, including without limitation those techniques described or employed in Example 1 (e.g., Pearson, Spearman, change, smooth, distance etc.). Such correlating genes may be used as predictive gene candidates. Examples of genes whose expression at 24 hours after treatment correlates with histopathology observed at 72h are detailed in Tables 3 and 4. In one embodiment, the correlating gene lists as well as the entire array gene list are used as input gene lists in the GeneSpringTM Predictive Model (otherwise known hereafter as "Predictive Model").
  • (C) Class Prediction and Classification Statistical analysis of the database of gene expression profiles can be effected by utilizing commercially available software programs.
  • GeneSpringTM Very 4.1 , Silicon Genetics, Redwood City, CA
  • Other software programs which can be used for statistical analysis include, without limitation, SAS software packages (SAS Institute Inc., Cary, NC) and S-PLUS® software (Insightful Corporation, Seattle, WA)
  • class predictions can be made from the genes in the database, as detailed in Example 1 , using one or more training and test sets.
  • six training sets and six test sets are obtained, as shown in Example 1 (Table 4).
  • Kidney toxicological classifications are entered for the samples in each training and test set.
  • Toxicological classifications can be defined by various pathologies.
  • the toxicity is defined as kidney tubular necrosis observed 72 hours after treatment with an agent. However, toxicity can manifest in other nephropathologies such as glomerular necrosis or papillary injury.
  • predicted classifications of the test set samples are obtained by using k-nearest neighbor (or knri) voting procedure.
  • the class of each of the knn is determined and the test sample is assigned to the class with the largest representation after adjusting for the proportion of classifications in the training set. In one embodiment, adjustments are made to account for different proportions of classes in the training set.
  • Toxicity can also be observed at various time points after exposure to an agent and is not limited to only 72 hour after treatment.
  • a skilled toxicologist can determine the optimal time after exposure to an agent to observe pathology by either what has been disclosed in the art or a stepwise experimentation with time increments, for example 2, 4, 6, 12, 18, 24, 36, 48 hours post-exposure or even longer time increments, for example, days, weeks, or months after exposure to the agent.
  • Figure 1 describes the overall process used to identify kidney toxicity predictive genes. In one embodiment, this process was run independently for each time point.
  • the number of genes that are to be used in the Predictive Model can be varied, for example 50, 40, 30, 20, 10, 5, 2, or 1 gene(s) can be used. In a preferred embodiment, at least 50 genes are used.
  • optimum gene lists for all input gene lists are combined for each training and test set and then these combined lists for all six training and test sets are merged to create an aggregate list of predictive genes.
  • the aggregate list can then be subdivided to smaller lists of genes based on the number of times that the genes occurred on the predictive gene lists for each individual training or test set. These are designated herein as Combo 6, 5, 4, 3, 2, or 1 lists.
  • the genes that were predictive in all 6 training and test sets are designated as Combo 6 and the genes that were predictive in 5 of 6 training and test sets are designated as Combo 5 and so forth.
  • Table 32 presents gene names, accession numbers and sequence information for the kidney toxicity predictive genes found by analysis of the database in the manner described above.
  • Table 33 lists the kidney toxicity predictive genes organized by time point and Combo Class.
  • Table 34 lists homologous genes for the RCT sequences that were identified by BLAST search using the GenBank NR database as the target database. [102] The predictive genes can also be categorized by their occurrence as predictive at different time points.
  • Table 35 lists 53 genes that are on the combined predictive lists of all three time points tested. This list is derived from the list of all the predictive genes measured at 6, 24 and 72 hours that predicted kidney tubular necrosis at 72 hours. Genes that are predictive at multiple time points can be further grouped by their Combo ranking.
  • Table 36 lists 23 genes that are the most predictive across the three time points tested.
  • This list is a subset of the list of 53 genes that are predictive across all three time points 6, 24 and 72 hours.
  • the criteria for inclusion in this table were that the gene be a member of the highest combinations, viz., combinations 6, 5 or 4 in at least 2 out of three time points.
  • the gene expression data of the genes in Table 36 could be expected to be very highly predictive of kidney tubular necrosis. Further, since the predictive strength of these genes is very high across the 3 time points tested, it could be expected that gene expression data derived from these genes even at time points not tested such as any time points falling between 6 and 72 hours or any other time point would be very highly predictive of tubular necrosis.
  • These specific genes could be useful in cases where the dose route or pharmacokinetic properties of a compound may alter the kinetics of predictive gene expression changes.
  • Example 1 list or subsets thereof was used as input into the Predictive Model.
  • Example 2 describes the evaluation of the predictive performance of the kidney toxicity predictive genes.
  • Predictive performance may also be assessed using data from different time points after exposure to the agent.
  • 24 hour expression data is used.
  • 6 hour expression data is used, as described in Examples 3 and 4.
  • 72 hour expression data is used, as described in Example 5 and 6.
  • Table 41 predictive capability for 24 hour expression data has a high accuracy rate (i.e., 90% accuracy) when the entire predictive gene list is used.
  • Predictive performance may also be assessed using subsets of genes from the different Combo lists. As indicated in Examples 2, 4 and 6 randomly selected subsets of the Combo gene lists had very good predictive performance (accuracy better than 80% and approaching 90%) and even individual genes had mean predictive accuracies that were significant (for example, greater than 80%). Cumulative performance of subsets of 24 h data is presented in Figures 4-6. In one embodiment, using 3 genes from Combo list 6 yields about 90% accuracy. However, using different Combo lists may require more genes to reach the same accuracy level, e.g., 8 genes from Combo 5 list, 13 genes from Combo 4 list.
  • kidney toxicity predictive genes The kidney toxicity predictive genes disclosed herein and kidney toxicity predictive genes identified by using methods disclosed herein are useful for predicting kidney toxicity in response to exposure to one or more agents.
  • the use of larger numbers of predictive genes provides for redundancy and consequent greater accuracy and precision. Applications using larger numbers of predictive genes might be tests of candidates at later stages of commercial development. An example would be later stages of preclinical development of a therapeutic candidate where in vivo samples can be obtained and more comprehensive methods such as microarray measurement of gene expression are appropriate.
  • the larger gene sets can also include different subsets of genes which may offer more insight into potential mechanisms of toxicity and the ability to have refined predictions of long term toxic consequences such as chronic, irreversible toxicity or carcinogenicity.
  • kidney toxicity predictive genes may also be suitable for prediction of toxicity in other organs or may be preferable for predicting toxicity for wider ranges of timepoints or treatment routes or regimens. As an example of the latter, some of the predictive genes are observed at three different timepoints after treatment. These genes may be useful for prediction in cases where the samples come from treatment protocols that have different measurement timepoints or routes of administration than those employed for the database or where the toxicokinetics for a particular agent are known or suspected to be different from those in the database.
  • the agent is an agent for which no expression profile has been assessed or stored in the database or library.
  • An animal e.g., rat
  • the gene expression profile(s) is the test set for the Predictive Model.
  • the training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database.
  • the prediction can be made with accuracy without requiring the use of histopathology scores for the test set as part of the input into the Predictive Model.
  • the agent is an agent present in the database but is used at a different dose level or with a different treatment protocol than used in the database.
  • the training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database.
  • the prediction can be made with accuracy without requiring the use of histopathology scores for the test set as part of the input into the Predictive Model.
  • the exposure time of the agent is not 6, 24, or 72 hours or repeat dosing protocols are used.
  • the skilled artisan can use the toxicity predictive genes from surrounding time points to extrapolate the predicted toxicity without undue experimentation. For example, if the individual has been exposed to the agent for 12 hours, then predictive genes from 6 and 24 hours timepoints are used as guidelines for extrapolating possible predicted toxicity.
  • the kidney predictive genes and predictive model can be used to determine the presence or absence of a no-observable toxicity effect level (NOEL).
  • NOEL no-observable toxicity effect level
  • An agent can be used at different treatment levels and expression profiles obtained for each treatment level.
  • the predictive genes and predictive model can be used to determine which dose levels elicit a response that is predicted to be toxic and which dose levels are not toxic.
  • the use of expression data, predictive genes and predictive models applies a number of quantitative endpoints and criteria instead of subjective endpoints and criteria. This permits more rigorous and precisely defined determination of no effect levels.
  • kidney toxicity predictive genes can be used to detect toxic effects that may be manifested as long lasting or chronic consequences such as irreversible toxicity or carcinogenesis.
  • the predictive genes and model can be applied to databases where classifications of training and test set samples are made with respect to actual or putative endpoints such as irreversible toxicity or carcinogenicity.
  • the predictive genes can be used in a variety of alternative models to predict kidney toxicity. Some of these models do not require the direct use of data in a database but use functions or coefficients derived from the database.
  • the predictive genes and models may be used to evaluate in vitro systems for their ability to reflect in vivo toxic events and to use such in vitro systems for predicting in vivo toxicity. Expression profiles for predictive genes can be created from candidate in vitro assays using treatments with agents of known in vivo toxicity and for which in vivo data on gene expression are available. The expression data and predictive models of this invention can be used to determine whether the in vitro assay system has predictive gene expression responses that accurately reflect the in vivo situation. Large sets of predictive genes as described in this invention can be tested in such models for their suitability and performance with the candidate in vitro systems. This is a superior and novel tool for evaluating and optimizing in vitro systems for their ability to reflect and accurately predict in vitro responses.
  • measurement of the expression levels of the proteins coded for by the predictive genes can be used in conjunction with predictive models to predict kidney toxicity.
  • kidney toxicity predictive genes are various genes known to encode cell surface, secreted and/or shed proteins. This enables the development of methods for predicting toxicity using protein biomarkers.
  • Example 11 presents a process by which candidate protein biomarker genes may be selected from biomarker genes identified from transcription expression. For example, as disclosed in Table 37, there are 23 genes in the master predictive set which are known to encode secreted proteins. As disclosed in Table 43, predictive protein marker candidates may also be selected by categorizing a number of other parameters related to the predictive performance and potential use as protein markers.
  • Example 11 the utility of this concept has been demonstrated by testing for serum protein levels of one of the identified biomarkers, insulin-like growth factor binding protein 1.
  • the serum protein levels of this biomarker parallel the kidney transcription levels and distinguish kidney toxic from non-toxic treatments.
  • kidney toxicity predictive assays which detect the expression of one or more of said predictive proteins may be developed. Such assays may have several advantages, such as:
  • the identified predictive genes can be considered as potential therapeutic targets when the genes are involved in toxic damage or repair responses whose expression or functional modification may attenuate, ameliorate or eliminate disease conditions or adverse symptoms of disease conditions.
  • the predictive genes can be organized into clusters of genes that exhibit similar patterns of expression by a variety of statistical procedures commonly used to identify such coordinately expression patterns.
  • Common functional properties of these clustered genes can be used to provide insight into the functional relationship of the response of these genes to toxic effects.
  • Common genetic properties of these genes e.g., common regulatory sequences
  • the presence of common known or novel signal transduction systems that regulate expression of the genes can also lead to insight as to the functional properties of the genes.
  • the presence of common known or novel regulatory sequences in the identified predictive genes can also be used to identify toxicity predictive genes that are not present in the current Rat CT array. This can be accomplished by someone skilled in the art who can analyze sequence databases for common regulatory sequences.
  • the kidney toxicity predictive genes can be used to predict toxicity responses in other species, for example, human, non-human primate, mouse, hamster, guinea pig, rabbit, cattle, sheep, pig, chicken, and dog. Some members of the kidney toxicity predictive genes may also be more suitable for prediction of toxicity in species other than the species used to derive the database (rat in the case of the examples provided).
  • One method for identification of such genes is that would be available to someone skilled in the art would be to examine DNA sequence databases to determine whether orthologous sequences to the predictive genes exist in the target species and how close the orthologous sequences are to the predictive gene sequences.
  • One of skill in the art can examine the orthologous sequences for similarity in amino acid coding regions and motifs as well as for similarities in regulatory regions and motifs of the gene.
  • kidney toxicity predictive genes or gene sequences are used for screening other potential toxicity predictive genes or gene sequences in other species or even within the same species using methods known in the art. See, for example, Sambrook supra. Gene sequences which hybridize under stringent conditions to the kidney toxicity predictive gene sequences disclosed herein are selected as potential toxicity predictive genes. Gene sequences which hybridize to the kidney toxicity predictivity gene of this invention can show homology to the kidney toxicity predictivity genes, preferably at least about 50%, 60%, 70%, 80%, or 90% identical to the kidney toxicity predictivity genes disclosed herein. It is understood that conservative substitutions of amino acids are possible for gene sequences which have some percentage homology with the kidney toxicity predictive gene sequences of this invention.
  • a conservative substitution in a protein is a substitution of one amino acid with an amino acid with similar size and charge.
  • Groups of amino acids known normally to be equivalent are: (a) Ala, Ser, Thr, Pro, and Gly; (b) Asn, Asp, Glu, and Gin; (c) His, Arg, and Lys; (d) Met, Glu, lie, and Val; and (e) Phe, Tyr, and Trp.
  • the toxicity predictive genes can be used as guides to predicting toxicity for agents that have been administered via different routes (, intravenous, oral, dermal, inhalation, I, etc.) from the routes that were used to generate the database or to identify the toxicity predictive genes.
  • the invention is not intended to be limiting to agents that have been administered at different dosages than the agents that were used to generate the database or to identify the toxicity predictive genes.
  • mice Charles River, Raleigh, NC were divided into treated rats that receive a specific concentration of the compound (see Table 1 ) and the control rats that only received the vehicle in which the compound is mixed (e.g., saline).
  • a specific concentration of the compound see Table 1
  • the control rats that only received the vehicle in which the compound is mixed (e.g., saline).
  • kidney tissue was weighed out and placed in a sterile container. To preserve integrity of the RNA, all tissues were kept on dry ice when other samples were being weighed. A RLT (Qiagen®) buffer buffer was added to the sample to aid in the homogenization process.
  • the tissue was homogenized using commercially available homogenizer ( IKA Ultra Turrax T25 homogenizer) with the 7 mm microfine sawtooth shaft and generator (195 mm long with a processing range of 0.25 ml to 20 ml, item # 372718). After homogenization, samples were stored on ice until all samples were homogenized. The homogenized tissue sample was spun to remove nuclei thus reducing DNA contamination.
  • Rat 700 CT chip Gene expression data was generated from a microarray chip that has a set of toxicologically relevant rat genes which are used to predict toxicological responses.
  • the rat 700 CT gene array is disclosed in U.S. applications 60/264,933; 60/308,161 ; and pending application filed on January 29, 2002 that claims priority to 60/264,933 and 60/308,161 [Attorney docket 40074-2000600].
  • Microarray RT reaction Fluorescence-labeled first strand cDNA probe was made from the total RNA or mRNA isolated from kidneys of control and treated rats. This probe was hybridized to microarray slides spotted with DNA specific for toxicologically relevant genes. The materials needed are: total or messenger RNA, primer, Superscript II buffer, dithiothreitol (DTT), nucleotide mix, Cy3 or Cy5 dye, Superscript II (RT), ammonium acetate, 70% EtOH, PCR machine, and ice.
  • each sample that would contain 20 yg of total RNA (or 2 ⁇ g of mRNA) was calculated.
  • the amount of DEPC water needed to bring the total volume of each RNA sample to 14 ⁇ l was also calculated. If RNA was too dilute, the samples were concentrated to a volume of less than 14 ⁇ in a speedvac without heat. The speedvac must be capable of generating a vacuum of 0 Milli-Torr so that samples can freeze dry under these conditions. Sufficient volume of DEPC water was added to bring the total volume of each RNA sample to 14 ⁇ l.
  • Each PCR tube was labeled with the name of the sample or control reaction. The appropriate volume of DEPC water and 8 ⁇ of anchored oligo dT mix (stored at -20°C) was added to each tube.
  • PCR tube The samples were mixed by pipeting. The tubes were kept on ice until all samples are ready for the next step. It is preferable for the tubes to kept on ice until the next step is ready to proceed. The samples were incubated in a PCR machine for 10 minutes at 70°C followed by 4°C incubation period until the sample tubes were ready to be retrieved. The sample tubes were left at 4°C for at least 2 minutes.
  • Cy dyes are light sensitive, so any solutions or samples containing Cy- dyes should be kept out of light as much as possible (e.g., cover with foil) after this point in the process. Sufficient amounts of Cy3 and Cy5 reverse transcription mix were prepared for one to two more reactions than would actually be run by scaling up the following:
  • the completed RT reaction contained impurities that must be removed. These impurities included excess primers, nucleotides, and dyes.
  • the primary method of removing the impurities was by following the instructions in the QIAquick PCR purification kit (Qiagen cat#120016).
  • the completed RT reactions were cleaned of impurities by ethanol precipitation and resin bead binding.
  • the samples from DNA engine were transferred to Eppendorf tubes containing 600 ⁇ l of ethanol precipitation mixture and placed in -80°C freezer for at least 20-30 minutes. These samples were centrifuged for 15 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C) and carefully the supernatant was decanted. A visible pellet was seen (pink/red for Cy3, blue for Cy5). Ice cold 70% EtOH (about 1 ml per tube) was used to wash the tubes and the tubes were subsequently inverted to clean tube and pellet.
  • the tubes were centrifuged for 10 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C), then the supernatant was carefully decanted. The tubes were air dried for about 5 to 10 minutes, protected from light. When the pellets were dried, they were resuspended in 80 ul nanopure water. The cDNA/mRNA hybrid was denatured by heating for 5 minutes at 95°C in a heat block and flash spun. Then the lid of a "Millipore MAHV N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was attached.
  • the filter plate was placed on a clean collection plate (v-bottom 96 well) and 80 ⁇ l of Nanopure water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was secured to the collection plate and after 5 minutes was centrifuged for 7 minutes at 2500 rpm.
  • Probes were added to the appropriate wells (80 ⁇ l cDNA samples) containing the Binding Resin.
  • the reaction is mixed by pipeting up and down -10 times. It is preferable to use regular, unfiltered pipette tips for this step.
  • the plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) and then the filtrate was decanted. About 200 ⁇ l of 80% isopropanol was added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 80% isopropanol wash and spin step was repeated.
  • the filter plate was placed on a clean collection plate (v-bottom 96 well) and 80 ⁇ l of Nanopure water, pH 8.0-8.5 was added.
  • the pH was adjusted with NaOH.
  • the filter plate was secured to the collection plate with tape to ensure that the plate did not slide during the final spin.
  • the plate sat for 5 minutes and was centrifuged for 7 minutes at 2500 rpm. Replicates of samples should be pooled.
  • (G) Dry-down Process Concentration of the cDNA probes is preferable so that they can be resuspended in hybridization buffer at the appropriate volume.
  • the volume of the control cDNA (Cy-5) was measured and divided by the number of samples to determine the appropriate amount to add to each test cDNA (Cy-3).
  • Eppendorf tubes were labeled for each test sample and the appropriate amount of control cDNA was allocated into each tube.
  • the test samples (Cy-3) were added to the appropriate tubes. These tubes were placed in a speed-vac to dry down, with foil covering any windows on the speed vac. At this point, heat (45°C) may be used to expedite the drying process. Samples may be saved in dried form at -20°C for up to 14 days.
  • Hybridization Buffer for 100 ⁇ l:
  • the hybridization buffer was made up as:
  • Hybridization Buffer for 101 ⁇ l:
  • the slides were then moved to 2X SSC, 0.1% SDS and soaked for 5 minutes.
  • the slides were transferred into 0.1X SSC and 0.1% SDS for 5 minutes.
  • the slides are transferred to 0.1 X SSC for 5 minutes.
  • the slides, still in the slide carrier were transferred into nanopure water (18 megaohms) for 1 second.
  • the stainless steel slide carriers were placed on micro-carrier plates and spun in a centrifuge (Beckman GS-6 or equivalent) for 5 minutes at 1000 rpm.
  • GenePix files as above. Initially, set A training set compounds (see Table 4) data from one microarray was used per animal. Next, set A test set compounds (see Table 4) replicate arrays for each animal were combined into one GenePix file. Specific data loaded into GeneSpringTM software included gene name, GenBank ID control channel mean fluorescence and signal channel mean fluorescence. Expression ratio data (ratio of signal to control fluorescence) were normalized using the 50 th percentile of the distribution of all genes and control channel. Ratio data were excluded from analysis if the control channel value was ⁇ 0. For analysis of correlations and predictive values gene expression ratios were transformed as the log of the ratio.
  • Histopathology scores for each animal were entered with gene expression data by using the GeneSpringTM 'Drawn Gene' function.
  • the first step is variable selection of genes to be used for prediction. This entails taking a single gene and a single class (e.g., kidney toxicity) and creating a contingency table.
  • columns 1 through N of the table each represent one possible cutoff point based on the gene expression level (ratio of signal/control) for that class.
  • the number of possible cutoffs is less than or equal to the total number of samples for the class (e.g., A). It is possibly less than the total number, since there may be ties in gene expression level.
  • N, M, and X may or may not be distinct.
  • n-class problem is illustrated, where and /entries are the class counts at that gene expression cutoff level, for that specific gene and class, either above (“a") or below (“b") the cutoff.
  • Classl is the set of all samples (above or below) the cutoff for Classl
  • !Class1 are all those not in Classl (above or below) the cutoff, and similarly for the other classes.
  • the class totals in the training set are the total class marginals used to compute Fisher's exact test.
  • the genes per class are rank ordered by the most discriminating (highest) score.
  • the predictivity list is composed of the most discriminating genes per class. Namely, genes are combined that best discriminate class 1 with those that best discriminate class 2 and so on. The genes are selected in rotation of the highest score per class. Duplicate genes are ignored in the rotation and not added to the list, the gene with the next highest score is taken.
  • each sample is a vector of 50 normalized expression ratios. Since the selection of genes is done in rotation, the list contains 25 genes for one class, and 25 for the other class.
  • the matrix below illustrates the basic features of this gene selection process.
  • the test set is classified based on the / -nearest neighbor (knn) voting procedure. Using just those genes in the gene list, for each sample in the test set of samples, the k nearest neighbors in the training set are found with the Euclidean distance. The class in which each of the k nearest neighbors is determined, and the test set sample is assigned to the class with the largest representation in the k nearest neighbors after adjusting for the proportion of classes in the training set.
  • knn / -nearest neighbor
  • the decision threshold is a mechanism to help clearly define the class into which the sample will fall, and can be set to reject classification if the voting is very close or tied. (Thus, k can be even for two-class problems without worrying about the tie problem.) A p-value is calculated for the proportion of neighbors in each class against the proportions found in the training set, again using Fisher's exact test, but now a one-sided test.
  • a p-value ratio is set as a way of setting the level of confidence in individual sample predictions based on the ratio of p-values for the best class (lowest p-value) versus the second best class (second lowest p-value). For example, if the P-value is set at 0.5 and the ratio of p-values for a particular sample is 0.6, then the predictive model will not make a call for that sample.
  • Training and Test Data Sets Data were each separated into 6 training and test sets. The first training and test set was created by allocating one set of data as a training set (Set A training set) and another set of data as a test set (Set A test set). Other training and test sets were created by randomly distributing the compounds into the sets. This was accomplished by assigning random numbers to lists of compounds that are negative and positive for histopathology, sorting by random number, and then dividing the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 4.
  • Kidney Toxicology Classification Kidney toxicity classifications were entered for training and test set as a parameter column. Toxicity, as defined by observation of kidney tubular necrosis in the kidney at 72 hours after treatment, was entered as a "yes” or "no" for each animal in a compound-dose group. Additionally, a parameter column for random histopathology classification was designated. This was done by randomly assigning the same number of "yes” and "no" calls to the individual animals.
  • Kidney toxicology classifications used are described in Table 1. In this analysis randomized classifications (same number of "yes” and “no" classifications distributed randomly among the samples) were used.
  • False positive rate is the proportion of negative cases that are incorrectly classified as positive is calculated as: b/a+b.
  • False negative rate is the proportion of positive cases that are incorrectly classified as negative is calculated as: c/c+d.
  • One noteworthy feature of the predictive ability is the ability to distinguish between effects of a compound at different dose levels.
  • Combo 6 and Combo 5 (The top combo subsets with the highest levels, 92.1% and 89.6%, respectively, of predictive accuracy on an individual sample basis) for 24 hour kidney data.
  • Example 1 Materials and Methods: Compounds and treatments list used to construct the kidney database are given in Example 1. This table also provides the evaluation of the kidney toxicity observed as kidney tubular necrosis in samples collected 72 hours after treatment. The database is described in detail in Example 1. This Example analyzes expression data from samples collected 6 hours after treatment. Array data, normalization and transformation procedures used were as described in Example 1. Procedures and methods for obtaining gene lists correlating with histopathology scores were as described in Example 1 with scores as in Example 1. The Predict Parameter Values tool in GeneSpringTM software used for kidney toxicity class prediction is described in detail in Material and Methods of Example 1. [202] (B) Training and Test Data Sets:Data were each separated into 6 training and test sets.
  • the first training and test set was created by allocating one set of data as a training set (Set A training set) and another set of data as a test set (Set A test set). Other training and test sets were created by randomly distributing the compounds into the sets. This was accomplished by assigning random numbers to lists of compounds that are negative and positive for histopathology, sorting by random number, and then dividing the sorted lists into a specific number of training and test sets.
  • the training and test set assignments are presented in Table 19.
  • Kidney toxicity classifications were entered for training and test set as a parameter column. Toxicity, as defined by observation of kidney tubular necrosis in the kidney at 72 hours after treatment, was entered as a "yes” or "no" for each animal in a compound-dose group. Additionally, a parameter column for random histopathology classification was designated. This was done by randomly assigning "yes” and "no” calls to the individual animals. The total number of "yes” and “no” calls was maintained the same as in the correct classification, so that the proportion of "yes” and no calls was the same in all the training and test sets.
  • Input genes for the Predict Parameter Value feature included all 700 genes in the GenePix file (the rat CT Array) as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously.
  • the number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used.
  • the specified number of predictive genes was varied to obtain an optimum number of predictive genes.
  • Hour Expression Data (A) Materials and Methods: The database used was as described in Example 1. Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 38 presents 6 hour gene expression data for the predictive genes. These data can be used with a k- means nearest neighbor prediction model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example. The Predict Parameter Values tool in GeneSpringTM software was used for kidney toxicity class prediction. A description of this tool and the statistical procedures used is provided in Example 1.
  • Kidney Toxicology Classification Kidney toxicology classifications used are described in Example 1. In this analysis randomized classifications (same number of "yes” and “no" classifications distributed randomly among the samples) were used.
  • (E) Prediction Measures Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Venables and Ripley, ibid and Kubat and Matwin, ibid). Results from predictions of a two class case can be described as a two-class matrix as described above.
  • Example 1 Materials and Methods: Compounds and treatments list used to construct the kidney database are given in Example 1. This table also provides the evaluation of the kidney toxicity observed as kidney tubular necrosis in samples collected 72 hours after treatment. The Database is described in detail in Example 1. This Example analyzes expression data from samples collected 6 hours after treatment. Array data, normalization and transformation procedures used were as described in Example 1. Procedures and methods for obtaining gene lists correlating with histopathology scores were as described in Example 1 with scores as in Example 1. The Predict Parameter Values tool in GeneSpringTM software used for kidney toxicity class prediction is described in detail in Material and Methods of Example 1.
  • the first training and test set was created by allocating one set of data as a training set (Set A training set) and another set of data as a test set (Set A test set).
  • Other training and test sets were created by randomly distributing the compounds into the sets. This was accomplished by assigning random numbers to lists of compounds that are negative and positive for histopathology, sorting by random number, and then dividing the sorted lists into a specific number of training and test sets.
  • Kidney Toxicology Classification Kidney toxicity classifications were entered for training and test set as a parameter column. Toxicity, as defined by observation of kidney tubular necrosis in the kidney at 72 hours after treatment, was entered as a "yes” or "no" for each animal in a compound-dose group. Additionally, a parameter column for random histopathology classification was designated. This was done by randomly assigning "yes” and "no" calls to the individual animals. The total number of "yes” and “no” calls was maintained the same as in the correct classification, so that the proportion of "yes” and no calls was the same in all the training and test sets.
  • Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff Calculations were made for overall percent correct calls (number of correct classifications/number or samples), percent correct calls of called samples (number of correct classifications/number of samples with calls) and percent of called samples (samples with calls/number of samples).
  • the correlating gene lists as well as the entire array gene list were provided as input lists to the GeneSpring Predict Parameter value tool (described in Materials and Methods) that employs a K-means nearest neighbor (knn) predictive model. These lists as well as the entire array gene list were used for each of the six training and test sets defined in Materials and Methods o generate predictions of histopathology classifications of the test sets.
  • Input genes for the Predict Parameter Value feature included all 700 genes in the GenePix file (the Rat CT Array) as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously.
  • the number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used.
  • the specified number of predictive genes was varied to obtain an optimum number of predictive genes.
  • each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set.
  • the aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 6 training and test sets were used, genes that were predictive in all 6 training and test sets were designated as Combo (combination) 6. Genes that were predictive in only 5 of 6 training and test sets were designated as Combo 5, etc.
  • Hour Expression Data (A) Materials and Methods: The Database used was as described in Example 1. Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 40 presents 72 hour gene expression data for the predictive genes. These data can be used with a k- means nearest neighbor prediction model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example. The Predict Parameter Values tool in GeneSpringTM software was used for kidney toxicity class prediction. A description of this tool and the statistical procedures used is provided in Example 1. The training and test data sets used are those described in Example 1.
  • Kidney Toxicology Classification Kidney toxicology classifications used are described in Example 1. In this analysis randomized classifications (same number of "yes” and “no" classifications distributed randomly among the samples) were used.
  • (D) Prediction Measures Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Venables and Ripley, ibid and Kubat and Matwin, ibid). Results from predictions of a two class case can be described above.
  • the database used for evaluation of these models was the 24 hour expression data for kidney samples described above. Expression data was for the Combo 6 set of predictive genes as described herein. Due to heteroscedasticity (i.e., the variance increases proportionately more than the mean increases) of the gene expression ratio data, a log transformation of the data is often considered. In general untransformed data was used but for some models log transformed data was used for comparison. Six training and testing sets were used that are the same as described in Example 1.
  • a discrimination function is used to classify a training set. This function is cross-validated with a testing set, often repeatedly to quantify the mean and variation of the classification error. There are numerous common discrimination functions, and a comparative study of the performance of these functions is useful in determining the best classifier. Additional measures are then used to compare the performance of the classifiers. Since the classes are of significantly uneven sizes, use a geometric mean measure (GMM) was used to compare models, namely, the square root of the product of the true positives and the true negatives.
  • GMM geometric mean measure
  • (C) Classifier Models A variety of common classification techniques were evaluated. As an extension of the k-means nearest neighbor (knn) model a simple hybrid classifier was designed and tested, using the knn results, to transform the knn model into a database independent model. This model is termed a centroid model. The centroid model uses the correctly identified test data results from knn and locates a centroid of the subset of k samples that are of the same class for each correctly identified test sample. The centroid is assigned the correct class, and with new test data, a sample is assigned the class of its nearest centroid.
  • knn k-means nearest neighbor
  • Trees were pruned via ten-fold internal cross-validation, (i.e., using subsets of the training set) for each training set, and then the tree was used to predict the testing set.
  • a GMM was thus calculated for each testing set.
  • Trees perform the gene selection via pruning, and anywhere from one to five genes were selected for each tree.
  • the centroid model is five-fold cross-validated using random subsets of the testing set. The mean of the GMM of each of the validation runs is used as the performance measure.
  • the top five discriminating genes are used in the centroid models.
  • the logistic discrimination uses a stepwise backwards selection process to determine the gene set during the training phase. Three to six genes are typically selected via this process. A single performance is then obtained using the corresponding testing set.
  • a neural network is trained on each training set and then validated on the corresponding testing set. All 28 genes in the data set are used with the neural network model.
  • Table 30 presents logistic discrimination coefficients derived from this analysis. These coefficients may be used in a logistic discriminant model to obtain predictions of kidney toxicity when expression ⁇ values for the indicated genes are determined using appropriate samples and an appropriate microarray expression detection system such as the Rat CT array used to develop the Database.
  • the classification model for all of the data using a classification tree in S-Plus software provided the following rule for predicting toxicity: if Gadd45 ⁇ 1.474 AND Tissue inhibitor of metalloproteinases 1 ⁇ 1.786, then "No” (not toxic), otherwise "Yes” Toxic.
  • (A)(1 ) Animal Treatment and Tissue Harvest Male Sprague-Dawley rats in groups of 3 were treated by intraperitoneal injection with test compounds (cephalosporidine, 1500 mg/kg and cisplatin, 20 mg/kg) or only with the vehicle in which the compound was mixed. At specified timepoints (6h and 24h) the rats were euthanized and tissues collected. Kidney tissues were immediately placed into liquid nitrogen and frozen within 3 minutes of the death of the animal to ensure that mRNA did not degrade. The tissues were sent blinded to be evaluated. The organs/tissues are then packaged into well-labeled plastic freezer quality bags and stored at -80 degrees until needed for isolation of the mRNA from a portion of the organ/tissue sample.
  • (C) RESULTS Table 31 presents predictions for samples that were external to the database used to derive the predictive genes.
  • the samples were kidney samples from replicate animals treated with cephaloridine and cisplatin.
  • One of these compounds (cisplatin) is also represented in the database (at a different dose level) and the other compound, cephaloridine, is not in the database. Histopathology conducted on the kidney samples verified that these treatments induced kidney tubular necrosis.
  • FIG. 7 presents combined results of K-means and gene-tree hierarchical clustering analysis.
  • Combo 6 28 genes was clustered using K-means (number of cluster 10, maximum iteration 100, similarity measure Pearson) and Gene tree (separation ratio 0.5, minimum distance 0.001 , similarity measure Pearson).
  • the k-means clusters are colored according to the corresponding set 1 to set 10.
  • the gene names on the display from top to bottom correspond to left to right cluster bars.
  • a computer program product produces a prediction of the occurrence of a kidney toxicity using input gene expression data from test samples.
  • the model and data for the computer program have been primarily validated using Phase-1 Rat CT arrays and Phase-
  • Rat CT expression data in the Phase-1 TOXBank database as described in previous examples may also be used in the computer program product.
  • expression platforms such as TaqMan using Syber Green technology
  • Those skilled in the art are capable of developing and validating scaling factors to adjust for differences in differential gene expression sensitivity and responsiveness among different platforms used in the computer program product.
  • the computer program product uses the Predictive Model as described in the previous examples.
  • the computer program product contains an encrypted training data set that includes differential gene expression values and an endpoint classification for each sample in the training set.
  • the computer program product samples are from the same timepoint (e.g., gene expression measured at 24 hours after dosing) and the classification is binary for the specific endpoint (e.g., kidney tubular necrosis or no kidney tubular necrosis).
  • the computer program product also contains encrypted lists of the Combo sets of predictive genes (also called Predictagen sets).
  • Inputs to the Predictive Model of the computer program product are the c value for number of nearest neighbors and the type of distance measure to be used in the model.
  • Data inputs for the Predictive Model include the Combo list(s) of predictive genes and training set as encrypted "plug-in" files and specification of a test data file(s) that has expression data.
  • the initial prediction is made after calculating the probability that the tabulated votes are different from the proportion of votes in the training set for each classification.
  • a statistical test (hypergeometric mean distribution) is run for each classification and p-values are calculated.
  • the classification prediction would be that class that has the highest p-value.
  • a classification cutoff procedure is used that uses the p-value ratio (1 - po/pi where po is the p-value for the not predicted class and pi is the p-value for the predicted class). If the p-value ratio does not exceed a specified cutoff value (input to the computer program product by the user) then a prediction is not made.
  • the Prediction Machine can be used with multiple Predictagen sets with the classifications, p-values and p-value ratios calculated as above. In this case an overall prediction is made by combining the predictions of the individual Predictagen sets. Each Predictagen set is weighted by a performance number. The overall certainty for this combined prediction is calculated by a paired value Mest using the p-value ratio and (1 -p-value ratio) for each Predictagen set as a pair of values. The certitude is 1-p where p is the value for the paired value Mest.
  • Encrypted training data is included as a plug-in module for the software.
  • User input includes specification of encrypted Predictagen gene lists and samples for prediction (files with gene expression data). Additional specifications are distance measure to be used in the knn model (currently Euclidean), number of neighbors and a certitude cutoff (p-value ratio cutoff).
  • the 'Load Predictagens' button is clicked on to load the desired predictagen(s).
  • the 24 hour kidney Predictagen is loaded.
  • a predictagen in the Predictagen sets list box is highlighted and the 'Make Predictor' button is clicked on (in this example, 24 hour kidney). If necessary, the predictor is highlighted and the 'Configure' button is clicked on to set parameter values.
  • the 'Load Samples' button is clicked on. Sample data is loaded as text files in the format shown in Table 44. Samples from the Samples list box using the left mouse button are then selected, and the CTRL key is simultaneously selected to make multiple selections.
  • 3 kidney samples from rats treated with 25 mg/kg paraquat and 3 kidney samples from rats treated with 80 mg/kg phenobarbital are selected.
  • the samples were treated and processed for gene expression analysis as described in the previous examples.
  • the 'Add to predictor' button is then clicked on, and the 'Predict' button is then clicked on to generate the program's output.
  • the 'Summary', 'Detail', or 'Full' radio buttons are selected to control the amount of information displayed about the prediction.
  • the 'Tabular Report' checkbox is checked to put the output in a format that can be loaded into Excel as tab-delimited text.
  • the 'Save', 'Copy', 'Print', and 'Clear' buttons are selected to save the output, copy the output to the clipboard, print the output, or clear the output window prior to another prediction.
  • the summary view displays sample information, the call (kidney tubular necrosis or negative), and the overall certitude.
  • the detail view presents the individual calls and 1 -p-value ratio for each Predictagen, in addition to summary view information.
  • the full view presents, for each sample and Predictagen gene list, the specific nearest neighbors and their classification (votes) along with the hypergeometric mean p values for each classification. At the end of this information detail view information is presented.
  • Table 43 displays the test set of gene expression data used to generate predictions. The table shows the correct classification of kidney samples that have histopathology (kidney tubular necrosis) or no histopathology.
  • Table 42 displays the summary output of the computer program after loading. Two out of three of the paraquat samples (sample #s 16477 and 16479) were correctly predicted for rat kidney tubular necrosis (with certitudes of 0.472 and 0.796). Three out of three of the phenobarbital samples were correctly predicted as negative for kidney tubular necrosis.
  • Table 43 displays the detailed output of the computer program, which shows the individual performances of the 24 hour kidney Combo sets and the overall certitude score.
  • Protein marker candidates can be selected from biomarker genes using a number of parameters. Table 44 presents biomarker genes sorted in order of their mean individual gene predictive performance (percent correct calls) for all genes exhibiting ⁇ 60% percent correct calls. Each gene was then evaluated for evidence whether it codes for a protein. This is clearly a key criterion for a protein marker. The next parameters evaluated were the relative transcriptional response in toxic versus non- toxic samples. If protein levels are proportional to RNA levels then these columns indicate the relative potential magnitude of the protein marker in toxic and non-toxic samples. The better marker candidates should be those genes exhibiting the larger differences in RNA expression. A number of additional criteria can be considered included protein MW, occurrence of the protein in tissues other than the target tissue and availability of antibodies which will recognize the protein.
  • One important criterion may also be whether the protein is secreted.
  • the last column in Table 44 indicates that 3 of the proteins are known to be secreted.
  • Table 37 lists proteins known to be secreted derived from the total list of predictive genes. The property of secretion may be useful in identification of proteins which could be biomarkers in serum or possibly other matrices such as urine or saliva.
  • Protein markers can be rapidly evaluated by testing for levels of the identified marker candidates using any of a number of analytical techniques for measuring specific protein levels such as Western blots or ELISA assays.
  • Samples for analysis may be selected from a tissue bank such as that described in Example 1. Selection for analysis would include samples from toxic treatments and samples from non-toxic treatments.
  • Quantitative protein marker data can be analyzed using the same approaches described in Example 2 for evaluation and validation of predictive performance of the protein markers.
  • Combination category is the number of training/test set gene list occurrences Table 6 Randomly Selected Gene Subsets from 24 H Combo All (216 Genes)*
  • Genes were randomly selected from the entire array list of genes excluding the Combo All 216 predictive genes by assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
  • Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and gene lists. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment.
  • False positive rate Proportion of negative cases that are incorrectly classified as positive
  • Geometric mean Performance measure that takes into account proportion of positive and negative cases
  • Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and gene lists. Unit of prediction was compound-dose level and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment. Prediction for compound-dose was based on a majority of individual animal calls. In cases where there were an equal number of opposing calls or no calls a no-call was assigned to the compound-dose level.
  • Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and gene lists. Unit of prediction was the compound and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment. Compounds were considered toxic if any compound-dose level for that compound was predicted as toxic.
  • Beta-actin sequence 2
  • Pancreatic secretory trypsin inhibitor type II PSTI-II
  • Preproalbumin sequence 2 (alternate clone 1)
  • Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and random subsets of genes. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment.
  • Accuracy proportion of the total number of predictions that are correct. Non-calls are counted as incorrect predictions. Accuracy was calculated for correct classifications of kidney toxicity assigned to the samples and for randomized classifications in the same proportions as the correct classifications. Values presented are the mean accuracy values for 6 training/test sets with minimum and maximum accuracy values.
  • Combination category is the number of training/test set gene list occurrences.
  • Combination category is the number of training/test set gene list occurrences.
  • Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 72 hour array data and gene lists. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment.
  • a Combo entry number indicates that the gene was on the predictive list for that time point and the number of occurrences of that gene on optimal combined training/test set lists. "Not Found” indicates that the gene was not on the optimal combined list for that time point.
  • PC3 NGF-inducible anti-proliferative putative secreted protein
  • Pancreatic secretory trypsin inhibitor type II PSTI-II
  • Alpha prothymosin 1 104 118 106 -108 1
  • Beta-actin sequence 2 -117 -115 -102 102 105 108
  • Bile salt export pump (sister of p-gtycoprotein) 115 12 115 108 127 105
  • Carbonic anhydrase III sequence 2 -114 -107 -121 123 105 123
  • NCLK Cdc2 related protein kinase
  • CNBP Cellular nucleic acid binding protein
  • Complement factor I CFI
  • Co ⁇ trapsin-iike protease inhibitor (CP ⁇ -21) -105 103 101 -102 -102 -106
  • Disulfide isomerase related protein ERp72 117 102 111 11 101 112
  • Enoyl CoA hydratase (mitochondnal) 122 117 117 127 114 118
  • Epithelial sodium channel alpha subunit (alpha-ENaC) 1 1 -104 -104 101 109
  • Fetuin-like protein (IRL685) 104 101 104 -127 -104 -117
  • hypoxia-inducible factor 1 alpha 116 106 112 111 108 116
  • Insulin-like growth factor I exon 6 -106 108 ⁇ 115 -104 -127 •111
  • Interferon inducible protein 10 104 -107 -109 122 108 -108
  • Interferon related developmental regulator IFRD1 (PC4) 133 109 117 -109 103 •107
  • Peroxisome proliferator activated receptor alpha 114 114 108 109 125 128
  • Peroxisome proliferator activated receptor gamma 1 103 102 111 -105 -11
  • Phase-1 RCT 110 1 -117 -128 -106 106 113
  • Phase-1 RCT-141 138 134 121 -102 103 104
  • Phase-1 RCT-H9 177 131 115 -109 -109 -107 D hase-1 RCT 15 106 102 103 111 106 Pnase-1 RCT-150 -114 -104 ' -117 105 125 107- Phas ⁇ -1 RCT-151 -12 -11 ⁇ 121 107 102 108 Phase-1 RCT-152 -104 -111 12 -104 -106 -104 Phase-1 RCT 153 -144 129 131 -106 -102 -109 Phase-1 RCT-154 -107 -109 -102 103 -109 108 Phase-1 RCT-155 -11 -104 •115 -104 -103 -107 Phase-1 RCT-156 104 -103 •1 -107 -125 -108 Phase-1 RCT-158 196 135 256 -104 165 106 Phase-1 RCT-160 103 -111 -106 1 104 106 Phase-1 RCT-161 -13 -139 -129 -
  • Phase- 1 RCT-282 " -115 -118 -117 -107 -104 -104 Phase-! RCT-283 -104 -103 -108 -103 -107 -103 Phase-1 RCT 284 103 -101 11 -111 105 -103 Phase-1 RCT-285 -103 -I 03 -107 -101 105 108 Phase-1 RCT-286 104 -105 -102 105 -1 ⁇ 101 Phase-1 RCT 287 -101 -101 -102 -106 103 -108 Phase-1 RCT-288 101 104 -108 103 102 •101 Phase-1 RCT-289 -103 101 -1 -101 105 114 Phase-1 RCT-29 -108 -116 -I 18 -1 -104 -103 Phase-1 RCT-290 124 115 103 106 119 105 Phase-1 RCT-291 -102 -106 -1 108 123 108 Phase-1 RCT-292 •11 -108 -106 104
  • D hase-1 RCT-74 102 -111 -114 -102 -101 -106

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The invention provides kidney toxicity predictive genes which can be used to predict kidney toxicity in response to one or more agents.

Description

SPECIFICATION
KIDNEY TOXICITY PREDICTIVE GENES
Cross Reference to Other Patent Applications
[01] This application claims priority from U.S. provisional application Serial No.
60/361 ,128 titled "Kidney Toxicity Predictive Genes", on February 27, 2002, which is hereby incorporated by reference in its entirety.
Reference to a Sequence Listing and Tables
[02] This application contains a gene sequence listing and four tables submitted on a compact disc whose file name is 'Tables for Burning", created on February 27, 2003, containing 5 files and is herein incorporated by reference in its entirety. The five files are (a) a gene sequencing Table 32 (403 KB), in Microsoft® Word®, (b) Table 38 (785 KB) in Microsoft Excel®, (c) Table 39 (957KB) in Excel, (d) Table 40 (992 KB) in Excel, and (e) Table 45 (57KB) in Excel.
Background of the Invention
[03] This invention is the field of toxicology. More specifically, it relates to kidney toxicity predictive genes and the methods of using such genes to predict kidney toxicity. Molecular biology and genomics technologies have potential to create dramatic advances and improvements for the science of toxicology as for other biological sciences. See, for example, MacGregor, et al. Fund. Appl. Tox. 26:156- 173, 1995; Rodi et al., Tox. Pathology 27:107-110, 1999; Cunningham et al., Ann. N.Y. Acad. Sci. 919: 52-67, 2000; Pritchard et al., Proc. Natl. Acad. Sci. USA 98:13266-13271 , 2001 ; and Fielden and Zacharewski, Tox. Sciences 60: 6-10, 2001. The advantage of these technologies is that they can provide massive amounts of parallel information and that this information concerns processes and events occurring at the molecular level. This level of information is in dramatic contrast to conventional safety assessment toxicology that, to a large extent, currently relies on subjective evaluation (e.g., in-life observations of behavior, observations of gross abnormalities at necropsy and histopathological examination of stained tissue slides using a microscope). These current methodologies may be largely subjective and in some cases such as histopathological evaluation, they require someone with a high degree of training, experience and skill to make competent evaluations. Furthermore, many of the methodologies require access to organs and tissues that necessitates either killing laboratory animals or surgery to obtain tissue specimens.
[04] Recently, there have been some initial efforts to apply molecular biology and genomics technologies to toxicology. Some efforts have involved application of gene expression measurements. See, for example, U.S. Patent 6,228,589 and WO 01/05804. Analysis of the data has yielded interesting observations of gene expressions that appear to correlate with some toxic effects or mechanisms. See, for example, Mueller et al. Environmental Health Perspectives 106(5): 277-230 (1998). However, there has been very little published work in toxicology so far that applies rigorous analytical and statistical techniques to the massive amounts of data available from genomics technologies. The observations, so far, have tended to be phenomenological and focused on individual gene responses rather than determining the generally applicable capabilities of patterns of gene expression to predict toxic effects (see, for example, studies of gene expression altered by exposure to kidney toxicants in Bartosiewicz et al., J. Pharm. Exp. Ther. 297: 895-905, 2001 ; Lieberthal, Curr. Opin. Nephrol. Hypertens 7:289-295, 1998; Huang et al., Tox. Sciences 63: 196-207, 2001). Even in the larger field of biological sciences, these types of analyses are just beginning to be evidenced in the literature (e.g., Golub et al., Science 286: 531-537, 1999).
[05] What is needed are genes and predictive models, which are capable of predicting toxicity response. Brief Summary of the Invention
[06] The invention provides kidney toxicity predictive genes and predictive models which are useful to predict toxic responses to one or more agents.
[07] In one aspect, the invention provides methods of predicting kidney toxicity in an individual exposed to an agent which include the steps of: (a) obtaining a biological sample from an individual treated with the agent or treating a biological sample obtained from an individual with the agent or treating in vitro cultured cells or explants with the agent; (b) obtaining a gene expression profile from the biological sample or in vitro cultured cells or explants; and (c) using the gene expression profile from the biological sample or cells treated with the agent as a test set and a database of gene expression profiles and toxicity classifications as a training set and using kidney toxicity predictive genes and a Predictive Model to determine whether the agent will induce kidney toxicity in the individual or would be predicted to produce kidney toxicity following in vivo exposure.
[08] In one embodiment, the predictive model utilizes expression profiles from sets of kidney toxicity predictive gene(s) selected from Combination 6, infra, wherein the set is one or more kidney toxicity predictive gene(s). In other embodiments, the predictive model utilizes expression profiles from sets of one or more kidney toxicity predictive gene(s) selected from Combination 5, 4, 3, 2, or 1 , wherein the set is one or more kidney toxicity predictive gene(s).
[09] In another aspect, the invention provides methods for determining the presence or absence of a no-observable effect level (NOEL) of an agent by the steps of: (a) obtaining biological samples from individuals treated with the agent at different dose levels or treating a biological sample obtained from an individual with different dose levels of the agent or treating in vitro cultured cells or explants with different dose levels of the agent; (b) obtaining gene expression profiles of the samples; and (d) using the gene expression profile from the biological samples as a test set and a database of gene expression profiles and toxicity classifications as a training set and using kidney toxicity predictive genes and a Predictive Model to determine or predict whether and at which dose levels the agent will induce kidney toxicity. In one embodiment, the predictive model utilizes expression profiles from sets of kidney toxicity predictive gene(s) selected from Combination 6, infra, wherein the set is one or more kidney toxicity predictive gene(s). In other embodiments, the predictive model utilizes expression profiles from sets of one or more kidney toxicity predictive gene(s) selected from Combination 5, 4, 3, 2, or 1 , wherein the set is one or more kidney toxicity predictive gene(s).
[10] In another embodiment, the predictive genes and models may be used with an in vitro system to identify in vitro systems that can be used to accurately predict in vivo toxicity and to use the identified in vitro systems to accurately predict in vivo toxicity.
[11] In another aspect, the invention provides methods of identifying a kidney toxicity predictive gene in an individual including the steps of: (a) providing a set of candidate toxicity predictive genes; (b) evaluating said genes for their predictive performance with at least one training and test set of data in a predictive model to identify genes which are predictive of kidney toxicity; and (c) testing the performance of predictive genes for their ability to predict kidney toxicity for different training and test sets of data, for prediction of accurate compared to random classification and prediction of test data external to the data used to derive the predictive genes, in one embodiment, the candidate toxicity predictive genes are rat toxicity genes.
[12] In another aspect, the invention provides methods for determining the presence or absence of a no-observable effect level (NOEL) of an agent by the steps of: (a) obtaining biological samples from individuals treated with the agent at different dose levels or treating a biological sample obtained from an individual with different dose levels of the agent or treating in vitro cultured cells or explants with different dose levels of the agent; (b) obtaining gene expression profiles of the samples; and (d) using the gene expression profile from the biological samples as a test set and a database of gene expression profiles and toxicity classifications as a training set and using kidney toxicity predictive genes and a Predictive Model to determine or predict whether and at which dose levels the agent will induce kidney toxicity. In one embodiment, the predictive model utilizes expression profiles from sets of kidney toxicity predictive gene(s) selected from Combination 6, infra, wherein the set is one or more kidney toxicity predictive gene(s). In other embodiments, the predictive model utilizes expression profiles from sets of one or more kidney toxicity predictive gene(s) selected from Combination 5, 4, 3, 2, or 1 , wherein the set is one or more kidney toxicity predictive gene(s).
[13] In another aspect, the invention provides a computer program product which includes a set of kidney toxicity predictive genes derived from mining a database having a plurality of gene expression profiles indicative of toxicity. in one embodiment, the set of kidney toxicity predictive genes includes at least one toxicity predictive gene from combination 6, 5, 4, 3, 2, or 1 list.
[14] In another aspect, the invention provides a library of information about kidney toxicity predictive genes produced by the methods disclosed herein.
[15] In another aspect, the invention provides an integrated system for predicting kidney toxicity comprising: an array reader modified to read gene expression profiles from biological samples exposed to a test agent, operably linked to a computer comprising a database file having a plurality of kidney toxicity predictive genes.
BRIEF DESCRIPTION OF THE DRAWINGS
[16] Figure 1 is a flow diagram illustrating the identification of kidney toxicity predictive genes. The pathway is given for discovery of kidney toxicity predictive genes using the database of expression array data (Rat CT array) and toxicity data for kidney samples from rats treated with various compounds (see Table 1). Gene with expressions correlating with pathology were determined using a variety of correlation statistics (see for example Tables 2 and 3). Predictive model used was the GeneSpring Predict Parameter Value model that employs a K-nearest neighbor model.
[17] Figure 2 is a graph which shows the percent of overall correct calls as a function of the number of predictivity genes using histopathology correlating genes (Pearson measure) as the input gene list with Training and Test Set A. The percent of overall correct calls is presented as a function of the number of kidney toxicity predictivity. genes. The input genes list consisted of 66 genes that showed a statistically significant correlation with the histopathology scores using Pearson's correlation measure (r-value >0.4). Training and Test Set A was used with other model values of 10 nearest neighbors and a p-value ratio cutoff of 0.5. An optimum gene number of 49 was observed (lowest number of genes giving the highest percent overall calls) for this case.
[18] Figure 3 is a flow diagram illustrating how kidney toxicity predictive genes are evaluated for performance. Performance of predictive model is evaluated using 6 sets of training and test data (Rat CT expression array data). The training and test sets have accurate classification assignments (histopathology "yes" or "no" for each sample) or random classifications assignments ("yes" and "no" randomly assigned to samples). The K-nearest neighbor model is used with input being lists of predictive genes, as indicated, and the training and test set data. Four different measures of prediction are considered as indicated.
[19] Figure 4 is a graph that shows the cumulative predictive performance of
Combo 6 genes. The mean, minimum and maximum percent accuracy for 6 training and test sets are presented for Combo 6 genes that were used cumulatively in the order given in Table 14.
[20] Figure 5 is a graph that shows the cumulative predictive performance of
Combo 5 genes. The mean, minimum and maximum percent accuracy for 6 training and test sets are presented for Combo 5 genes that were used cumulatively in the order given in Table 14. [21] Figure 6 is a graph that shows the cumulative predictive performance of
Combo 4 genes. The mean, minimum and maximum percent accuracy for 6 training and test sets are presented for Combo 4 genes that were used cumulatively in the order given in Table 14.
[22] Figure 7 shows the k-means and tree cluster analysis of Combo 6 genes.
[23] Figure 8 shows the Wards cluster analysis of Combo 6 gene set.
[24] Figure 9 shows a scanned autoradiogram of a Western blot of serum samples from 8 animals probed with antibodies to clusterin and insulin-like growth factor binding protein 1. Sample information is indicated in the figure. The figure also presents transcriptional differential expression levels of the insulin-like growth factor binding protein 1 gene observed in kidney samples from these animals.
BRIEF DESCRIPTION OF THE TABLES
[25] Table 1 lists the compounds, dose levels, kidney pathology and abbreviations in the database.
[26] Table 2 lists genes whose expression at 24h directly correlates with kidney tubular necrosis at 72h, ranked by Pearson correlation coefficient.
[27] Table 3 lists genes whose expression at 24h inversely correlates with kidney tubular necrosis at 72h, ranked by Spearman correlation coefficient.
[28] Table 4 lists the distribution of compounds in individual training and test sets for 24 hour kidney data.
[29] Table 5 lists the predictive genes for 24 hour expression data.
[30] Table 6 lists the randomly selected gene subsets from 24 hour combo all
(216 genes). [31] Table 7 lists the randomly selected gene subsets from 24 h combo 6 gene set (28 genes).
[32] Table 8 lists the randomly selected gene subsets from 24 h combo 5 gene set (25 genes).
[33] Table 9 lists the randomly selected gene subsets from 24 h combo 4 gene set (23 genes).
[34] Table 10 lists the randomly selected gene subsets from array genes excluding combo all set.
[35] Table 11 lists the kidney toxicity individual sample prediction values for 24 hour data predictive genes (combined list and subsets).
[36] Table 12 lists the kidney toxicity compound-dose prediction values for 24 hour data predictive genes (combined list and subsets).
[37] Table 13 lists the kidney toxicity compound prediction values for 24 hour data predictive genes (combined list and subsets).
[38] Table 14 lists the order of genes used for cumulative analysis of predictive performance of predictive combo gene sets.
[39] Table 15 lists the individual gene predictions for combo 6.
[40] Table 16 lists the individual gene predictions for combo 5.
[41] Table 17 lists kidney toxicity individual sample prediction values for 24 hour data with random gene subsets.
[42] Table 18 lists the comparison of predictivity for true kidney toxicity classification and random classification using combo gene sets and random subsets and 24 hour data.
[43] Table 19 lists the distribution of compounds in individual training and test sets for 6 hour kidney data.
[44] Table 20 lists the genes whose expression at 6 hours directly correlates with kidney tubular necrosis at 72 hours, ranked by Pearson correlation coefficient.
[45] Table 21 lists the genes whose expression at 6 hours inversely correlates with kidney tubular necrosis at 72 hours, ranked by Spearman correlation coefficient.
[46] Table 22 lists the genes whose expression at 6 hours is predictive of kidney toxicity at 72 hours.
[47] Table 23 lists the kidney toxicity compound-dose prediction values for 6 hour data predictive genes (combined list and subsets).
[48] Table 24 lists the distribution of compounds in individual training and test sets for the 72 hour kidney data.
[49] Table 25 lists the genes whose expression at 72 hours directly correlates with kidney tubular necrosis at 72 hours, ranked by Pearson correlation coefficient.
[50] Table 26 lists the genes whose expression at 72 hours inversely correlates with kidney tubular necrosis at 72 hours, ranked by Spearman correlation coefficient.
[51] Table 27 lists the genes whose expression at 72 hours is predictive of kidney toxicity at 72 hours.
[52] Table 28 lists the kidney toxicity compound-dose prediction values for 72 hour data predictive genes (combined list and subsets).
[53] Table 29 lists the predictive performance of various models.
[54] Table 30 lists the logistic discrimination coefficients.
[55] Table 31 lists the prediction of kidney toxicity for samples external to database.
[56] Table 32 lists the genes predictive for kidney tubular necrosis, sequences, and accession numbers.
[57] Table 33 lists the kidney predictive genes (376 genes) organized by time point and combo category.
[58] Table 34 lists the RCT genes (ESTs) predictive for kidney tubular necrosis: best homology matches.
[59] Table 35 lists the genes that are predictive at all three time points.
[60] Table 36 lists the genes that are the most predictive across the time points.
[61] Table 37 lists the kidney toxicity predictive genes whose protein products are known to be secreted. The genes are from the table listing all the kidney predictive genes at the three time points 6, 24 and 72 hours. The protein products are easier to access since they are secreted into body fluids and are thus more amenable to be quantified. Therefore these proteins could be monitored in body fluids of subjects such as humans and toxicity predictions could be made.
[62] Table 38 lists the expression data for the 6 hour timepoint.
[63] Table 39 lists the expression data for the 24 hour timepoint.
[64] Table 40 lists the expression data for the 72 hour timepoint.
[65] Table 41 lists the predictive performance of predictive genes organized by occurrence on training/test set lists (combo number) and time point.
[66] Table 42 lists the summary output of the predictive computer software product.
[67] Table 43 lists the detailed output of the predictive computer software product.
[68] Table 44 lists protein marker candidate identification information that includes the gene name, % correct calls, average fold induction for negative histopathology samples, and average fold induction for positive histopathology samples. [69] Table 45 lists input data used for the predictive computer program product.
DETAILED DESCRIPTION OF THE INVENTION
[70] This invention relates to methods of predicting whether an agent or other stimulus is capable of inducing kidney toxicity in a recipient organism using predictive molecular toxicology analysis. In particular, the invention provides methods of predicting kidney toxicity that comprise analyzing gene and/or protein expression across a number of kidney toxicity biomarkers disclosed herein for patterns of expression that correlate with and are predictive of kidney tubule necrosis in the recipient organism. This endpoint is significant because mortality in patients is high for acute renal failure and tubular necrosis is associated with many causes such as ischemia, endotoxemia or exposure to nephrotoxins (Ueda et al., Am. J. Med. 108: 403-415, 2000).
[71] The invention is based, in part, upon the discovery that modulated transcriptional regulation of relatively small sets of certain genes in response to a test agent can accurately predict the occurrence of kidney toxicity observed at later time points.
[72] Provided herein are multiple sets of kidney toxicity biomarkers which are useful in the practice of the kidney toxicity prediction methods of the invention. In particular, applicants have identified 376 kidney toxicity biomarkers that demonstrate utility in predicting kidney toxicity outcomes. These biomarkers have been thoroughly characterized for their predictive performance, individually as well as in various combinations or subsets thereof. In addition, various optimized subsets of the kidney toxicity biomarkers of the invention are disclosed, which sets have also been thoroughly characterized for predictive performance using the methods of the invention. Among the subsets of kidney toxicity genes provided herein are several which demonstrate prediction accuracies in the vicinity of 95%. [73] The invention is further described by way of the experimental examples provided herein. These examples demonstrate that small sets of genes (i.e., in some instances, as few as 2 or 3 biomarker genes) may be used to accurately predict kidney toxicity. For example, as further described in the Examples, analysis of mRNA expression of only a few genes can provide an accurate indication of whether a test agent will or will not induce kidney toxicity.
[74] The predictive capacity of the methods of the invention have been verified by
(a) comparisons with random classifications, and (b) predictions using data external to the database used to identify the kidney toxicity biomarkers. Moreover, the methods of the invention are capable of distinguishing between agent dose levels which induce toxicity (typically higher doses) and those doses that are non-toxic. This latter feature is an essential component of meaningful toxicological evaluation..
[75] I. General Techniques: The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, biochemistry, nucleic acid chemistry, and immunology, which are well known to those skilled in the art.. Such techniques are explained fully in the literature, such as, Molecular Cloning: A Laboratory Manual, second edition (Sambrook et al., 1989) and Molecular Cloning: A Laboratory Manual, third edition (Sambrook and Russel, 2001), (jointly referred to herein as "Sambrook"); Current Protocols in Molecular Biology (F.M. Ausubel et al., eds., 1987, including supplements through 2001); PCR: The Polymerase Chain Reaction, (Mullis et al., eds., 1994); Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York; Harlow and Lane (1999) Using Antibodies: A Laboratory Manual Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (jointly referred to herein as "Harlow and Lane"), Beaucage et al. eds., Current Protocols in Nucleic Acid Chemistry John Wiley & Sons, Inc., New York, 2000) and Casarett and Doull's Toxicology The Basic Science of Poisons, C. Klaassen, ed., 6th edition (2001).
[76] II. Definitions: Unless otherwise defined, all terms of art, notations and other scientific terminology used herein are intended to have the meanings commonly understood by those of skill in the art to which this invention pertains. In some cases, terms with commonly understood meanings are defined herein for clarity and/or for ready reference, and the inclusion of such definitions herein should not necessarily be construed to represent a substantial difference over what is generally understood in the art. The techniques and procedures described or referenced herein are generally well understood and commonly employed using conventional methodology by those skilled in the art, such as, for example, the widely utilized molecular cloning methodologies described in Sambrook et al., Molecular Cloning: A Laboratory Manual 2nd edition (1989) Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. As appropriate, procedures involving the use of commercially available kits and reagents are generally carried out in accordance with manufacturer defined protocols and/or parameters unless otherwise noted.
[77] "Toxic" or "toxicity" refers to the result of an agent causing adverse effects, usually by a xenobiotic agent administered at a sufficiently high dose level to cause the adverse effects.
[78] As used herein, the terms "kidney toxicity biomarker" and "kidney toxicity predictive gene" are used interchangeably and refer to a gene whose expression, measured at the RNA or protein level can predict the likelihood of a kidney toxicity response with accuracy significantly better than would occur by chance. In one embodiment, the kidney toxicity response is tubular necrosis. In other embodiments, the kidney toxicity response can be other toxicity manifestations that elicit similar detectable gene expression changes. These could include other forms of tubular injury, glomerular toxicity and papillary injury.
[79] A "toxicological response" refers to a cellular, tissue, organ or system level response to exposure to an agent. At the molecular level, this can include, but is not limited to, the differential expression of genes encompassing both the up- and down- regulation of expression of such genes at the RNA and/or protein level; the up- or down-regulation of expression of genes which encode proteins associated with response to and mitigation of damage, the repair or regulation of cell damage; or changes in gene expression due to changes in populations of cells in the tissue or organ affected in response to toxic damage.
[80] An "agent" or "compound" is any element to which an individual can be exposed and can include, without limitation, drugs, pharmaceutical compounds, household chemicals, industrial chemicals, environmental chemicals, other chemicals, and physical elements such as electromagnetic radiation.
[81] The term "biological sample" as used herein refers to substances obtained from an individual. The samples may comprise cells, tissue, parts of tissues, organs, parts of organs, or fluids (e.g., blood, urine or serum). Biological samples include, but are not limited to, those of eukaryotic, mammalian or human origin.
[82] "Sample" is defined for the purposes of prediction as a biological sample and the gene expression data for that sample. Each sample comes from an individual animal. A toxicity classification may also be associated with the sample.
[83] "Gene expression" as used herein refers to the relative levels of expression and/or pattern of expression of a gene. In some embodiments, the expression refers to a toxicity gene or toxic response gene. In other embodiments, the expression is of a toxicity predictive gene.
[84] "Gene expression profile" refers to the relative levels of expression of multiple different genes measured for the same sample. Gene expression profiles may be measured in a sample, such as samples comprising a variety of cell types, different tissues, different organs, or fluids (e.g., blood, urine, spinal fluid, sweat, saliva or serum) by various methods including but not limited to microarray technologies and quantitative and semi-quantitative RT-PCR (e.g., Taqman™) techniques, as well as techniques for measuring expression of proteins.
[85] "Individual" refers to a vertebrate, including, but not limited to, a human, non- human primate, mouse, hamster, guinea pig, rabbit, cattle sheep, pig, chicken, and dog. [86] As used herein, the terms "hybridize", "hybridizing", "hybridizes" and the like, used in the context of polynucleotides, are meant to refer to conventional hybridization conditions, such as hybridization in 50% formamide/6X SSC/0.1% SDS/100 μg/ml ssDNA, in which temperatures for hybridization are above 37 degrees Celsius and temperatures for washing in 0.1 X SSC/0.1% SDS are above 55 degrees Celsius, and preferably to stringent hybridization conditions. Nucleic acids will hybridize will depend upon factors such as their degree of complementarity as well as the stringency of the hybridization reaction conditions. Stringent conditions can be used to identify nucleic acid duplexes with a high degree of complementarity. Means for adjusting the stringency of a hybridization reaction are well-known to those of skill in the art. See, for example, Sambrook, et al., "Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor Laboratory Press, 1989; Ausubel, et al., "Current Protocols In Molecular Biology," John Wiley & Sons, 1996 and periodic updates; and Hames et al., "Nucleic Acid Hybridization: A Practical Approach," IRL Press, Ltd., 1985. In general, conditions that increase stringency (i.e., select for the formation of more closely-matched duplexes) include higher temperature, lower ionic strength and presence or absence of solvents; lower stringency is favored by lower temperature, higher ionic strength, and lower or higher concentrations of solvents.
[87] In the context of amino acid sequence comparisons, the term "identity" is used to express the percentage of amino acid residues at the same relative position which are the same. Also in this context, the term "homology" is used to express the percentage of amino acid residues at the same relative positions which are either identical or are similar, using the conserved amino acid criteria of BLAST analysis, as is generally understood in the art. Further details regarding amino acid substitutions, which are considered conservative under such criteria, are provided.
[88] III. Identification of Kidney Toxicity Biomarkers
A. Generation of Toxicology Gene Expression Biomarkers: The kidney toxicity biomarkers described herein were initially identified utilizing a database generated from large numbers of in vivo experiments, wherein the differential expression of approximately 700 rat genes, measured at various time points, in response to multiple toxic compounds inducing various specific toxic responses, as visualized through microscopic histopathological analysis, was quantified, as described in pending United States Patent Application filed January 29, 2002 (serial number not yet assigned). This quantitative gene expression data, as well as corresponding histopathological information, was then subjected to an analytical approach specifically designed to identify genes which not only correlated with the observed histopathology, but also demonstrated an ability to be used in a model capable of accurately predicting the occurrence of the toxic response associated with the observed histopathology. A complete description of this identification process is presented in the Examples. A flow diagram illustrating how the kidney toxicity biomarkers of the invention were identified is presented in Figure 1.
[89] In addition to the database described and utilized herein, other toxicology gene expression databases may be generated using techniques well known in the art, and used to identify additional kidney toxicity biomarkers, which may also be employed in the practice of the kidney toxicity prediction methods of the invention. Such databases may be generated with test compounds capable of inducing various pathologies indicative of a toxic response in the kidney and/or other organs or systems, over different time periods and under different administration and/or dosing conditions, including without limitation kidney tubule necrosis, glomerular necrosis, glomerular sclerosis and papillary injury. An example of compounds, dose levels, kidney toxicity classifications and histopathology scores used in the Examples which follow is provided in Table 1.
[90] Such databases may be generated using organisms other than the rat, including without limitation, animals of canine, murine, or non-human primate species. In addition, such databases may incorporate data derived from human clinical trials and post-approval human clinical experiences. Various methods for detecting and quantitating the expression of genes and/or proteins in response to toxic stimuli may be employed in the generation of such databases, as are generally known in the art. For example, microarrays comprising multiple cDNAs or oligonucleotide probes capable of hybridizing to corresponding transcripts of genes of interest may be used to generate gene expression profiles. Additionally, a number of other methods for detecting and quantitating the expression of gene transcripts are known in the art and may be employed, including without limitation, RT-PCR techniques such as TaqMan®, RNAse protection, branched chain, etc.
[91] Databases comprising quantitative gene expression information preferably include qualitative and quantitative and/or semi-quantitative information respecting the observed toxicological responses and other conventional toxicology endpoints, such as for example, body and organ weights, serum chemistry and histopathology observations, histopathology scores and/or similar parameters.
[92] B. Identification of Correlating Genes: For the purpose of identifying candidate predictive genes, the database preferably includes histopathology scores for each animal which has been exposed to one or more agent(s). These scores can be assigned based on actual histopathology observations for the tissue and animal or on the basis of effects observed for other animals treated with the same agent and dose level. The scores are numerical scores that reflect the occurrence and severity of histopathological changes. These scores can be adjusted to have similar range to gene expression changes. For example, a score of 1 could be assigned to samples with no changes and scores of 28 assigned to increasingly severe changes. Because the scores are numerical, they are suitable for use with a variety of statistical correlation and similarity measures.
[93] An example of a histopathology scoring system is provided in Example 1.
Referring to Figure 1 , histopathology scores may be utilized to identify genes which correlate with the observed toxicological response, using any number of statistical correlation and similarity analysis techniques, including without limitation those techniques described or employed in Example 1 (e.g., Pearson, Spearman, change, smooth, distance etc.). Such correlating genes may be used as predictive gene candidates. Examples of genes whose expression at 24 hours after treatment correlates with histopathology observed at 72h are detailed in Tables 3 and 4. In one embodiment, the correlating gene lists as well as the entire array gene list are used as input gene lists in the GeneSpring™ Predictive Model (otherwise known hereafter as "Predictive Model").
[94] (C) Class Prediction and Classification: Statistical analysis of the database of gene expression profiles can be effected by utilizing commercially available software programs. In one embodiment, GeneSpring™ (Version 4.1 , Silicon Genetics, Redwood City, CA) is used. Other software programs which can be used for statistical analysis include, without limitation, SAS software packages (SAS Institute Inc., Cary, NC) and S-PLUS® software (Insightful Corporation, Seattle, WA)
[95] Using GeneSpring™ software, class predictions can be made from the genes in the database, as detailed in Example 1 , using one or more training and test sets. In one embodiment, six training sets and six test sets are obtained, as shown in Example 1 (Table 4). Kidney toxicological classifications are entered for the samples in each training and test set. Toxicological classifications can be defined by various pathologies. In one embodiment, the toxicity is defined as kidney tubular necrosis observed 72 hours after treatment with an agent. However, toxicity can manifest in other nephropathologies such as glomerular necrosis or papillary injury.
[96] Once the training sets have been selected, then predicted classifications of the test set samples are obtained by using k-nearest neighbor (or knri) voting procedure. The class of each of the knn is determined and the test sample is assigned to the class with the largest representation after adjusting for the proportion of classifications in the training set. In one embodiment, adjustments are made to account for different proportions of classes in the training set.
[97] Toxicity can also be observed at various time points after exposure to an agent and is not limited to only 72 hour after treatment. A skilled toxicologist can determine the optimal time after exposure to an agent to observe pathology by either what has been disclosed in the art or a stepwise experimentation with time increments, for example 2, 4, 6, 12, 18, 24, 36, 48 hours post-exposure or even longer time increments, for example, days, weeks, or months after exposure to the agent.
[98] (D) Identification of Predictive Genes: Figure 1 describes the overall process used to identify kidney toxicity predictive genes. In one embodiment, this process was run independently for each time point.
[99] The number of genes that are to be used in the Predictive Model can be varied, for example 50, 40, 30, 20, 10, 5, 2, or 1 gene(s) can be used. In a preferred embodiment, at least 50 genes are used.
[100] An optimal gene list is generated that generates the best predictive accuracy with the lowest number of genes used. Figure 2 shows an exemplary profile for an optimal gene list.
[101] In one embodiment, optimum gene lists for all input gene lists are combined for each training and test set and then these combined lists for all six training and test sets are merged to create an aggregate list of predictive genes. The aggregate list can then be subdivided to smaller lists of genes based on the number of times that the genes occurred on the predictive gene lists for each individual training or test set. These are designated herein as Combo 6, 5, 4, 3, 2, or 1 lists. The genes that were predictive in all 6 training and test sets are designated as Combo 6 and the genes that were predictive in 5 of 6 training and test sets are designated as Combo 5 and so forth. Table 32 presents gene names, accession numbers and sequence information for the kidney toxicity predictive genes found by analysis of the database in the manner described above. Each of these genes has been demonstrated to contribute to predictive performance for at least one input gene list and training/test set and one time point. Table 33 lists the kidney toxicity predictive genes organized by time point and Combo Class. Table 34 lists homologous genes for the RCT sequences that were identified by BLAST search using the GenBank NR database as the target database. [102] The predictive genes can also be categorized by their occurrence as predictive at different time points. Table 35 lists 53 genes that are on the combined predictive lists of all three time points tested. This list is derived from the list of all the predictive genes measured at 6, 24 and 72 hours that predicted kidney tubular necrosis at 72 hours. Genes that are predictive at multiple time points can be further grouped by their Combo ranking. Table 36 lists 23 genes that are the most predictive across the three time points tested. This list is a subset of the list of 53 genes that are predictive across all three time points 6, 24 and 72 hours. The criteria for inclusion in this table were that the gene be a member of the highest combinations, viz., combinations 6, 5 or 4 in at least 2 out of three time points. The gene expression data of the genes in Table 36 could be expected to be very highly predictive of kidney tubular necrosis. Further, since the predictive strength of these genes is very high across the 3 time points tested, it could be expected that gene expression data derived from these genes even at time points not tested such as any time points falling between 6 and 72 hours or any other time point would be very highly predictive of tubular necrosis. These specific genes could be useful in cases where the dose route or pharmacokinetic properties of a compound may alter the kinetics of predictive gene expression changes.
[103] IV. Evaluation of Predictive Genes for Kidney Toxicity: The predictive genes are evaluated for predictive performance as shown in Figure 3. For each gene list prediction, a table of data was generated using the Predictive Model which included: the test set containing information about the actual call (i.e., "yes" or "no" for kidney toxicity), the predicted call (i.e., "yes" or "no" for kidney toxicity), and the P-value cutoff ratio. Expression data that can be used with the K-nearest neighbor model and predictive genes to enable one skilled in the art to make predictions are given in Tables 38-40.
[104] The combined list of predictive genes or alternatively, Combo 6, 5, 4, 3, 2, or
1 list or subsets thereof was used as input into the Predictive Model. As another verification of the predictive abilities of the genes found to be predictive for kidney toxicity, random lists of genes were generated and also used as input into the Predictive Model. Example 2 describes the evaluation of the predictive performance of the kidney toxicity predictive genes.
Predictive performance may also be assessed using data from different time points after exposure to the agent. In one embodiment, 24 hour expression data is used. In another embodiment, 6 hour expression data is used, as described in Examples 3 and 4. In another embodiment, 72 hour expression data is used, as described in Example 5 and 6. As shown in Table 41, predictive capability for 24 hour expression data has a high accuracy rate (i.e., 90% accuracy) when the entire predictive gene list is used.
Table 41 Predictive Performance of Predictive Genes Organized by Occurrence on Training/Test Set Lists (Combo number) and Time Point
Figure imgf000023_0001
** Means and ranges are given for 6 training and test sets. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment. Standard prediction measures were used as defined in Materials and Methods of Example 1. These include: Accuracy =Proportion of total number of predictions that are correct Geometric mean=Performance measure that takes into account proportion of positive and negative cases
[106] Somewhat lower predictive accuracies were observed for the 6h and 72 h data but the prediction was still quite significant. In general, selecting genes from Combo list 6 for use in prediction of kidney toxicity yields higher average accuracy than using genes from Combo list 5 which in turn yields higher average accuracy rates than Combo 4 and so forth for Combo lists 3, 2, and 1. All of the combo lists as well as Combo All list had significantly higher accuracy than using random classifications.
[107] Predictive performance may also be assessed using subsets of genes from the different Combo lists. As indicated in Examples 2, 4 and 6 randomly selected subsets of the Combo gene lists had very good predictive performance (accuracy better than 80% and approaching 90%) and even individual genes had mean predictive accuracies that were significant (for example, greater than 80%). Cumulative performance of subsets of 24 h data is presented in Figures 4-6. In one embodiment, using 3 genes from Combo list 6 yields about 90% accuracy. However, using different Combo lists may require more genes to reach the same accuracy level, e.g., 8 genes from Combo 5 list, 13 genes from Combo 4 list.
[108] V. Use of kidney toxicity predictive genes: The kidney toxicity predictive genes disclosed herein and kidney toxicity predictive genes identified by using methods disclosed herein are useful for predicting kidney toxicity in response to exposure to one or more agents.
[109] The discovery that relatively small sets of different genes have predictive value permits flexible application of these discoveries. The choice of how many and which genes to use can be tailored to a variety of different purposes. Very good predictivity is observed for sets of a few genes (for example as few as three genes of the 24 hour Combo 6 set have mean prediction accuracy of about 90%). These small sets may be particularly advantageous in applications where measurement of only a few RNA species has considerable advantages in terms of sample processing logistics, speed and cost. These applications would include relatively high throughput screens for predictive capability. An example of this would be an early screen using small samples of primary cells or cultured cell lines that can be processed with automated robotic equipment for treatment and isolation of RNA followed by efficient technologies for measuring expression of a few RNA species such as branched chain technology or RT-PCR. The use of larger numbers of predictive genes provides for redundancy and consequent greater accuracy and precision. Applications using larger numbers of predictive genes might be tests of candidates at later stages of commercial development. An example would be later stages of preclinical development of a therapeutic candidate where in vivo samples can be obtained and more comprehensive methods such as microarray measurement of gene expression are appropriate. The larger gene sets can also include different subsets of genes which may offer more insight into potential mechanisms of toxicity and the ability to have refined predictions of long term toxic consequences such as chronic, irreversible toxicity or carcinogenicity.
[110] Some members of the kidney toxicity predictive genes may also be suitable for prediction of toxicity in other organs or may be preferable for predicting toxicity for wider ranges of timepoints or treatment routes or regimens. As an example of the latter, some of the predictive genes are observed at three different timepoints after treatment. These genes may be useful for prediction in cases where the samples come from treatment protocols that have different measurement timepoints or routes of administration than those employed for the database or where the toxicokinetics for a particular agent are known or suspected to be different from those in the database.
[111] In one embodiment, the agent is an agent for which no expression profile has been assessed or stored in the database or library. An animal, e.g., rat, is dosed with such an agent and the gene expression profile(s) is the test set for the Predictive Model. The training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database. As described in Example 8, the prediction can be made with accuracy without requiring the use of histopathology scores for the test set as part of the input into the Predictive Model.
[112] In another embodiment the agent is an agent present in the database but is used at a different dose level or with a different treatment protocol than used in the database. The training set which is used in the Predictive Model in this case can be the entire database of sample array data because the test set data is not present in the database. As described in Example 8, the prediction can be made with accuracy without requiring the use of histopathology scores for the test set as part of the input into the Predictive Model.
[113] In another embodiment, the exposure time of the agent is not 6, 24, or 72 hours or repeat dosing protocols are used. In this case, the skilled artisan can use the toxicity predictive genes from surrounding time points to extrapolate the predicted toxicity without undue experimentation. For example, if the individual has been exposed to the agent for 12 hours, then predictive genes from 6 and 24 hours timepoints are used as guidelines for extrapolating possible predicted toxicity.
[114] In another embodiment, the kidney predictive genes and predictive model can be used to determine the presence or absence of a no-observable toxicity effect level (NOEL). An agent can be used at different treatment levels and expression profiles obtained for each treatment level. The predictive genes and predictive model can be used to determine which dose levels elicit a response that is predicted to be toxic and which dose levels are not toxic. In contrast to conventional endpoints for determining no-effect levels, the use of expression data, predictive genes and predictive models applies a number of quantitative endpoints and criteria instead of subjective endpoints and criteria. This permits more rigorous and precisely defined determination of no effect levels.
[115] In another embodiment, the kidney toxicity predictive genes can be used to detect toxic effects that may be manifested as long lasting or chronic consequences such as irreversible toxicity or carcinogenesis. The predictive genes and model can be applied to databases where classifications of training and test set samples are made with respect to actual or putative endpoints such as irreversible toxicity or carcinogenicity.
[116] In another embodiment, the predictive genes can be used in a variety of alternative models to predict kidney toxicity. Some of these models do not require the direct use of data in a database but use functions or coefficients derived from the database. In another embodiment, the predictive genes and models may be used to evaluate in vitro systems for their ability to reflect in vivo toxic events and to use such in vitro systems for predicting in vivo toxicity. Expression profiles for predictive genes can be created from candidate in vitro assays using treatments with agents of known in vivo toxicity and for which in vivo data on gene expression are available. The expression data and predictive models of this invention can be used to determine whether the in vitro assay system has predictive gene expression responses that accurately reflect the in vivo situation. Large sets of predictive genes as described in this invention can be tested in such models for their suitability and performance with the candidate in vitro systems. This is a superior and novel tool for evaluating and optimizing in vitro systems for their ability to reflect and accurately predict in vitro responses.
[117] In another embodiment, measurement of the expression levels of the proteins coded for by the predictive genes can be used in conjunction with predictive models to predict kidney toxicity. Among the full set of kidney toxicity predictive genes are various genes known to encode cell surface, secreted and/or shed proteins. This enables the development of methods for predicting toxicity using protein biomarkers. Example 11 presents a process by which candidate protein biomarker genes may be selected from biomarker genes identified from transcription expression. For example, as disclosed in Table 37, there are 23 genes in the master predictive set which are known to encode secreted proteins. As disclosed in Table 43, predictive protein marker candidates may also be selected by categorizing a number of other parameters related to the predictive performance and potential use as protein markers. In Example 11 , the utility of this concept has been demonstrated by testing for serum protein levels of one of the identified biomarkers, insulin-like growth factor binding protein 1. The serum protein levels of this biomarker parallel the kidney transcription levels and distinguish kidney toxic from non-toxic treatments. Thus, in another aspect of the present invention, kidney toxicity predictive assays which detect the expression of one or more of said predictive proteins may be developed. Such assays may have several advantages, such as:
(1)Ability to use archived tissue specimens such as preserved or embedded tissues that are not suitable for measurement of RNA expression
(2) Ability to examine predictive protein expression in tissue slides using in situ labeling and microscopic observation. This is useful for detecting toxicity predictive signals occurring in very small subpopulations of cells.
(3) Ability to detect protein markers in specimens that can be readily obtained with little or no invasiveness (e.g., blood, urine, sweat, saliva).
(4) Reduction in animal use in laboratory studies such that no sacrifice of animals necessary to obtain tissue specimens when toxicity prediction can be made with specimens that can be obtained without animal sacrifice or surgery.
(5) Application for human use where tissue specimens cannot be obtained or are only obtained with great difficulty.
[118] In another embodiment, the identified predictive genes can be considered as potential therapeutic targets when the genes are involved in toxic damage or repair responses whose expression or functional modification may attenuate, ameliorate or eliminate disease conditions or adverse symptoms of disease conditions.
[119] In another embodiment, the predictive genes can be organized into clusters of genes that exhibit similar patterns of expression by a variety of statistical procedures commonly used to identify such coordinately expression patterns. Common functional properties of these clustered genes can be used to provide insight into the functional relationship of the response of these genes to toxic effects. Common genetic properties of these genes (e.g., common regulatory sequences) may provide insight into functional aspects by revealing known or novel similarities in the coding region of the genes. The presence of common known or novel signal transduction systems that regulate expression of the genes can also lead to insight as to the functional properties of the genes. The presence of common known or novel regulatory sequences in the identified predictive genes can also be used to identify toxicity predictive genes that are not present in the current Rat CT array. This can be accomplished by someone skilled in the art who can analyze sequence databases for common regulatory sequences.
[120] In yet another embodiment, the kidney toxicity predictive genes can be used to predict toxicity responses in other species, for example, human, non-human primate, mouse, hamster, guinea pig, rabbit, cattle, sheep, pig, chicken, and dog. Some members of the kidney toxicity predictive genes may also be more suitable for prediction of toxicity in species other than the species used to derive the database (rat in the case of the examples provided). One method for identification of such genes is that would be available to someone skilled in the art would be to examine DNA sequence databases to determine whether orthologous sequences to the predictive genes exist in the target species and how close the orthologous sequences are to the predictive gene sequences. One of skill in the art can examine the orthologous sequences for similarity in amino acid coding regions and motifs as well as for similarities in regulatory regions and motifs of the gene.
[121] In another embodiment, kidney toxicity predictive genes or gene sequences are used for screening other potential toxicity predictive genes or gene sequences in other species or even within the same species using methods known in the art. See, for example, Sambrook supra. Gene sequences which hybridize under stringent conditions to the kidney toxicity predictive gene sequences disclosed herein are selected as potential toxicity predictive genes. Gene sequences which hybridize to the kidney toxicity predictivity gene of this invention can show homology to the kidney toxicity predictivity genes, preferably at least about 50%, 60%, 70%, 80%, or 90% identical to the kidney toxicity predictivity genes disclosed herein. It is understood that conservative substitutions of amino acids are possible for gene sequences which have some percentage homology with the kidney toxicity predictive gene sequences of this invention. A conservative substitution in a protein is a substitution of one amino acid with an amino acid with similar size and charge. Groups of amino acids known normally to be equivalent are: (a) Ala, Ser, Thr, Pro, and Gly; (b) Asn, Asp, Glu, and Gin; (c) His, Arg, and Lys; (d) Met, Glu, lie, and Val; and (e) Phe, Tyr, and Trp.
[122] It is also understood that the toxicity predictive genes can be used as guides to predicting toxicity for agents that have been administered via different routes (, intravenous, oral, dermal, inhalation, I, etc.) from the routes that were used to generate the database or to identify the toxicity predictive genes. Furthermore, the invention is not intended to be limiting to agents that have been administered at different dosages than the agents that were used to generate the database or to identify the toxicity predictive genes.
[123] Data described in the examples were generated using the microarray technology disclosed in the Examples. However, the invention is not dependent on using this particular platform. Other similar gene expression analysis technologies may be incorporated in the practice of this invention. These can include, but are not limited to, other arrays containing the predictive genes, RT-PCR (e.g., TaqMan®), branched chain technology, RNAs protection or any other method which quantitatively detects the expression of RNA polynucleotides. The invention can be practiced using these other technologies by generating a database of expression measurements for the predictive genes using samples such as those used in the database described in Example 1. This database can then be used in a model such as the K-nearest neighbor model or can be used to develop any of a number of other models.
[124] The following Examples are provided to illustrate but not to limit the invention in any manner. EXAMPLES
[125] Example 1 Discovery of Kidney Toxicity Predictive Genes from 24 Hour
Expression Data. Materials and Methods:(A) Database of Compounds and Kidney Toxicity: Compounds and treatments list used to construct the kidney database are given in Table 1. This table also provides the evaluation of the kidney toxicity observed as kidney tubular necrosis in samples collected 72 hours after treatment.
[126] (B) Database of Animal Experiments: Sprague Dawley rats Crl:CD from
Charles River, Raleigh, NC were divided into treated rats that receive a specific concentration of the compound (see Table 1 ) and the control rats that only received the vehicle in which the compound is mixed (e.g., saline).
[127] At specified timepoints (6h, 24h and 72h) after administration (intraperitoneal route) of the compound, a set number of rats (usually 3 control and 3 treated) were euthanized and tissues collected. Each rat was heavily sedated with an overdose of CO2 by inhalation and a maximum amount of blood drawn. Exsanguination of the rat by this drawing of blood kills the rat. The method of collecting the tissues is very important and ensures preserving the quality of the mRNA in the tissues. The body of the rat was then opened up and prosectors rapidly removed the tissues (including kidney) and immediately placed them into liquid nitrogen. All of the organs/tissues were completely frozen within 3 minutes of the death of the animal to ensure that mRNA did not degrade. The organs/tissues were then packaged into well-labeled plastic freezer quality bags and stored at -80 degrees until needed for isolation of the mRNA from a portion of the organ/tissue sample.
[128] (C) Isolating DNA/RNA from animal tissues or cells: Total RNA was isolated from kidney tissue samples using the following materials: Qiagen RNeasy midi kits, 2-mercaptoethanol, liquid N2, tissue homogenizer, dry ice Samples were kept on ice when specified. [129] If a tissue needed to be broken, then the tissue sample was placed on a double layer of aluminum foil which was then placed within a weigh boat containing a small amount of liquid nitrogen. The aluminum foil was folded around the tissue and then struck by a small foil-wrapped hammer to administer mechanical stress forces.
[130] About 0.15-0.20 g of kidney tissue was weighed out and placed in a sterile container. To preserve integrity of the RNA, all tissues were kept on dry ice when other samples were being weighed. A RLT (Qiagen®) buffer buffer was added to the sample to aid in the homogenization process. The tissue was homogenized using commercially available homogenizer ( IKA Ultra Turrax T25 homogenizer) with the 7 mm microfine sawtooth shaft and generator (195 mm long with a processing range of 0.25 ml to 20 ml, item # 372718). After homogenization, samples were stored on ice until all samples were homogenized. The homogenized tissue sample was spun to remove nuclei thus reducing DNA contamination. The supernatant of the lysate was then transferred to a clean container containing an equal volume of 70% EtOH in DEPC treated H2O and mixed. RNA was isolated by putting the supernatant through an RNeasy spin column, washed, and subsequently eluted. Small quantities of remaining DNA were removed by use of DNase enzyme during the RNA isolation procedure following the instructions provided by Qiagen and alternatively by lithium chloride (LiCI) precipitation following the RNA isolation. The isolated RNA pellet was stored in Rnase-free water or in an RNA storage buffer (10 mM sodium citrate), Ambion Cat #7000. The RNA amount was then quantitated using a spectrophotometer.
[131 ] (D) Rat 700 CT chip: Gene expression data was generated from a microarray chip that has a set of toxicologically relevant rat genes which are used to predict toxicological responses. The rat 700 CT gene array is disclosed in U.S. applications 60/264,933; 60/308,161 ; and pending application filed on January 29, 2002 that claims priority to 60/264,933 and 60/308,161 [Attorney docket 40074-2000600].
[132] (E) Microarray RT reaction: Fluorescence-labeled first strand cDNA probe was made from the total RNA or mRNA isolated from kidneys of control and treated rats. This probe was hybridized to microarray slides spotted with DNA specific for toxicologically relevant genes. The materials needed are: total or messenger RNA, primer, Superscript II buffer, dithiothreitol (DTT), nucleotide mix, Cy3 or Cy5 dye, Superscript II (RT), ammonium acetate, 70% EtOH, PCR machine, and ice.
[133] The volume of each sample that would contain 20 yg of total RNA (or 2μg of mRNA) was calculated. The amount of DEPC water needed to bring the total volume of each RNA sample to 14μl was also calculated. If RNA was too dilute, the samples were concentrated to a volume of less than 14 μ\ in a speedvac without heat. The speedvac must be capable of generating a vacuum of 0 Milli-Torr so that samples can freeze dry under these conditions. Sufficient volume of DEPC water was added to bring the total volume of each RNA sample to 14μl. Each PCR tube was labeled with the name of the sample or control reaction. The appropriate volume of DEPC water and 8 μ\ of anchored oligo dT mix (stored at -20°C) was added to each tube.
[134] Then the appropriate volume of each RNA sample was added to the labeled
PCR tube. The samples were mixed by pipeting. The tubes were kept on ice until all samples are ready for the next step. It is preferable for the tubes to kept on ice until the next step is ready to proceed. The samples were incubated in a PCR machine for 10 minutes at 70°C followed by 4°C incubation period until the sample tubes were ready to be retrieved. The sample tubes were left at 4°C for at least 2 minutes.
[135] The Cy dyes are light sensitive, so any solutions or samples containing Cy- dyes should be kept out of light as much as possible (e.g., cover with foil) after this point in the process. Sufficient amounts of Cy3 and Cy5 reverse transcription mix were prepared for one to two more reactions than would actually be run by scaling up the following:
[136] For labeling with Cy3
8 ul 5x First Strand Buffer for Superscript II
4 ul 0.1 M DTT
2 ul Nucleotide Mix
2 ul of 1:8 dilution of Cy3 (e.g.,, 0.125mM cy3dCTP). 2 ul Superscript II
[137] For labeling with Cy5
8 ul 5x First Strand Buffer for Superscript II
4 ul 0.1 M DTT
2 ul Nucleotide Mix
2 ul of 1 :10 dilution of Cy5 (e.g.,, 0.1 mM CyδdCTP).
2 ul Superscript II
[138] About 18 μl of the pink Cy3 mix was added to each treated sample and 18 μ\ of the blue Cy5 mix was added to each control sample. Each sample was mixed by pipeting. The samples were placed in a DNA engine (PTC-200 Petier Thermal Cycler, MJ Research) for 2 hours at 45°C followed by 4°C until the sample tubes were ready to be retrieved.
[139] In addition to the desired cDNA product, the completed RT reaction contained impurities that must be removed. These impurities included excess primers, nucleotides, and dyes. The primary method of removing the impurities was by following the instructions in the QIAquick PCR purification kit (Qiagen cat#120016).
[140] Alternatively, the completed RT reactions were cleaned of impurities by ethanol precipitation and resin bead binding. The samples from DNA engine were transferred to Eppendorf tubes containing 600 μl of ethanol precipitation mixture and placed in -80°C freezer for at least 20-30 minutes. These samples were centrifuged for 15 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C) and carefully the supernatant was decanted. A visible pellet was seen (pink/red for Cy3, blue for Cy5). Ice cold 70% EtOH (about 1 ml per tube) was used to wash the tubes and the tubes were subsequently inverted to clean tube and pellet. The tubes were centrifuged for 10 minutes at 20800 x g (14000 rpm in Eppendorf model 5417C), then the supernatant was carefully decanted. The tubes were air dried for about 5 to 10 minutes, protected from light. When the pellets were dried, they were resuspended in 80 ul nanopure water. The cDNA/mRNA hybrid was denatured by heating for 5 minutes at 95°C in a heat block and flash spun. Then the lid of a "Millipore MAHV N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was attached. About 160 μl of Wizard DNA Binding Resin (Promega cat#A1151 ) was added to each well of the filter plate that was used. Probes were added to the appropriate wells (80 μl cDNA samples) containing the Binding Resin. The reaction is mixed by pipeting up and down -10 times. The plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) and then the filtrate was decanted. About 200 μl of 80% isopropanol was added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 80% isopropanol wash and spin step was repeated. The filter plate was placed on a clean collection plate (v-bottom 96 well) and 80 μl of Nanopure water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was secured to the collection plate and after 5 minutes was centrifuged for 7 minutes at 2500 rpm.
[141] (F) Purification of Cy -Dye Labeled cDNA: To purify fluorescence-labeled first strand cDNA probes, the following materials were used: Millipore MAHV N45 96 well plate, v-bottom 96 well plate (Costar), Wizard DNA binding Resin, wide orifice pipette tips for 200 to 300 μ\ volumes, isopropanol, nanopure water. It is highly preferable to keep the plates aligned at all times during centrifugation. Misaligned plates lead to sample cross contamination and/or sample loss. It is also important that plate carriers are seated properly in the centrifuge rotor.
[142] The lid of a "Millipore MAHV N45" 96 well plate was labeled with the appropriate sample numbers. A blue gasket and waste plate (v-bottom 96 well) was attached. Wizard DNA Binding Resin (Promega cat#A1151) was shaken immediately prior to use for thorough resuspension. About 160 μl of Wizard DNA Binding Resin was added to each well of the filter plate that was used. If this was done with a multi-channel pipette, wide orifice pipette tips would have been used to prevent clogging. It is highly preferable not to touch or puncture the membrane of the filter plate with a pipette tip. Probes were added to the appropriate wells (80 μl cDNA samples) containing the Binding Resin. The reaction is mixed by pipeting up and down -10 times. It is preferable to use regular, unfiltered pipette tips for this step. The plates were centrifuged at 2500 rpm for 5 minutes (Beckman GS-6 or equivalent) and then the filtrate was decanted. About 200 μl of 80% isopropanol was added, the plates were spun for 5 minutes at 2500 rpm, and the filtrate was discarded. Then the 80% isopropanol wash and spin step was repeated. The filter plate was placed on a clean collection plate (v-bottom 96 well) and 80 μl of Nanopure water, pH 8.0-8.5 was added. The pH was adjusted with NaOH. The filter plate was secured to the collection plate with tape to ensure that the plate did not slide during the final spin. The plate sat for 5 minutes and was centrifuged for 7 minutes at 2500 rpm. Replicates of samples should be pooled.
[143] (G) Dry-down Process: Concentration of the cDNA probes is preferable so that they can be resuspended in hybridization buffer at the appropriate volume. The volume of the control cDNA (Cy-5) was measured and divided by the number of samples to determine the appropriate amount to add to each test cDNA (Cy-3). Eppendorf tubes were labeled for each test sample and the appropriate amount of control cDNA was allocated into each tube. The test samples (Cy-3) were added to the appropriate tubes. These tubes were placed in a speed-vac to dry down, with foil covering any windows on the speed vac. At this point, heat (45°C) may be used to expedite the drying process. Samples may be saved in dried form at -20°C for up to 14 days.
[144] (H) Microarray Hybridization: To hybridize labeled cDNA probes to single stranded, covalently bound DNA target genes on glass slide microarrays, the following material were used: formamide, SSC, SDS, 2μm syringe filter, salmon sperm DNA (Sigma, cat # D-7656), human Cot-1 DNA (Life Technologies, cat # 15279-011 ), poly A (40 mer: Life Technologies, custom synthesized), yeast tRNA (Life Technologies, cat # 15401-04), hybridization chambers, incubator, coverslips, parafilm, heat blocks. It is preferable that the array is completely covered to ensure proper hybridization.
[145] About 30 μl of hybridization buffer was prepared per cDNA sample (control rat cDNA plus treated rat cDNA). Slightly more than is what is needed should be made since about 100 μl of the total volume made for all hybridizations can be lost during filtration.
Hybridization Buffer: for 100 μl:
• 50% Formamide 50 μl formamide
• 5X SSC 25 μl 20X SSC
• 0.1 % SDS 25 μl 0.4% SDS
[146] The solution was filtered through 0.2 μm syringe filter, then the volume was measured. About 1 μl of salmon sperm DNA (10mg/ml) was added per 100μl of buffer.
[147] Alternatively, the hybridization buffer was made up as:
Hybridization Buffer: for 101 μl:
• 50% Formamide 50 μl formamide
• 10X SSC 50 μl 20X SSC
• 0.2% SDS 1 μl 20% SDS
[148] The solution was filtered through 0.2 μm syringe filter, then the volume was measured. One microliter of salmon sperm DNA (9.7mg/ml), 0.5 μl Human Cot-1 DNA (5 μg/μl), 0.5 μl poly A (5 μg/μl), 0.25 μl Yeast tRNA (10 μg/μl) was added per 100μl of buffer. The hybridization buffers were compared in validation studies and there was no change in differential gene expression data between the two buffers.
[149] Materials used for hybridization were: 2 Eppendorf tube racks, hybridization chambers (2 arrays per chamber), slides, coverslips, and parafilm. About 30 μl of nanopure water was added to each hybridization chamber. Slides and coverslips were cleaned using N2 stream. About 30 μl of hybridization buffer was added to dried probe and vortexed gently for 5 seconds. The probe remained in the dark for 10-15 minutes at room temperature and then was gently vortexed for several seconds and then was flash spun in the microfuge. The probes were boiled or placed in a 95 °C heat block for 5 minutes and centrifuged for 3 min at 20800 x g (14000 rpm, Eppendorf model 5417C). Probes were placed in 70 °C heat block. Each probe remained in this heat block until it was ready for hybridization. [150] About 25 μl was pipeted onto a coverslip. It is highly preferable to avoid the material at the bottom of the tube and to avoid generating air bubbles. This may mean leaving about 1 μl remaining in the pipette tip. The slide was gently lowered, face side down, onto the sample so that the coverslip covered that portion of the slide containing the array. Slides were placed in a hybridization chamber (2 per chamber). The lid of the chamber was wrapped with parafilm and the slides were placed in a 42°C humidity chamber in a 42°C incubator. It is preferable to not let probes or slides sit at room temperature for long periods. The slides were incubated for 18-24 hours.
[151] (I) Post-Hybridization Washing: To obtain only single stranded cDNA probes tightly bound to the sense strand of target cDNA on the array, all non-specifically bound cDNA probe should be removed from the array. Removal of all non- specifically bound cDNA probe was accomplished by washing the array and using the following materials: slide holder, glass washing dish, SSC, SDS, and nanopure water. Six glass buffer chambers and glass slide holders were set up with 2X SSC buffer heated to 30-34°C and used to fill up glass dish to 3/4th of volume or enough to submerge the microarrays. The slides were placed in 2X SSC buffer for 2 to 4 minutes while the cover slips fall off. The slides were then moved to 2X SSC, 0.1% SDS and soaked for 5 minutes. The slides were transferred into 0.1X SSC and 0.1% SDS for 5 minutes. Then the slides are transferred to 0.1 X SSC for 5 minutes. The slides, still in the slide carrier, were transferred into nanopure water (18 megaohms) for 1 second. To dry the slides, the stainless steel slide carriers were placed on micro-carrier plates and spun in a centrifuge (Beckman GS-6 or equivalent) for 5 minutes at 1000 rpm.
[152] (J) Scanning slides: The washed and dried hybridized slides were scanned on Axon Instruments Inc. GenePix 4000A MicroArray Scanner and the fluorescent readings from this scanner converted into quantitation files (.gpr) on a computer using GenePix software.
[153] II. Array Data, Normalization and Transformation: GeneSpring™ software (Version 4.1 , Silicon Genetics) was used for statistical analyses including identification of genes expressions correlating with histopathology scores, K-means and tree cluster analysis, and predictive modeling using the K-means nearest neighbor (Predict Parameter Values tool).
[154] Microarray data were loaded into GeneSpring™ software for analysis as
GenePix files as above. Initially, set A training set compounds (see Table 4) data from one microarray was used per animal. Next, set A test set compounds (see Table 4) replicate arrays for each animal were combined into one GenePix file. Specific data loaded into GeneSpring™ software included gene name, GenBank ID control channel mean fluorescence and signal channel mean fluorescence. Expression ratio data (ratio of signal to control fluorescence) were normalized using the 50th percentile of the distribution of all genes and control channel. Ratio data were excluded from analysis if the control channel value was <0. For analysis of correlations and predictive values gene expression ratios were transformed as the log of the ratio.
[155] Correlation with Histopathology Scores: Histopathology scores for each animal (assigned on a compound-dose basis as indicated in Table 1) were entered with gene expression data by using the GeneSpring™ 'Drawn Gene' function.
Correlations between the histopathology scores and gene expression were conducted with the distance measures listed below: standard positive and negative correlation smooth positive and negative correlation change positive correlation upregulated positive correlation
Pearson positive and negative correlation Spearman positive and negative correlation distance positive correlation
[156] These correlation or similarity measures are standard statistical correlation measures that are described in the GeneSpring Advanced Analysis Techniques Manual (Release Data March 13, 2001 , Silicon Genetics). Where both positive and negative correlations were obtained combined positive and negative correlating gene lists were also created. [157] IV. Class Prediction: The Predict Parameter Values tool in GeneSpring™ software was used for kidney toxicity class prediction. The following is a summary of the procedure used in the GeneSpring predictive software. This is described in GeneSpring Advanced Analysis Techniques Manual (Release Data March 13, 2001 , Silicon Genetics) with additional information supplied by Silicon Genetics and a statistical expert. The prediction tool relies on standard statistical procedures that can be implemented in a variety of statistical software packages.
[158] (IV)(A) Gene Selection: The first step is variable selection of genes to be used for prediction. This entails taking a single gene and a single class (e.g., kidney toxicity) and creating a contingency table. In the table below, columns 1 through N of the table each represent one possible cutoff point based on the gene expression level (ratio of signal/control) for that class. The number of possible cutoffs is less than or equal to the total number of samples for the class (e.g., A). It is possibly less than the total number, since there may be ties in gene expression level. Hence, N, M, and X may or may not be distinct. In the example, an n-class problem is illustrated, where and /entries are the class counts at that gene expression cutoff level, for that specific gene and class, either above ("a") or below ("b") the cutoff. "Classl" is the set of all samples (above or below) the cutoff for Classl , and "!Class1" are all those not in Classl (above or below) the cutoff, and similarly for the other classes. The class totals in the training set are the total class marginals used to compute Fisher's exact test.
[159] For a specific gene, and for each class, the best p-value as calculated by
Fisher's Exact Test for independence between one of the pair of columns (e.g., 1a and 1 b) and the actual class totals (e.g., A) is used to score the gene (-ln(p) = the score) for that class. Thus, there are N (or, M, Q etc.) contingency tables, where the best score of the N tables is used for that class and gene. If there is a wide disparity between the above and below counts in either the a or b column (this is a two-sided Fisher's Exact Test), the smaller the p-value and the higher the score.
[160] The genes per class are rank ordered by the most discriminating (highest) score. The predictivity list is composed of the most discriminating genes per class. Namely, genes are combined that best discriminate class 1 with those that best discriminate class 2 and so on. The genes are selected in rotation of the highest score per class. Duplicate genes are ignored in the rotation and not added to the list, the gene with the next highest score is taken.
[161 ] The training samples now have only the gene list garnered from the above procedure. As an example, where once the training samples may have had an initial list of 200 genes per sample, they now have only a subset composed of the gene list, for example, 50 (the number of predictivity genes specified) that are selected from the initial list by the gene selections procedure. Thus, each sample is a vector of 50 normalized expression ratios. Since the selection of genes is done in rotation, the list contains 25 genes for one class, and 25 for the other class. The matrix below illustrates the basic features of this gene selection process.
Figure imgf000041_0001
[162] After the genes to be used in the training set have been selected, the test set is classified based on the / -nearest neighbor (knn) voting procedure. Using just those genes in the gene list, for each sample in the test set of samples, the k nearest neighbors in the training set are found with the Euclidean distance. The class in which each of the k nearest neighbors is determined, and the test set sample is assigned to the class with the largest representation in the k nearest neighbors after adjusting for the proportion of classes in the training set.
[163] For example, in a two-class problem, let there be 30 samples of class 1 and
60 samples of class 2 in the training set. With k = 9 say it can be determined that 7 of the nearest neighbors to a sample from the testing set are in class 1. The sample can then be classified as being a member of class 1. If another sample from the test set has a total of 4 nearest neighbors in class 1 , after adjusting for the proportion, this sample would be assigned to class 1 rather than class 2, even though the majority vote suggests assignation to class 2.
[164] VI. Decision Threshold: The decision threshold is a mechanism to help clearly define the class into which the sample will fall, and can be set to reject classification if the voting is very close or tied. (Thus, k can be even for two-class problems without worrying about the tie problem.) A p-value is calculated for the proportion of neighbors in each class against the proportions found in the training set, again using Fisher's exact test, but now a one-sided test.
[165] For example, let k = 11 , if the proportion of neighbors of class 1 in the test set is 6/11 , and the proportion of class 1 in a 100 sample training set is 0.4, the p- value calculated is 0.29 (half the two-sided test). If the proportion in the training set is 0.1 , the p-value is 0.004. The smaller the p-value the greater the likelihood that the sample from the testing set belongs to that class.
[166] A p-value ratio (P-value) is set as a way of setting the level of confidence in individual sample predictions based on the ratio of p-values for the best class (lowest p-value) versus the second best class (second lowest p-value). For example, if the P-value is set at 0.5 and the ratio of p-values for a particular sample is 0.6, then the predictive model will not make a call for that sample.
[167] VII. Training and Test Data Sets: Data were each separated into 6 training and test sets. The first training and test set was created by allocating one set of data as a training set (Set A training set) and another set of data as a test set (Set A test set). Other training and test sets were created by randomly distributing the compounds into the sets. This was accomplished by assigning random numbers to lists of compounds that are negative and positive for histopathology, sorting by random number, and then dividing the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 4.
[168] VIII. Kidney Toxicology Classification: Kidney toxicity classifications were entered for training and test set as a parameter column. Toxicity, as defined by observation of kidney tubular necrosis in the kidney at 72 hours after treatment, was entered as a "yes" or "no" for each animal in a compound-dose group. Additionally, a parameter column for random histopathology classification was designated. This was done by randomly assigning the same number of "yes" and "no" calls to the individual animals.
[169] IX. Prediction Output and Initial Data Processing: The "Predict Parameter
Value" tool of GeneSpring was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets. Unless otherwise specified a nearest neighbor setting of 10 (default) and P-value ratio cutoff of 0.5 was used. The number of genes used to predict was varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. For each number of genes the numbers of correct calls, incorrect calls and non-calls were recorded. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff. Calculations were made for overall percent correct calls (number of correct classifications/number or samples), percent correct calls of called samples (number of correct classifications/number of samples with calls) and percent of called samples (samples with calls/number of samples).
[170] For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
[171] X. Results: Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores. Table 1 presents a list of the compounds and dose levels along with the kidney histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods. Example sets of correlating genes are provided in Tables 2 and 3.
[172] The correlating gene lists as well as the entire array gene list were provided as input lists to the GeneSpring Predict Parameter value tool (described in Materials and Methods) that employs a K-means nearest neighbor (knn) predictive model. These lists as well as the entire array gene list were used for each of the six training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets. Input genes for the Predict Parameter Value feature included all 700 genes in the GenePix file (the rat CT Array) which was disclosed in a currently pending application (serial number [Attorney docket no. 40074-2000600]) filed on January 29, 2002, as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. The number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. The specified number of predictive genes was varied to obtain an optimum number of predictive genes. Figure 2 presents a typical profile for obtaining an optimum gene list.
[173] After this was done for all 6 training and test sets, all gene lists were then merged to create one aggregate list of predictive genes. Each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set. The aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 6 training and test sets were used, genes that were predictive in all 6 training and test sets were designated as Combo (combination) 6. Genes that were predictive in only 5 of 6 training and test sets were designated as Combo 5, etc. A list of predictive genes organized by their occurrence in the separate training and test sets is presented in Table 5. [174] Example 2 Predictive Properties and Evaluation of Predictive Genes from 24
Hour Expression Data
(A) Materials and Methods: The database used was as described in Example 1.
[175] (B) Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 39 presents 24 hour gene expression data for the predictive genes. These data can be used with a k-means nearest neighbor prediction model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example.
[176] (C) The Predict Parameter Values tool in GeneSpring™ software was used for kidney toxicity class prediction. A description of this tool and the statistical procedures used is provided in Example 1.
[177] (D) The training and test data sets used are those described in Table 4.
[178] (E) Kidney toxicology classifications used are described in Table 1. In this analysis randomized classifications (same number of "yes" and "no" classifications distributed randomly among the samples) were used.
[179] (F) Prediction Output and Initial Data Processing: For each gene list prediction used for evaluation a table of data generated by the Predict Parameter Values tool in GeneSpring™ software was saved which provided for each sample in the test set the actual call ("yes" or "no" for kidney toxicity), the predicted call ("yes", "no" or no call for kidney toxicity) and the P-value cutoff ratio. This set of data was used to calculate predictive performance measures provided below.
[180] (G) Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Venables and Ripley, Modern Applied Statistics with S- Plus, 3rd edition, Springer, 1994 and Kubat and Matwin, Proc. 14th International Conference on Machine Learning, 1997). Results from predictions of a two class case can be described as a two-class matrix:
Figure imgf000046_0001
[181] Standard terms used for prediction are: Accuracy, which is the proportion of total number of predictions that are correct is calculated as: (a+d)/(a+b+c+d).
[182] False positive rate is the proportion of negative cases that are incorrectly classified as positive is calculated as: b/a+b.
[183] False negative rate is the proportion of positive cases that are incorrectly classified as negative is calculated as: c/c+d.
[184] Geometric-mean is the performance measure that takes into account proportion of positive and negative cases (Kubat et al., ibid) is calculated as: the square root of TP*TN, where TP=true positive rate (d/c+d) and TN=true negative rate (a/a+b). In those cases where no prediction was made because the p-value ratio exceeded the cutoff-value (generally 0.5), the non-call was considered to be incorrect.
[185] (H) Subsets of randomly selected genes were prepared from the predictive gene sets to test whether such subsets would have predictive value. Assignments of genes to these subsets are presented in Tables 6-10.
[186] (I) Prediction results for 24 hour expression data using genes identified as predictive are presented in Table 11. These data indicate a very high accuracy in predicting kidney toxicity. Mean accuracy exceeded 0.9 (90% accuracy) for the entire predictive gene list (Combo All) and the Combo 6 gene subset and 0.8 (80% accuracy) for the Combo 5 and 4 subsets. As expected, the predictive performance of the gene sets increased from the lowest occurrence gene list (Combo 1) to the highest occurrence gene list (Combo 6).
[187] Because these predictions were conducted with multiple training/test set combinations, it is possible to obtain an indication of the variability in prediction rates and robustness of the prediction capabilities of these gene sets. For the Combo All and Combo 6, 5 and 4 gene sets there was very good predictivity for all training/test sets of data with over 0.8 accuracy as a minimum value for any one training and test set. False positive prediction rates were generally low with means less than 0.1 for Combo All and Combo 6, 5 and 4. Because the proportion of negative classifications was much higher than the proportion of positive (toxic) classifications in these sample sets the false negative rates would be expected to be higher than the false positive rates and this was observed to be the case. Although the false negative rates were higher than the false positive rates, there was still very good prediction of positive responses with mean false negative rates of about 0.3 for Combo All, Combo 6, Combo 5 and Combo 4 gene sets. The geometric mean was used as an indication of predictive performance that includes consideration of the proportion of positive and negative classifications. All gene sets gave geometric mean measures >0.5 and three gene sets (Combo All, Combo 6 and Combo 5) had mean measures >0.8.
[188] In these analyses, in cases where no prediction was made because the p- value ratio exceeded the cutoff-value (generally 0.5), the non-call was considered to be incorrect.
[189] Prediction results for 24 hour expression data using genes identified as predictive and the predicting unit is compound-dose are presented in Table 12. This prediction unit is probably the most relevant for toxicology prediction. The performance of the genes in predicting compound-dose toxicity is even better than predictions on an individual animal basis. These data indicate a very high accuracy in predicting kidney toxicity. Mean accuracy exceeded 0.9 (90% accuracy) for the entire predictive gene list (Combo All) and Combo 6, 5, 4 and 3 gene lists. As expected, the predictive performance of the gene sets increased from the lowest occurrence gene list (Combo 1) to the highest occurrence gene list (Combo 6). Accuracy was better than 0.8 (80%) for the Combo 2 and Combo 1 lists. Variability in accuracy was low for most of the gene lists with >0.8 minimum accuracy for any single training and test set observed for the Combo All and Combo 6, 5, 4 and 3 gene lists. Particularly noteworthy on the compound-dose level prediction is the low false-negative rate observed for Combo All, Combo 6 and Combo 5 gene lists. The mean false negative rate was about 0.2 or less for these gene lists. As observed on an individual animal basis the false-positive rate was very low for all gene sets with mean rates of <0.12 for all gene sets.
[190] One noteworthy feature of the predictive ability is the ability to distinguish between effects of a compound at different dose levels. Two compounds, gancyclovir and cyclophosphamide, produced kidney toxicity at the high dose but not at the low dose. The predictive gene sets, particularly the Combo All, Combo 6 and Comboδ sets, accurately predicted toxicity at the high dose level, but not at the low dose level.
[191] Prediction results for 24 hour expression data using genes identified as predictive and the predicting unit is compound are presented in Table 13. In terms of predicting toxicity of compounds the predictive capability was excellent with no compounds missed using the Combo 6 and Combo 5 gene sets and very low false positive rates for all of the gene sets.
[192] Cumulative performance for the Combo gene lists was examined by adding genes one at a time in an order based on predictive weight as calculated by GeneSpring software. This order (and predictive weight) were different for each training set so a mean weight was used to obtain a single gene order for the predictive sets tested. The gene order is presented in Table 14.
[193] Cumulative predictive performance for the Combo 6, Combo 5 and Combo 4 predictive gene sets are presented in Figures 4-6.
[194] The cumulative performance data clearly indicate that very good predictive performance can be achieved with small subsets of the Combo gene sets. For Combo 6, the accuracy reached a plateau level of about 90% at 3 genes. For Combo 5, a similar plateau level was reached with about 8 genes and for Combo 4 the plateau level was reached with about 13 genes. This illustrates the increased predictive power of small sets of genes rather than single genes. The increased number of genes required to reach a predictive performance plateau of the different Combo sets is consistent with the hierarchy of performance prediction in the Combo sets.
[195] Tables 15 and 16 show the level of predictive accuracy of individual genes of
Combo 6 and Combo 5 (The top combo subsets with the highest levels, 92.1% and 89.6%, respectively, of predictive accuracy on an individual sample basis) for 24 hour kidney data.
[196] These tables show that overall, individual genes of both combo groups did not perform as well as the whole combination, as the average predictive accuracy of individual genes of Combo 6 was 67.7% and for Combo 5 was 62.7%. The table also shows that while some of the individual genes of both Combos gave a moderate to good level of predictive accuracy (as high as 79.7% for Combo 6 and as high as 75.6% for Combo 5), the predictive accuracy of individual genes never exceeded the predictive accuracy of the whole combination. The data further support the cumulative gene predictivity conclusion that small subsets of genes have superior predictive power compared to individual genes.
[197] In order to assess the performance of subsets of genes, predictive performance was evaluated for subsets of genes randomly selected from the total combined predictive list (Combo All) and the top Combo sets (as defined in Materials and Methods). Prediction results for 24 hour expression data using randomly selected subsets of genes are presented in Table 17.
[198] These data clearly indicate that subsets of the Combo gene lists have predictive power. The predictive performance, as indicated by several measures including accuracy and geometric mean, increased in parallel with the predictive power of the gene set from which the genes were selected. The predictive power also generally increased as the number of randomly collected genes increased. In the case of the Combo 4, 5 and 6 sets, the 15 gene random subset had predictive performance that was close to that of the entire gene set. [199] Table 18 compares prediction accuracy for correct classification of kidney toxicity and for the same proportion of positive and negative toxicity calls randomly assigned to the samples (random classification). For each gene set or subset predictions were made using the same six training/test sets as for the other prediction analyses. Additionally, sets of genes were randomly chosen from the array which were not identified on the list of 216 predictive genes at 24 hour (Example 1, Table 10).
[200] It is clear from these data that the predictions with accurate classification are much better than predictions with randomized classification. This means that the predictive results are not simply due to chance and large data sets but are due to significant, meaningful predictive association between the gene expression of the predictive genes and the kidney toxicity. The accuracy numbers for the gene sets selected from a list of all genes on the array minus the predictive genes are much lower than the Combo predictive lists and the random subsets of these predictive lists. This also verifies the predictive power of the identified predictive genes. The fact that the predictive numbers from these subsets are somewhat higher for accurate than random classification is likely due to some residual predictivity in these genes that is not very substantial.
[201] Example 3: Discovery of Kidney Toxicity Predictive Genes from 6 Hour
Expression Data: (A) Materials and Methods: Compounds and treatments list used to construct the kidney database are given in Example 1. This table also provides the evaluation of the kidney toxicity observed as kidney tubular necrosis in samples collected 72 hours after treatment. The database is described in detail in Example 1. This Example analyzes expression data from samples collected 6 hours after treatment. Array data, normalization and transformation procedures used were as described in Example 1. Procedures and methods for obtaining gene lists correlating with histopathology scores were as described in Example 1 with scores as in Example 1. The Predict Parameter Values tool in GeneSpring™ software used for kidney toxicity class prediction is described in detail in Material and Methods of Example 1. [202] (B) Training and Test Data Sets:Data were each separated into 6 training and test sets. The first training and test set was created by allocating one set of data as a training set (Set A training set) and another set of data as a test set (Set A test set). Other training and test sets were created by randomly distributing the compounds into the sets. This was accomplished by assigning random numbers to lists of compounds that are negative and positive for histopathology, sorting by random number, and then dividing the sorted lists into a specific number of training and test sets. The training and test set assignments are presented in Table 19.
[203] (C) Kidney toxicity classifications were entered for training and test set as a parameter column. Toxicity, as defined by observation of kidney tubular necrosis in the kidney at 72 hours after treatment, was entered as a "yes" or "no" for each animal in a compound-dose group. Additionally, a parameter column for random histopathology classification was designated. This was done by randomly assigning "yes" and "no" calls to the individual animals. The total number of "yes" and "no" calls was maintained the same as in the correct classification, so that the proportion of "yes" and no calls was the same in all the training and test sets.
[204] (D) Prediction Output and Initial Data Processing: The "Predict Parameter
Value" tool of GeneSpring was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets. Unless otherwise specified a nearest neighbor setting of 10 (default) and P-value ratio cutoff of 0.5 was used. The number of genes used to predict was varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. For each number of genes the numbers of correct calls, incorrect calls and non-calls were recorded. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff. Calculations were made for overall percent correct calls (number of correct classifications/number or samples), percent correct calls of called samples (number of correct classifications/number of samples with calls) and percent of called samples (samples with calls/number of samples).
[205] For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set.
[206] (E) Results: Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores. Materials and Methods of Example 1 presents a list of the compounds and dose levels along with the kidney histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods. Example sets of correlating genes are provided in Tables 20 and 21. The correlating gene lists as well as the entire array gene list were provided as input lists to the GeneSpring Predict Parameter value tool (described in Materials and Methods) that employs a K-means nearest neighbor (knn) predictive model. These lists as well as the entire array gene list were used for each of the six training and test sets defined in Materials and Methods to generate predictions of histopathology classifications of the test sets. Input genes for the Predict Parameter Value feature included all 700 genes in the GenePix file (the rat CT Array) as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. The number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. The specified number of predictive genes was varied to obtain an optimum number of predictive genes.
[207] After this was done for all 6 training and test sets, all gene lists were then merged to create one aggregate list of predictive genes. Each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set. The aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 6 training and test sets were used, genes that were predictive in all 6 training and test sets were designated as Combo (combination) 6. Genes that were predictive in only 5 of 6 training and test sets were designated as Combo 5, etc. [208] A list of predictive genes organized by their occurrence in the separate training and test sets is presented in Table 22.
[209] Example 4 Predictive Properties and Evaluation of Predictive Genes from 6
Hour Expression Data: (A) Materials and Methods: The database used was as described in Example 1. Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 38 presents 6 hour gene expression data for the predictive genes. These data can be used with a k- means nearest neighbor prediction model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example. The Predict Parameter Values tool in GeneSpring™ software was used for kidney toxicity class prediction. A description of this tool and the statistical procedures used is provided in Example 1.
[210] (B) Training and Test Data Sets: The training and test data sets used are those described in Table 19.
[211] (C) Kidney Toxicology Classification: Kidney toxicology classifications used are described in Example 1. In this analysis randomized classifications (same number of "yes" and "no" classifications distributed randomly among the samples) were used.
[212] (D) Prediction Output and Initial Data Processing: For each gene list prediction used for evaluation a table of data generated by the Predict Parameter Values tool in GeneSpring™ software was saved which provided for each sample in the test set the actual call ("yes" or "no" for kidney toxicity), the predicted call ("yes", "no" or no call for kidney toxicity) and the P-value cutoff ratio. This set of data was used to calculate predictive performance measures provided below.
[213] (E) Prediction Measures: Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Venables and Ripley, ibid and Kubat and Matwin, ibid). Results from predictions of a two class case can be described as a two-class matrix as described above.
[214] (F) Results: Prediction results for 6 hour expression data using genes identified as predictive and the predicting unit is compound-dose are presented in Table 23. This prediction unit is probably the most relevant for toxicology prediction. The performance of the genes in predicting compound-dose toxicity is even better than predictions on an individual animal basis.
[215] These data indicate some accuracy in predicting kidney toxicity. Mean accuracy exceeded 0.7 (70% accuracy) for the entire predictive gene list (Combo All) and Combo 6 and 5 gene lists. As expected, the predictive performance of the gene sets generally increased from the lowest occurrence gene list (Combo 1) to the highest occurrence gene list (Combo 6) with the exception of the Combo 5 list. Mean false negative values were in the range of 0.4-0.6 as were the geometric mean measures.
[216] Example 5 Discovery of Kidney Toxicity Predictive Genes from 72 Hour
Expression Data: (A) Materials and Methods: Compounds and treatments list used to construct the kidney database are given in Example 1. This table also provides the evaluation of the kidney toxicity observed as kidney tubular necrosis in samples collected 72 hours after treatment. The Database is described in detail in Example 1. This Example analyzes expression data from samples collected 6 hours after treatment. Array data, normalization and transformation procedures used were as described in Example 1. Procedures and methods for obtaining gene lists correlating with histopathology scores were as described in Example 1 with scores as in Example 1. The Predict Parameter Values tool in GeneSpring™ software used for kidney toxicity class prediction is described in detail in Material and Methods of Example 1.
[217] (B) Training and Test Data Sets; Data were each separated into 6 training and test sets. The first training and test set was created by allocating one set of data as a training set (Set A training set) and another set of data as a test set (Set A test set). Other training and test sets were created by randomly distributing the compounds into the sets. This was accomplished by assigning random numbers to lists of compounds that are negative and positive for histopathology, sorting by random number, and then dividing the sorted lists into a specific number of training and test sets.
The training and test set assignments are presented in the following Table 24.
[218] (C) Kidney Toxicology Classification; Kidney toxicity classifications were entered for training and test set as a parameter column. Toxicity, as defined by observation of kidney tubular necrosis in the kidney at 72 hours after treatment, was entered as a "yes" or "no" for each animal in a compound-dose group. Additionally, a parameter column for random histopathology classification was designated. This was done by randomly assigning "yes" and "no" calls to the individual animals. The total number of "yes" and "no" calls was maintained the same as in the correct classification, so that the proportion of "yes" and no calls was the same in all the training and test sets.
[219] (D) Prediction Output and Initial Data Processing: The "Predict Parameter
Value" tool of GeneSpring was used with each of the training and test sets to generate predictions of histopathology classifications of the test sets. Unless otherwise specified a nearest neighbor setting of 10 (default) and P-value ratio cutoff of 0.5 was used. The number of genes used to predict was varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. For each number of genes the numbers of correct calls, incorrect calls and non-calls were recorded. Non-calls are cases where no prediction was made because the P-value ratio exceeded the specified P-value ratio cutoff Calculations were made for overall percent correct calls (number of correct classifications/number or samples), percent correct calls of called samples (number of correct classifications/number of samples with calls) and percent of called samples (samples with calls/number of samples).
[220] For each input list and optimal number of predictive genes (lowest number of genes giving a maximum overall percent of correct calls) additional information was recorded that included the list of specific genes in the optimum predictive set. [221] (E) Results: Expression array data were first examined for the existence of genes whose expression correlated with histopathology scores. Materials and Methods of Example 1 presents a list of the compounds and dose levels along with the kidney histopathology classification and histopathology severity scores used for this analysis. For each distance measure the probability was adjusted in increments of 0.05 until at least 50 correlating genes were obtained. Lists of correlating genes were obtained using the distance measures described in Materials and Methods. Example sets of correlating genes are provided in Tables 25-26. The correlating gene lists as well as the entire array gene list were provided as input lists to the GeneSpring Predict Parameter value tool (described in Materials and Methods) that employs a K-means nearest neighbor (knn) predictive model. These lists as well as the entire array gene list were used for each of the six training and test sets defined in Materials and Methods o generate predictions of histopathology classifications of the test sets. Input genes for the Predict Parameter Value feature included all 700 genes in the GenePix file (the Rat CT Array) as well as smaller lists of genes whose expressions correlated with histopathology by the correlation measures described previously. The number of genes used to predict are varied with standard numbers of 50, 40, 30, 20, 10, 5, 2 and 1 genes used. The specified number of predictive genes was varied to obtain an optimum number of predictive genes.
[222] After this was done for all 6 training and test sets, all gene lists were then merged to create one aggregate list of predictive genes. Each gene on this aggregate list has predictive value for at least one of the training and test sets because it was observed to contribute to an optimum predictivity for a specific training/test set. The aggregate list was subdivided into smaller lists of genes based on the number of times a gene was predictive for an individual training or test set. For example, if 6 training and test sets were used, genes that were predictive in all 6 training and test sets were designated as Combo (combination) 6. Genes that were predictive in only 5 of 6 training and test sets were designated as Combo 5, etc.
[223] A list of predictive genes organized by their occurrence in the separate training and test sets is presented in Table 27. [224] Example 6: Predictive Properties and Evaluation of Predictive Genes from 72
Hour Expression Data: (A) Materials and Methods: The Database used was as described in Example 1. Array data, normalization procedures and transformations used in these analyses are as described in Example 1. Table 40 presents 72 hour gene expression data for the predictive genes. These data can be used with a k- means nearest neighbor prediction model (as available in GeneSpring or other statistical software packages) to make predictions as described in this example. The Predict Parameter Values tool in GeneSpring™ software was used for kidney toxicity class prediction. A description of this tool and the statistical procedures used is provided in Example 1. The training and test data sets used are those described in Example 1.
[225] (B) Kidney Toxicology Classification: Kidney toxicology classifications used are described in Example 1. In this analysis randomized classifications (same number of "yes" and "no" classifications distributed randomly among the samples) were used.
[226] (C) Prediction Output and Initial Data Processing: For each gene list prediction used for evaluation a table of data generated by the Predict Parameter Values tool in GeneSpring™ software was saved which provided for each sample in the test set the actual call ("yes" or "no" for kidney toxicity), the predicted call ("yes", "no" or no call for kidney toxicity) and the P-value cutoff ratio. This set of data was used to calculate predictive performance measures provided below.
[227] (D) Prediction Measures: Measures of prediction used for these analyses are generally accepted prediction measures for information about actual and predicted classifications done by a classification system (Venables and Ripley, ibid and Kubat and Matwin, ibid). Results from predictions of a two class case can be described above.
[228] (E) Results: Prediction results for 72 hour expression data using genes identified as predictive and the predicting unit is compound-dose are presented in Table 28. This prediction unit is probably the most relevant for toxicology prediction. The performance of the genes in predicting compound-dose toxicity is even better than predictions on an individual animal basis.
[229] These data indicate a high accuracy in predicting kidney toxicity. Mean accuracy exceeded 0.85 (85% accuracy) for the entire predictive gene list (Combo All) and 0.8 (80% accuracy) for the Combo 6 and 4 subsets. False positive prediction rates were generally low for Combo All (mean less than 0.1) as well as the other Combos except Combo 2 (means 0.138 - 0.228). Because the proportion of negative classifications was much higher than the proportion of positive (toxic) classifications in these sample sets the false negative rates would be expected to be higher than the false positive rates and this was observed to be the case. The geometric mean was used as an indication of predictive performance that includes consideration of the proportion of positive and negative classifications. Combo All, Combo 6, Combo 5, and Combo 4 gave geometric mean measures >0.6.
[230] Example 7 Alternate Models for Predicting Kidney Toxicity: (A) Materials and
Methods: The database used for evaluation of these models was the 24 hour expression data for kidney samples described above. Expression data was for the Combo 6 set of predictive genes as described herein. Due to heteroscedasticity (i.e., the variance increases proportionately more than the mean increases) of the gene expression ratio data, a log transformation of the data is often considered. In general untransformed data was used but for some models log transformed data was used for comparison. Six training and testing sets were used that are the same as described in Example 1.
[231] (B) Predictive Modeling: The predictive task with the kidney toxicology gene expression data is a two-class classification problem, where the two classes of possible responses are defined by either kidney toxicity histopathology (yes) or absence of kidney toxicity histopathology (no). This is an uneven class problem in that the class of yes responses is roughly 20 percent of the data or less in the database tested. A discrimination function is used to classify a training set. This function is cross-validated with a testing set, often repeatedly to quantify the mean and variation of the classification error. There are numerous common discrimination functions, and a comparative study of the performance of these functions is useful in determining the best classifier. Additional measures are then used to compare the performance of the classifiers. Since the classes are of significantly uneven sizes, use a geometric mean measure (GMM) was used to compare models, namely, the square root of the product of the true positives and the true negatives.
[232] Common discrimination methods are Fisher's linear discriminant, quadratic discriminant (mahalanobis distance), Ac-nearest neighbors (knn), logistic discriminant (MacLachlan, 1992), classification trees (or more generally known as recursive partitioning) (Breiman et al., 1984; Clark and Pregibon, 1993; Quinlan and Kaufman, 1988), and neural network classifiers (Ripley, 1996). Most are formula-based such as linear and quadratic discriminant, whereas others are rule-based, such as recursive partitioning, or algorithmically based, such as knn. knn is also database dependent in that a database containing training set is needed to perform nearest neighbor search and classification.
[233] (C) Classifier Models: A variety of common classification techniques were evaluated. As an extension of the k-means nearest neighbor (knn) model a simple hybrid classifier was designed and tested, using the knn results, to transform the knn model into a database independent model. This model is termed a centroid model. The centroid model uses the correctly identified test data results from knn and locates a centroid of the subset of k samples that are of the same class for each correctly identified test sample. The centroid is assigned the correct class, and with new test data, a sample is assigned the class of its nearest centroid.
[234] In addition to the knn and centroid models described above, tree, centroid, logistic, and neural network models were employed. The neural network is a simple, feed-forward network, allowing skip layers, and with an entropy fitting criterion. Linear classifiers perform poorly with respect to this data and quadratic classifiers perform modestly, so their results are not presented. [235] (D) Cross Validation of the Models: Six training and testing sets were used to cross-validate knn. Gene selection ranking was then performed on each training set. A number of different gene sets were used for each of the six sets and the best GMM value was chosen to represent the performance of the model. Trees were pruned via ten-fold internal cross-validation, (i.e., using subsets of the training set) for each training set, and then the tree was used to predict the testing set. A GMM was thus calculated for each testing set. Trees perform the gene selection via pruning, and anywhere from one to five genes were selected for each tree. The centroid model is five-fold cross-validated using random subsets of the testing set. The mean of the GMM of each of the validation runs is used as the performance measure. The top five discriminating genes are used in the centroid models. The logistic discrimination uses a stepwise backwards selection process to determine the gene set during the training phase. Three to six genes are typically selected via this process. A single performance is then obtained using the corresponding testing set. A neural network is trained on each training set and then validated on the corresponding testing set. All 28 genes in the data set are used with the neural network model.
[236] (E) Results: Model performance is presented in Table 29. The knn model performed the best overall. If the best common gene selection is used, knn is still the best, though the performance mean is more in line with the logistic and centroid models. Logistic and centroid models perform the next best overall, and either could be used successfully with a less than 25 percent misclassification error, if a database independent solution is preferred. Log transformations of the data produced mixed results when used with logistic and neural network models, suggesting that such a transformation has little impact. Tree and neural network models perform the poorest respectively on average; however, all of the models perform well for this type of data on at least some of the training and testing pairs, with the equivalent of a less than 20 percent misclassification error. The knn, centroid and neural network models could be improved by a more thorough gene selection scheme.
[237] Table 30 presents logistic discrimination coefficients derived from this analysis. These coefficients may be used in a logistic discriminant model to obtain predictions of kidney toxicity when expression\values for the indicated genes are determined using appropriate samples and an appropriate microarray expression detection system such as the Rat CT array used to develop the Database.
[238] Similarly, the classification model for all of the data using a classification tree in S-Plus software provided the following rule for predicting toxicity: if Gadd45 < 1.474 AND Tissue inhibitor of metalloproteinases 1 < 1.786, then "No" (not toxic), otherwise "Yes" Toxic.
[239] For this model and rule, the internal performance with the entire database was a total 7 of 241 samples were misclassified, with a misclassification error 0.03. A total of 2 of 38 of the yes class (toxic) are misclassified and 5 of 203 no class (not toxic) are misclassified. This is equivalent to a 0.053 and 0.025 misclassification error, respectively. The geometric mean performance measure is 0.961267. This model rule can be applied to obtain predictions of kidney toxicity when expression values for the indicated genes are determined using appropriate samples and an appropriate microarray expression detection system such as the Rat CT array used to develop the Database. References
1. Discriminant Analysis and Statistical Pattern Recognition, Geoffrey J. McLachlan, Wiley Series in Probability and Mathematical Statistics, 1992.
2. Classification and Regression Trees, L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Chapman & Hall, 1984.
3. 'Tree-based Models" by Linda A. Clark and Daryl Pregibon, Chapter 9 of Statistical Models in S, John M. Chambers and Trevor J. Hastie, eds. Chapman & Hall Computer Science Series, 1993.
4. C4.5: Programs for Machine Learning, J. Ross Quinlan, Morgan Kaufmann, 1988. 5. Pattern Recognition and Neural Networks, B.D. Ripley, Cambridge University Press, 1996.
[240] Example 8 Use of Predictive Genes to Predict Kidney Toxicity for Samples
External to the Database: (A) Materials and Methods: (A)(1 ) Animal Treatment and Tissue Harvest: Male Sprague-Dawley rats in groups of 3 were treated by intraperitoneal injection with test compounds (cephalosporidine, 1500 mg/kg and cisplatin, 20 mg/kg) or only with the vehicle in which the compound was mixed. At specified timepoints (6h and 24h) the rats were euthanized and tissues collected. Kidney tissues were immediately placed into liquid nitrogen and frozen within 3 minutes of the death of the animal to ensure that mRNA did not degrade. The tissues were sent blinded to be evaluated. The organs/tissues are then packaged into well-labeled plastic freezer quality bags and stored at -80 degrees until needed for isolation of the mRNA from a portion of the organ/tissue sample.
[241] (A)(2) Gene Expression Measurement: Isolation of RNA, preparation of cDNA labeled probes and hybridizations procedures were as described in Example 1 Materials and Methods. Probes were hybridized to the rat CT Chip which is the same array as used for the database.
[242] (B) Data Analysis: Array data from the samples was loaded into GeneSpring software using the same procedures as used for the database. No kidney toxicity parameters were entered for these samples. The Predict Parameter Value tool was used to make toxicity predictions using different Combo Gene sets from the 24 hour data and the entire database as the training set. Other values used were 10 nearest neighbors and a p-value ratio cutoff of 0.5.
[243] (C) RESULTS: Table 31 presents predictions for samples that were external to the database used to derive the predictive genes. The samples were kidney samples from replicate animals treated with cephaloridine and cisplatin. One of these compounds (cisplatin) is also represented in the database (at a different dose level) and the other compound, cephaloridine, is not in the database. Histopathology conducted on the kidney samples verified that these treatments induced kidney tubular necrosis. Each of the Combo gene sets correctly predicted that these samples had expression patterns indicative of kidney toxicity.
[244] These results demonstrate clearly that the discovered sets of predictive genes in conjunction with the database and K-means nearest neighbor model can accurately predict toxicity from microarray data that is external to the database. Because the database consists mostly of non-toxic samples the prediction of toxicity for these samples is significantly different from what would be expected from chance. It is also noteworthy that three different sets of predictive genes are capable of making accurate predictions.
[245] Example 9 Clustering Analysis to Identify Coordinantly Behaving Subset of
Predictive Genes
(A) Materials and Methods
(A)(1) Gene Expression Data: Gene expression data used for cluster analysis were the 24 hour kidney expression data of the 28 genes of the Combo 6 predictive gene set. These data are shown in Table 39.
[246] (B) Cluster Analysis: Cluster analysis tools used in these analyses included
K-means and gene tree features of GeneSpring software and Wards clustering algorithm in S-Plus statistical analysis software.
[247] (C) Results: Figure 7 presents combined results of K-means and gene-tree hierarchical clustering analysis. Combo 6 (28 genes) was clustered using K-means (number of cluster 10, maximum iteration 100, similarity measure Pearson) and Gene tree (separation ratio 0.5, minimum distance 0.001 , similarity measure Pearson). The k-means clusters are colored according to the corresponding set 1 to set 10. The gene names on the display from top to bottom correspond to left to right cluster bars.
[248] Wards cluster analysis results are shown in Figure 8. Cluster tree for Combo 6 genes are shown with the best cut line indicating 7 clusters. Gene names corresponding to numbers are indicated in tabular form below the diagram.
[249] Example 10. Use of Expression Profiles of Predictive Genes in a Computer
Program Product to Predict Renal Toxicity (A) Materials and Methods
(A1) Overview of Computer Program Product: A computer program product produces a prediction of the occurrence of a kidney toxicity using input gene expression data from test samples. The model and data for the computer program have been primarily validated using Phase-1 Rat CT arrays and Phase-
1 Rat CT expression data in the Phase-1 TOXBank database as described in previous examples. In other embodiments, expression data from other expression platforms (such as TaqMan using Syber Green technology) may also be used in the computer program product. Those skilled in the art are capable of developing and validating scaling factors to adjust for differences in differential gene expression sensitivity and responsiveness among different platforms used in the computer program product.
[250] The computer program product uses the Predictive Model as described in the previous examples. The computer program product contains an encrypted training data set that includes differential gene expression values and an endpoint classification for each sample in the training set. The computer program product samples are from the same timepoint (e.g., gene expression measured at 24 hours after dosing) and the classification is binary for the specific endpoint (e.g., kidney tubular necrosis or no kidney tubular necrosis). The computer program product also contains encrypted lists of the Combo sets of predictive genes (also called Predictagen sets). Inputs to the Predictive Model of the computer program product are the c value for number of nearest neighbors and the type of distance measure to be used in the model. Data inputs for the Predictive Model include the Combo list(s) of predictive genes and training set as encrypted "plug-in" files and specification of a test data file(s) that has expression data.
[251] The initial prediction is made after calculating the probability that the tabulated votes are different from the proportion of votes in the training set for each classification. A statistical test (hypergeometric mean distribution) is run for each classification and p-values are calculated. The classification prediction would be that class that has the highest p-value. A classification cutoff procedure is used that uses the p-value ratio (1 - po/pi where po is the p-value for the not predicted class and pi is the p-value for the predicted class). If the p-value ratio does not exceed a specified cutoff value (input to the computer program product by the user) then a prediction is not made. The Prediction Machine can be used with multiple Predictagen sets with the classifications, p-values and p-value ratios calculated as above. In this case an overall prediction is made by combining the predictions of the individual Predictagen sets. Each Predictagen set is weighted by a performance number. The overall certainty for this combined prediction is calculated by a paired value Mest using the p-value ratio and (1 -p-value ratio) for each Predictagen set as a pair of values. The certitude is 1-p where p is the value for the paired value Mest.
[252] (A2) Computer Program Product Input: Encrypted training data is included as a plug-in module for the software. User input includes specification of encrypted Predictagen gene lists and samples for prediction (files with gene expression data). Additional specifications are distance measure to be used in the knn model (currently Euclidean), number of neighbors and a certitude cutoff (p-value ratio cutoff).
[253] (A3) Program Operation: The program is executed as follows. First, on the
Prediction tab the 'Load Predictagens' button is clicked on to load the desired predictagen(s). In this example, the 24 hour kidney Predictagen is loaded. Next, a predictagen in the Predictagen sets list box is highlighted and the 'Make Predictor' button is clicked on (in this example, 24 hour kidney). If necessary, the predictor is highlighted and the 'Configure' button is clicked on to set parameter values. Next, the 'Load Samples' button is clicked on. Sample data is loaded as text files in the format shown in Table 44. Samples from the Samples list box using the left mouse button are then selected, and the CTRL key is simultaneously selected to make multiple selections. In this example, 3 kidney samples from rats treated with 25 mg/kg paraquat and 3 kidney samples from rats treated with 80 mg/kg phenobarbital are selected. The samples were treated and processed for gene expression analysis as described in the previous examples. The 'Add to predictor' button is then clicked on, and the 'Predict' button is then clicked on to generate the program's output.
[254] On the Output tab, the 'Summary', 'Detail', or 'Full' radio buttons are selected to control the amount of information displayed about the prediction. The 'Tabular Report' checkbox is checked to put the output in a format that can be loaded into Excel as tab-delimited text. The 'Save', 'Copy', 'Print', and 'Clear' buttons are selected to save the output, copy the output to the clipboard, print the output, or clear the output window prior to another prediction.
[255] (A4) Computer Program Product Output: The summary view displays sample information, the call (kidney tubular necrosis or negative), and the overall certitude. The detail view presents the individual calls and 1 -p-value ratio for each Predictagen, in addition to summary view information. The full view presents, for each sample and Predictagen gene list, the specific nearest neighbors and their classification (votes) along with the hypergeometric mean p values for each classification. At the end of this information detail view information is presented.
[256] (B) Test Data: Table 43 displays the test set of gene expression data used to generate predictions. The table shows the correct classification of kidney samples that have histopathology (kidney tubular necrosis) or no histopathology.
[257] (C) Results: Table 42 displays the summary output of the computer program after loading. Two out of three of the paraquat samples (sample #s 16477 and 16479) were correctly predicted for rat kidney tubular necrosis (with certitudes of 0.472 and 0.796). Three out of three of the phenobarbital samples were correctly predicted as negative for kidney tubular necrosis. Table 43 displays the detailed output of the computer program, which shows the individual performances of the 24 hour kidney Combo sets and the overall certitude score.
[258] Example 11 Selection and Validation of Protein Biomarker Candidates.
Protein marker candidates can be selected from biomarker genes using a number of parameters. Table 44 presents biomarker genes sorted in order of their mean individual gene predictive performance (percent correct calls) for all genes exhibiting ≥ 60% percent correct calls. Each gene was then evaluated for evidence whether it codes for a protein. This is clearly a key criterion for a protein marker. The next parameters evaluated were the relative transcriptional response in toxic versus non- toxic samples. If protein levels are proportional to RNA levels then these columns indicate the relative potential magnitude of the protein marker in toxic and non-toxic samples. The better marker candidates should be those genes exhibiting the larger differences in RNA expression. A number of additional criteria can be considered included protein MW, occurrence of the protein in tissues other than the target tissue and availability of antibodies which will recognize the protein. One important criterion may also be whether the protein is secreted. The last column in Table 44 indicates that 3 of the proteins are known to be secreted. Table 37 lists proteins known to be secreted derived from the total list of predictive genes. The property of secretion may be useful in identification of proteins which could be biomarkers in serum or possibly other matrices such as urine or saliva.
[259] Protein markers can be rapidly evaluated by testing for levels of the identified marker candidates using any of a number of analytical techniques for measuring specific protein levels such as Western blots or ELISA assays. Samples for analysis may be selected from a tissue bank such as that described in Example 1. Selection for analysis would include samples from toxic treatments and samples from non-toxic treatments. Quantitative protein marker data can be analyzed using the same approaches described in Example 2 for evaluation and validation of predictive performance of the protein markers.
[260] Experimental data demonstrating application of this concept and identification and validation of a protein marker were developed using antibodies to clusterin and insulin-like growth factor binding protein 1. These genes were selected from the list of genes on Table 44 based on available antibodies. Insulin-like growth factor binding protein 1 is known to be secreted. Serum sample protein from four pairs of animals (2 pairs treated with non-toxic compounds and 2 pairs treated with kidney-toxic compounds were analyzed using Western blot methods known to those skilled in the art. The Western blot was probed with antibodies to insulin-like growth factor binding protein 1 and clusterin.
[261] A scanned autoradiogram of results is presented in Figure 9. Clusterin appeared to be approximately equal abundance in the samples. Insulin-like growth factor binding protein 1 protein levels clearly appeared to be proportional to the gene expression levels observed in kidneys of these animals and were clearly elevated in the kidney-toxic treatments compared to the non-toxic treatments. The insulin-like growth factor binding protein 1 protein levels in serum were correlative at the individual animal level with the transcription factor signals. These data clearly indicate that predictive markers identified through transcript measurement and analysis can also be predictive protein markers.
[262] It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes to the same extent as if each individual publication, patent or patent application were specifically and individually indicated to be so incorporated by reference.
Figure imgf000069_0001
Figure imgf000070_0001
* Values in parentheses indicate that array data are only available for indicated time points
** Histopathology tubular necrosis severity scores. 1= not remarkable; 2 and higher indicate histopathology of increasing severity Table 2 List of Genes, Whose Expression at 24h Directly Correlates with Kidney Tubular Necrosis at 72h, Ranked by Pearson Correlation Coefficient
Figure imgf000071_0001
Figure imgf000072_0001
Table 3 List of Genes, Whose Expression at 24h Inversely Correlates with Kidney Tubular Necrosis at 72h, Ranked by Spearman Correlation Coefficient
Figure imgf000073_0001
Figure imgf000074_0001
Table 4 Distribution of Compounds* in Individual Training and Test Sets for 24 Hour Kidney Data
Training and Test Set A
Figure imgf000075_0001
Training and Test Set 1
Figure imgf000075_0002
Figure imgf000076_0001
Training and Test Set 2
Figure imgf000076_0002
Training and Test Set 3
Figure imgf000076_0003
Figure imgf000077_0001
Training and Test Set 4
Figure imgf000077_0002
Training and Test Set 5
Figure imgf000077_0003
Figure imgf000078_0001
* For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.) ** Negative= Compounds that did not elicit histopathology (score=l)
Positive= Compounds that did elicit histopathology (score of 2 or greater)
Figure imgf000079_0001
Figure imgf000080_0001
Figure imgf000081_0001
Figure imgf000082_0001
RCT 165
RCT 252
RCT- 101
RCT-111
Protein O-mannosyl transferase 1 (Pomtl)
RCT-129
Apoptosis-regulating basic protein
RCT- 140
RCT- 147
RCT-153
RCT- 164
RCT- 166
RCT- 18
RCT-181
RCT- 185
RCT-206
RCT-220
RCT-221
Inositol polyphosphate multikinase (lpmk)
RCT-268
RCT-276
RCT-279
RCT-31
RCT-36
RCT-43
RCT-61
RCT-72
RCT-76
Renal organic anion transporter
Retinoid X receptor alpha
Retinol dehydrogenase type III
Retinol-binding protein (RBP)
Sarcoplasmic reticulum calcium ATPase
Sulfotransferase K2
Superoxide dismutase Cu/Zn
T-cell cyclophilin
Thiol-specific antioxidant (natural killer cell-enhancing factor B)
Thiopurine methyltransferase Thrombin receptor (PAR-1) , _
* Combination category is the number of training/test set gene list occurrences Table 6 Randomly Selected Gene Subsets from 24 H Combo All (216 Genes)*
Figure imgf000084_0001
Figure imgf000084_0002
Figure imgf000084_0003
* Genes were randomly selected from the Combo All list of predictive genes (216 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes. Table 7 Randomly Selected Gene Subsets from 24 H Combo 6 Gene Set (28 Genes)*
Figure imgf000085_0001
Figure imgf000085_0002
* Genes were randomly selected from the Combo All list of predictive genes (216 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
Table 8 Randomly Selected Gene Subsets from 24 H Combo 5 Gene Set (25 genes)*
Figure imgf000087_0001
Figure imgf000087_0002
Figure imgf000087_0003
* Genes were randomly selected from the Combo All list of predictive genes (216 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes. Table 9 Randomly Selected Gene Subsets from 24 H Combo 4 Gene Set (23 genes)*
Figure imgf000088_0002
Figure imgf000088_0001
Matrix metalloproteinase-1
Multidrug resistant protein- 1
Organic cation transporter 3
RCT- 180
RCT-240
RCT-38
Pyruvate kinase, muscle
Superoxide dismutase Mn
Ubiquitin conjugating enzyme (RAD 6 homologue)
Figure imgf000089_0001
* Genes were randomly selected from the Combo All list of predictive genes (216 genes) assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
Table 10 Randomly Selected Gene Subsets from Array Genes Excluding Combo All Set*
Figure imgf000090_0001
Figure imgf000090_0002
Figure imgf000090_0003
Figure imgf000091_0001
* Genes were randomly selected from the entire array list of genes excluding the Combo All 216 predictive genes by assigning a random number to each gene, sorting by the random number and selecting the appropriate number of sorted genes.
Table 11 Kidney Toxicity Individual Sample Prediction Values for 24 Hour Data Predictive Genes (Combined List and Subsets)
Figure imgf000092_0001
* Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and gene lists. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment.
** Standard prediction measures were used as defined in Materials and Methods. These include:
Accuracy =Proportion of total number of predictions that are correct
False positive rate =Proportion of negative cases that are incorrectly classified as positive
False negative rate =Proportion of positive cases that are incorrectly classified as negative
Geometric mean =Performance measure that takes into account proportion of positive and negative cases
Tablel2 Kidney Toxicity Compound-Dose Prediction Values for 24 Hour Data Predictive Genes (Combined List and Subsets)
Figure imgf000093_0001
* Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and gene lists. Unit of prediction was compound-dose level and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment. Prediction for compound-dose was based on a majority of individual animal calls. In cases where there were an equal number of opposing calls or no calls a no-call was assigned to the compound-dose level.
** Standard prediction measures were used as defined in Materials and Methods. As described in Materials and Methods in cases where no prediction was made because the p-value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect.
Table 13 Kidney Toxicity Compound Prediction Values for 24 Hour Data Predictive Genes (Combined List and Subsets)
Figure imgf000094_0001
* Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and gene lists. Unit of prediction was the compound and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment. Compounds were considered toxic if any compound-dose level for that compound was predicted as toxic.
** Standard prediction measures were used as defined in Materials and Methods. As described in Materials and Methods in cases where no prediction was made because the p-value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect.
Table 14 Order of Genes Used for Cumulative Analysis of Predictive Performance of Predictive Combo Gene Sets*
Figure imgf000095_0001
Combo 5 Gene Set
RCT- 182
Carbonic anhydrase III, sequence 2
RCT-258
60S ribosomal protein L6
RCT-274 Multidrug resistant protein-3
Osteopontin
Beta-actin, sequence 2
Beta-tubulin, class I
Zinc finger protein
Canalicular multispecific organic anion transporter
Keratinocyte growth factor
Alpha-fibrinogen
Ribosomal protein S9
RCT-60
RCT-179
Thymosin beta- 10
Proliferating cell nuclear antigen gene
IgE binding protein
RCT-211
RCT-49
RCT-50
Heme binding protein 23
MHC class I antigen RTl.Al(f) alpha-chain
RCT-126
Combo 4 Gene Set
Pancreatic secretory trypsin inhibitor type II (PSTI-II)
RCT-240
Epidermal growth factor
Matrix metalloproteinase-1
RCT-287
Connexin-32
ATP-stimulated glucocorticoid-receptor translocation promoter (Gyk)
Superoxide dismutase Mn
Pyruvate kinase, muscle
Ferritin H-chain
Multidrug resistant protein- 1
RCT-293
Interleukin-1 beta
Organic cation transporter 3
Preproalbumin, sequence 2 (alternate clone 1)
CD44 metastasis suppressor gene
Ubiquitin conjugating enzyme (RAD 6 homologue) RCT-38
Ref-1
Ceruloplasmin
Hypoxanthine-guanine phosphoribosyltransferase
RCT- 138
RCT- 180
* Genes are listed in the order in which they were used for cumulative analysis of predictive performance
Table 15 Individual Gene Predictions: Combo 6
Figure imgf000098_0001
Table 16 Individual Gene Predictions: Combo 5
Figure imgf000099_0001
Table 17 Kidney Toxicity Individual Sample Prediction Values for 24 Hour Data with Random Gene Subsets
Figure imgf000100_0001
Figure imgf000101_0001
* Randomly selected sets of genes derived from the Combo sets.
* Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 24 hour array data and random subsets of genes. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment.
** Standard prediction measures were used as defined in Materials and Methods. As described in Materials and Methods in cases where no prediction was made because the p-value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect.
Table 18 Comparison of Predictivity for True Kidney Toxicity Classification and Random Classification Using Combo Gene Sets and Random Subsets and 24h data
Figure imgf000102_0001
For Combo lists all genes were used or random subsets. All-Pred used genes randomly selected from genes that were present on the array but not in the predictive list.
Accuracy = proportion of the total number of predictions that are correct. Non-calls are counted as incorrect predictions. Accuracy was calculated for correct classifications of kidney toxicity assigned to the samples and for randomized classifications in the same proportions as the correct classifications. Values presented are the mean accuracy values for 6 training/test sets with minimum and maximum accuracy values.
Table 19 Distribution of Compounds* in Individual Training and Test Sets for 6 Hour Kidney Data
Training and Test Set A
Figure imgf000104_0001
Random Training and Test Set 1 (Randomlv assigned)
Figure imgf000104_0002
Figure imgf000105_0001
Random Training and Test Set 2 (Randomlv assigned)
Figure imgf000105_0002
Random Training and Test Set 3 (Randomlv assigned)
Training Set 3 | Training Set 3 | Test Set 3 | Test Set 3 Positive
Figure imgf000106_0001
Random Training and Test Set 4 (Randomlv assigned)
Figure imgf000106_0002
Figure imgf000107_0001
* For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.) ** Negative= Compounds that did not elicit histopathology (score=l)
Positive= Compounds that did elicit histopathology (score of 2 or greater) Table 20 List of Genes, Whose Expression at 6 h Directly Correlates with Kidney Tubular Necrosis at 72h, Ranked by Pearson Correlation Coefficient
Figure imgf000108_0001
Figure imgf000109_0001
Figure imgf000110_0001
Figure imgf000111_0001
Figure imgf000112_0001
Table 21 List of Genes, Whose Expression at 6 h Inversely Correlates with Kidney Tubular Necrosis at 72h, Ranked by Spearman Correlation Coefficient
Figure imgf000113_0001
Figure imgf000114_0001
Table 22 List of genes whose expression at 6 hours is predictive of kidney toxicity at 72 hours
Figure imgf000115_0001
Figure imgf000116_0001
Figure imgf000117_0001
Figure imgf000118_0001
Figure imgf000119_0001
* Combination category is the number of training/test set gene list occurrences.
Table 23 Kidney Toxicity Compound-Dose Prediction Values for 6 Hour Data Predictive Genes (Combined List and Subsets)
Figure imgf000120_0001
Table 24 Distribution of Compounds* in Individual Training and Test Sets for 72 Hour Kidney Data
Training and Test Set A
Figure imgf000121_0001
Training and Test Set 1
Figure imgf000121_0002
Figure imgf000122_0001
Figure imgf000123_0001
Training and Test Set 5
Figure imgf000124_0001
* For abbreviations please see Table 1 (Compound, Dose, Abbreviation, etc.) ** Negative= Compounds that did not elicit histopathology (score=l)
Positive= Compounds that did elicit histopathology (score of 2 or greater)
Table 25 List of Genes, Whose Expression at 72 h Directly Correlates with Kidney Tubular Necrosis at 72h, Ranked by Pearson Correlation Coefficient
Figure imgf000125_0001
Figure imgf000126_0001
Table 26 List of Genes, Whose Expression at 72 h Inversely Correlates with Kidney Tubular Necrosis at 72h, Ranked by Spearman Correlation Coefficient
Figure imgf000127_0001
Figure imgf000128_0001
Table 27 List of genes whose expression at 72 hours is predictive of kidney toxicity at 72 hours
Figure imgf000129_0001
Figure imgf000130_0001
Figure imgf000131_0001
Figure imgf000132_0001
Figure imgf000133_0001
Figure imgf000134_0001
Combination category is the number of training/test set gene list occurrences.
Table 28 Kidney Toxicity Compound-Dose Prediction Values for 72 Hour Data Predictive Genes (Combined List and Subsets)
Figure imgf000135_0001
* Prediction measures are given as means and range of values (in parentheses) for six training/test sets using 72 hour array data and gene lists. Unit of prediction was the animal and the predictive classification was for kidney tubular necrosis observed at 72 hours after treatment.
** Standard prediction measures were used as defined in Materials and Methods. As described in Materials and Methods In these analyses cases where no prediction was made because the p-value ratio exceeded the cutoff-value (generally 0.5) the non-call was considered to be incorrect.
Table 29 Predictive Performance of Various Models
Figure imgf000136_0001
Table 30 Logistic Discrimination Coefficients
Figure imgf000137_0001
Table 31 Prediction of Kidney Toxicity for Samples External to Database
Figure imgf000138_0001
Figure imgf000139_0001
* All genes used for Combo Gene Lists.
** Prediction values are output from prediction program. Values include prediction (yes=kidney toxicity predicted, no=no kidney toxicity predicted), numbers of yes and no votes from 10 nearest neighbors, the p-value for the no and yes votes and the p-value ratio for the predicted class over the not predicted class. A p-value ratio cutoff of 0.5 was used
Figure imgf000140_0001
Figure imgf000141_0001
Figure imgf000142_0001
Figure imgf000143_0001
Figure imgf000144_0001
Figure imgf000145_0001
Figure imgf000146_0001
Figure imgf000147_0001
* A Combo entry number indicates that the gene was on the predictive list for that time point and the number of occurrences of that gene on optimal combined training/test set lists. "Not Found" indicates that the gene was not on the optimal combined list for that time point.
Table 34 RCT genes (ESTs) Predictive for Kidney Tubular Necrosis: Best Homology Matches
Figure imgf000149_0001
Figure imgf000150_0001
Figure imgf000151_0001
Figure imgf000152_0001
* Homologies are given from BLAST searches using the Phase 1 RCT sequence as the query sequence and GenBank NR database as the target sequence database. The best BLAST homology sequence observed is given. In general, no significant homology indicates that no BLAST match was observed with a BIT score >100. BLAST searches in this category were conducted as recently as February, 2002.
Table 35 Fifty-three Genes that are Predictive at all Three Time Points
Figure imgf000154_0001
Figure imgf000155_0001
Table 36 Twenty-three Genes that are the most predictive across the time points
Figure imgf000156_0001
Table 37 Kidney Toxicity Predictive Genes Whose Protein Products Are Known to be
Secreted
Ceruloplasmin
Colony-stimulating factor-1
Complement component C3
Cystatin C
Epidermal growth factor
Ferritin H-chain
Fibrinogen gamma chain
Interleukin-1 beta
Interleukin-10
Interleukin-18
Keratinocyte growth factor
Macrophage inflammatory protein- 1 alpha
Macrophage inflammatory protein-2 alpha
Major acute phase protein alpha- 1
Mullerian inhibiting substance
NGF-inducible anti-proliferative putative secreted protein (PC3)
Pancreatic secretory trypsin inhibitor type II (PSTI-II)
T-cell cyclophilin
Thioredoxin-1 (Trxl)
Tissue factor
Tissue inhibitor of metalloproteinases- 1
Transferrin
Vascular endothelial growth factor
Figure imgf000157_0001
Figure imgf000158_0001
Table 43 Detailed Output of Predictive Computer Software Product
Sample paraquat 16477 RatKidney 25mg/kg 24h 503r#3132
Predictagen Performance Kidney Tubular
Necrosis Negative
24hKidney Combo 1.txt 1.000 0.752
24hKidneyCombo2.txt 1.000
24hKidneyCombo3.txt 1.000 0.752
24hKidneyCombo4.txt 1.000 0.584
24hKidneyCombo5.txt 1.000 0.997
24hKidneyCombo6.txt 1.000 0.977
Prediction: Kidney Tubular Necrosis with certitude 0.472
Sample paraquat 16478 RatKidney 25mg/kg 24h 503r#3133
Predictagen Performance Kidney Tubular
Necrosis Negative
24hKidneyCombol .txt 1.000 0.752
24hKidneyCombo2.txt 1.000 0.752
24hKidneyCombo3.txt 1.000 0.752
24hKidneyCombo4.txt 1.000 0.752
24hKidneyCombo5.txt 1.000 0.752
24hKidneyCombo6.txt 1.000 0.752
Prediction: Negative with certitude 0.999
Sample paraquat 16479 RatKidney 25mg/kg 24h 503r#3134
Predictagen Performance Kidnev Tubular
Necrosis Negative
24hKidneyCombo 1.txt 1.000 0.752
24hKidneyCombo2.txt 1.000
24hKidneyCombo3.txt 1.000
24hKidneyCombo4.txt 1.000 0.882
24hKidneyCombo5.txt 1.000 0.997
24hKidneyCombo6.txt 1.000 0.999
Prediction: Kidney Tubular Necrosis with certitude 0.796
Sample phenobarbital 11494 RatKidney 80mg/kg 24h H375#2634
Predictagen Performance Kidnev Tubular
Necrosis Negative
24hKidneyCombo 1.txt 1.000 0.752 24hKidneyCombo2.txt 1.000 0.752
24hKidneyCombo3.txt 1.000 0.752
24hKidneyCombo4.txt 1.000 0.752
24hKidneyCombo5.txt 1.000 0.752
24hKidneyCombo6.txt 1.000 0.752
Prediction: Negative with certitude 0.999
Sample phenobarbital 11495 RatKidney 80mg/kg 24h H375#2635
Predictagen Performance Kidnev Tubular
Necrosis Negative
24hKidney Combo 1.txt 1.000 0.752
24hKidneyCombo2.txt 1.000 0.752
24hKidneyCombo3.txt 1.000 0.752
24hKidneyCombo4.txt 1.000 0.752
24hKidneyCombo5.txt 1.000 0.752
24hKidneyCombo6.txt 1.000 0.752
Prediction: Negative with certitude 0.999
Sample phenobarbital 11496 RatKidney 80mg/kg 24h H375#2636
Predictagen Performance Kidnev Tubular
Necrosis Negative
24hKidney Combo 1.txt 1.000 0.752
24hKidneyCombo2.txt 1.000 0.752
24hKidneyCombo3.txt 1.000 0.752
24hKidneyCombc4.txt 1.000 0.752
24hKidneyCombo5.txt 1.000 0.752
24hKidneyCombo6.txt 1.000 0.752
Prediction: Negative with certitude 0.999
Table 44. Protein Marker Candidate Identification
Figure imgf000159_0001
Figure imgf000160_0001
*Mean Percent Accuracy for six training test sets for individual gene predictive performance.
**Mean fold induction relative to 0= no induction for expression in kidney samples treated with nontoxic treatments (Neg FI) or treatments producing kidney toxicity
(Pos FI) Table 45 Input data used for predictive computer program product
Gene paraquat, 16477, Rat paraquat, 16478, Rat paraquat.16479, Rat phenobarbital, 11494, Rat phenobarbital, 11495. Rat phenobarbital, 11496, Rat
Kidney, 25 mg/kg, 24 h, Kidney, 25 mg/kg, 24 h, Kidney, 25 mg/kg, 24 Kidney, 80 mg/kg, 24 h. Kidney, 80 mg/kg, 24 h. Kidney, 80 mg/kg, 24 h,
503r#3132 503r #3133 h, 503r #3134 H375 #2634 H375 #2635 H375 #2636
14-3-3 zeta 106 108 1.28 111 104 122
17-beta hydroxysteroid dehydrogenase, type 2 166 13 157 -11 -1-34 -115
22kDa integral peroxisomal membrane protein 105 -104 -108 -1 1 •124 -12
25-DX 101 124 114 -119 101 108
25-hydroxyvrtamιn D3-I alpha-hydroxytase 117 119 119 107 -1 101
3-beta hydroxysteroid dehydrogenase (HSD381) -127 -113 -103 105 -101 1
3-hγdroxyιsobutyratθ dehydrogenase -137 -122 -1-23 102 113 109
3-methyladenιne DNA glycosylase 113 126 119 122 117 122
6uS nbosomal protein L6 103 -104 116 112 101 102
3 αxoguaninβ DNA glycosylase -101 -104 -102 108 -1 106
Acetyl-CoA carboxylasβ -16 •154 -148 -102 -14 •139
Acetylcholine receptor epsiton -142 -113 -148 -107 109 108
Activating transcription factor 3 -102 -1 -103 -1 113 106
Actrvin receptor type II 11 103 -102 -108 -119 -101
Acyl-CoA dehydrogenase, medium chain -113 102 -107 112 101 105
Adenine nucleotide translocator 1 -14 126 -106 -108 1 103
ADP-πbosylation factor-like protein ARL184 115 11 117 -105 -111 101
Aαrenodoxin reductase 101 104 -109 -i 0' 101 101
Adrenomedullin -102 1 103 103 103 -108
Aflatoxin 31 aldehyde reductase -135 -121 105 -128 102 101
Alanine aminotransferase 104 106 109 111 •107 111
Alcohol dehydrogenase 1 131 128 114 105 114 111
Aldehyde dehydrogenase 1 -113 -106 -11 -132 •106 -115
Aldehyde dehydrogenase 2 -105 -1 -102 101 11 11
A'dertyde dehydrogenase. microsomal -109 -102 •109 108 104 111
Alpha 1 - inhibitor III -122 109 -104 111 -107 -103
Alpha 1 antitrypsin -125 -113 -112 105 108 103
Alpha 1 acid glycoprotein 107 -102 -111 103 -1 -109
Alpha-1 microglobulin/bikunin precursor (Ambp) 105 -106 -104 104 1 -101 alpha-12 fucosyltransferase -108 1 -104 -107 -107 112
Alpha-2-nιacroglobulιn 103 -1 1 109 103 111
Alpha-2-πιacroglobulιn. sequence 2 12 133 124 -102 105 109
Alpha 2-mιcroglobulιn 117 101 107 102 -101 101
Alpha fetoprotein -112 -109 -11 1 -106 -107
Alpha-fibrinogen 473 236 497 102 121 111
Alpha prothymosin 1 104 118 106 -108 1
Alpha lubulin 129 -102 135 -103 A 2 -104
Annexin V -122 -111 •102 124 105 119
Apolipoprotein All 12 115 111 121 11 138
Apolipoprolein C1 115 108 -11 114 104 108
Apolipoprotein CHI 12 118 111 106 118 125
Apolipoprotein E -13β -122 -134 113 11 -1
Aquapoπn-2 -183 -15 -189 106 -113 111
Aquaporιn-3 (AQP3) -115 -105 -13 122 103 106
Argminosuccinatθ lyase 112 126 116 117 136 116
Arginosuccinate synthetase 1 -133 -107 -122 106 116 109
Aryl hydrocarbon receptor -103 •105 -105 11 111 102
Aryl sulfotransferase -114 103 -105 139 126 113
Arylsulfatase B -104 -105 -105 107 109 -105
Table 45
104 -104 114 108 -104 103 -176 •158 -158 -119 -115 1 OS 144 -125 126 -109 111 -1 109 103 111 102 102 105 -116 -114 •113 111 107 -108 108 109^ 107 -13 -111 -114 -102 103 -109 106 101 106
Bcl-2 108 113 102 1 -101 112
8d-xL -116 -106 •102 101 1 1
Beta-actin 118 12 125 106 118 155
Beta-actin, sequence 2 -117 -115 -102 102 105 108
Beta-alanine synthase 107 119 -109 111 157 111
Beta-tubulin, class I 138 117 153 101 -109 1
Betainβ homocysleinβ methyftransferase (BHMT) 202 193 117 124 165 118
Bile salt export pump (sister of p-gtycoprotein) 115 12 115 108 127 105
Bilirubin UDP-glucuroπosyltransferase isozyme 1 132 135 166 -103 103 113
Bilrverdin reductase 12 109 108 111 -133 -105
BRCA1 -105 -106 -113 1 -104 -104 c-erb B-2 103 103 -105 108 106 109 c-fos 1 101 101 119 103 102 c-H-ras -101 102 109 -101 106 105 c-jun 131 112 113 106 103 -106 c-myc 139 123 132 -112 -1 108
C reactive protein 106 123 113 -104 -107 102
C4b-bιndιng protein -112 •112 -1 -103 -108 -109
Calbindin-D (9K) 116 102 •1 i 02 102 101
Calcineuπn-B 121 114 123 114 151 114
Calnexin -11 -102 •103 123 116 153
Calpactin I heavy chain 162 118 176 113 105 11
Calpaiπ 2 -107 104 101 -102 108 102
Calreticulin 112 116 12 119 115 129
Canalicular multispecific organic anion transporter 156 129 182 -101 104
Caroamyl phosphate synthetase I -102 -111 -116 -103 -102 1 1
Carbonic anhydrase II 145 13 125 113 124 1 19
Carbonic anhydrase III -159 -164 -193 -123 -212 -115
Carbonic anhydrase III, sequence 2 -114 -107 -121 123 105 123
Carboπyl reductase 103 101 -101 -105 -109 -105
Canitine palmitoyl CoA transferase 125 132 115 12 12 129
Casein-alpha -107 -113 -116 -104 -107 -105
Caspase 1 101 106 102 122 111 123
Caspase 2 -134 -112 -128 -109 -111 ■105
Caspase 3 -114 101 -117 108 106 107
Caspase 6 111 114 11 -108 105 101
Caspase 7 -113 -106 -11 -102 105 1
Catalase 129 144 117 1 133 148
Catβchol-O- methyltransferase 121 12 103 105 214 117
Cathepsin B -116 -103 103 119 123 146
Cathepsin L 186 152 285 -108 112 103
Cathepsin L, sequence 2 18 143 242 -116 106 101
Cathepsin S 111 -102 145 104 106 107
Caveolιπ-3 -103 113 116 -102 101 -113
CCR5 -106 -108 -115 -1 -103 -105
C044 metastasis suppressor gene 119 111 123 101 1 -103
Cdc2 related protein kinase (NCLK) -108 •107 -106 111 121 116
CDK102 134 -113 -119 -117 -101 -11
CDK108 -178 -155 -145 -108 -111 -102
Cellular nucleic acid binding protein (CNBP) -103 -104 109 -107 -104 104
Table 45 2/13
Cellular retinoic acid binding protein 2 109 -1 •104 108 -132 -111
Ceruloplasmm 104 132 109 123 107 124
Cholesterol 7-alpna-hydroxylase (P450 VII) -101 -111 •113 101 104 -1
Cholesterol esterase 132 122 -11 119 105 -102
Choline kinase 11 108 1 105 103 108
Ciliary neurotrophic factor 101 -102 105 109 102 102
Clusteπn 154 125 155 -102 -109 -123
Cofilin 102 -107 117 -103 -105 104
Collagen type II -131 -119 -127 102 -107 101
Colony-stimulating factor-1 -108 -106 101 -104 -105 -109
Complement component C3 123 -101 108 -103 102 -101
Complement factor I (CFI) -109 -109 101 104 -103 104
Connexιn-32 -106 -113 •117 102 105 101
Coπtrapsin-iike protease inhibitor (CPι-21) -105 103 101 -102 -102 -106
CTP phosphocholine cybdylyttransferase 58 118 145 1 105 11
CXCR4 109 103 -101 -101 -104 101
Cyclin 01 -16i -128 -139 09 102 -105
Cyclin D3 -102 -1 101 -103 -104 -101
Cyclin dependent kinase 2 -1 104 -103 -103 -104 102
Cyclin dependent kinase 4 -101 109 105 102 107 -108
Cyclin E -101 -103 -105 -1 -101 -106
Cyclin G 105 108 115 -109 •11 112
Cyclin-dependent kinase 4 inhibitor P27kιp1 1 104 -104 -ιoa -123 -111
Cyclooxygenase 2 109 -109 -106 101 101 -104
Cystatiπ C -136 -129 -162 101 -11 -101
Cytochrome c oxidase subunit I -154 -141 -109 -102 -106 -111
Cytochrome c oxidase subunit II -157 -148 -105 -108 -116 -114
Cytochrome c oxidase subunrt IV -1 4 -134 -128 103 -108 -102
| »N Cytochrome P-450Md 103 -111 106 -106 106 -104
Cytocnrome P45011 A1 1 105 -105 145 117 115
Cytochrome P4S014D 1 103 114 11 101 103
Cytochrome P45017A 117 -105 11 102 -106 -108
Cytochrome P4501A1 -102 102 -112 107 101 -103
Cytochrome P4501 A2 12 105 134 107 -101 104
Cytochrome P4501B1 104 -105 -109 103 -101 107
Cytochrome P4502A3 -121 -117 -108 -131 -107 -113
Cytochrome P4502B1/2B2 108 -103 •106 119 112 109
Cytochrome P4502C11 137 109 144 102 -101 -111
Cytochrome P4502C12 123 112 116 119 102 102
Cytochrome P4502C23 101 -105 132 -128 -101 -103
Cytochrome P4502C39 109 102 131 1 -1 108
Cytochrome P4502D18 108 113 114 -101 -111 -106
Cytochrome P4502E1 125 126 126 133 154 185
Cytochrome P4503A1 111 -103 -104 -101 -106 -107
Cytochrome P4504A1 153 148 158 113 141 129
Cytochrome P4504A1 , 50-mer 148 153 171 102 108 115
D-dopachrome tautomerase 112 122 107 -101 107 -104
Decoπn 11 134 -113 -108 -144 -115
Defender against cell death-1 -113 -102 -102 104 -108 -104
Deoxycytidine kinase -106 -103 101 -101 113 103
Diacylglycerol kinase zeta 112 11 109 107 -101 109
Diazepam binding inhibitor -113 -11 -105 107 -109 106
Dimθthylargininedimethylaπunohydrolase -158 -14 -109 -107 -114 -115
Disulfide isomerase related protein (ERp72) 117 102 111 11 101 112
DNA binding protein inhibitor ID2 102 112 108 14 118 139
DNA polymerase beta -102 107 11 -107 -102 106
DNA topoisomerase I -106 -117 101 102 109 103
3/13
Oopamine receptor 02 -116 -106 105 109 111 106
Dopamlne transporter -106 -102 -107 1 102 -109
Dynamιn-1 (D100| 106 102 -106 105 -106 108
Dynein light chain 1 12 121 121 -109 106 104
E-selectin 106 -108 -112 105 105 -102
Ecto-ATPase 167 143 16 -109 -107 -106
ΘIF-4E -104 107 103 -101 -104 -1
Elongation factor-1 alpha -104 102 11 -103 108 117
Emenπ 119 -111 •128 103 104 11
Endogenous retroviral sequence, 5' and 3* LTR 154 181 131 -114 -103 114
Endothelin converting enzyme 11 125 113 1 135 134
Endothelιn-1 104 -1 106 -101 •102 105
Enolase alpha -126 -116 -104 114 101 107
Enoyl CoA hydratase (mitochondnal) 122 117 117 127 114 118
Epidermal growth factor -326 •285 -255 -126 -13 -133
Epithelial sodium channel alpha subunit (alpha-ENaC) 1 1 -104 -104 101 109
Epoxidβ hydrolase #2 112 111 103 -121 -1 104
Equilbrative πitrobeπzytmioinosine-sensitive nucleoside transporter -109 -113 -121 -113 101 11
ERG-2 -214 -166 -169 137 113 13
Estrogen receptor 109 114 104 108 106 -1
Extracellular signal regulated kinase 1 -108 -103 -11 -108 -139 -107
F1 -ATPase beta subunit -15 -1 4 -128 11 112 113
Farnesol receptor 121 12 12 106 12 116
Fas antigen 102 109 107 126 114 137
Fatty aαd synthase -198 -157 -204 104 -466 -465
Fatty acyl-CoA oxidase 184 178 149 106 12 133
Ferritin H-chain 111 123 125 -129 103 -117
Fetuin-like protein (IRL685) 104 101 104 -127 -104 -117
Fibrinogen gamma chain 191 143 188 102 -1 -109
Focal adhesion kinase (pp125FAK) -114 -109 -104 103 103 104
Four repeat ion channel 102 106 114 117 114 101
Gadd153 117 109 115 1 -104 104
Gadd45 122 118 122 -108 103 104
Gamma actin, cytoplasmic 127 114 145 -103 -106 107
Gamma-glutamyl transpeptidase 108 112 103 -101 1 12
Gap lunction membrane channel protein beta 1 (G|b1) -106 -109 -115 -113 -108 -114
Glucokinase -1 102 -102 -102 106 -116
Glucose transporter 1 106 106 -101 -101 103 102
Glucose transporter 2 136 125 151 107 123 101
Glucose-6-phosphate dehydrogenase 148 138 163 101 -107 105
Glucose-regulated protein 78 128 121 149 11 103 126
Glucosylceramide synthase 1 11 102 -128 -122 -1 15
Glutamine synthetase -118 -104 104 107 101 1
Glutathioπe peroxidase 111 126 115 112 128 114
Glutathione reductase 11 112 113 103 107 129
Glutathione S-transferase alpha subunit -119 108 108 148 112 114
Glutathione S-transferase mu-2 127 255 155 125 118 105
Glutathione S-transferase P1 137 143 208 -113 -108 -116
Glutathione S-transferase theta-1 -123 -111 -103 -115 -104 102
Glutathione S-transferase Ya -118 103 101 158 126 129
Glutathione synthetase 114 134 112 111 141 155
Glycβraldehyde 3-phosphate dehydrogenase 125 13 127 -104 11 107
Glycine methyltransferase -106 107 -112 102 113 105
H-rev107 •104 -107 -11 -104 -108 -116
Heme binding protein 23 125 115 133 -126 -101 107
Heme oxygenase -143 -129 -129 -114 104 -109
Hemoglobin alpha 1 chain -102 103 116 -112 -19 -133
Table 45
Hemopexin " • 111 -106 -103 -102 -101 108
Hepatic lipase - — -135 128 -132 -103 108 -3
Heoatocyte growth factor receptor -104 -111 112 -111 -105 -116
Hepatocyte πudear factor 4 -153 •144 -136 -122 -124 -118
High affinity IgE receptor gamma chain (FcERlgamma) -141 -125 -121 -106 102 -114
Hιstιdιrtθ-πch glycoprotein 155 157 134 123 118 102
Histone 2A -123 -12 -123 -111 -11 •122
HMG CoA reductase 108 103 106 111 108 134
HMG-CoA synthase, cytosolic -117 -121 -121 111 •106 -107
HMG-CoA synthase, mitochondnal 46 296 196 109 105 -108
Hydroxysteroid suffotransferase a 106 111 -11 105 •103 -105
Hypoxanthine-guanine phosphonbosyitransferase 104 11 108 102 106 105
Hypoxia-inducible factor 1 alpha 116 106 112 111 108 116
ID-1 106 114 109 103 1 106
IgE binding protein 105 -105 -103 106 •103 -107
IkB-a -107 103 103 106 -106 -105
Insulin-like growth factor binding protein 1 22 202 183 -114 108 -105
Insulin-like growth factor binding protein 3 -137 -103 -137 -118 -118 101
Insulin-like growth factor binding protein 5 102 105 -104 118 -105 109
Insulin-like growth factor binding protein 6 -105 -122 •1 9 -119 -137 -1 2
Insulin-like growth factor I -126 -128 126 105 -118 •112
Insulin-like growth factor I, exon 6 -106 108 115 -104 -127 •111
Iπtegπn beta-4 103 102 109 -108 -106 -104
Integrin betal 167 161 18 107 11 12
Inter-alpha-inhibitor H4 heavy chain (Itιh4) -108 -113 108 -101 -104 •102
Interferon gamma -106 -104 -104 -126 -105 -107
Interferon inducible protein 10 104 -107 -109 122 108 -108
Interferon related developmental regulator IFRD1 (PC4) 133 109 117 -109 103 •107
Interleukin-1 beta 101 102 -104 -11 103 -104 lπterteukιn-10 -103 101 102 -I 26 -115 -1.1 lnterteukιn-18 108 102 105 -107 •105 -106 lnterteukιn-6 105 1 108 107 -101 11
Intracellular calαum-binding protein (MRP14) 103 -103 107 102 -102 -11
Intracellular calαum-binding protein (MRPβ) 106 -103 -101 -11 -102 -11
Iron-responsive element-binding protein 194 229 155 -1 13 1
Jagged 1 -116 -113 -117 -101 -101 104
JNK1 stress activated protein kinase 122 14 101 147 125 11
K-cadheππ 129 118 15 -108 -101 103 AI1 metastasis suppressor gene (CD82) -107 107 -116 117 108 117
Keratinocyte growth factor 163 138 207 -114 104 104
L gulono-gamma-lactone oxidase -108 -109 -105 102 103 -108
Lactate dehydrogenase-3 -145 -124 -124 103 108 101
Lecithin cholesterol acyltransferase 102 -11 -112 102 103 104
Leptm receptor (fatty) 112 -103 -117 11 102 11 ϋpopofysacchande binding protein -106 106 -106 -1 -118 •134
Lipoprotein lipase 13 132 103 118 1 109
Liver fatty acid binding protein 11 -103 104 103 -103 105
Low density lipoprotein receptor 114 104 11 107 -104 106
Lysyl hydroxylase -119 -109 1 108 103 -1
Lysyl oxidase 107 119 114 111 11 121
Macrophage inflammatory proteιn-1 alpha -103 1 106 105 111 105
Macrophage inflammatory proteιn-2 alpha 121 111 118 102 1 -103
Macrophage metalloelastase -107 -111 -122 105 -102 102
Ma|or acute phase protein alpha-1 204 141 158 1 1 -101
Major basic protein 1 102 103 -103 -113 -111 -113
Malate dehydrogenase, cytosolic -159 •142 -137 109 108 101
Malic enzyme -151 -14 101 -11 -134 -127
Table 45 5/13
Figure imgf000166_0001
Table 45 6/13
Qaraoxonase 1 -115 -103 -116 -103 -109 -104
Peroxtsomaj 3-ketoacyl-CύA thiolase i 145 151 134 102 121 129
Peroxisomal 3-ketoacyl-CoA ιnιoιa°e 2 161 173 138 11 129 135
Peroxisomal acyl-CoA oxidase 175 183 161 116 147 164
Peroxisomal multifunctional enzyme type II -107 11 124 123 126 119
Peroxisome assembly factor 1 121 118 11 109 112 105
Peroxisome assembly factor 2 125 127 123 103 11 102
Peroxisome proliferator activated receptor alpha 114 114 108 109 125 128
Peroxisome proliferator activated receptor gamma 1 103 102 111 -105 -11
Phase-1 RCT 165 -113 -119 -122 114 -109 1
Phase-1 RCT 252 -104 -107 -11 -106 102 -11
Phase-1 RCT-10 -136 -12 -131 103 112 106
Phase-1 RCT-101 1 102 -104 109 116 116
Phase-1 RCT-102 -15 -148 -159 126 -1 112
Phase-1 RCT-103 -103 102 -107 -131 -169 -113
Phase-1 RCT-106 -106 -106 102 -107 -114 -109
Phase-1 RCT 107 109 114 12 -1 113 103
Phase-1 RCT-108 -102 104 -103 -157 -138 •118
Phase-1 RCT-109 102 1 115 -112 -111 -11
Phase-1 RCT 110 1 -117 -128 -106 106 113
Phase-1 RCT-111 104 102 -106 -132 -154 •116
Phase-1 RCT-112 -1 -112 -119 101 -109 •113
Phase- 1 RCT-113 -105 101 -109 104 102 114
Phase-1 RCT-114 -109 -107 -117 -1 -107 108
Phase-1 RCT-115 101 101 -103 1 112 -105
Phase-1 RCT-"6 101 -101 -i oι 101 102 1 OS
Phase-1 RCT 117 108 107 -105 11 146 105
Phase-1 RCT-118 1 -102 -103 101 -125 -117
Phase-1 RCT-119 -106 -115 -126 -105 •104 -112 fjl Phase-1 RCT-12 -101 109 101 -111 -111 -107
Phase-1 RCT-121 -102 103 -11 108 105 -108
Phase-1 RCT-122 115 103 1 104 -1 106
Phase-1 RCT- 123 -106 -108 -119 -111 -103 107
Phase-1 RCT-125 112 104 -114 -101 -103 -113
Phase-1 RCT-126 122 12 116 123 107 101
Phase-1 RCT-127 117 1 -109 -101 111 -108
Phase-1 RCT-128 104 -1 -107 -1 108 -106
Phase-1 RCT 129 ιoι -103 105 122 -107 111
Phase-1 RCT- 13 -104 141 121 -172 115 129
Phase-1 RCT-130 -108 -102 -111 -111 -11 -106
Phase-1 RCT 131 -219 -193 -179 -11 -132 -116
Phase-1 RCT-132 -125 -119 -115 -101 -105 114
Phase-1 RCT- 133 11 105 121 -11 -103 -105
Phase-1 RCT-134 -104 -103 101 107 112 107
Phase-1 RCT-136 -108 -106 -105 -106 101 1
Phase-1 RCT- 137 125 131 149 -108 102 -104
Phase-1 RCT-138 -118 -117 -116 -108 106 -114
Phase-1 RCT 14 102 108 -102 -102 -115 -105
Phase-1 RCT-140 -102 106 -102 -1 -103 109
Phase-1 RCT-141 138 134 121 -102 103 104
Phase-1 RCT-142 -143 -127 -127 -106 -111 -107
Phase-1 RCT 143 -167 -148 -135 102 -104 101
Phase-1 RCT-144 108 -106 12 -114 -119 -115
Phase-1 RCT-145 115 103 123 102 103 -101
Phase-1 RCT 146 1 -107 102 101 -105 -101
Phase-1 RCT-147 -109 -104 -106 -159 -186 -139
Phase-1 RCT-148 -103 109 -107 103 115 105
Table 45 7/13
Phase-1 RCT-H9 177 131 115 -109 -109 -107 Dhase-1 RCT 15 106 102 103 111 106 Pnase-1 RCT-150 -114 -104 ' -117 105 125 107- Phasβ-1 RCT-151 -12 -11 121 107 102 108 Phase-1 RCT-152 -104 -111 12 -104 -106 -104 Phase-1 RCT 153 -144 129 131 -106 -102 -109 Phase-1 RCT-154 -107 -109 -102 103 -109 108 Phase-1 RCT-155 -11 -104 •115 -104 -103 -107 Phase-1 RCT-156 104 -103 •1 -107 -125 -108 Phase-1 RCT-158 196 135 256 -104 165 106 Phase-1 RCT-160 103 -111 -106 1 104 106 Phase-1 RCT-161 -13 -139 -129 -103 -107 -108 Phase-1 RCT-162 -109 -113 -101 102 -101 102 Phase-1 RCT-164 115 105 101 117 111 139 Phase-1 RCT-166 -129 -114 1 -122 116 -1 Phase-1 RCT-168 121 111 126 102 101 109 Phase-1 RCT-169 -104 -1 M •119 -102 -104 -107 Phase-1 RCT-17 102 119 101 108 104 -102 Phase-1 RCT 170 108 103 116 -11 -135 -128 Phase-1 RCT-173 -137 -11 -138 -105 -103 111 Phase-1 RCT-174 -101 -108 -101 11 102 -101 Phase-1 RCT-175 -146 -13 -123 -105 -102 113 Phase-1 RCT 176 102 101 104 11 118 111 Phase-1 RCT-177 106 108 115 -114 -107 107 Phase-1 RCT-1 8 -17 -113 -16 -103 105 -205 Phase-1 RCT-179 115 11 148 111 101 103 Phase-1 RCT-18 -114 -119 -12 -109 ■101 -104 Phase-1 RCT-180 109 111 123 -1 oι •102 103 Phase-1 RCT-181 -111 -11 -113 -109 109 -103 Phase-1 RCT 182 -21 179 167 -118 -11 -12 Phase-1 RCT-184 105 -105 -103 -103 102 -108 Phase- 1 RCT-185 139 124 108 -103 118 -13 Phase-1 RCT-187 -12 114 119 103 103 106 Phase-1 RCT-188 -113 -111 -108 -103 104 107 Phase-1 RCT-189 131 124 126 126 114 103 Phase-1 RCT-191 125 122 112 -104 106 101 Phase-1 RCT 192 -111 -104 103 108 113 115 Phase-1 RCT-193 112 -104 -104 -104 -103 -105 Phase-1 RCT-194 -1 -11 -109 1 104 -108 Phase-1 RCT-195 ιoι 105 -104 -104 -104 -102 Phase-1 RCT-196 105 106 118 101 105 106 Phase-1 RCT-197 -11 -109 -107 -107 105 -105 Phase-1 RCT-198 -1 -102 118 -101 102 -102 Phase-1 RCT-199 21 164 201 -101 101 -101 Phase-1 RCT-2 105 •103 -106 -106 101 -115 Phase-1 RCT-20 -1 -106 -103 03 104 -101 Phase-1 RCT-202 -11 -108 -101 101 107 101 Phase-1 RCT-204 101 •107 -123 -122 -109 -108 Phase-1 RCT-205 -118 -115 -101 -102 101 -102 Phase-1 RCT-206 -103 -111 -11 -103 -102 -102 Phase-1 RCT-207 114 116 126 -105 105 -102 Phase-1 RCT-208 112 105 102 -102 1 113 Phase-1 RCT-209 121 122 129 102 1 111 Phase-1 RCT-21 104 -104 104 -101 117 -119 Phase- 1 RCT-211 -101 102 106 -117 101 -114 Phase-1 RCT-212 -108 -101 -104 1 104 -105 Phase-1 RCT-213 -109 -102 103 -106 107 11
Table 45 8/13
Phase-1 RCT-214 -102 -101 -11 109 105 -t Phase-1 RCT-215 105 1 109 111 106 102 Phase-1 RCT-216 106 ""-104 -11 -102 -101 106 Phase-1 RCT-218 -101 -114 -126 -104 -104 -104 Phase-1 RCT-219 -102 -111 -113 -102 -1 102 Phase-1 RCT-22 -118 -105 -112 -105 -118 -104 Phase-1 RCT-220 106 -102 -105 109 102 102 Phase-1 RCT-221 -103 103 -105 -124 -129 -116 Phase-1 RCT-222 •102 -103 -109 1 -103 -107 Phase-1 RCT-225 1 -11 -115 106 109 -102 Phase-1 RCT-227 •106 -1 IS -12 -104 -106 -112 Phase-1 RCT-228 103 103 -108 -107 -118 -105 Phase-1 RCT-229 -106 -114 -125 -102 -111 -105 Phase-1 RCT-230 -105 - 06 -122 105 -111 -104 Phase-1 RCT-231 111 102 103 103 105 114 Phase-1 RCT-233 107 -109 -106 102 102 -111 Phase-1 RCT-235 -107 105 -101 -118 -116 -118 Phase-1 RCT-236 •102 -107 -105 107 1 103 Phase-1 RCT-237 113 -1 -11 105 105 107 Phase-1 RCT-239 112 101 -104 -104 -101 -113 Phase-1 RCT-24 135 114 139 -1 -102 -1 Phase-1 RCT-240 104 105 1 111 101 -101 Phase-1 RCT-241 152 125 157 -101 105 109 Phase-1 RCT-242 182 131 188 104 106 -112 Phase-1 RCT 243 -105 -1 -101 -102 -103 -108 Phase-1 RC ι'-244 -114 104 -104 108 -101 106 Phase-1 RCT-245 101 -106 -113 117 1 111 Phase-1 RCT-246 107 -107 -11 1 104 107 Phase-1 RCT-248 105 -104 -11 -107 106 -111
-4 Phase-1 RCT-25 -13 -115 -113 105 105 102 Phase-1 RCT 251 104 -105 -117 -1 106 112 Phase-1 RCT-253 -141 -112 -122 109 116 12 Phase- 1 RCT-255 -103 •11 •115 -107 101 109 Phase-1 RCT-256 109 104 128 -1 oι ιoι ιoι Phase-1 RCT-258 113 103 12 -102 102 102 Phase-1 RCT-259 104 •101 -101 -1 101 -112 Phase-1 RCT-26 -115 •112 -121 -101 105 108 Phase-1 RCT-260 -107 -108 -117 102 111 -107 Phase-1 RCT-261 11 102 108 104 103 11 Phase-1 RCT-262 -101 -114 •118 -101 102 -104 Phase-1 RCT-263 -102 1 -102 102 106 105 Phase-1 RCT-264 -104 -107 -108 •104 -104 -108 Phase-1 RCT-266 -1 102 -104 102 -103 101 Phase-1 RCT-267 119 -107 102 -107 112 118 Phase-1 RCT-268 -103 109 -119 102 -105 -111 Phase-1 RCT-27 126 113 11 -113 -117 -107 Phase-1 RCT-270 -116 •114 -113 -1 106 -102 Phase-1 RCT-271 -138 -11 126 108 1 118 Phase-1 RCT-273 -109 -102 -113 106 -1 -1 Phase-1 RCT-274 153 127 159 -101 -108 -118 Phase-1 RCT-276 101 -107 -102 -112 104 -107 Phase-1 RCT 277 107 -103 -106 105 •101 103 Phase-1 RCT-278 104 -106 -107 -107 111 -106 Phase-1 RCT-279 -106 -113 •117 -105 106 -108 Phase-1 RCT-28 103 102 107 103 1 -104 Phase-1 RCT-280 105 -1 -106 1 106 112 Phase-1 RCT-281 -107 -102 -11 -122 -113 104
Table 45 9/13
Phase- 1 RCT-282" -115 -118 -117 -107 -104 -104 Phase-! RCT-283 -104 -103 -108 -103 -107 -103 Phase-1 RCT 284 103 -101 11 -111 105 -103 Phase-1 RCT-285 -103 -I 03 -107 -101 105 108 Phase-1 RCT-286 104 -105 -102 105 -1 101 Phase-1 RCT 287 -101 -101 -102 -106 103 -108 Phase-1 RCT-288 101 104 -108 103 102 •101 Phase-1 RCT-289 -103 101 -1 -101 105 114 Phase-1 RCT-29 -108 -116 -I 18 -1 -104 -103 Phase-1 RCT-290 124 115 103 106 119 105 Phase-1 RCT-291 -102 -106 -1 108 123 108 Phase-1 RCT-292 •11 -108 -106 104 -1 •101 Phase-1 RCT 293 144 134 151 114 -115 -104 Phase-1 RCT-294 -108 -113 -12 107 -111 •105 Phase-1 RCT-295 -104 101 106 107 105 108 Phase- 1 RCT 296 102 -103 -103 103 -108 108 Phase-1 RCT-297 102 -1 OS -I 06 109 -108 •108 Phase-1 RCT-3 -101 -105 -111 101 -101 -107 Phase- 1 RCT-30 101 -102 -113 102 103 103 Phase-1 RCT-31 -13 -126 -115 104 1 103 Phase-1 RCT-32 105 102 -103 -106 -107 1 Phase-1 RCT-33 -101 -106 -11 -104 107 101 Phase-1 RCT 34 -148 -127 -118 104 105 125 Phase-1 RCT-35 -119 -112 -117 11 -1 106 Phase-1 RCT-36 -104 -105 -113 -101 104 -107 Phase- 1 RCT-37 -145 -14 -14 -116 -107 101 Phase-1 RCT-38 -119 -i 04 -124 106 104 113 Phase-1 RCT-39 116 1 B 103 -106 123 102 Phase-1 RCT-40 -101 -117 121 -116 -1 -104 Phase-t RCT 41 -107 -106 -122 101 -103 -112 Phase-1 RCT-42 -13 -107 -129 111 113 104 Phase-1 RCT-43 -102 -101 -103 -101 -106 -106 Phase-1 RCT-45 109 1 103 107 104 106 Phase-1 RCT-47 1 103 -106 -103 -103 -103 Phase-1 RCT-48 -123 -107 -111 112 107 111 Phase-1 RCT-49 11 11 11 -114 102 116 Phase-1 RCT-50 196 136 191 103 -101 ιoι Phase-1 RCT-51 -105 -109 -114 -1 -1 oι 104 Phase-1 RCT 52 -108 -115 -1 14 -116 105 104 Phase-1 RCT-53 -1 112 -107 -111 -115 -106 Phase-1 RCT 54 -1 •104 -102 -103 107 105 Phase-1 RCT-55 108 ιoι 111 -104 -11 102 Phase-t RCT-56 -104 -108 -106 104 11 106 Phase-1 RCT 57 107 107 102 102 -101 102 Phase-1 RCT-58 -106 -108 -104 114 11 119 Phase-1 RCT-59 123 106 -106 108 121 11 Phase-1 RCT-6 -112 -108 11 -106 -106 -103 Phase-1 RCT-60 107 102 109 102 -101 -1 oι Phase-1 RCT-61 -105 -111 -11 103 -1 105 Phase-1 RCT-62 -138 114 -114 -124 1 115 Phase-1 RCT-63 104 101 -103 -117 -118 -114 Phase-1 RCT-64 -116 1 -11 105 -108 -105 Phase-1 RCT-65 12 11 116 125 137 144 Phase-1 RCT-66 112 -1 112 103 -101 -102 Phase-1 RCT-67 -102 101 -1 -103 -102 -111 Phase-1 RCT-68 137 116 142 -1 105 104 Phase-1 RCT-69 11 113 113 -116 -113 104
Table 45 10/13
Phase-1 RCT-7 -104 -106 -102 103 109 12
Phase- 1 RCT-70 104 -1 104 107 101 '
Phase-1 RCT-71 102 -101 104 -111 -104 -11
Phase-1 RCT-72 - 113 104 122 106 -102 105
Phase-1 RCT-73 -105 -11 -108 •108 -109 102
Dhase-1 RCT-74 102 -111 -114 -102 -101 -106
Phase-1 RCT-75 109 103 11 •104 114 119
Phase-1 RCT-76 105 101 -101 -123 -118 -107
Phase-1 RCT-77 -119 -115 103 113 11 116
Phase-1 RCT-78 -14 -137 -142 119 -1 105
Phase-1 RCT-79 -106 -104 -108 103 1 101
Phase-1 RCT-8 116 -113 -11 -112 -101 -101
Phase-1 RCT-80 -104 -109 -113 101 102 -111
Phase-1 RCT-81 101 104 -108 -105 -105 -107
Phase-1 RCT-82 -101 -114 -11 -101 101 -108
Phase-1 RCT-83 -123 -113 11 1 105 -105
Phase-1 RCT-84 1 -105 ■11 -106 -109
Phase-1 RCT-85 106 101 -102 -105 101 107
Phase-1 RCT-87 115 -113 -I 04 102 1 -104
Phase-1 RCT-88 125 121 108 106 107 -103
Phase-1 RCT-89 -113 -116 -128 -109 -101 -1
Phase-1 RCT-9 102 -1 104 -107 -16 -12
Phase-1 RCT-90 106 -119 127 107 11 11
Phase-1 RCT-91 -102 -105 101 -109 -106 -101
Phase-1 RCT-92 109 -107 -103 106 1 -108
Phase-1 RCT-93 113 117 124 -103 104 -108
Phase-1 RCT-94 101 -104 -111 -1 102 -1
Phase-1 RCT-95 -1 i 02 -103 -111 -124 -108
Phase-1 RCT-96 108 106 103 -103 106 105
Phase- 1 RCT-97 108 -102 101 106 1 102
Phase-1 RCT-99 11 116 125 109 115 123
Phenylalamne hydroxylase •173 -146 •13 -111 116 107
Phosphabdylethanolamne-binding protem 114 113 -104 107 114 117
Phosphoglycerate kinase -122 -123 -107 -101 109 103
Phospholipase D -101 -11 -115 -1 -109 -104
Pιm1 proto-oncogene 104 108 106 101 109 -109
Poly(ADP-πbose) polymerase 104 103 106 118 11 125
Preproalbumin 103 -104 -101 -103 101 -108
Preproalbumin, sequence 2 108 -102 -1 -112 -109 -107
Presenιlιn-1 -115 -114 -111 125 -112 -105
Proliferating cell nuclear antigen gene 1 102 107 -103 -103 -106
Prostaglandin H synthase -105 -102 -105 111 1 C9 105
Proteasome activator 28 alpha -115 -105 121 -107 -104 -107
Protein disulfide isomerase (PDI) -108 -105 -1 oι -105 101 102
Protein kinase C alpha -109 -102 -1 -105 104 -106
Protein kinase C betal 232 188 251 -143 128 -114
Protein tyrosine phosphatase alpha -106 -101 108 102 -108 -108
Protein tyrosine phosphatase, receptor type, D -1 102 -109 -107 108 -103
PTEN/MMAC1 -115 -106 -107 -12 -104 102
Putative membrane fatty acid transporter -104 1 II -138 -103 -138 -113
Pyruvate kinase muscle 125 111 111 116 -117 -105
RAC protein kinase beta 106 113 104 102 -102 -105
RAD 1 103 101 117 104 114
Ref-1 121 11 128 107 106 109
Renal organic anion transporter 134 132 138 103 116 103
Retinoid X receptor alpha -103 104 -101 101 106 -1
Retinol dehydrogenase type III 101 -1 OS -105 129 104 107
Table 45 11/13
Retiπol-binβing protein (RBP) -11 152 116 -119 121 -118
Ribosomal protein L13 109 -1 109 -101 -106 -113
Ribosomal protein L13A 109 111 116 108 1"1 -108
Ribosomal pros n L27 -117 -115 106 -1G3 -111 -105
Ribosomal protein S17 -111 -11S 102 105 -104 105
Ribosomal protein S8 -109 -114 115 101 101 -103
Ribosomal protein S9 105 -101 106 •133 -146 -123
S-adenosylmethionine decarboxylase -115 -103 -107 104 108 103
S-adenosylmethionine synthetase 102 106 -106 136 155 135
Sarcoplasmic reticulum calcium ATPase -102 102 -109 111 108 114
Scavenger receptor class B type I 109 -105 1.02 1 -104 -105
Schlafen-4 -103 -114 -13 •1 OS -107 -1 IS
Selenoprotein P -168 -118 -134 123 124 112
Senescence marker protβtn-30 -212 -147 -16 112 -132 108
Serotonin transporter (SERT) -125 -121 -117 105 105 -105
SodiunVbtle acid cotransporter -111 104 103 156 121 115
Sodium/glucose cotransporter 1 -14 -128 109 -105 -112 -108
Sorbitol dehydrogenase -119 -105 -105 111 137 139
Stathmin 114 1 -102 -115 -11 -103
Stearyl-CoA desaturase, liver -531 -334 -436 109 -831 -423
Stem cell factor -163 -129 -1 6 1 -122 101
Sterol earner protein 2 101 -1 104 104 113 109
Sulfotransferase K2 -234 -189 -125 113 138 121
Superoxide dismutase Cu/Zn -12 -113 -114 -122 -123 106
Superoxide dismutase Mn 104 104 107 109 112 109
Suppressor of cytokine signaling 3 114 111 11 1 OS 101 102
Syndecan-1 -109 -111 -114 -106 -104 -106
T-cell cyclopaliπ -122 -12 -102 -116 -107 101
-4 TGF-beta receptor type II -106 103 109 -101 11 111
© Thiol-speαfic antioxidant (natural killer cell-enhancing factor B) -101 1 106 -108 -109 -103
Thiopuπne methyltransferase -1 109 102 -11 -121 -12
Thιoredoxιn-1 (Trx1) 116 118 127 -112 -112 -103
Thιoredoxιn-2 (Trx2) -127 -113 -104 114 114 114
Thrombin receptor (PAR-1 ) 104 109 11 109 1 125
Thrombomodulm 113 114 132 105 103 103
Thymidylate synthase -102 -101 -103 102 -109 -104
Thymosin beta- 10 106 104 105 -122 -121 -129
Tissue factor 108 109 119 116 102 107
Tissue factor pathway inhibitor -108 -104 -11 105 101 11
Tissue inhibitor of metalloproteinases-1 14 118 133 -1 ιoι -105
Tissue inhibitor of metalloproteιnases-3 -109 103 -103 107 -108 106
Tissue plasminogen activator 113 11 115 -109 -11 -11
Transfemn 183 171 18 1 4 -109 -11
Transforming growth factor-beta3 -104 106 -105 104 104 -109
Transitional endoplasmic reticulum ATPase 105 102 114 -102 105 106
Transthyrebn 104 107 -1 105 -112 -103
Tryptophan hγdroxylase -115 107 -118 103 108 1 oι
Tyrosine aminotransferase -101 104 -103 101 -112 -1
Tyrosine hydroxylase -113 -104 -109 -101 103 -104
Tyrosine protein kinase receptor (UFO) -101 -102 -108 108 104 -101
Ubiquitin conjugating enzyme (RAD 6 homologue) 104 -108 112 -123 -151 -116
UDP-glucuronosyltransferase 125 107 203 -141 102 -125
UDP-glucuronosyltransferase 1A6 137 141 163 •103 -101 114
UDP glucuronosyitransferase 2B 131 125 194 -14 -104 -119
Uncoupling protein 2 -12 -109 -106 -118 -124 -114
Urate oxidase 114 117 105 11 116 116
Urokinase plasminogen activator receptor -102 103 -108 -101 -112 -105
Table 45 12/13
Vascular cell adhesion molecule 1 (VCAM-1) 113 104 109 ■ 102 -105 102
Vascular eπdothelial growth factor 103 104 101 104 111 106
Very long-chain acyl-CoA dehydrogenase 109 1.04 108 -1 of: -104 103
Very long-chain acyl-CoA synthetase 116 1.55 153 -1.01 -1.07 128
Vesicular monoamine transporter (VMAT) -102 104 -101 1.01 1.02 -106
VL30 element 14 1.75 124 1.71 -14 104
Waf1 112 108 106 1.08 -1 1.05
Zinc finger protein 171 142 199 1.12 113 106
(Phase-1 RCT 98) 1.02 -117 -124
(Phase-1 RCT-100) 1.3 12 103 -108 -123 1.09
(Phase-1 RCT-167) 106 -1.01 109 1.21 1.01 -1.07
(Phase-1 RCT-171) -127 -105 -11 -1.17 -103 11
(Phase-1 RCT-190) -1.38 -1.37 -1.19 1.07 102 •105
{Phase-1 RCT-1) 1.1 102 119 1.01 -1 -1.04
(Phase-1 RCT-200) -114 -1.11 -1.1 -1.1 1.01 -1.06
{Phase-1 RCT-247] -11 -135 -162 -1.35 -115 -162
(Phase-1 RCT-265) 117 •105 104 -105 -106 -114
(Phase-1 RCT-269) 105 -106 -102 -117 -13 -11
(Phase-1 RCT-272) 104 101 102 125 -1 -102
(Phase-1 RCT-275) 101 -101 132 -107 113 113
(Phase-1 RCT-5) -105 ιoι 115 -116 -192 -136
(PSTI-II} -155 -129 -148 -111 106 -114
(Ribosomal protein L6) 104 -103 118 -104 1 -102
13/13
Figure imgf000174_0001
Figure imgf000175_0001
Figure imgf000176_0001
Figure imgf000176_0002
Figure imgf000177_0001
Figure imgf000177_0002
Figure imgf000178_0002
Figure imgf000178_0003
Figure imgf000178_0001
Figure imgf000179_0001
Figure imgf000179_0002
Figure imgf000179_0003
-4 -4
Figure imgf000180_0001
Figure imgf000180_0002
Figure imgf000181_0001
Figure imgf000181_0002
Figure imgf000182_0001
Figure imgf000182_0002
oe o
Figure imgf000183_0001
Figure imgf000183_0002
Figure imgf000183_0004
Figure imgf000183_0005
Figure imgf000183_0003
Figure imgf000184_0001
Figure imgf000184_0002
Figure imgf000184_0003
Figure imgf000184_0004
Figure imgf000185_0001
Figure imgf000185_0002
oe
Figure imgf000186_0001
Figure imgf000186_0002
Figure imgf000186_0003
Figure imgf000187_0001
Figure imgf000187_0003
Figure imgf000187_0002
Figure imgf000187_0004
Figure imgf000187_0005
Figure imgf000187_0006
Figure imgf000188_0001
Figure imgf000188_0002
Figure imgf000189_0001
Figure imgf000189_0002
Figure imgf000190_0001
Figure imgf000190_0002
Figure imgf000190_0003
Figure imgf000191_0002
Figure imgf000191_0003
Figure imgf000191_0001
00
Figure imgf000192_0001
Figure imgf000192_0002
Figure imgf000192_0003
Figure imgf000192_0004
Figure imgf000193_0001
Figure imgf000194_0001
Figure imgf000195_0001
Figure imgf000196_0001
Figure imgf000196_0002
Figure imgf000197_0001
Figure imgf000197_0002
Ul
r« :
Figure imgf000198_0001
Figure imgf000199_0001
Figure imgf000199_0002
Figure imgf000200_0001
Figure imgf000200_0002
oe
Figure imgf000201_0001
Figure imgf000202_0001
Figure imgf000203_0002
Figure imgf000203_0001
Figure imgf000203_0003
Figure imgf000204_0001
Figure imgf000204_0002
Figure imgf000205_0001
©
Figure imgf000205_0002
Figure imgf000205_0003
Figure imgf000206_0001
Figure imgf000206_0002
©
Figure imgf000207_0001
Figure imgf000208_0001
Figure imgf000209_0001
Figure imgf000210_0001
© oe
Figure imgf000211_0001
Figure imgf000211_0002
Figure imgf000212_0001
Figure imgf000212_0002
Is)
Figure imgf000212_0003
Figure imgf000213_0001
Figure imgf000213_0002
Figure imgf000213_0003
Figure imgf000213_0004
Figure imgf000213_0005
Figure imgf000214_0001
Figure imgf000215_0001
Figure imgf000215_0002
Figure imgf000215_0003
Figure imgf000216_0001
Figure imgf000216_0002
Figure imgf000216_0004
Figure imgf000216_0003
Figure imgf000216_0005
Figure imgf000217_0001
Figure imgf000217_0002
Figure imgf000218_0001
XX.i'XVXXX.
Figure imgf000218_0002
stags
! t f> ϊ lii
Figure imgf000218_0003
igaε:
■ sέs-≤ls !κs
,S83 δ S I S 3 § i £ 8 $ 3 £ £ 8 £ 3
S S 3 s - - a 8 S 83
8 fe In S
Figure imgf000218_0004
SSfef gssssss sS;aS2si|sl s a 826 C B I X Ξ s
88-:§M8EE>S
Figure imgf000219_0001
Figure imgf000219_0003
Figure imgf000219_0002
Figure imgf000219_0004
Figure imgf000220_0002
Figure imgf000220_0001
Figure imgf000220_0004
Figure imgf000220_0005
Figure imgf000220_0003
Figure imgf000221_0001
Figure imgf000221_0002
Figure imgf000221_0003
IBISS-Mii-
BsSaaSδfc-h
SK in •> 3 α .
ooo o ioloioio —i o
Figure imgf000222_0001
j Saθi-e»sαsαδQπσtfcjQϊ'eM
ONro -SαSoro-S
o d o o d •- o o §SgSΪS8|-:8§g
-a;
Figure imgf000222_0002
S5-.SSBI.EIISΪIS agJSSgggggge- s
Figure imgf000222_0003
ΪRfcξjSS ϊ 958 N 3 ϊ *333 » 5
Figure imgf000222_0004
g εΞCi!
Figure imgf000222_0005
Figure imgf000222_0006
:|<
I ϊ 9 β 5 J 85 i IE?
Figure imgf000222_0007
Figure imgf000223_0001
Figure imgf000224_0001
Figure imgf000224_0002
Figure imgf000224_0003
Figure imgf000224_0004
Figure imgf000225_0001
Figure imgf000225_0002
Figure imgf000225_0003
Figure imgf000226_0001
Figure imgf000227_0001
Figure imgf000227_0002
Figure imgf000227_0004
Figure imgf000227_0003
Figure imgf000228_0001
Figure imgf000228_0003
Figure imgf000228_0002
Figure imgf000228_0004
Figure imgf000229_0001
Figure imgf000229_0002
oe
Figure imgf000231_0002
Figure imgf000231_0001
Figure imgf000231_0003
Figure imgf000231_0004
Figure imgf000232_0001
Figure imgf000232_0002
Figure imgf000232_0003
Figure imgf000232_0005
Figure imgf000232_0004
Figure imgf000233_0001
Figure imgf000234_0001
fc8 i («.S_i(;O .■i (÷; ϋ ir) 3S σ (
ss δl.siiil?.
O O •- •<- — oόo o —
E s ε 2 § s i si s s s
85 fe nKriStoδs -!
Figure imgf000234_0002
-oo-oooo-«-
S333.8g8.i3S.
: 8 S B 8882 s * *
Figure imgf000235_0001
Figure imgf000235_0003
Figure imgf000235_0002
Figure imgf000236_0001
Figure imgf000237_0002
Figure imgf000237_0001
Figure imgf000237_0003
Figure imgf000237_0004
Figure imgf000238_0001
Ul
Figure imgf000239_0001
Figure imgf000239_0002
Figure imgf000239_0003
Figure imgf000239_0004
Figure imgf000240_0001
Figure imgf000241_0002
Figure imgf000241_0003
Figure imgf000241_0001
1 mm 286 o o
Figure imgf000242_0001
i oiσgi σi«-i-.il oiolσlo
Figure imgf000242_0002
Figure imgf000242_0003
sUXXUΛ
■lNθNon0oα]iδ
£ g-Sggtegw S
:f>8 ligiilil
81S ess
-
!
Figure imgf000242_0004
X X Λ
Figure imgf000242_0005
siail sss
Figure imgf000242_0006
sssϊssssaags
6I -1.3.15?$
Figure imgf000243_0001
Figure imgf000243_0002
Figure imgf000244_0001
Figure imgf000244_0003
Figure imgf000244_0002
Figure imgf000245_0001
Figure imgf000245_0002
4- Ul
Figure imgf000245_0003
Figure imgf000245_0004
Figure imgf000246_0001
- -
Figure imgf000247_0002
Figure imgf000248_0001
Figure imgf000248_0002
Figure imgf000248_0003
Figure imgf000249_0001
Figure imgf000249_0002
Figure imgf000250_0001
oe
Figure imgf000251_0001
Figure imgf000251_0002
Figure imgf000252_0001
Figure imgf000252_0002
Figure imgf000253_0001
Figure imgf000253_0002
Figure imgf000253_0003
Figure imgf000255_0001
Figure imgf000255_0003
Figure imgf000255_0004
Figure imgf000255_0002
Figure imgf000256_0001
Figure imgf000256_0002
Figure imgf000256_0003
Figure imgf000257_0002
Figure imgf000257_0001
Figure imgf000257_0003
Figure imgf000257_0004
Figure imgf000257_0005
Figure imgf000259_0001
Figure imgf000259_0002
Figure imgf000259_0003
Figure imgf000259_0004
Figure imgf000260_0001
Figure imgf000261_0001
Figure imgf000261_0003
Figure imgf000261_0002
Figure imgf000261_0004
Figure imgf000261_0005
Figure imgf000263_0001
Figure imgf000263_0002
Figure imgf000263_0003
„ , „„,_,,„ .__ ._, „__, , ,__ , ,_ .„ __„ ...... _. „ .._ .._ „„„,„ .„__,. , .„,_, .„„_
Figure imgf000263_0004
Figure imgf000263_0005
Figure imgf000264_0001
Figure imgf000264_0002
Figure imgf000264_0003
Figure imgf000264_0004
Figure imgf000265_0001
Figure imgf000265_0003
Figure imgf000265_0004
Figure imgf000265_0002
-
Figure imgf000267_0001
Figure imgf000267_0002
Figure imgf000268_0002
Figure imgf000268_0001
Figure imgf000268_0003
Figure imgf000268_0004
Figure imgf000268_0005
Figure imgf000268_0006
Figure imgf000268_0007
Figure imgf000269_0001
Figure imgf000269_0002
oe
Figure imgf000271_0001
Figure imgf000271_0002
Figure imgf000271_0003
Figure imgf000271_0004
Figure imgf000272_0001
Figure imgf000272_0002
Figure imgf000273_0001
Figure imgf000273_0002
Figure imgf000273_0003
Figure imgf000273_0004
Figure imgf000273_0005
Figure imgf000273_0006
-4
Figure imgf000275_0001
Figure imgf000276_0001
Figure imgf000276_0002
Figure imgf000276_0003
Figure imgf000277_0001
Figure imgf000277_0002
-4
Figure imgf000279_0001
Figure imgf000279_0002
Figure imgf000280_0001
Figure imgf000280_0002
Figure imgf000281_0002
Figure imgf000281_0001
Figure imgf000281_0003
oe ©
Figure imgf000283_0002
Figure imgf000283_0001
Figure imgf000283_0003
Figure imgf000284_0001
Figure imgf000284_0002
-seises
Figure imgf000285_0001
Figure imgf000285_0002
oe
Figure imgf000287_0002
Figure imgf000287_0001
Figure imgf000287_0003
oe
Figure imgf000287_0004
Figure imgf000288_0001
Figure imgf000288_0002
Figure imgf000289_0001
oe oe
Figure imgf000291_0001
Figure imgf000291_0002
oe
Figure imgf000291_0003
Figure imgf000292_0001
Figure imgf000292_0002
Figure imgf000293_0001
Figure imgf000293_0002
Figure imgf000293_0003

Claims

What is claimed is:
[c1] A method of predicting the kidney toxicity in an individual to an agent, comprising the steps of:
obtaining a biological sample from an individual treated with the agent;
measuring the expression of one or more kidney toxicity predictive genes in the sample, wherein the genes are selected from the group consisting of the genes corresponding to the partial gene sequences in Table 32, thereby generating a test expression profile; and
using the test expression profile with a set of reference expression profiles in a Predictive Model to determine whether the agent will induce kidney toxicity in the individual.
[c2] The method according to claim 1 , wherein the expression of the kidney toxicity predictive gene is measured at the RNA level.
[c3] The method according to claim 1 , wherein the expression of the kidney toxicity predictive gene is measured at the protein level.
[c4] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo All.
[c5] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 6.
[c6] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 5. [c7] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 4.
[c8] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 3.
[c9] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 2.
[c10] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 1.
[d 1 ] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo All.
[c12] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 6.
[c13] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 5.
[c14] The method according to any of the preceding claims 1-13, wherein the expression of at least one gene is measured.
[c15] The method according to any of the preceding claims 1-13, wherein the expression of at least five genes is measured.
[c16] The method according to any of the preceding claims 1-13, wherein the expression of at least ten genes is measured.
[c17] The method according to any of the preceding claims 1-13, wherein the expression of at least fifteen genes is measured.
[c18] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo All. [d 9] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 6.
[c20] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 4.
[c21] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo All.
[c22] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 6.
[c23] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 4.
[c24] The method according to any of the preceding claims 18-23, wherein the expression of at least one gene is measured.
[c25] The method according to any of the preceding claims 18-23, wherein the expression of at least five genes is measured.
[c26] The method according to any of the preceding claims 18-23, wherein the expression of at least ten genes is measured.
[c27] The method according to any of the preceding claims 18-23, wherein the expression of at least fifteen genes is measured.
[c28] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo All.
[c29] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 6.
[c30] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 5. [c31] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 4.
[c32] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 3.
[c33] The method according to claim 2, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 1.
[c34] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo All.
[c35] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 6.
[c36] The method according to claim 3, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 4.
[c37] The method according to any of the preceding claims 28-36, wherein at least one gene is used.
[c38] The method according to any of the preceding claims 28-36, wherein at least five genes are used.
[c39] The method according to any of the preceding claims 28-36, wherein at least ten genes are used.
[c40] The method according to any of the preceding claims 28-36, wherein at least fifteen genes are used.
[c41] The method according to any one of claims 1-13, 18-23, or 28-36, wherein the partial gene sequences correspond to rat genes.
[c42] The method according to any one of claims 1 -13, 18-23, or 28-36, wherein the partial gene sequences correspond to dog genes. [c43] The method according to any one of claims 1-13, 18-23, or 28-36, wherein the partial gene sequences correspond to non-human primate genes.
[c44] The method according to any one of claims 1 -13, 18-23, or 28-36, wherein the partial gene sequences correspond to human genes.
[c45] The method according to claim 41 , wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c46] The method according to claim 42, wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c47] The method according to claim 43, wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c48] The method according to claim 44, wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c49] A method of predicting the kidney toxicity of an agent using an in vitro system, comprising the steps of: obtaining a biological sample from in vitro cultured cells or explants treated with the agent; measuring the expression of one or more kidney toxicity predictive genes in the sample, wherein the genes are selected from the group consisting of the genes corresponding to the partial gene sequences in Table 32, thereby generating a test expression profile; and using the test expression profile with a set of reference expression profiles in a Predictive Model to determine whether the agent will induce kidney toxicity. [c50] The method according to claim 49, wherein the expression of the kidney toxicity predictive gene is measured at the RNA level.
[c51] The method according to claim 49, wherein the expression of the kidney toxicity predictive gene is measured at the protein level.
[c52] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo All.
[c53] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 6.
[c54] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 5.
[c55] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 4.
[c56] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 3.
[c57] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 2.
[c58] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 1.
[c59] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo All.
[c60] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 6.
[c61] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 24 hour Combo 4. [c62] The method according to any of the preceding claims 50-61 , wherein the expression of at least one gene is measured.
[c63] The method according to any of the preceding claims 50-61 , wherein the expression of at least five genes is measured.
[c64] The method according to any of the preceding claims 50-61 , wherein the expression of at least ten genes is measured.
[c65] The method according to any of the preceding claims 50-61 , wherein the expression of at least fifteen genes is measured.
[c66] The method according to claim 50 wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo All.
[c67] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 6.
[c68] The method according to claim 50 wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 4.
[c69] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo All.
[c70] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 6.
[c71] The method according to claim 51 wherein the genes corresponding to the partial gene sequences are members of 6 hour Combo 4.
[c72] The method according to any of the preceding claims 66-71 , wherein the expression of at least one gene is measured.
[c73] The method according to any of the preceding claims 66-71 , wherein the expression of at least five genes is measured. [c74] The method according to any of the preceding claims 66-71 , wherein the expression of at least ten genes is measured.
[c75] The method according to any of the preceding claims 66-71 , wherein the expression of at least fifteen genes is measured.
[c76] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo All.
[c77] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 6.
[c78] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 5.
[c79] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 4.
[c80] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 3.
[c81] The method according to claim 50, wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 1.
[c82] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo All.
[c83] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 6.
[c84] The method according to claim 51 , wherein the genes corresponding to the partial gene sequences are members of 72 hour Combo 4.
[c85] The method according to any of the preceding claims 76-84, wherein the expression of at least one gene is measured. [c86] The method according to any of the preceding claims 76-84, wherein the expression of at least five genes is measured.
[c87] The method according to any of the preceding claims 76-84, wherein the expression of at least ten genes is measured.
[c88] The method according to any of the preceding claims 76-84, wherein the expression of at least fifteen genes is measured.
[c89] The method according to any one of claims 50-61 , 66-71 , or 76-84, wherein the partial gene sequences correspond to rat genes.
[c90] The method according to any one of claims 50-61 , 66-71 , or 76-84, wherein the partial gene sequences correspond to dog genes
[c91] The method according to any one of claims 50-61 , 66-71 , or 76-84, wherein the partial gene sequences correspond to non-human primate genes.
[c92] The method according to any one of claims 50-61 , 66-71 , or 76-84, wherein the partial gene sequences correspond to human genes.
[c93] The method according to claim 89, wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c94] The method according to claim 90, wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c95] The method according to claim 91 , wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c96] The method according to claim 92, wherein the agent is administered at different dose levels to determine the presence or absence of a no- observable effect level.
[c97] A computer program product for predicting kidney toxicity from a test sample expression profile, comprising: an encrypted training data set; encrypted lists of genes selected from the group consisting of the genes corresponding to the partial gene sequences in Table 32, to be used with the training set, and a Predictive Model that uses said training set, said lists of genes, and said test sample expression profile to predict the kidney toxicity of the test sample.
[c98] The computer program product of claim 97, wherein the encrypted lists of genes comprise the 24 hour Combo 6, 24 hour Combo 5, 24 hour Combo 4, 24 hour Combo3, 24 hour Combo 2, and 24 hour Combo 1 gene lists.
[c99] The computer program product of claim 97, wherein the encrypted lists of genes comprise the 6 hour Combo 6, 6 hour Combo 5, 6 hour Combo 4, 6 hour Combo 3, 6 hour Combo 2, and 6 hour Combo 1 gene lists.
[c100] The computer program product of claim 97, wherein the encrypted lists of genes comprise the 72 hour Combo 6, 72 hour Combo 5, 72 hour Combo 4, 72 hour Combo 3, hour Combo 2, and 72 hour Combo 1 gene lists.
[c101] The computer program product of claim 97, wherein the prediction is made through the calculation of a certitude score.
[c102] A method for mining genes predictive for kidney toxicity, comprising the steps of: collecting expression levels of a plurality of candidate toxicity predictive genes among a multiplicity of samples; defining a group of samples to be a training set; defining another group of samples to be a test set; optionally generating additional training and test sets; and selecting a set of genes which are predictive of kidney toxicity based on evaluating the training and test sets in a Predictive Model.
[c103] The method according to claim 102, wherein the expression levels are stored as a database on an electronic medium.
[c104] An integrated system for predicting kidney toxicity, comprising: means for measuring gene expression profiles of kidney predictive genes from biological samples exposed to the test agent; and a computer system operably linked to said means that is capable of implementing a predictive model.
PCT/US2003/006196 2002-02-27 2003-02-27 Kidney toxicity predictive genes Ceased WO2003100030A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP03741753A EP1506396A2 (en) 2002-02-27 2003-02-27 Kidney toxicity predictive genes
AU2003273154A AU2003273154A1 (en) 2002-02-27 2003-02-27 Kidney toxicity predictive genes
CA002477688A CA2477688A1 (en) 2002-02-27 2003-02-27 Kidney toxicity predictive genes

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US36112802P 2002-02-27 2002-02-27
US60/361,128 2002-02-27

Publications (3)

Publication Number Publication Date
WO2003100030A2 true WO2003100030A2 (en) 2003-12-04
WO2003100030A3 WO2003100030A3 (en) 2004-12-23
WO2003100030B1 WO2003100030B1 (en) 2005-05-06

Family

ID=29584275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/006196 Ceased WO2003100030A2 (en) 2002-02-27 2003-02-27 Kidney toxicity predictive genes

Country Status (4)

Country Link
EP (1) EP1506396A2 (en)
AU (1) AU2003273154A1 (en)
CA (1) CA2477688A1 (en)
WO (1) WO2003100030A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7415358B2 (en) 2001-05-22 2008-08-19 Ocimum Biosolutions, Inc. Molecular toxicology modeling
US7469185B2 (en) 2002-02-04 2008-12-23 Ocimum Biosolutions, Inc. Primary rat hepatocyte toxicity modeling
US8603752B2 (en) 2005-01-27 2013-12-10 Institute For Systems Biology Methods for identifying and monitoring drug side effects
WO2022064506A1 (en) * 2020-09-24 2022-03-31 Quris Technologies Ltd Ai-chip-on-chip, clinical prediction engine

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DEBOUCK ET AL: 'DNA microarrays in drug discovery and development' NATURE GENETICS SUPPLEMENT vol. 21, January 1999, pages 48 - 50, XP002928673 *
DOOLEY ET AL: 'A method to improve selection of molecular targets by circumventing the ADME pharmacokinetic system utilizing PharmArray DNA microarrays' BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS vol. 303, 11 April 2003, pages 828 - 841, XP002982396 *
GERHOLD ET AL: 'Better therapeutics through microarrays' NATURE GENETICS SUPPLEMENT vol. 32, December 2002, pages 547 - 552, XP002982395 *
HUANG ET AL: 'Assessment of Cisplatin-induced nephrotoxicity by microarray technology' TOXICOLOGICAL SCIENCES vol. 63, 2001, pages 196 - 207, XP001096461 *
KRAMER ET AL: 'Overview of the application of transcription profiling using selected nephrotoxicants for toxicology assessment' ENVIRONMENTAL HEALTH PERSPECTIVES vol. 112, no. 4, March 2004, pages 460 - 464, XP002982394 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7415358B2 (en) 2001-05-22 2008-08-19 Ocimum Biosolutions, Inc. Molecular toxicology modeling
US7426441B2 (en) 2001-05-22 2008-09-16 Ocimum Biosolutions, Inc. Methods for determining renal toxins
US7469185B2 (en) 2002-02-04 2008-12-23 Ocimum Biosolutions, Inc. Primary rat hepatocyte toxicity modeling
US8603752B2 (en) 2005-01-27 2013-12-10 Institute For Systems Biology Methods for identifying and monitoring drug side effects
US9103834B2 (en) 2005-01-27 2015-08-11 Institute For Systems Biology Methods for identifying and monitoring drug side effects
WO2022064506A1 (en) * 2020-09-24 2022-03-31 Quris Technologies Ltd Ai-chip-on-chip, clinical prediction engine

Also Published As

Publication number Publication date
AU2003273154A1 (en) 2003-12-12
AU2003273154A8 (en) 2003-12-12
WO2003100030A3 (en) 2004-12-23
EP1506396A2 (en) 2005-02-16
WO2003100030B1 (en) 2005-05-06
CA2477688A1 (en) 2003-12-04

Similar Documents

Publication Publication Date Title
CA2897828C (en) Methods for identifying, diagnosing, and predicting survival of lymphomas
US8014957B2 (en) Genes associated with progression and response in chronic myeloid leukemia and uses thereof
US20060063156A1 (en) Outcome prediction and risk classification in childhood leukemia
US7908090B2 (en) Signatures for human aging
US20070092888A1 (en) Diagnostic markers of hypertension and methods of use thereof
US11591655B2 (en) Diagnostic transcriptomic biomarkers in inflammatory cardiomyopathies
US20090258002A1 (en) Biomarkers for Tissue Status
AU2007244868A1 (en) Methods and compositions for detecting autoimmune disorders
JP2015154774A (en) Methods and compositions for assessing graft survival in solid organ transplant recipients
AU2006236588A1 (en) Diagnosis of sepsis
WO2003083140A2 (en) Classification and prognosis prediction of acute lymphoblasstic leukemia by gene expression profiling
KR20110057188A (en) Biomarker Profile Measurement System and Method
US20060204968A1 (en) Tools for diagnostics, molecular definition and therapy development for chronic inflammatory joint diseases
CA2674211A1 (en) Methods and kits for diagnosis and/or prognosis of the tolerant state in liver transplantation
AU2003233743A1 (en) Identification of specific marker genes and use thereof
WO2006135904A2 (en) Method for producing improved results for applications which directly or indirectly utilize gene expression assay results
CN101180407A (en) Leukemia disease genes and uses thereof
US20040067507A1 (en) Liver inflammation predictive genes
US20080096770A1 (en) Evaluation of the Toxicity of Pharmaceutical Agents
US20110130303A1 (en) In vitro diagnosis/prognosis method and kit for assessment of tolerance in liver transplantation
US20040076974A1 (en) Liver necrosis predictive genes
WO2003100030A2 (en) Kidney toxicity predictive genes
WO2005074540A2 (en) Novel predictors of transplant rejection determined by peripheral blood gene-expression profiling
EP1368499A2 (en) Rat toxicologically relevant genes and uses thereof
US7060444B1 (en) Zone 3 necrosis associated markers and method of use thereof

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2477688

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2003741753

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2003741753

Country of ref document: EP

B Later publication of amended claims

Effective date: 20041217

NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP

WWW Wipo information: withdrawn in national office

Ref document number: 2003741753

Country of ref document: EP