TITLE OF THE INVENTION Host Markers for Diagnosis of Equine Protozoal Myeloencephalitis and Sarcocystis neurona Infection
FIELD OF THE INVENTION [0001] This invention relates generally to protozoal diseases. More particularly, the 5 present invention relates to disease-associated molecules and assays, which are useful for diagnosing Equine Protozoal Myeloencephalitis (EPM). The invention has practical use in the early diagnosis of disease, in monitoring an animal's immune response to the disease, and in enabling better treatment and management decisions to be made in clinically and sub-clinically affected animals.
10 BACKGROUND OF THE INVENTION [0002] Equine Protozoal Myeloencephalitis (EPM) is the leading infectious neurologic or abortigenic equine disease in the Western Hemisphere and is caused at least in part by the Apicomplexan parasite Sarcocystis neurona. S. neurona has a host life cycle stage consisting of natural prey species, an intermediate host and a definitive host. The opossum has 15 been determined to be the definitive host species. The feral opossum {Didelphis virginiana) and the South American opossum {D. albiventris) consume the intermediate host's muscle tissue infected with protozoal sarcocysts. Following ingestion, the protozoa undergo sexual reproduction in the intestinal epithelium ofthe host opossum to form oocysts. Stimulated by the intestinal environment, the oocysts undergo sporulation producing sporocysts that are eventually 20 shed through the host opossum's feces. The speculated transmission route between opossum and equines is by fecal-oral transfer through contaminated feed or water ingested by horses. [0003] Equines are an aberrant host because the ingested sporocysts mature into the merozoite life cycle stage but do not form sarcocysts in horse's muscle tissue. Following excystation, the sporozoites penetrate the intestinal mucosa ofthe horse, and undergo a series of 25 replicative cycles in the vascular endothelial cells, and possibly in the white blood cells. The merozoites then migrate to the central nervous system where they continually divide without encysting (i.e., they do not form cysts). The merozoites divide by polygeny and often leave a residual body that gradually destroys the nervous tissue ofthe infected horse causing spasticity, hypermetria, ataxia, paralysis, recumbency and death. The life-cycle stage ofthe protozoa that is 30 found in horses cannot be transmitted to other horses nor can the tissue of horses, even if eaten by opossums, infect the opossum. Therefore, the horse is a dead end host for the protozoa. [0004] A horse of any age, breed, or sex may be affected by EPM. The disease has been reported in a horse as young as two months of age, as well as one in its thirties (Gray et al, 2001, VetRec. 149(9):269-273). Clinical signs of a horse with EPM do not develop (and may
not develop at all) until the organism has crossed the blood brain barrier and is within the central nervous system. These signs include weakness, muscle atrophy, spinal ataxia, or "wobbling" and/or head tilt with asymmetry ofthe face (e.g., eyelid, ear, or lip). A severely EPM-affected horse may go down and be unable to rise. Lameness not traceable to orthopedic disease or any combination ofthe above signs may occur in early or less severe infections. In most cases, an affected horse is bright and alert with a normal appetite, haematological and biochemical blood values are usually in the normal range. [0005] Neurological clinical signs of EPM are not easily distinguished from other diseases that present with similar clinical signs such as West Nile Virus (WNV) infection, rabies, hind limb lameness, cervical stenotic myelopathy (Wobbler syndrome), botulism, Equine Herpes Virus (EHV-1) infection (neurological strain), Eastern Equine Encephalitis Virus (EEE), Western Equine Encephalitis Virus (WEE), Venezuelan Equine Encephalitis Virus (VEE), migrating parasites and trauma. Also confounding is that other protozoan parasites, including Neospora caninum and Toxoplasma species, have been implicated in the aaetiology and pathogenesis of EPM. [0006] Surveys (using a positive serum test to immunoblotted S. neurona antigens to indicate exposure to the parasite) in North America have revealed that approximately fifty percent (50%) of horses in the surveyed areas had been exposed to S. neurona (Blythe et al, 1997, JAVMA 210(4): 525-528; Bentz et al, 1997, JAVMA 210(4): 517-518; Saville et al, 1997, JAVMA 210(4): 519-523; MacKay, R. J. 1997, JAVMA 210(4): 482-483). However, a positive test result on an immunoblot test does not necessarily indicate the presence of an active form ofthe disease. The incidence of active disease appears to be much lower than the seroprevalence since less than 1% of seropositive horses are clinically affected (Gray et al, 2001, supra). Therefore demonstration of serum antibodies does not necessarily mean the animal has EPM. The epidemiology ofthe disease suggests that some animals may be resistant to the organism or that some animals may develop an inappropriate immune response to the organism causing disease and clinical symptoms. [0007] Current methods for diagnosis of EPM involve the analysis of cerebrospinal fluid (CSF) for the presence of anti-S. neurona antibodies using an immunoblot test developed by Granstrom, D. E. (1993, Proc. Eleventh ACVIM Forum: 587-590). However, this test has several disadvantages including (1) antibody levels to the organism can often only be detected well after clinical symptoms appear, (2) rising antibody levels need to be demonstrated on two consecutive samples, (3) antibodies demonstrate exposure to the organism only, (4) antibody levels do not correlate to disease status, and (5) the process of obtaining CSF is difficult to obtain and hazardous to the horse and veterinarian (6) when obtaining CSF it can easily be contaminated with blood leading to false positive reactions.
[0008] Other diagnostic assays have recently been developed, which rely on detecting S. neurona antigens or nucleic acid in serum or cerebrospinal fluid (see U.S. Pat. Nos. 6,344,337 and 6,110,665). Although these assays can potentially detect the presence of S. neurona in animals, they do not detect other protozoan parasites, which have been implicated in the aaetiology and pathogenesis of EPM. [0009] There is currently no vaccine available for prevention of EPM. Several antiprotozoal agents including sulphonamides, pyrimethamine (see, U.S. Pat. No. 5,747,476) and coccidiostats (e.g., diclazuril and toltrazuril) are available for treatment but are expensive. These drugs do not cure the disease and a typical treatment course lasts for 90 days and costs typically in the range of $US800- $US1200 per horse. Response to treatment is often the most effective method of diagnosis. However, this means that many animals without EPM are treated unnecessarily. [0010] Susceptibility to any disease is dependent upon exposure to conditions that are conducive to developing disease and the ability ofthe host animal to respond appropriately to these conditions. The host animal response is generally orchestrated by the immune system. The immune response can be influenced by factors (amongst others) such as gene expression, gene alleles or haplotype. For example, United States Patent No. 6,376,176 discloses that susceptibility to Crohn's disease, type I diabetes mellitus, and rheumatoid arthritis can be determined through the presence or absence of a particular haplotype (set of alleles) ofthe genes for Notch4, hsp40-HOM and MHC Class III. This suggests that these genes are inherited together as a locus through proximity. Regulation of gene expression through promoter elements for these genes is also likely to be inherited along with the genes. Therefore, it could be expected that differences in gene expression would be found in patients susceptible to Crohn's disease, type I diabetes mellitus and rheumatoid arthritis. In addition it has been demonstrated that the haplotype DRβl*1501, DQA1*0102, DQB1*0602 for the MHC locus in humans is associated with multiple sclerosis among both Ashkenazi and non-Ashkenazi Jewish patients (Kwon et al, 1999, Arch. Neurol 56(5): 555-60). [0011] It is clear that many horses are exposed to the causative organism of EPM because up to 50% of horses in some regions ofthe USA are seropositive (Fenger CK. Compend. Contin. Educ. Pract. Vet. 19(4): 513-523). However, only a small percentage (less than 1%) develop clinical symptoms associated with infection (USD A, APHIS, National Economic Cost of Equine Lameness, Colic and EPM in the USA. 2001) suggesting that some horses are more susceptible than others. No prior literature exists suggesting an immunological cause for susceptibility to EPM. However, epidemiological studies suggest that Standardbreds and Thoroughbreds are more susceptible to EPM than other breeds. This supposition may be based on an over-representation of these breeds in necropsy studies. Necropsy numbers could be
influenced by a number of other factors including owner interest in post-mortem results. Only one pony with EPM has been reported in the literature suggesting resistance in this breed. The ability to determine susceptibility to EPM would assist in the diagnosis of disease, allow horse owners to implement appropriate management routines, and veterinarians to advise owners on appropriate therapies. [0012] As such, there currently exists a need for more effective modalities for identifying equines susceptible to developing active disease, for diagnosing EPM, and for identifying equines amenable to treatment with antiprotozoal agents.
SUMMARY OF THE INVENTION [0013] Existing technologies for the diagnosis of EPM are available and they rely upon the detection of S. neurona antigens or nucleic acid or of host antibodies to S. neurona antigens. These methodologies suffer from a variety of defects, including difficulty of detection in asymptomatic subjects, lack of correlation of antibody levels to disease status and narrow spectrum detection of parasites which have been implicated in the aaetiology and pathogenesis of EPM. [0014] The present invention represents a significant advance over current technologies for the management of affected animals. In certain advantageous embodiments, it relies upon measuring the level of certain markers in cells, especially circulating leukocytes, of the host rather than detecting protozoal products or anti-protozoan antibodies. As such, these methods are suitable for widespread screening of animals with neurological signs typical of EPM. In certain embodiments where circulating leukocytes are the subject of analysis, it is expected that detection of EPM may be feasible at very early stages of its progression, when there are few or no circulating parasites present in the peripheral blood or CSF. This represents a significant and unexpected advance in the detection, diagnosis and management of EPM. [0015] Thus, the present invention addresses the problem of diagnosing EPM by detecting a response to EPM that may be measured in host cells. Advantageous embodiments involve monitoring the expression of certain genes in peripheral leukocytes ofthe immune system, which may be reflected in changing patterns of RNA levels or protein production that correlate with the presence of EPM. [0016] Accordingly, in one aspect, the present invention provides methods for diagnosing the presence of EPM or S. neurona infection in a test subject, especially in an equine test subject. These methods generally comprise detecting in the test subject aberrant expression of at least one gene (also referred to herein as an "EPM marker gene") selected from the group consisting of: (a) a gene having a polynucleotide expression product comprising a nucleotide sequence that shares at least 50% (and at least 51%> to at least 99% and all integer percentages in
between) sequence identity with the sequence set forth in any one of SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56 or 57, or a complement thereof; (b) a gene having a polynucleotide expression product comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58; (c) a gene having a polynucleotide expression product comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a gene having a polynucleotide expression product comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium or high stringency conditions. Polynucleotide expression products of EPM marker genes are referred to herein as "EPM marker polynucleotides." Polypeptide expression products ofthe EPM marker genes are referred to herein as "EPM marker polypeptides." In some embodiments, the method broadly described above is used to diagnose acute EPM or S. neurona acute infection. [0017] Typically, such aberrant expression is detected by: (1) measuring in a biological sample obtained from the test subject the level or functional activity of an expression product of at least one EPM marker gene and (2) comparing the measured level or functional activity of each expression product to the level or functional activity of a corresponding expression product in a reference sample obtained from one or more normal subjects or from one or more subjects lacking EPM, wherein a difference in the level or functional activity ofthe expression product in the biological sample, as compared to the level or functional activity of the corresponding expression product in the reference sample, is indicative ofthe presence of EPM or of or S. neurona infection in the test subject. [0018] In some embodiments, the methods comprise detecting aberrant expression of an EPM marker polynucleotide selected from the group consisting of (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56, 57 or 422 or 422, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58; (c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe
sequence set forth in SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium or high stringency conditions. [0019] In other embodiments, the methods comprise detecting aberrant expression of an EPM marker polypeptide selected from the group consisting of: (i) a polypeptide comprising an amino acid sequence that shares at least 50%> (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with the sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58; (ii) a polypeptide comprising a portion ofthe sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, wherein the portion comprises at least 5 contiguous amino acid residues of that sequence; (iii) a polypeptide comprising an amino acid sequence that shares at least 30% similarity with at least 15 contiguous amino acid residues of the sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58; and (iv) a polypeptide comprising a portion ofthe sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, wherein the portion comprises at least 5 contiguous amino acid residues of that sequence and is immuno-interactive with an antigen-binding molecule that is immuno-interactive with a sequence of (i), (ii) or (iii). [0020] In some embodiments, the methods further comprise diagnosing the presence, stage, degree of EPM or the presence, stage, degree of S. neurona infection in the test subject when the measured level or functional activity ofthe or each expression product is different than the measured level or functional activity ofthe or each corresponding expression product. In these embodiments, the difference typically represents an at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90%, or even an at least about 100%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900% or 1000% increase, or an at least about 10%, 20%, 30% 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 96%, 97%, 98% or 99%, or even an at least about 99.5%, 99.9%, 99.95%, 99.99%, 99.995% or 99.999% decrease in the level or functional activity of an individual expression product as compared to the level or functional activity of an individual corresponding expression product, which is hereafter referred to as "aberrant expression." In illustrative examples of this type, the presence of EPM or S. neurona infection is determined by detecting an increase in the level or functional activity of at least one EPM marker polynucleotide selected from (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 3, 5, 7, 8, 18, 20, 22, 24, 26, 27, 38, 39, 44, 45, 50, 52, 54, 55, 56 or 57, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 4, 6, 9, 19, 21, 23, 25, 51, 53, or 58;
(c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 4, 6, 9, 19, 21, 23, 25, 51, 53 or 58, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium, or high stringency conditions. [0021] In other illustrative examples, the presence of EPM or S. neurona infection is determined by detecting a decrease in the level or functional activity of at least one EPM marker polynucleotide selected from (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 1, 10, 12, 14, 16, 17, 28, 30, 32, 34, 35, 36, 37, 40, 41, 42, 43, 46, 47, 48, 49 or 422, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 2, 11, 13, 15, 29, 31 or 33; (c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 2, 11, 13, 15, 29, 31 or 33, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium, or high stringency conditions. [0022] In some embodiments, the method further comprises diagnosing the absence of EPM or the absence of S- neurona infection or the absence of S. neurona infection in the test subject when the measured level or functional activity ofthe or each expression product is the same as or similar to the measured level or functional activity ofthe or each corresponding expression product. In these embodiments, the measured level or functional activity of an individual expression product varies from the measured level or functional activity of an individual corresponding expression product by no more than about 20%, 18%, 16%, 14%, 12%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1% or 0.1%, which is hereafter referred to as "normal expression." [0023] In some embodiments, the methods comprise measuring the level or functional activity of individual expression products of at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 or 31 EPM marker genes. For example, the methods may comprise measuring the level or functional activity of an EPM marker polynucleotide either alone or in combination with as much as 30, 29, 28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 other EPM marker
polynucleotide(s). In another example, the methods may comprise measuring the level or functional activity of an EPM marker polypeptide either alone or in combination with as much as 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 other EPM marker polypeptides(s). In illustrative examples of this type, the methods comprise measuring the level or functional activity of individual expression products of at least 1, 2, 3, 4, 5 or 6 EPM marker genes that have a very high correlation with the presence or risk of EPM (hereafter referred to as "level one correlation EPM marker genes"), representative examples of which include, but are not limited to, (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 18, 22, 26, 27, 34, 35, 36, 37, 38, 39, 44 or 45, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 19 or 23; (c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 19 or 23, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium, or high stringency conditions. [0024] In other illustrative examples, the methods comprise measuring the level or functional activity of individual expression products of at least 1, 2, 3, 4, 5, 6, 7 or 8 EPM marker genes that have a high correlation with the presence or risk of EPM (hereafter referred to as "level two correlation EPM marker genes"), representative examples of which include, but are not limited to, (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 10, 16, 17, 20, 30, 40, 41, 48, 49 or 57, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 11, 21, 31 or 58; (c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 11, 21, 31 or 58, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium, or high stringency conditions. [0025] In still other illustrative examples, the methods comprise measuring the level or functional activity of individual expression products of at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 EPM marker genes that have a medium correlation with the presence or risk of EPM (hereafter referred to as "level three correlation EPM marker genes"), representative examples of which
include, but are not limited to, (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 1, 3, 12, 14, 24, 28, 42, 43, 46, 47, 52 or 56, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 2, 4, 13, 15, 25, 29 or 53; (c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 2, 4, 13, 15, 25, 29 or 53, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium, or high stringency conditions. [0026] In still other illustrative examples, the methods comprise measuring the level or functional activity of individual expression products of at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 EPM marker genes that have a lower correlation with the presence or risk of EPM (hereafter referred to as "level four correlation EPM marker genes"), representative examples of which include, but are not limited to, (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 5, 7, 8, 32, 50, 54, 55 or 422, or a complement thereof; (b) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide comprising the amino acid sequence set forth in any one of SEQ ID NO: 6, 9, 33 or 51; (c) a polynucleotide comprising a nucleotide sequence that encodes a polypeptide that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with at least a portion ofthe sequence set forth in SEQ ID NO: 6, 9, 33 or 51, wherein the portion comprises at least 15 contiguous amino acid residues of that sequence; and (d) a polynucleotide comprising a nucleotide sequence that hybridizes to the sequence of (a), (b), (c) or a complement thereof, under at least low, medium, or high stringency conditions. [0027] In some embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level one correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 1 level two EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level one correlation EPM marker genes and the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene. In still other
embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level two correlation EPM marker genes. [0028] In some embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level one correlation EPM marker genes and the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level three correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 3 level three correlation EPM marker genes. [0029] In some embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 1 level four correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level one correlation EPM marker genes and the level or functional activity of an expression product of at least 1 level four correlation EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level four correlation EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 3 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level one correlation EPM marker gene and the level or functional activity of an expression product of at least 4 level four correlation EPM marker genes.
[0030] In some embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level two correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level two correlation EPM marker genes and the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level three correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level three correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 3 level three correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 4 level three correlation EPM marker genes. [0031] In some embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 1 level four correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level two correlation EPM marker genes and the level or functional activity of an expression product of at least 1 level four correlation EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 3 level four
correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 4 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level two correlation EPM marker gene and the level or functional activity of an expression product of at least 5 level four correlation EPM marker genes. [0032] In some embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level three correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene and the level or functional activity of an expression product of at least 1 level four correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level three correlation EPM marker genes and the level or functional activity of an expression product of at least 1 level four correlation EPM marker gene. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene and the level or functional activity of an expression product of at least 2 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene and the level or functional activity of an expression product of at least 3 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene and the level or functional activity of an expression product of at least 4 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 1 level three correlation EPM marker gene and the level or functional activity of an expression product of at least 5 level four correlation EPM marker genes. [0033] In some embodiments, the methods comprise measuring the level or - functional activity of an expression product of at least 1 level four correlation EPM marker gene. In other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 2 level four correlation EPM marker genes. In other
embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 3 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 3 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 4 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 5 level four correlation EPM marker genes. In still other embodiments, the methods comprise measuring the level or functional activity of an expression product of at least 6 level four correlation EPM marker genes. [0034] Advantageously, the biological sample comprises blood, especially peripheral blood, which suitably includes leukocytes. Suitably, the expression product is selected from a RNA molecule or a polypeptide. In some embodiments, the expression product is the same as the corresponding expression product. In other embodiments, the expression product is a variant (e.g., an allelic variant) ofthe corresponding expression product. [0035] In certain embodiments, the expression product or corresponding expression product is a target RNA (e.g., mRNA) or a DNA copy ofthe target RNA whose level is measured using at least one nucleic acid probe that hybridizes under at least low, medium or high stringency conditions to the target RNA or to the DNA copy, wherein the nucleic acid probe comprises at least 15 contiguous nucleotides of an EPM marker gene. In these embodiments, the measured level or abundance ofthe target RNA or its DNA copy is normalized to the level or abundance of a reference RNA or a DNA copy ofthe reference RNA that is present in the same sample. Suitably, the nucleic acid probe is immobilized on a solid or semi-solid support. In illustrative examples of this type, the nucleic acid probe forms part of a spatial array of nucleic acid probes. In some embodiments, the level of nucleic acid probe that is bound to the target RNA or to the PNA copy is measured by hybridization (e.g., using a nucleic acid array). In other embodiments, the level of nucleic acid probe that is bound to the target RNA or to the DNA copy is measured by nucleic acid amplification (e.g., using a polymerase chain reaction (PCR)). In still other embodiments, the level of nucleic acid probe that is bound to the target RNA or to the DNA copy is measured by nuclease protection assay. [0036] In other embodiments, the expression product or corresponding expression product is a target polypeptide whose level is measured using at least one antigen-binding molecule that is immuno-interactive with the target polypeptide. In these embodiments, the measured level ofthe target polypeptide is normalized to the level of a reference polypeptide that is present in the same sample. Suitably, the antigen-binding molecule is immobilized on a solid or semi-solid support. In illustrative examples of this type, the antigen-binding molecule
forms part of a spatial array of antigen-binding molecule. In some embodiments, the level of antigen-binding molecule that is bound to the target polypeptide is measured by immunoassay (e.g., using an ELISA). [0037] In still other embodiments, the expression product or corresponding expression product is a target polypeptide whose level is measured using at least one substrate for the target polypeptide with which it reacts to produce a reaction product. In these embodiments, the measured functional activity ofthe target polypeptide is normalized to the functional activity of a reference polypeptide that is present in the same sample. [0038] In some embodiments, a system is used to perform the method, which suitably comprises at least one end station coupled to a base station. The base station is suitably caused (a) to receive subject data from the end station via a communications network, wherein the subject data represents parameter values corresponding to the measured or normalized level or functional activity of at least one expression product in the biological sample, and (b) to compare the subject data with predetermined data representing the measured or normalized level or functional activity of at least one corresponding expression product in the reference sample to thereby determine any difference in the level or functional activity ofthe expression product in the biological sample as compared to the level or functional activity ofthe corresponding expression product in the reference sample. Desirably, the base station is further caused to provide a diagnosis for the presence, absence of EPM. In these embodiments, the base station may be further caused to transfer an indication ofthe diagnosis to the end station via the communications network. [0039] In another aspect, the present invention provides methods for treating, preventing or inhibiting the development of EPM in a subject. These methods generally comprise detecting aberrant expression of at least one EPM diagnostic marker gene in the subject, and administering to the subject an effective amount of an agent that treats or ameliorates the symptoms or reverses or inhibits the development of EPM in the subject. [0040] In another aspect, the present invention provides isolated EPM marker polynucleotides, which are generally selected from: (a) a polynucleotide comprising a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 16, 17, 26, 27, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 54, 55, 56 or 422, or a complement thereof; (b) a polynucleotide comprising a portion ofthe sequence set forth in any one of SEQ ID NO: 16, 17, 26, 27, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 54, 55, 56 or 422, or a complement thereof, wherein the portion comprises at least 15 contiguous nucleotides of that sequence or complement; (c) a polynucleotide that hybridizes to the sequence of (a) or (b) or a complement thereof, under at least low, medium or high
stringency conditions; and (d) a polynucleotide comprising a portion of any one of SEQ ID NO:
16, 17, 26, 27, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 54, 55, 56 or 422, or a complement thereof, wherein the portion comprises at least 15 contiguous nucleotides of that sequence or complement and hybridizes to a sequence of (a), (b) or (c), or a complement thereof, under at least low, medium or high stringency conditions. [0041] In yet another aspect, the present invention provides a nucleic acid construct comprising a polynucleotide as broadly described above in operable connection with a regulatory element, which is operable in a host cell. In certain embodiments, the construct is in the form of a vector, especially an expression vector. [0042] In still another aspect, the present invention provides isolated host cells containing a nucleic acid construct or vector as broadly described above. In certain advantageous embodiments, the host cells are selected from bacterial cells, yeast cells and insect cells. [0043] In still another aspect, the present invention provides probes for detecting the presence of a polynucleotide as broadly described above. These probes generally comprise a nucleotide sequence that hybridizes under at least low, medium or high stringency conditions to a polynucleotide as broadly described above. In some embodiments, the probes consist essentially of a nucleic acid sequence which corresponds or is complementary to at least a portion of a nucleotide sequence encoding the amino acid sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, wherein the portion is at least 15 nucleotides in length. In other embodiments, the probes comprise a nucleotide sequence which is capable of hybridizing to at least a portion of a nucleotide sequence encoding the amino acid sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58 under at least low, medium or high stringency conditions, wherein the portion is at least 15 nucleotides in length. In still other embodiment, the probes comprise a nucleotide sequence that is capable of hybridizing to at least a portion of any one of SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56, 57 or 422 under at least low, medium or high stringency conditions, wherein the portion is at least 15 nucleotides in length. Illustrative probes for detecting the presence of a polynucleotide as broadly described above are set forth in SEQ ID NO: 59-421 (see Table 2). [0044] In a related aspect, the invention provides a solid or semi-solid support comprising at least one nucleic acid probe as broadly described above immobilized thereon. In some embodiments, the solid or semi-solid support comprises a spatial array of nucleic acid probes immobilized thereon.
[0045] In a further aspect, the present invention provides isolated polypeptides, referred to herein as "EPM marker polypeptides," which are generally selected from: (i) a polypeptide comprising an amino acid sequence that shares at least 50% (and at least 51%> to at least 99% and all integer percentages in between) sequence similarity with the sequence set forth in any one ofSEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58;
(ii) a polypeptide comprising an amino acid sequence that shares at least 50%. (and at least 51% to at least 99% and all integer percentages in between) sequence similarity with a polypeptide expression product of an EPM marker gene as broadly described above, for example, especially a gene that comprises a nucleotide sequence that shares at least 50% (and at least 51% to at least 99% and all integer percentages in between) sequence identity with the sequence set forth in any one of SEQ ID NO: 16, 17, 26, 27, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 54, 55, 56 or 422; (iii) a portion ofthe polypeptide according to (i) or (ii) wherein the portion comprises at least 5 contiguous amino acid residues of that polypeptide; (iv) a polypeptide comprising an amino acid sequence that shares at least 30% similarity (and at least 31% to at least 99% and all integer percentages in between) with at least 15 contiguous amino acid residues ofthe polypeptide according to (i) or (ii); and (iv) a polypeptide comprising a portion ofthe polypeptide according to (i) or (ii), wherein the portion comprises at least 5 contiguous amino acid residues ofthe polypeptide according to (i) or (ii) and is immuno-interactive with an antigen-binding molecule that is immuno-interactive with a sequence of (i), (ii) or (iii). [0046] Still a further aspect of the present invention provides an antigen-binding molecule that is immuno-interactive with an EPM marker polypeptide as broadly described above. In some embodiments, the antigen-binding molecule is immuno-interactive with an EPM diagnostic marker polypeptide as broadly described above. [0047] In a related aspect, the invention provides a solid or semi-solid support comprising at least one antigen-binding molecule as broadly described above immobilized thereon. In some embodiments, the solid or semi-solid support comprises a spatial array of antigen-binding molecules immobilized thereon. [0048] Still another aspect of the invention provides the use of one or more EPM marker polynucleotides as broadly described above, or the use of one or more probes as broadly described above, or the use of one or more EPM marker polypeptides as broadly described above, or the use of one or more antigen-binding molecules as broadly described above, in the manufacture of a kit for diagnosing the presence of EPM in a subject.
BRIEF DESCRIPTION OF THE DRAWINGS [0049] Figure 1 is a graphical representation of a ROC for Day 2 post-infection with S neurona. The sensitivity and selectivity are 0.917 and 0.857 respectively. The areas under the curve are 0.92 and 0.87 using raw and Lloyd's method respectively. The ROC was calculated by moving a critical threshold along the axis ofthe discriminant function scores. Both rw empirical ROCs were calculated, and smoothed ROCs using Lloyd's method (Lloyd, CJ. 1998, Journal of the American Statistical Association 93: 1356-1364). The curve was calculated comparing those animals that became diseased to those animals that remained healthy at day 0 (pre-infection). The area under the ROC was calculated by the trapezoidal rule, applied to both the empirical ROC and the smoothed ROC. [0050] Figure 2 is a graphical representation of a ROC for Day 4 post-infection with S neurona. The sensitivity and selectivity are 0.750 and 0.714 respectively. The areas under the curve are 0.86 and 0.81 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0051] Figure 3 is a graphical representation of a ROC for Day 7 post-infection with
S neurona. The sensitivity and selectivity are 0-833 and 0.714 respectively. The areas under the curve are 0.89 and 0.84 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0052] Figure 4 is a graphical representation of a ROC for Day 9 post-infection with S neurona. The sensitivity and selectivity are 0.833 and 0.857 respectively. The areas under the curve are 0.86 and 0.82 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0053] Figure 5 is a graphical representation of a ROC for Day 11 post-infection with S neurona. The sensitivity and selectivity are 0.833 and 0.857 respectively. The areas under the curve are 0.93 and 0.86 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0054] Figure 6 is a graphical representation of a ROC for Day 14 post-infection with S neurona. The sensitivity and selectivity are 0.917 and 1.000 respectively. The areas under the curve are 0.94 and 0.91 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0055] Figure 7 is a graphical representation of a ROC for Day 17 post-infection with S neurona. The sensitivity and selectivity are 1.000 and 0.857 respectively. The areas under the curve are 0.98and 0.92 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1.
[0056] Figureδ is a graphical representation of a ROC for Day 21 post-infection with S neurona. The sensitivity and selectivity are 0.833 and 0.857 respectively. The areas under the curve are 0.94 and 0.86 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0057] Figure 9 is a graphical representation of a ROC for Day 24 post-infection with S neurona. The sensitivity and selectivity are 1.000 and 0.857 respectively. The areas under the curve are 0.94 and 0.91 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1. [0058] Figure 10 is a graphical representation of a ROC for Day 28 post-infection with S neurona. The sensitivity and selectivity are 1 and 0.714 respectively. The areas under the curve are 0.94 and 0.93 using raw and Lloyd's method respectively. The ROC was calculated as per the ROC described in Figure 1.
DETAILED DESCRIPTION OF THE INVENTION
1. Definitions [0059] Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the art to which the invention belongs. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing ofthe present invention, preferred methods and materials are described. For the purposes ofthe present invention, the following terms are defined below. [0060] The articles "a" and "an" are used herein to refer to one or to more than one (i.e., to at least one) ofthe grammatical object ofthe article. By way of example, "an element" means one element or more than one element. [0061] The term "aberrant expression," as used herein to describe the expression of an EPM marker gene, refers to the overexpression or underexpression of an EPM marker gene relative to the level of expression ofthe EPM marker gene or variant thereof in cells obtained from a healthy subject or from a subject lacking EPM, and/or to a higher or lower level of an EPM marker gene product (e.g., transcript or polypeptide) in a tissue sample or body fluid obtained from a healthy subject or from a subject lacking EPM. In particular, an EPM marker gene is aberrantly expressed if the level of expression ofthe EPM marker gene is higher by at least about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90%, or even an at least about 100%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900% or 1000%, or lower by at least about 10%, 20%, 30% 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 96%, 97%, 98% or 99%, or even an at least about 99.5%, 99.9%, 99.95%, 99.99%, 99.995% or 99.999% than the level of expression ofthe EPM marker gene by cells obtained from a healthy subject or from a subject without EPM, and/or relative to the level of expression ofthe EPM marker gene in a tissue sample or body fluid obtained from a healthy subject or from a subject without EPM. [0062] By "about" is meant a quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length that varies by as much as 30, 25, 20, 25, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 % to a reference quantity, level, value, number, frequency, percentage, dimension, size, amount, weight or length. [0063] The term "amplicon" refers to a target sequence for amplification, and/or the amplification products of a target sequence for amplification. In certain other embodiments an "amplicon" may include the sequence of probes or primers used in amplification. [0064] By "antigen-binding molecule" is meant a molecule that has binding affinity for a target antigen. It will be understood that this term extends to immunoglobulins,
immunoglobulin fragments and non-immunoglobulin derived protein frameworks that exhibit antigen-binding activity. [0065] As used herein, the term "binds specifically," "specifically immuno- interactive" and the like when referring to an antigen-binding molecule refers to a binding reaction which is determinative of the presence of an antigen in the presence of a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antigen-binding molecules bind to a particular antigen and do not bind in a significant amount to other proteins or antigens present in the sample. Specific binding to an antigen under such conditions may require an antigen-binding molecule that is selected for its specificity for a particular antigen. For example, antigen-binding molecules can be raised to a selected protein antigen, which bind to that antigen but not to other proteins present in a sample. A variety of immunoassay formats may be used to select antigen-binding molecules specifically immuno- interactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select monoclonal antibodies specifically immuno-interactive with a protein. See Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor
Publications, New York, for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity. [0066] By "biologically active portion" is meant a portion of a full-length parent peptide or polypeptide which portion retains an activity ofthe parent molecule. As used herein, the term "biologically active portion" includes deletion mutants and peptides, for example of at least about 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 300, 400, 500, 600, 700, 800, 900, 1000 contiguous amino acids, which comprise an activity of a parent molecule. Portions of this type may be obtained through the application of standard recombinant nucleic acid techniques or synthesized using conventional liquid or solid phase synthesis techniques. For example, reference may be made to solution synthesis or solid phase synthesis as described, for example, in Chapter 9 entitled "Peptide Synthesis" by Atherton and Shephard which is included in a publication entitled "Synthetic Vaccines" edited by Nicholson and published by Blackwell Scientific Publications. Alternatively, peptides can be produced by digestion of a peptide or polypeptide ofthe invention with proteinases such as endoLys-C, endoArg-C, endoGlu-C and staphylococcus V8- protease. The digested fragments can be purified by, for example, high performance liquid chromatographic (HPLC) techniques. Recombinant nucleic acid techniques can also be used to produce such portions. [0067] The term "biological sample" as used herein refers to a sample that may be extracted, untreated, treated, diluted or concentrated from an animal. The biological sample may include a biological fluid such as whole blood, serum, plasma,- saliva, urine, sweat, ascitic fluid,
peritoneal fluid, synovial fluid, amniotic fluid, cerebrospinal fluid, tissue biopsy, and the like. In certain embodiments, the biological sample is blood, especially peripheral blood. [0068] As used herein, the term "cis-acting sequence," "cis-acting element" or "cis- regulatory region" or "regulatory region" or similar term shall be taken to mean any sequence of nucleotides, which when positioned appropriately relative to an expressible genetic sequence, is capable of regulating, at least in part, the expression ofthe genetic sequence. Those skilled in the art will be aware that a cis-regulatory region may be capable of activating, silencing, enhancing, repressing or otherwise altering the level of expression and/or cell-type-specifϊcity and/or developmental specificity of a gene sequence at the transcriptional or post-transcriptional level. In certain embodiments ofthe present invention, the cis-acting sequence is an activator sequence that enhances or stimulates the expression of an expressible genetic sequence. [0069] Throughout this specification, unless the context requires otherwise, the words "comprise," "comprises" and "comprising" will be understood to imply the inclusion of a stated step or element or group of steps or elements but not the exclusion of any other step or element or group of steps or elements. [0070] By "corresponds to" or "corresponding to" is meant a polynucleotide (a) having a nucleotide sequence that is substantially identical or complementary to all or a portion of a reference polynucleotide sequence or (b) encoding an amino acid sequence identical to an amino acid sequence in a peptide or protein. This phrase also includes within its scope a peptide or polypeptide having an amino acid sequence that is substantially identical to a sequence of amino acids in a reference peptide or protein. [0071] By "effective amount", in the context of treating or preventing a condition is meant the administration of that amount of active to an individual in need of such treatment or prophylaxis, either in a single dose or as part of a series, that is effective for the prevention of incurring a symptom, holding in check such symptoms, and/or treating existing symptoms, of that condition. The effective amount will vary depending upon the health and physical condition ofthe individual to be treated, the taxonomic group of individual to be treated, the formulation ofthe composition, the assessment ofthe medical situation, and other relevant factors. It is expected that the amount will fall in a relatively broad range that can be determined through routine trials. [0072] The terms "expression" or "gene expression" refer to either production of RNA message or translation of RNA message into proteins or polypeptides. Detection of either types of gene expression in use of any ofthe methods described herein are part ofthe invention. [0073] By "expression vector" is meant any autonomous genetic element capable of directing the transcription of a polynucleotide contained within the vector and suitably the
synthesis of a peptide or polypeptide encoded by the polynucleotide. Such expression vectors are known to practitioners in the art. [0074] As used herein, the term "functional activity" generally refers to the ability of a molecule (e.g., a transcript or polypeptide) to perform its designated function including a biological, enzymatic, or therapeutic function. In certain embodiments, the functional activity of a molecule corresponds to its specific activity as determined by any suitable assay known in the art. [0075] The term "gene" as used herein refers to any and all discrete coding regions ofthe cell's genome, as well as associated non-coding and regulatory regions. The gene is also intended to mean the open reading frame encoding specific polypeptides, introns, and adjacent 5' and 3' non-coding nucleotide sequences involved in the regulation of expression. In this regard, the gene may further comprise control signals such as promoters, enhancers, termination and/or polyadenylation signals that are naturally associated with a given gene, or heterologous control signals. The DNA sequences may be cDNA or genomic DNA or a fragment thereof. The gene may be introduced into an appropriate vector for extrachromosomal maintenance or for integration into the host. [0076] By "high density polynucleotide arrays" and the like is meant those arrays that contain at least 400 different features per cm2. [0077] The phrase "high discrimination hybridization conditions" refers to hybridization conditions in which single base mismatch may be determined. [0078] By "housekeeping gene" is meant a gene that is expressed in virtually all cells since it is fundamental to the any cell's functions (e.g., essential proteins and RNA molecules). [0079] "Hybridization" is used herein to denote the pairing of complementary nucleotide sequences to produce a DNA-DNA hybrid or a DNA-RNA hybrid. Complementary base sequences are those sequences that are related by the base-pairing rules. In DNA, A pairs with T and C pairs with G. In RNA, U pairs with A and C pairs with G. In this regard, the terms "match" and "mismatch" as used herein refer to the hybridization potential of paired nucleotides in complementary nucleic acid strands. Matched nucleotides hybridize efficiently, such as the classical A-T and G-C base pair mentioned above. Mismatches are other combinations of nucleotides that do not hybridize efficiently. [0080] The phrase "hybridizing specifically to" and the like refer to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA.
[0081] Reference herein to "immuno-interactive" includes reference to any interaction, reaction, or other form of association between molecules and in particular where one ofthe molecules is, or mimics, a component ofthe immune system. [0082] By "isolated" is meant material that is substantially or essentially free from components that normally accompany it in its native state. For example, an "isolated polynucleotide", as used herein, refers to a polynucleotide, which has been purified from the sequences which flank it in a naturally-occurring state, e.g., a DNA fragment which has been removed from the sequences that are normally adjacent to the fragment. Alternatively, an "isolated peptide" or an "isolated polypeptide" and the like, as used herein, refer to in vitro isolation and/or purification of a peptide or polypeptide molecule from its natural cellular environment, and from association with other components ofthe cell, i.e., it is not associated with in vivo substances. [0083] By "marker gene" is meant a gene that imparts a distinct phenotype to cells expressing the marker gene and thus allows such transformed cells to be distinguished from cells that do not have the marker. A selectable marker gene confers a trait for which one can 'select' based on resistance to a selective agent (e.g., a herbicide, antibiotic, radiation, heat, or other treatment damaging to untransformed cells). A screenable marker gene (or reporter gene) confers a trait that one can identify through observation or testing, i.e., by 'screening' (e.g. β- glucuronidase, luciferase, or other enzyme activity not present in untransformed cells). [0084] As used herein, a "naturally-occurring" nucleic acid molecule refers to a
RNA or DNA molecule having a nucleotide sequence that occurs in nature. For example a naturally-occurring nucleic acid molecule can encode a protein that occurs in nature. [0085] By "obtained from" is meant that a sample such as, for example, a nucleic acid extract or polypeptide extract is isolated from, or derived from, a particular source. For example, the extract may be isolated directly from fungal pathogens and fungal related plant pathogens including oomycete plant pathogens (e.g., Phytophthora and Pythium). In some embodiments, the polypeptide extract is isolated directly from zoospores, especially zoospore ventral vesicles. [0086] The term "oligonucleotide" as used herein refers to a polymer composed of a multiplicity of nucleotide residues (deoxyribonucleotides or ribonucleotides, or related structural variants or synthetic analogues thereof, including nucleotides with modified or substituted sugar groups and the like) linked via phosphodiester bonds (or related structural variants or synthetic analogues thereof). Thus, while the term "oligonucleotide" typically refers to a nucleotide polymer in which the nucleotide residues and linkages between them are naturally-occurring, it will be understood that the term also includes within its scope various analogues including, but not restricted to, peptide nucleic acids (PNAs), phosphorothioate,
phosphorodithioate, phophoroselenoate, phosphorodiselenoate, phosphoroanilothioate, phosphoraniladate, phosphoroamidate, methyl phosphonates, 2-O-methyl ribonucleic acids, and the like. The exact size ofthe molecule can vary depending on the particular application. Oligonucleotides are a polynucleotide subset with 200 bases or fewer in length. Preferably, oligonucleotides are 10 to 60 bases in length and most preferably 12, 13, 14, 15, 16, 17, 18, 19, or 20 to 40 bases in length. Oligonucleotides are usually single stranded, e.g., for probes; although oligonucleotides may be double stranded, e.g., for use in the construction of a variant nucleic acid sequence. Oligonucleotides ofthe invention can be either sense or antisense oligonucleotides. [0087] The term "oligonucleotide array" refers to a substrate having oligonucleotide probes with different known sequences deposited at discrete known locations associated with its surface. For example, the substrate can be in the form of a two dimensional substrate as described in U.S. Patent No. 5,424,186. Such substrate may be used to synthesize two- dimensional spatially addressed oligonucleotide (matrix) arrays. Alternatively, the substrate may be characterized in that it forms a tubular array in which a two dimensional planar sheet is rolled into a three-dimensional tubular configuration. The substrate may also be in the form of a microsphere or bead connected to the surface of an optic fiber as, for example, disclosed by Chee et al. in WO 00/39587. Oligonucleotide arrays have at least two different features and a density of at least 400 features per cm2. In certain embodiments, the arrays can have a density of about 500, at least one thousand, at least 10 thousand, at least 100 thousand, at least one million or at least 10 million features per cm2. For example, the substrate may be silicon or glass and can have the thickness of a glass microscope slide or a glass cover slip, or may be composed of other synthetic polymers. Substrates that are transparent to light are useful when the method of performing an assay on the substrate involves optical detection. The term also refers to a probe array and the substrate to which it is attached that form part of a wafer. [0088] The term "operably connected" or "operably linked" as used herein means placing a structural gene under the regulatory control of a promoter, which then controls the transcription and optionally translation ofthe gene. In the construction of heterologous promoter/structural gene combinations, it is generally preferred to position the genetic sequence or promoter at a distance from the gene transcription start site that is approximately the same as the distance between that genetic sequence or promoter and the gene it controls in its natural setting; i.e. the gene from which the genetic sequence or promoter is derived. As is known in the art, some variation in this distance can be accommodated without loss of function. Similarly, the preferred positioning of a regulatory sequence element with respect to a heterologous gene to be placed under its control is defined by the positioning ofthe element in its natural setting; i.e., the genes from which it is derived.
[0089] The term "pathogen" is used herein in its broadest sense to refer to an organism or an infectious agent whose infection of cells of viable animal tissue elicits a disease response. [0090] The term "polynucleotide" or "nucleic acid" as used herein designates mRNA, RNA, cRNA, cDNA or DNA. The term typically refers to polymeric form of nucleotides of at least 10 bases in length, either ribonucleotides or deoxynucleotides or a modified form of either type of nucleotide. The term includes single and double stranded forms of DNA. [0091] The terms "polynucleotide variant" and "variant" refer to polynucleotides displaying substantial sequence identity with a reference polynucleotide sequence or polynucleotides that hybridize with a reference sequence under stringent conditions that are defined hereinafter. These terms also encompass polynucleotides in which one or more nucleotides have been added or deleted, or replaced with different nucleotides. In this regard, it is well understood in the art that certain alterations inclusive of mutations, additions, deletions and substitutions can be made to a reference polynucleotide whereby the altered polynucleotide retains a biological function or activity ofthe reference polynucleotide. The terms "polynucleotide variant" and "variant" also include naturally-occurring allelic variants. [0092] "Polypeptide", "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues and to variants and synthetic analogues ofthe same. Thus, these terms apply to amino acid polymers in which one or more amino acid residues is a synthetic non-naturally-occurring amino acid, such as a chemical analogue of a corresponding naturally-occurring amino acid, as well as to naturally-occurring amino acid polymers. [0093] The term "polypeptide variant" refers to polypeptides which are distinguished from a reference polypeptide by the addition, deletion or substitution of at least one amino acid residue. In certain embodiments, one or more amino acid residues of a reference polypeptide are replaced by different amino acids. It is well understood in the art that some amino acids may be changed to others with broadly similar properties without changing the nature ofthe activity ofthe polypeptide (conservative substitutions) as described hereinafter. [0094] By "primer" is meant an oligonucleotide which, when paired with a strand of DNA, is capable of initiating the synthesis of a primer extension product in the presence of a suitable polymerizing agent. The primer is preferably single-stranded for maximum efficiency in amplification but can alternatively be double-stranded. A primer must be sufficiently long to prime the synthesis of extension products in the presence ofthe polymerization agent. The length ofthe primer depends on many factors, including application, temperature to be employed, template reaction conditions, other reagents, and source of primers. For example, depending on the complexity ofthe target sequence, the primer may be at least about 5, 6, 7, 8,
9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500, to one base shorter in length than the template sequence at the 3' end ofthe primer to allow extension of a nucleic acid chain, though the 5' end ofthe primer may extend in length beyond the 3' end ofthe template sequence. In certain embodiments, primers can be large polynucleotides, such as from about 35 nucleotides to several kilobases or more. Primers can be selected to be "substantially complementary" to the sequence on the template to which it is designed to hybridize and serve as a site for the initiation of synthesis. By "substantially complementary", it is meant that the primer is sufficiently complementary to hybridize with a target polynucleotide. Desirably, the primer contains no mismatches with the template to which it is designed to hybridize but this is not essential. For example, non- complementary nucleotide residues can be attached to the 5' end ofthe primer, with the remainder ofthe primer sequence being complementary to the template. Alternatively, non- complementary nucleotide residues or a stretch of non-complementary nucleotide residues can be interspersed into a primer, provided that the primer sequence has sufficient complementarity with the sequence of the template to hybridize therewith and thereby form a template for synthesis ofthe extension product ofthe primer. [0095] "Probe" refers to a molecule that binds to a specific sequence or subsequence or other moiety of another molecule. Unless otherwise indicated, the term "probe" typically refers to a polynucleotide probe that binds to another polynucleotide, often called the "target polynucleotide", through complementary base pairing. Probes can bind target polynucleotides lacking complete sequence complementarity with the probe, depending on the stringency ofthe hybridization conditions. Probes can be labeled directly or indirectly and include primers within their scope. [0096] The term "recombinant polynucleotide" as used herein refers to a polynucleotide formed in vitro by the manipulation of nucleic acid into a form not normally found in nature. For example, the recombinant polynucleotide may be in the form of an expression vector. Generally, such expression vectors include transcriptional and translational regulatory nucleic acid operably linked to the nucleotide sequence. [0097] By "recombinant polypeptide" is meant a polypeptide made using recombinant techniques, i.e., through the expression of a recombinant or synthetic polynucleotide. [0098] By "regulatory element" or "regulatory sequence" is meant nucleic acid sequences (e.g., DNA) necessary for expression of an operably linked coding sequence in a particular host cell. The regulatory sequences that are suitable for prokaryotic cells for example, include a promoter, and optionally a cis-acting sequence such as an operator sequence and a ribosome binding site. Control sequences that are suitable for eukaryotic cells include
promoters, polyadenylation signals, transcriptional enhancers, translational enhancers, leader or trailing sequences that modulate mRNA stability, as well as targeting sequences that target a product encoded by a transcribed polynucleotide to an intracellular compartment within a cell or to the extracellular environment. [0099] The term "sequence identity" as used herein refers to the extent that sequences are identical on a nucleotide-by-nucleotide basis or an amino acid-by-amino acid basis over a window of comparison. Thus, a "percentage of sequence identity" is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical nucleic acid base (e.g., A, T, C, G, I) or the identical amino acid residue (e.g., Ala, Pro, Ser, Thr, Gly, Val, Leu, He, Phe, Tyr, Trp, Lys, Arg, His, Asp, Glu, Asn, Gin, Cys and Met) occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity. For the purposes ofthe present invention, "sequence identity" will be understood to mean the "match percentage" calculated by the DNASIS computer program (Version 2.5 for windows; available from Hitachi Software engineering Co., Ltd., South San Francisco, California, USA) using standard defaults as used in the reference manual accompanying the software. [0100] "Similarity" refers to the percentage number of amino acids that are identical or constitute conservative substitutions as defined in Table A infra. Similarity may be determined using sequence comparison programs such as GAP (Deveraux et al. 1984, Nucleic Acids Research 12, 387-395). In this way, sequences of a similar or substantially different length to those cited herein might be compared by insertion of gaps into the alignment, such gaps being determined, for example, by the comparison algorithm used by GAP. [0101] Terms used to describe sequence relationships between two or more polynucleotides or polypeptides include "reference sequence," "comparison window," "sequence identity," "percentage of sequence identity" and "substantial identity". A "reference sequence" is at least 12 but frequently 15 to 18 and often at least 25 monomer units, inclusive of nucleotides and amino acid residues, in length. Because two polynucleotides may each comprise (1) a sequence (i.e., only a portion ofthe complete polynucleotide sequence) that is similar between the two polynucleotides, and (2) a sequence that is divergent between the two polynucleotides, sequence comparisons between two (or more) polynucleotides are typically performed by comparing sequences ofthe two polynucleotides over a "comparison window" to identify and compare local regions of sequence similarity. A "comparison window" refers to a conceptual segment of at least 6 contiguous positions, usually about 50 to about 100, more usually about 100 to about 150 in which a sequence is compared to a reference sequence ofthe
same number of contiguous positions after the two sequences are optimally aligned. The comparison window may comprise additions or deletions (i.e., gaps) of about 20% or less as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment ofthe two sequences. Optimal alignment of sequences for aligning a comparison window may be conducted by computerized implementations of algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package Release 7.0, Genetics Computer Group, 575 Science Drive Madison, WI, USA) or by inspection and the best alignment (i.e., resulting in the highest percentage homology over the comparison window) generated by any ofthe various methods selected. Reference also may be made to the BLAST family of programs as for example disclosed by Altschul et al., 1997, Nucl. Acids Res. 25:3389. A detailed discussion of sequence analysis can be found in Unit 19.3 of Ausubel et al, "Current Protocols in Molecular Biology", John Wiley & Sons Inc, 1994-1998, Chapter 15. [0102] The terms "subject" or "individual" or "patient", used interchangeably herein, refer to any subject, particularly a vertebrate subject, and even more particularly a mammalian subject, for whom therapy or prophylaxis is desired. Suitable vertebrate animals that fall within the scope ofthe invention include, but are not restricted to, primates, avians, livestock animals {e.g., sheep, cows, horses, donkeys, pigs), laboratory test animals {e.g., rabbits, mice, rats, guinea pigs, hamsters), companion animals {e.g., cats, dogs) and captive wild animals {e.g., foxes, deer, dingoes). A preferred subject is an equine animal in need of treatment or prophylaxis of EPM However, it will be understood that the aforementioned terms do not imply that symptoms are present. [0103] The phrase "substantially similar affinities" refers herein to target sequences having similar strengths of detectable hybridization to their complementary or substantially complementary oligonucleotide probes under a chosen set of stringent conditions. [0104] The term "template" as used herein refers to a nucleic acid that is used in the creation of a complementary nucleic acid strand to the "template" strand. The template may be either RNA and/or DNA, and the complementary strand may also be RNA and/or DNA. In certain embodiments, the complementary strand may comprise all or part ofthe complementary sequence to the "template," and/or may include mutations so that it is not an exact, complementary strand to the "template". Strands that are not exactly complementary to the template strand may hybridize specifically to the template strand in detection assays described here, as well as other assays known in the art, and such complementary strands that can be used in detection assays are part ofthe invention. [0105] The term "transformation" means alteration ofthe genotype of an organism, for example a bacterium, yeast, mammal, avian, reptile, fish or plant, by the introduction of a foreign or endogenous nucleic acid.
[0106] The term "treat" is meant to include both therapeutic and prophylactic treatment. [0107] By "vector" is meant a polynucleotide molecule, suitably a DNA molecule derived, for example, from a plasmid, bacteriophage, yeast, virus, mammal, avian, reptile or fish into which a polynucleotide can be inserted or cloned. A vector preferably contains one or more unique restriction sites and can be capable of autonomous replication in a defined host cell including a target cell or tissue or a progenitor cell or tissue thereof, or be integrable with the genome ofthe defined host such that the cloned sequence is reproducible. Accordingly, the vector can be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a linear or closed circular plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome. The vector can contain any means for assuring self-replication. Alternatively, the vector can be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated. A vector system can comprise a single vector or plasmid, two or more vectors or plasmids, which together contain the total DNA to be introduced into the genome ofthe host cell, or a transposon. The choice ofthe vector will typically depend on the compatibility ofthe vector with the host cell into which the vector is to be introduced. The vector can also include a selection marker such as an antibiotic resistance gene that can be used for selection of suitable transformants. Examples of such resistance genes are known to those of skill in the art. [0108] The terms "wild-type" and "normal" are used interchangeably to refer to the phenotype that is characteristic of most ofthe members ofthe species occurring naturally and contrast for example with the phenotype of a mutant.
2. Abbreviations [0109] The following abbreviations are used throughout the application: nt = nucleotide nts = nucleotides aa = amino acid(s) kb = kilobase(s) or kilobase pair(s) kDa = kilodalton(s) d= day h = hour s = seconds
3. Markers of EPM and uses therefor [0110] The present invention concerns the early detection, diagnosis, or prognosis of EPM or related conditions that present in a clinically similar manner. Markers of EPM, in the form of RNA molecules of specified sequences, or polypeptides expressed from these RNA molecules in cells, especially in blood cells, and more especially in peripheral blood cells, of subjects with or susceptible to EPM, are disclosed. These markers are indicators of EPM and, when differentially expressed as compared to their expression in normal subjects or in subjects lacking EPM, are diagnostic for the presence of EPM in tested subjects. Such markers provide considerable advantages over the prior art in this field. [0111] It will be apparent that the nucleic acid sequences disclosed herein will find utility in a variety of applications in EPM detection, diagnosis, prognosis and treatment. Examples of such applications within the scope ofthe present disclosure comprise amplification of EPM markers using specific primers, detection of EPM markers by hybridization with oligonucleotide probes, incorporation of isolated nucleic acids into vectors, expression of vector-incorporated nucleic acids as RNA and protein, and development of immunological reagents corresponding to marker encoded products. [0112] The identified EPM markers may in turn be used to design specific oligonucleotide probes and primers. Such probes and primers may be of any length that would specifically hybridize to the identified marker gene sequences and may be at least about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, 500 nucleotides in length and in the case of probes, up to the full length ofthe sequences ofthe marker genes identified herein. Probes may also include additional sequence at their 5' and/or 3' ends so that they extent beyond the target sequence with which they hybridize. [0113] When used in combination with nucleic acid amplification procedures, these probes and primers enable the rapid analysis of biological samples (e.g., peripheral blood samples) for detecting marker genes or for detecting or quantifying marker gene transcripts. Such procedures include any method or technique known in the art or described herein for duplicating or increasing the number of copies or amount of a target nucleic acid or its complement. [0114] The identified markers may also be used to identify and isolate full-length gene sequences, including regulatory elements for gene expression, from genomic DNA libraries, which are suitably but not exclusively of equine origin. The cDNA sequences identified in the present disclosure may be used as hybridization probes to screen genomic DNA libraries by conventional techniques. Once partial genomic clones have been identified, full- length genes may be isolated by "chromosomal walking" (also called "overlap hybridization") using, for example, the method disclosed by Chinault & Carbon (1979, Gene 5: 111-126). Once
a partial genomic clone has been isolated using a cDNA hybridization probe, non-repetitive segments at or near the ends of the partial genomic clone may be used as hybridization probes in further genomic library screening, ultimately allowing isolation of entire gene sequences for the EPM markers of interest. It will be recognized that full-length genes may be obtained using the full-length or partial cDNA sequences or short expressed sequence tags (ESTs) described in this disclosure using standard techniques as disclosed for example by Sambrook, et al. (MOLECULAR CLONING. A LABORATORY MANUAL (Cold Spring Harbor Press, 1989) and Ausubel et al, (CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, Inc. 1994). In addition, the disclosed sequences may be used to identify and isolate full- length cDNA sequences using standard techniques as disclosed, for example, in the above- referenced texts. Sequences identified and isolated by such means may be useful in the detection ofthe EPM marker genes using the detection methods described herein, and are part ofthe invention. [0115] One of ordinary skill in the art could select segments from the identified marker genes for use in the different detection, diagnostic, or prognostic methods, vector constructs, antigen-binding molecule production, kit, and/or any ofthe embodiments described herein as part ofthe present invention. Marker gene sequences that are desirable for use in the invention are those set fort in SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56, 57 or 422 (see Table 1).
4. Nucleic acid molecules ofthe invention [0116] As described in the Examples and in Tables 1 , the present disclosure provides 31 markers of EPM, identified by GeneChip® analysis of blood obtained from normal horses and from horses with clinical evidence of EPM. Ofthe 31 markers identified, 18 comprise coding regions sequences (see the markers relating to SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 18, 20, 22, 24, 28, 30, 32, 50, 52 and 57) and 13 comprise 5' and/or 3' untranslated sequences only (see the markers relating to SEQ ID NO: 7, 16, 17; 26, 27; 34, 35; 36, 37; 38, 39; 40, 41; 42, 43; 44, 45; 46, 47; 48, 49; 54, 55, 56 and 422). These sequences, which are presented in Table 1, are diagnostic for the presence, stage or degree of EPM (also referred to herein as "EPM marker polynucleotides"). Sequence analysis has revealed that the EPM marker genes can be classified into subgroups. For example, several EPM marker genes encode membrane associated polypeptides (e.g., SEQ ID NO: 2, 11, 13, 17, 21, 25, 33 and 58), whereas others encode cytoplasm associated polypeptides (e.g., SEQ ID NO: 4, 8, 15, 23, 51 and 53), while still others encode nucleus associated polypeptides (e.g., SEQ ID NO: 19 and 31), whereas still others encode immune-modulating molecules (e.g., 2, 11, 13, 15, 17, 21, 25, 33, 51
and 58) and still others encode house-keeping molecules (e.g., SEQ ID NO: 4, 6, 19, 23, 38, 39 and 42). [0117] In accordance with the present invention, the sequences of isolated nucleic acids disclosed herein find utility inter alia as hybridization probes or amplification primers. These nucleic acids may be used, for example, in diagnostic evaluation of biological samples or employed to clone full-length cDNAs or genomic clones corresponding thereto. In certain embodiments, these probes and primers represent oligonucleotides, which are of sufficient length to provide specific hybridization to a RNA or DNA sample extracted from the biological sample. The sequences typically will be about 10-20 nucleotides, but may be longer. Longer sequences, e.g., of about 30, 40, 50, 100, 500 and even up to full-length, are desirable for certain embodiments. [0118] Nucleic acid molecules having contiguous stretches of about 10, 15, 17, 20, 30, 40, 50, 60, 75 or 100 or 500 nucleotides of a sequence set forth in any one of SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56, 57 or 422 are contemplated. Molecules that are complementary to the above mentioned sequences and that bind to these sequences under high stringency conditions are also contemplated. These probes are useful in a variety of hybridization embodiments, such as Southern and northern blotting. In some cases, it is contemplated that probes may be used that hybridize to multiple target sequences without compromising their ability to effectively diagnose EPM. In general, it is contemplated that the hybridization probes described herein are useful both as reagents in solution hybridization, as in PCR, for detection of expression of corresponding genes, as well as in embodiments employing a solid phase. [0119] Various probes and primers may be designed around the disclosed nucleotide sequences. For example, in certain embodiments, the sequences used to design probes and primers may include repetitive stretches of adenine nucleotides (poly-A tails) normally attached at the ends ofthe RNA for the identified marker genes. In other embodiments, probes and primers may be specifically designed to not include these or other segments from the identified marker genes, as one of ordinary skilled in the art may deem certain segments more suitable for use in the detection methods disclosed. In any event, the choice of primer or probe sequences for a selected application is within the realm ofthe ordinary skilled practitioner. Illustrative probe sequences for detection of EPM marker genes are presented in Tables 2. [0120] Primers may be provided in double-stranded or single-stranded form, although the single-stranded form is desirable. Probes, while perhaps capable of priming, are designed to bind to a target DNA or RNA and need not be used in an amplification process. In
certain embodiments, the probes or primers are labeled with radioactive species 32P, 14C, 35S, 3H, or other label), with a fluorophore (e.g., rhodamine, fluorescein) or with a chemillumiscent label (e.g., luciferase). [0121] The present invention provides substantially full-length cDNA sequences as well as EST and partial cDNA sequences that are useful as markers of EPM. It will be understood, however, that the present disclosure is not limited to these disclosed sequences and is intended particularly to encompass at least isolated nucleic acids that are hybridizable to nucleic acids comprising the disclosed sequences or that are variants of these nucleic acids. For example, a nucleic acid of partial sequence may be used to identify a structurally-related gene or the full-length genomic or cDNA clone from which it is derived. Methods for generating cDNA and genomic libraries which may be used as a target for the above-described probes are known in the art (see, for example, Sambrook et al, 1989, supra and Ausubel et al, 1994, supra). All such nucleic acids as well as the specific nucleic acid molecules disclosed herein are collectively referred to as "EPM marker polynucleotides." Additionally, the present invention includes within its scope isolated or purified expression products of EPM marker polynucleotides (i.e., RNA transcripts and polypeptides). [0122] Accordingly, the present invention encompasses isolated or substantially purified nucleic acid or protein compositions. An "isolated" or "purified" nucleic acid molecule or protein, or biologically active portion thereof, is substantially or essentially free from components that normally accompany or interact with the nucleic acid molecule or protein as found in its naturally occurring environment. Thus, an isolated or purified polynucleotide or polypeptide is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Suitably, an "isolated" polynucleotide is free of sequences (especially protein encoding sequences) that naturally flank the polynucleotide (i.e., sequences located at the 5' and 3' ends ofthe polynucleotide) in the genomic DNA ofthe organism from which the polynucleotide was derived. For example, in various embodiments, an isolated EPM marker polynucleotide can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequences that naturally flank the polynucleotide in genomic DNA ofthe cell from which the polynucleotide was derived. A polypeptide that is substantially free of cellular material includes preparations of protein having less than about 30%, 20%, 10%, 5%, (by dry weight) of contaminating protein. When the protein ofthe invention or biologically active portion thereof is recombinantly produced, culture medium suitably represents less than about 30%, 20%, 10%, or 5% (by dry weight) of chemical precursors or non-protein-of-interest chemicals.
[0123] The present invention also encompasses portions of the full-length or substantially full-length nucleotide sequences ofthe EPM marker genes or their transcripts or DNA copies of these transcripts. Portions of an EPM marker nucleotide sequence may encode polypeptide portions or segments that retain the biological activity ofthe native polypeptide. Alternatively, portions of an EPM marker nucleotide sequence that are useful as hybridization probes generally do not encode amino acid sequences retaining such biological activity. Thus, portions of an EPM marker nucleotide sequence may range from at least about 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 40, 50, 60, 80, 90, 100 nucleotides, or almost up to the full-length nucleotide sequence encoding the EPM marker polypeptides ofthe invention. [0124] A portion of an EPM marker nucleotide sequence that encodes a biologically active portion of an EPM marker polypeptide ofthe invention may encode at least about 5, 6, 1, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 300, 400, 500, 600, 700, 800, 900 or 1000, or even at least about 2000, 3000, 4000 or 5000 contiguous amino acid residues, or almost up to the total number of amino acids present in a full-length EPM marker polypeptide. Portions of an EPM marker nucleotide sequence that are useful as hybridization probes or PCR primers generally need not encode a biologically active portion of an EPM marker polypeptide. [0125] Thus, a portion of an EPM marker nucleotide sequence may encode a biologically active portion of an EPM marker polypeptide, or it may be a fragment that can be used as a hybridization probe or PCR primer using standard methods known in the art. A biologically active portion of an EPM marker polypeptide can be prepared by isolating a portion of one ofthe EPM marker nucleotide sequences ofthe invention, expressing the encoded portion ofthe EPM marker polypeptide (e.g., by recombinant expression in vitro), and assessing the activity ofthe encoded portion ofthe EPM marker polypeptide. Nucleic acid molecules that are portions of an EPM marker nucleotide sequence comprise at least about 15, 16, 17, 18, 19,
20, 25, 30, 50, 75, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, or 650 nucleotides, or almost up to the number of nucleotides present in a full-length EPM marker nucleotide sequence. [0126] The invention also contemplates variants ofthe EPM marker nucleotide sequences. Nucleic acid variants can be naturally-occurring, such as allelic variants (same locus), homologues (different locus), and orfhologues (different organism) or can be non naturally- occurring. Naturally occurring variants such as these can be identified with the use of well- known molecular biology techniques, as, for example, with polymerase chain reaction (PCR) and hybridization techniques as known in the art. Non-naturally occurring variants can be made by mutagenesis techniques, including those applied to polynucleotides, cells, or organisms. The variants can contain nucleotide substitutions, deletions, inversions and insertions. Variation can
occur in either or both the coding and non-coding regions. The variations can produce both conservative and non-conservative amino acid substitutions (as compared in the encoded product). For nucleotide sequences, conservative variants include those sequences that, because ofthe degeneracy ofthe genetic code, encode the amino acid sequence of one ofthe EPM marker polypeptides ofthe invention. Variant nucleotide sequences also include synthetically derived nucleotide sequences, such as those generated, for example, by using site-directed mutagenesis but which still encode an EPM marker polypeptide ofthe invention. Generally, variants of a particular nucleotide sequence ofthe invention will have at least about 30%, 40% 50%, 55%, 60%, 65%, 70%, generally at least about 75%, 80%, 85%, desirably about 90% to 95% or more, and more suitably about 98% or more sequence identity to that particular nucleotide sequence as determined by sequence alignment programs described elsewhere herein using default parameters. [0127] The EPM marker nucleotide sequences ofthe invention can be used to isolate corresponding sequences and alleles from other organisms, particularly other mammals, especially other equine species. Methods are readily available in the art for the hybridization of nucleic acid sequences. Coding sequences from other organisms may be isolated according to well known techniques based on their sequence identity with the coding sequences set forth herein. In these techniques all or part ofthe known coding sequence is used as a probe which selectively hybridizes to other EPM marker coding sequences present in a population of cloned genomic DNA fragments or cDNA fragments (i.e., genomic or cDNA libraries) from a chosen organism. Accordingly, the present invention also contemplates polynucleotides that hybridize to the EPM marker gene nucleotide sequences, or to their complements, under stringency conditions described below. As used herein, the term "hybridizes under low stringency, medium stringency, high stringency, or very high stringency conditions" describes conditions for hybridization and washing. Guidance for performing hybridization reactions can be found in Ausubel et al, (1998, supra), Sections 6.3.1-6.3.6. Aqueous and non-aqueous methods are described in that reference and either can be used. Reference herein to low stringency conditions include and encompass from at least about 1% v/v to at least about 15% v/v formamide and from at least about 1 M to at least about 2 M salt for hybridization at 42° C, and at least about 1 M to at least about 2 M salt for washing at 42° C. Low stringency conditions also may include 1% Bovine Serum Albumin (BSA), 1 mM EDTA, 0.5 M NaHPO4 (pH 7.2), 7% SDS for hybridization at 65° C, and (i) 2 x SSC, 0.1% SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO (pH 7.2), 5% SDS for washing at room temperature. One embodiment of low stringency conditions includes hybridization in 6 x sodium chloride/sodium citrate (SSC) at about 45° C, followed by two washes in 0.2 x SSC, 0.1% SDS at least at 50° C (the temperature ofthe washes can be increased to 55° C for low stringency conditions). Medium stringency conditions include and encompass from at least about 16% v/v to at least about 30% v/v
formamide and from at least about 0.5 M to at least about 0.9 M salt for hybridization at 42° C, and at least about 0.1 M to at least about 0.2 M salt for washing at 55° C. Medium stringency conditions also may include 1% Bovine Serum Albumin (BSA), 1 mM EDTA, 0.5 M NaHPO4 (pH 7.2), 7% SDS for hybridization at 65° C, and (i) 2 x SSC, 0.1% SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO4 (pH 7.2), 5% SDS for washing at 60-65° C. One embodiment of medium stringency conditions includes hybridizing in 6 x SSC at about 45° C, followed by one or more washes in 0.2 x SSC, 0.1% SDS at 60° C. High stringency conditions include and encompass from at least about 31% v/v to at least about 50% v/v formamide and from about 0.01 M to about 0.15 M salt for hybridization at 42° C, and about 0.01 M to about 0.02 M salt for washing at 55° C. High stringency conditions also may include 1% BSA, 1 mM EDTA, 0.5 M NaHPO4 (pH 7.2), 7% SDS for hybridization at 65° C, and (i) 0.2 x SSC, 0.1% SDS; or (ii) 0.5% BSA, 1 mM EDTA, 40 mM NaHPO4 (pH 7.2), 1% SDS for washing at a temperature in excess of 65° C. One embodiment of high stringency conditions includes hybridizing in 6 x SSC at about 45° C, followed by one or more washes in 0.2 x SSC, 0.1% SDS at 65° C. [0128] In certain embodiments, an antigen-binding molecule ofthe invention is encoded by a polynucleotide that hybridizes to a disclosed nucleotide sequence under very high stringency conditions. One embodiment of very high stringency conditions includes hybridizing 0.5 M sodium phosphate, 7% SDS at 65° C, followed by one or more washes at 0.2 x SSC, 1% SDS at 65° C. [0129] Other stringency conditions are well known in the art and a skilled addressee will recognize that various factors can be manipulated to optimize the specificity ofthe hybridization. Optimization ofthe stringency ofthe final washes can serve to ensure a high degree of hybridization. For detailed examples, see Ausubel et al, supra at pages 2.10.1 to 2.10.16 and Sambrook et al. (1989, supra) at sections 1.101 to 1.104. [0130] While stringent washes are typically carried out at temperatures from about
42° C to 68° C, one skilled in the art will appreciate that other temperatures may be suitable for stringent conditions. Maximum hybridization rate typically occurs at about 20° C to 25° C below the Tm for formation of a DNA-DNA hybrid. It is well known in the art that the Tm is the melting temperature, or temperature at which two complementary polynucleotide sequences dissociate. Methods for estimating Tm are well known in the art (see Ausubel et al, supra at page 2.10.8) . In general, the Tm of a perfectly matched duplex of DNA may be predicted as an approximation by the formula: [0131] Tm= 81.5 + 16.6 (logI0 M) + 0.41 (%G+C) - 0.63 (% formamide) - (600/length)
[0132] wherein: M is the concentration of Na+, preferably in the range of 0.01 molar to 0.4 molar; %G+C is the sum of guanosine and cytosine bases as a percentage ofthe total number of bases, within the range between 30% and 75% G+C; % formamide is the percent formamide concentration by volume; length is the number of base pairs in the DNA duplex. The Tm of a duplex DNA decreases by approximately 1 ° C with every increase of 1 % in the number of randomly mismatched base pairs. Washing is generally carried out at Tm - 15° C for high stringency, or Tm- 30° C for moderate stringency. [0133] In one example of a hybridization procedure, a membrane (e.g., a nitrocellulose membrane or a nylon membrane) containing immobilized DNA is hybridized overnight at 42° C in a hybridization buffer (50% deionised formamide, 5 x SSC, 5 x
Denhardt's solution (0.1% ficoll, 0.1% polyvinylpyrollidone and 0.1% bovine serum albumin), 0.1% SDS and 200 mg/mL denatured salmon sperm DNA) containing labeled probe. The membrane is then subjected to two sequential medium stringency washes (i.e., 2 x SSC, 0.1% SDS for 15 min at 45° C, followed by 2 x SSC, 0.1% SDS for 15 min at 50° C), followed by two sequential higher stringency washes (i.e., 0.2 x SSC, 0.1% SDS for 12 min at 55° C followed by 0.2 x SSC and 0.1%SDS solution for 12 min at 65-68° C.
5. Polypeptides ofthe invention [0134] The present invention also contemplates full-length polypeptides encoded by the EPM marker genes ofthe invention as well as the biologically active portions of those polypeptides, which are referred to collectively herein as "EPM marker polypeptides."
Biologically active portions of full-length EPM marker polypeptides include portions with immuno-interactive activity of at least about 6, 8, 10, 12, 14, 16, 18, 20, 25, 30, 40, 50, 60 amino acid residues in length. For example, immuno-interactive fragments contemplated by the present invention are at least 6 and desirably at least 8 amino acid residues in length, which can elicit an immune response in an animal for the production of antigen-binding molecules that are immuno-interactive with an EPM marker polypeptide ofthe invention. Such antigen-binding molecules can be used to screen other mammals, especially equine mammals, for structurally and/or functionally related EPM marker polypeptides. Typically, portions of a full-length EPM marker polypeptide may participate in an interaction, for example, an intramolecular or an inter- molecular interaction. An inter-molecular interaction can be a specific binding interaction or an enzymatic interaction (e.g., the interaction can be transient and a covalent bond is formed or broken). Biologically active portions of a full-length EPM marker polypeptide include peptides comprising amino acid sequences sufficiently similar to or derived from the amino acid sequences of a (putative) full-length EPM marker polypeptide, for example, the amino acid sequences shown in SEQ ED NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, which include less amino acids than a full-length EPM marker polypeptide, and exhibit at least
one activity of that polypeptide. Typically, biologically active portions comprise a domain or motif with at least one activity of a full-length EPM marker polypeptide. A biologically active portion of a full-length EPM marker polypeptide can be a polypeptide which is, for example, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150, 300, 400, 500, 600, 700, 800, 900 or 1000, or even at least about 2000 or 3000, or more amino acid residues in length. Suitably, the portion is a "biologically-active portion" having no less than about 1%, 10%>, 25% 50% ofthe activity ofthe full-length polypeptide from which it is derived. [0135] The present invention also contemplates variant EPM marker polypeptides. "Variant" polypeptides include proteins derived from the native protein by deletion (so-called truncation) or addition of one or more amino acids to the N-terminal and/or C-terminal end of the native protein; deletion or addition of one or more amino acids at one or more sites in the native protein; or substitution of one or more amino acids at one or more sites in the native protein. Variant proteins encompassed by the present invention are biologically active, that is, they continue to possess the desired biological activity ofthe native protein. Such variants may result from, for example, genetic polymorphism or from human manipulation. Biologically active variants of a native EPM marker polyeptide ofthe invention will have at least 40%, 50%, 60%, 70%, generally at least 75%, 80%, 85%, preferably about 90% to 95% or more, and more preferably about 98% or more sequence similarity with the amino acid sequence for the native protein as determined by sequence alignment programs described elsewhere herein using default parameters. A biologically active variant of a protein ofthe invention may differ from that protein generally by as much 1000, 500, 400, 300, 200, 100, 50 or 20 amino acid residues or suitably by as few as 1-15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue. [0136] An EPM marker polypeptide of the invention may be altered in various ways including amino acid substitutions, deletions, truncations, and insertions. Methods for such manipulations are generally known in the art. For example, amino acid sequence variants of an EPM marker protein can be prepared by mutations in the DNA. Methods for mutagenesis and nucleotide sequence alterations are well known in the art. See, for example, Kunkel (1985, Proc. Natl. Acad. Sci. USA 82:488-492), Kunkel et al. (1987, Methods in Enzymol. 154:367- 382), U.S. Pat. No. 4,873,192, Watson, J. D. et al. ("Molecular Biology ofthe Gene", Fourth Edition, Benjamin/Cummings, Menlo Park, Calif., 1987) and the references cited therein. Guidance as to appropriate amino acid substitutions that do not affect biological activity ofthe protein of interest may be found in the model of Dayhoff et al. (1978) Atlas of Protein Sequence and Structure (Natl. Biomed. Res. Found., Washington, D.C.). Methods for screening gene products of combinatorial libraries made by point mutations or truncation, and for screening cDNA libraries for gene products having a selected property are known in the art. Such methods
are adaptable for rapid screening ofthe gene libraries generated by combinatorial mutagenesis of EPM marker polypeptides. Recursive ensemble mutagenesis (REM), a technique which enhances the frequency of functional mutants in the libraries, can be used in combination with the screening assays to identify EPM marker polypeptide variants (Arkin and Yourvan (1992) Proc. Natl. Acad. Sci. USA 59:7811-7815; Delgrave et al. (1993) Protein Engineering 6:327- 331). Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be desirable as discussed in more detail below. [0137] Variant EPM marker polypeptides may contain conservative amino acid substitutions at various locations along their sequence, as compared to the parent EPM marker amino acid sequence. A "conservative amino acid substitution" is one in which the amino acid residue is replaced with an amino acid residue having a similar side chain. Families of amino acid residues having similar side chains have been defined in the art, which can be generally sub-classified as follows: [0138] Acidic: The residue has a negative charge due to loss of H ion at physiological pH and the residue is attracted by aqueous solution so as to seek the surface positions in the conformation of a peptide in which it is contained when the peptide is in aqueous medium at physiological pH. Amino acids having an acidic side chain include glutamic acid and aspartic acid. [0139] Basic: The residue has a positive charge due to association with H ion at physiological pH or within one or two pH units thereof (e.g., histidine) and the residue is attracted by aqueous solution so as to seek the surface positions in the conformation of a peptide in which it is contained when the peptide is in aqueous medium at physiological pH. Amino acids having a basic side chain include arginine, lysine and histidine. [0140] Charged: The residues are charged at physiological pH and, therefore, include amino acids having acidic or basic side chains (i.e., glutamic acid, aspartic acid, arginine, lysine and histidine). [0141] Hydrophobic: The residues are not charged at physiological pH and the residue is repelled by aqueous solution so as to seek the inner positions in the conformation of a peptide in which it is contained when the peptide is in aqueous medium. Amino acids having a hydrophobic side chain include tyrosine, valine, isoleucine, leucine, methionine, phenylalanine and tryptophan. [0142] Neutral/polar: The residues are not charged at physiological pH, but the residue is not sufficiently repelled by aqueous solutions so that it would seek inner positions in the conformation of a peptide in which it is contained when the peptide is in aqueous medium.
Amino acids having a neutral/polar side chain include asparagine, glutamine, cysteine, histidine, serine and threonine. [0143] This description also characterizes certain amino acids as "small" since their side chains are not sufficiently large, even if polar groups are lacking, to confer hydrophobicity. With the exception of proline, "small" amino acids are those with four carbons or less when at least one polar group is on the side chain and three carbons or less when not. Amino acids having a small side chain include glycine, serine, alanine and threonine. The gene-encoded secondary amino acid proline is a special case due to its known effects on the secondary conformation of peptide chains. The structure of proline differs from all the other naturally- occurring amino acids in that its side chain is bonded to the nitrogen ofthe cu-amino group, as well as the c-carbon. Several amino acid similarity matrices (e.g., PAM120 matrix and PAM250 matrix as disclosed for example by Dayhoff et al (1978) A model of evolutionary change in proteins. Matrices for determining distance relationships In M. O. Dayhoff, (ed.), Atlas of protein sequence and structure, Vol. 5, pp. 345-358, National Biomedical Research Foundation, Washington DC; and by Gonnet et al, 1992, Science 256(5062): 144301445), however, include proline in the same group as glycine, serine, alanine and threonine. Accordingly, for the purposes ofthe present invention, proline is classified as a "small" amino acid. [0144] The degree of attraction or repulsion required for classification as polar or nonpolar is arbitrary and, therefore, amino acids specifically contemplated by the invention have been classified as one or the other. Most amino acids not specifically named can be classified on the basis of known behavior. [0145] Amino acid residues can be further sub-classified as cyclic or noncyclic, and aromatic or nonaromatic, self-explanatory classifications with respect to the side-chain substituent groups ofthe residues, and as small or large. The residue is considered small if it contains a total of four carbon atoms or less, inclusive ofthe carboxyl carbon, provided an additional polar substituent is present; three or less if not. Small residues are, of course, always nonaromatic. Dependent on their structural properties, amino acid residues may fall in two or more classes. For the naturally-occurring protein amino acids, sub-classification according to the this scheme is presented in the Table 3. [0146] Conservative amino acid substitution also includes groupings based on side chains. For example, a group of amino acids having aliphatic side chains is glycine, alanine, valine, leucine, and isoleucine; a group of amino acids having aliphatic-hydroxyl side chains is serine and threonine; a group of amino acids having amide-containing side chains is asparagine and glutamine; a group of amino acids having aromatic side chains is phenylalanine, tyrosine, and tryptophan; a group of amino acids having basic side chains is lysine, arginine, and histidine; and a group of amino acids having sulfur-containing side chains is cysteine and
methionine. For example, it is reasonable to expect that replacement of a leucine with an isoleucine or valine, an aspartate with a glutamate, a threonine with a serine, or a similar replacement of an amino acid with a structurally related amino acid will not have a major effect on the properties ofthe resulting variant polypeptide. Whether an amino acid change results in a functional EPM marker polypeptide can readily be determined by assaying its activity. Conservative substitutions are shown in Table 4 below under the heading of exemplary substitutions. More preferred substitutions are shown under the heading of preferred substitutions. Amino acid substitutions falling within the scope ofthe invention, are, in general, accomplished by selecting substitutions that do not differ significantly in their effect on maintaining (a) the structure ofthe peptide backbone in the area ofthe substitution, (b) the charge or hydrophobicity ofthe molecule at the target site, or (c) the bulk ofthe side chain. After the substitutions are introduced, the variants are screened for biological activity. [0147] Alternatively, similar amino acids for making conservative substitutions can be grouped into three categories based on the identity ofthe side chains. The first group includes glutamic acid, aspartic acid, arginine, lysine, histidine, which all have charged side chains; the second group includes glycine, serine, threonine, cysteine, tyrosine, glutamine, asparagine; and the third group includes leucine, isoleucine, valine, alanine, proline, phenylalanine, tryptophan, methionine, as described in Zubay, G., Biochemistry, third edition, Wm.C. Brown Publishers (1993). [0148] Thus, a predicted non-essential amino acid residue in an EPM marker polypeptide is typically replaced with another amino acid residue from the same side chain family. Alternatively, mutations can be introduced randomly along all or part of an EPM marker gene coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for an activity ofthe parent polypeptide to identify mutants which retain that activity. Following mutagenesis ofthe coding sequences, the encoded peptide can be expressed recombinantly and the activity ofthe peptide can be determined. [0149] Accordingly, the present invention also contemplates variants ofthe naturally-occurring EPM marker polypeptide sequences or their biologically-active fragments, wherein the variants are distinguished from the naturally-occurring sequence by the addition, deletion, or substitution of one or more amino acid residues. In general, variants will display at least about 30, 40, 50, 55, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 % similarity to a parent EPM marker polypeptide sequence as, for example, set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58. Desirably, variants will have at least 30, 40, 50, 55, 60, 65, 70, 75, 80, 85, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 % sequence identity to a parent EPM marker polypeptide sequence as, for example, set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58. Moreover,
sequences differing from the native or parent sequences by the addition, deletion, or substitution of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60 ,70, 80 ,90, 100, 150, 200, 300, 500 or more amino acids but which retain the properties ofthe parent EPM marker polypeptide are contemplated. EPM marker polypeptides also include polypeptides that are encoded by polynucleotides that hybridize under stringency conditions as defined herein, especially high stringency conditions, to the EPM marker polynucleotide sequences ofthe invention, or the non-coding strand thereof, as described above. [0150] In one embodiment, variant polypeptides differ from an EPM marker sequence by at least one but by less than 50, 40, 30, 20, 15, 10, 8, 6, 5, 4, 3 or 2 amino acid residue(s). In another, variant polypeptides differ from the corresponding sequence in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58 by at least 1% but less than 20%, 15%, 10% or 5% ofthe residues. (If this comparison requires alignment the sequences should be aligned for maximum similarity. "Looped" out sequences from deletions or insertions, or mismatches, are considered differences.) The differences are, suitably, differences or changes at a non-essential residue or a conservative substitution. [0151] A "non-essential" amino acid residue is a residue that can be altered from the wild-type sequence of an embodiment polypeptide without abolishing or substantially altering one or more of its activities. Suitably, the alteration does not substantially alter one of these activities, for example, the activity is at least 20%, 40%, 60%>, 10% or 80% of wild-type. An "essential" amino acid residue is a residue that, when altered from the wild-type sequence of an EPM marker polypeptide ofthe invention, results in abolition of an activity ofthe parent molecule such that less than 20% ofthe wild-type activity is present. [0152] In other embodiments, a variant polypeptide includes an amino acid sequence having at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94% 95%, 96%, 97%, 98% or more similarity to a corresponding sequence of an EPM marker polypeptide as, for example, set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58, and has the activity of that EPM marker polypeptide. [0153] EPM marker polypeptides ofthe invention may be prepared by any suitable procedure known to those of skill in the art. For example, the polypeptides may be prepared by a procedure including the steps of: (a) preparing a chimeric construct comprising a nucleotide sequence that encodes at least a portion of an EPM marker polynucleotide and that is operably linked to a regulatory element; (b) introducing the chimeric construct into a host cell; (c) culturing the host cell to express the EPM marker polypeptide; and (d) isolating the EPM marker polypeptide from the host cell. In illustrative examples, the nucleotide sequence encodes at least a portion ofthe sequence set forth in any one of SEQ ID NO: 2, 4, 6, 9, 11, 13, 15, 19, 21, 23, 25, 29, 31, 33, 51, 53 or 58 or a variant thereof.
[0154] The chimeric construct is typically in the form of an expression vector, which is suitably selected from self-replicating extra-chromosomal vectors (e.g., plasmids) and vectors that integrate into a host genome. [0155] The regulatory element will generally be appropriate for the host cell employed for expression of the EPM marker polynucleotide. Numerous types of expression vectors and regulatory elements are known in the art for a variety of host cells. Illustrative elements of this type include, but are not restricted to, promoter sequences (e.g., constitutive or inducible promoters which may be naturally occurring or combine elements of more than one promoter), leader or signal sequences, ribosomal binding sites, transcriptional start and stop sequences, translational start and termination sequences, and enhancer or activator sequences. [0156] In some embodiments, the expression vector comprises a selectable marker gene to permit the selection of transformed host cells. Selectable marker genes are well known in the art and will vary with the host cell employed. [0157] The expression vector may also include a fusion partner (typically provided by the expression vector) so that the EPM marker polypeptide is produced as a fusion polypeptide with the fusion partner. The main advantage of fusion partners is that they assist identification and/or purification ofthe fusion polypeptide. In order to produce the fusion polypeptide, it is necessary to ligate the EPM marker polynucleotide into an expression vector so that the translational reading frames ofthe fusion partner and the EPM marker polynucleotide coincide. Well lαiown examples of fusion partners include, but are not limited to, glutathione-S -transferase (GST), Fc potion of human IgG, maltose binding protein (MBP) and hexahistidine (HIS6), which are particularly useful for isolation ofthe fusion polypeptide by affinity chromatography. In some embodiments, fusion polypeptides are purified by affinity chromatography using matrices to which the fusion partners bind such as but not limited to glutathione-, amylose-, and nickel- or cobalt-conjugated resins. Many such matrices are available in "kit" form, such as the QIAexpress™ system (Qiagen) useful with (HIS6) fusion partners and the Pharmacia GST purification system. Other fusion partners known in the art are light-emitting proteins such as green fluorescent protein (GFP) and luciferase, which serve as fluorescent "tags" that permit the identification and/or isolation of fusion polypeptides by fluorescence microscopy or by flow cytometry. Flow cytometric methods such as fluorescence activated cell sorting (FACS) are particularly useful in this latter application. [0158] Desirably, the fusion partners also possess protease cleavage sites, such as for Factor Xa or Thrombin, which permit the relevant protease to partially digest the fusion polypeptide and thereby liberate the EPM marker polypeptide from the fusion construct. The liberated polypeptide can then be isolated from the fusion partner by subsequent chromatographic separation.
[0159] Fusion partners also include within their scope "epitope tags," which are usually short peptide sequences for which a specific antibody is available. Well lαiown examples of epitope tags for which specific monoclonal antibodies are readily available include c-Myc, influenza virus, hemagglutinin and FLAG tags. [0160] The chimeric constructs ofthe invention are introduced into a host by any suitable means including "transduction" and "transfection", which are art recognized as meaning the introduction of a nucleic acid, for example, an expression vector, into a recipient cell by nucleic acid-mediated gene transfer. "Transformation," however, refers to a process in which a host's genotype is changed as a result ofthe cellular uptake of exogenous DNA or RNA, and, for example, the transformed cell comprises the expression system ofthe invention. There are many methods for introducing chimeric constructs into cells. Typically, the method employed will depend on the choice of host cell. Technology for introduction of chimeric constructs into host cells is well known to those of skill in the art. Four general classes of methods for delivering nucleic acid molecules into cells have been described: (1) chemical methods such as calcium phosphate precipitation, polyethylene glycol (PEG)-mediate precipitation and lipofection; (2) physical methods such as microinjection, electroporation, acceleration methods and vacuum infiltration; (3) vector based methods such as bacterial and viral vector-mediated transformation; and (4) receptor-mediated. Transformation techniques that fall within these and other classes are well known to workers in the art, and new techniques are continually becoming known. The particular choice of a transformation technology will be determined by its efficiency to transform certain host species as well as the experience and preference ofthe person practicing the invention with a particular methodology of choice. It will be apparent to the skilled person that the particular choice of a transformation system to introduce a chimeric construct into cells is not essential to or a limitation ofthe invention, provided it achieves an acceptable level of nucleic acid transfer. [0161] Recombinant EPM marker polypeptides may be produced by culturing a host cell transformed with a chimeric construct. The conditions appropriate for expression ofthe EPM marker polynucleotide will vary with the choice of expression vector and the host cell and are easily ascertained by one skilled in the art through routine experimentation. Suitable host cells for expression may be prokaryotic or eukaryotic. An illustrative host cell for expression of a polypeptide ofthe invention is a bacterium. The bacterium used may be Escherichia coli. Alternatively, the host cell may be a yeast cell or an insect cell such as, for example, SF9 cells that may be utilized with a baculovirus expression system. [0162] Recombinant EPM marker polypeptides can be conveniently prepared using standard protocols as described for example in Sambrook, et al, (1989, supra), in particular
Sections 16 and 17; Ausubel et al, (1994, supra), in particular Chapters 10 and 16; and Coligan
et al, CURRENT PROTOCOLS IN PROTEIN SCIENCE (John Wiley & Sons, Inc. 1995-
1997), in particular Chapters 1, 5 and 6. Alternatively, the EPM marker polypeptides may be synthesized by chemical synthesis, e.g., using solution synthesis or solid phase synthesis as described, for example, in Chapter 9 of Atherton and Shephard {supra) and in Roberge et al (1995, Science 269: 202).
6. Antigen-binding molecules [0163] The invention also provides antigen-binding molecules that are specifically immuno-interactive with an EPM marker polypeptide ofthe invention. In one embodiment, the antigen-binding molecule comprise whole polyclonal antibodies. Such antibodies may be prepared, for example, by injecting an EPM marker polypeptide ofthe invention into a production species, which may include mice or rabbits, to obtain polyclonal antisera. Methods of producing polyclonal antibodies are well known to those skilled in the art. Exemplary protocols which may be used are described for example in Coligan et al, CURRENT PROTOCOLS IN IMMUNOLOGY, (John Wiley & Sons, Inc, 1991), and Ausubel et al, (1994- 1998, supra), in particular Section III of Chapter 11. [0164] In lieu of polyclonal antisera obtained in a production species, monoclonal antibodies may be produced using the standard method as described, for example, by Kδhler and Milstein (1975, Nature 256, 495-497), or by more recent modifications thereof as described, for example, in Coligan et al, (1991, supra) by immortalizing spleen or other antibody producing cells derived from a production species which has been inoculated with one or more ofthe EPM marker polypeptides ofthe invention. [0165] The invention also contemplates as antigen-binding molecules Fv, Fab, Fab' and F(ab')2 immunoglobulin fragments. Alternatively, the antigen-binding molecule may comprise a synthetic stabilized Fv fragment. Exemplary fragments of this type include single chain Fv fragments (sFv, frequently termed scFv) in which a peptide linker is used to bridge the N terminus or C terminus of a V# domain with the C terminus or N-terminus, respectively, of a Vi domain. ScFv lack all constant parts of whole antibodies and are not able to activate complement. ScFvs may be prepared, for example, in accordance with methods outlined in Kreber et al (Kreber et al. 1997, J. Immunol. Methods; 201(1): 35-55). Alternatively, they may be prepared by methods described in U.S. Patent No 5,091,513, European Patent No 239,400 or the articles by Winter and Milstein (1991, Nature 349:293) and Plϋckthun et al (1996, In Antibody engineering: A practical approach. 203-252). In another embodiment, the synthetic stabilized Fv fragment comprises a disulfide stabilized Fv (dsFv) in which cysteine residues are introduced into the YH and YL domains such that in the fully folded Fv molecule the two residues will form a disulfide bond between them. Suitable methods of producing dsFv are described for example in (Glockscuther et al. Biochem. 29: 1363-1367; Reiter et al. 1994, J.
Biol. Chem. 269: 18327-18331; Reiter et al. 1994, Biochem. 33: 5451-5459; Reiter et α/. 1994.
Cancer Res. 54: 2714-2718; Webber et al. 1995, Mol. Immunol 32: 249-258). [0166] Phage display and combinatorial methods for generating anti-EPM marker polypeptide antigen-binding molecules are known in the art (as described in, e.g., Ladner et al. U.S. Patent No. 5,223,409; Kang et al. International Publication No. WO 92/18619; Dower et al. International Publication No. WO 91/17271; Winter et al. International Publication WO 92/20791; Markland et al. International Publication No. WO 92/15679; Breitling et al. International Publication WQ 93/01288; McCafferty et al. International Publication No. WO 92/01047; Garrard et al. International Publication No. WO 92/09690; Ladner et al. International Publication No. WO 90/02809; Fuchs et al. (1991) Bio/Technology 9: 1370-1372; Hay et al. (1992) Hum Antibod Hybridomas 3:81-85; Huse et al. (1989) Science 246:1275-1281; Griffiths et al. (1993) EMBO J 12:725-734; Hawkins et al. (1992) J Mol Biol 226:889-896; Clackson et al (1991) Nature 352:624-628; Gram et al. (1992) PNAS 89:3576-3580; Garrad et al. (1991) Bio/Technology 9:1373-1377; Hoogenboom et al. (1991) Nuc Acid Res 19:4133-4137; and Barbas et al. (1991) PNAS 88:1918-1982). The antigen-binding molecules can be used to screen expression libraries for variant EPM marker polypeptides. They can also be used to detect and/or isolate the EPM marker polypeptides ofthe invention. Thus, the invention also contemplates the use of antigen-binding molecules to isolate EPM marker polypeptides using , for example, any suitable immunoaffinity based method including, but not limited to, immunochromatography and immunoprecipitation. A suitable method utilises solid phase adsorption in which anti- EPM marker polypeptide antigen-binding molecules are attached to a suitable resin, the resin is contacted with a sample suspected of containing an EPM marker polypeptide, and the EPM marker polypeptide, if any, is subsequently eluted from the resin. Illustrative resins include: Sepharose® (Pharmacia), Poros® resins (Roche Molecular Biochemicals, Indianapolis), Actigel Superflow™ resins (Sterogene Bioseparations Inc., Carlsbad Calif.), and Dynabeads™ (Dynal Inc., Lake Success, N.Y.). [0167] The antigen-binding molecule can be coupled to a compound, e.g., a label such as a radioactive nucleus, or imaging agent, e.g. a radioactive, enzymatic, or other, e.g., imaging agent, e.g., a NMR contrast agent. Labels which produce detectable radioactive emissions or fluorescence are preferred. An anti- EPM marker polypeptide antigen-binding molecule (e.g., monoclonal antibody) can be used to detect EPM marker polypeptides (e.g., in a cellular lysate or cell supernatant) in order to evaluate the abundance and pattern of expression ofthe protein. In certain advantageous application in accordance with the present invention, such antigen-binding molecules can be used to monitor EPM marker polypeptides levels in biological samples (including whole cells and fluids) for diagnosing the presence, absence, degree, or stage of development of EPM. Detection can be facilitated by coupling (i.e.,
physically linking) the antibody to a detectable substance (i.e., antibody labeling). Examples of detectable substances include various enzymes, prosthetic groups, fluorescent materials, luminescent materials, bioluminescent materials, and radioactive materials. Examples of suitable enzymes include horseradish peroxidase, alkaline phosphatase, β-galactosidase, or acetylcholinesterase; examples of suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; examples of suitable fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an example of a luminescent material includes luminol; examples of bioluminescent materials include luciferase, luciferin, and aequorin, and examples of suitable radioactive material include 1251, 1311, 35S or 3H. The label may be selected from a group including a chromogen, a catalyst, an enzyme, a fluorophore, a chemiluminescent molecule, a lanthanide ion such as Europium (Eu34), a radioisotope and a direct visual label. In the case of a direct visual label, use may be made of a colloidal metallic or non-metallic particle, a dye particle, an enzyme or a substrate, an organic polymer, a latex particle, a liposome, or other vesicle containing a signal producing substance and the like. [0168] A large number of enzymes useful as labels is disclosed in United States Patent Specifications U.S. 4,366,241, U.S. 4,843,000, and U.S. 4,849,338. Enzyme labels useful in the present invention include alkaline phosphatase, horseradish peroxidase, luciferase, β- galactosidase, glucose oxidase, lysozyme, malate dehydrogenase and the like. The enzyme label may be used alone or in combination with a second enzyme in solution.
7. Methods of detecting aberrant EPM marker gene expression or the presence of EPM marker polynucleotides [0169] The present invention is predicated in part on the discovery that: [0170] (1) horses with clinical evidence of EPM have aberrant expression of certain genes (referred to herein as "EPM marker genes") whose transcripts include, but are not limited to, SEQ ID NQ: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56, 57 or 422, as compared to normal horses or to horses lacking EPM; [0171] Accordingly, in certain embodiments, the invention features a method for diagnosing the presence, absence, degree or stage of EPM in a subject, which is typically of equine origin, by detecting aberrant expression of an EPM diagnostic marker gene in a biological sample obtained from the subject. Accordingly, in order to make such diagnoses, it will be desirable to qualitatively or quantitatively determine the levels of EPM marker gene transcripts or the level or functional activity of EPM marker polypeptides. In some embodiments, the presence, degree, or stage of development of EPM is diagnosed when an
EPM marker gene product is expressed at a detectably lower level in the biological sample as compared to the level at which that gene is expressed in a reference sample obtained from normal subjects or from subjects lacking EPM. In other embodiments, the presence, degree, or stage of development of EPM is diagnosed when an EPM marker gene product is expressed at a detectably higher level in the biological sample as compared to the level at which that gene is expressed in a reference sample obtained from normal subjects or from subjects lacking EPM. Generally, such diagnoses are made when the level or functional activity of an EPM marker gene product in the biological sample varies from the level or functional activity of a corresponding EPM marker gene product in the reference sample by at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 92%, 94%, 96%, 97%, 98% or 99%, or even by at least about 99.5%, 99.9%, 99.95%, 99.99%, 99.995% or 99.999%, or even by at least about 100%, 200%, 300%, 400%, 500%, 600%, 700%, 800%, 900% or 1000%. The corresponding gene product is generally selected from the same gene product that is present in the biological sample, a gene product expressed from a variant gene (e.g., an homologous or orthologous gene) including an allelic variant, or a splice variant or protein product thereof. In some embodiments, the method comprises measuring the level or functional activity of individual expression products of at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 29 or 30 EPM marker genes. [0172] Generally, the biological sample contains blood, especially peripheral blood, or a fraction or extract thereof. Typically, the biological sample comprises blood cells such as mature, immature and developing leukocytes, including lymphocytes, polymorphonuclear leukocytes, neutrophils, monocytes, reticulocytes, basophils, coelomocytes, hemocytes, eosinophils, megakaryocytes, macrophages, dendritic cells natural killer cells, or fraction of such cells (e.g., a nucleic acid or protein fraction). In specific embodiments, the biological sample comprises leukocytes including peripheral blood mononuclear cells (PBMC).
7.1 Nucleic acid-based diagnostics [0173] Nucleic acid used in polynucleotide-based assays can be isolated from cells contained in the biological sample, according to standard methodologies (Sambrook, et al, 1989, supra; and Ausubel et al, 1994, supra). The nucleic acid is typically fractionated (e.g., poly A+ RNA) or whole cell RNA. Where RNA is used as the subject of detection, it may be desired to convert the RNA to a complementary DNA. In some embodiments, the nucleic acid is amplified by a template-dependent nucleic acid amplification technique. A number of template dependent processes are available to amplify the EPM marker sequences present in a given template sample. An exemplary nucleic acid amplification technique is the polymerase chain reaction (referred to as PCR) which is described in detail in U.S. Pat. Nos. 4,683,195, 4,683,202 and 4,800,159, Ausubel et al. {supra), and in Innis et al, ("PCR Protocols", Academic Press,
Inc., San Diego Calif., 1990). Briefly, in PCR, two primer sequences are prepared that are complementary to regions on opposite complementary strands ofthe marker sequence. An excess of deoxynucleoside triphosphates are added to a reaction mixture along with a DNA polymerase, e.g., Taq polymerase. If a cognate EPM marker sequence is present in a sample, the primers will bind to the marker and the polymerase will cause the primers to be extended along the marker sequence by adding on nucleotides. By raising and lowering the temperature ofthe reaction mixture, the extended primers will dissociate from the marker to form reaction products, excess primers will bind to the marker and to the reaction products and the process is repeated. A reverse transcriptase PCR amplification procedure may be performed in order to quantify the amount of mRNA amplified. Methods of reverse transcribing RNA into cDNA are well known and described in Sambrook et al, 1989, supra. Alternative methods for reverse transcription utilize thermostable, RNA-dependent DNA polymerases. These methods are described in WO 90/07641. Polymerase chain reaction methodologies are well known in the art. [0174] In certain advantageous embodiments, the template-dependent amplification involves the quantification of transcripts in real-time. For example, RNA or DNA may be quantified using the Real-Time PCR technique (Higuchi, 1992, et al, Biotechnology 10: 413- 417). By determining the concentration ofthe amplified products ofthe target DNA in PCR reactions that have completed the same number of cycles and are in their linear ranges, it is possible to determine the relative concentrations ofthe specific target sequence in the original DNA mixture. If the DNA mixtures are cDNAs synthesized from RNAs isolated from different tissues or cells, the relative abundance ofthe specific mRNA from which the target sequence was derived can be determined for the respective tissues or cells. This direct proportionality between the concentration ofthe PCR products and the relative mRNA abundance is only true in the linear range of the PCR reaction. The final concentration of the target DNA in the plateau portion ofthe curve is determined by the availability of reagents in the reaction mix and is independent ofthe original concentration of target DNA. [0175] Another method for amplification is the ligase chain reaction ("LCR"), disclosed in EPO No. 320 308. In LCR, two complementary probe pairs are prepared, and in the presence ofthe target sequence, each pair will bind to opposite complementary strands ofthe target such that they abut. In the presence of a ligase, the two probe pairs will link to form a single unit. By temperature cycling, as in PCR, bound ligated units dissociate from the target and then serve as "target sequences" for ligation of excess probe pairs. U.S. Pat. No. 4,883,750 describes a method similar to LCR for binding probe pairs to a target sequence. [0176] Q 3 Replicase, described in PCT Application No. PCT/US87/00880, may also be used. In this method, a replicative sequence of RNA that has a region complementary to
that of a target is added to a sample in the presence of an RNA polymerase. The polymerase will copy the replicative sequence that can then be detected. [0177] An isothermal amplification method, in which restriction endonucleases and ligases are used to achieve the amplification of target molecules that contain nucleotide 5'cϋ-thio- triphosphates in one strand of a restriction site may also be useful in the amplification of nucleic acids in the present invention, Walker et al, (1992, Proc. Natl. Acad. Sci. U.S.A 89: 392-396). [0178] Strand Displacement Amplification (SDA) is another method of carrying out isothermal amplification of nucleic acids which involves multiple rounds of strand displacement and synthesis, i.e., nick translation. A similar method, called Repair Chain Reaction (RCR), involves annealing several probes throughout a region targeted for amplification, followed by a repair reaction in which only two ofthe four bases are present. The other two bases can be added as biotinylated derivatives for easy detection. A similar approach is used in SDA. Target specific sequences can also be detected using a cyclic probe reaction (CPR). In CPR, a probe having 3' and 5' sequences of non-specific DNA and a middle sequence of specific RNA is hybridized to DNA that is present in a sample. Upon hybridization, the reaction is treated with RNase H, and the products ofthe probe identified as distinctive products that are released after digestion. The original template is annealed to another cycling probe and the reaction is repeated. [0179] Still another amplification method described in GB Application No. 2 202 328, and in PCT Application No. PCT/US89/01025, may be used. In the former application, "modified" primers are used in a PCR-like, template- and enzyme-dependent synthesis. The primers may be modified by labeling with a capture moiety (e.g., biotin) and/or a detector moiety (e.g., enzyme). In the latter application, an excess of labeled probes are added to a sample. In the presence ofthe target sequence, the probe binds and is cleaved catalytically. After cleavage, the target sequence is released intact to be bound by excess probe. Cleavage of the labeled probe signals the presence of the target sequence. [0180] Other nucleic acid amplification procedures include transcription-based amplification systems (TAS), including nucleic acid sequence based amplification (NASBA) and 3SR (Kwoh et al, 1989, Proc. Natl. Acad. Sci. U.S.A., 86: 1173; Gingeras et al, PCT Application WO 88/10315). In NASBA, the nucleic acids can be prepared for amplification by standard phenol/chloroform extraction, heat denaturation of a clinical sample, treatment with lysis buffer and minispin columns for isolation of DNA and RNA or guanidinium chloride extraction of RNA. These amplification techniques involve annealing a primer which has target specific sequences. Following polymerization, DNA RNA hybrids are digested with RNase H while double stranded DNA molecules are heat denatured again. In either case the single stranded DNA is made fully double stranded by addition of second target specific primer,
followed by polymerisation. The double-stranded DNA molecules are then multiply transcribed by an RNA polymerase such as T7 or SP6. In an isothermal cyclic reaction, the RNAs are reverse transcribed into single stranded DNA, which is then converted to double stranded DNA, and then transcribed once again with an RNA polymerase such as T7 or SP6. The resulting products, whether truncated or complete, indicate target specific sequences. [0181] Davey et al, EPO No. 329 822 disclose a nucleic acid amplification process involving cyclically synthesizing single-stranded RNA ("ssRNA"), ssDNA, and double- stranded DNA (dsDNA), which may be used in accordance with the present invention. The ssRNA is a template for a first primer oligonucleotide, which is elongated by reverse franscriptase (RNA-dependent DNA polymerase). The RNA is then removed from the resulting DNA:RNA duplex by the action of ribonuclease H (RNase H, an RNase specific for RNA in duplex with either DNA or RNA). The resultant ssDNA is a template for a second primer, which also includes the sequences of an RNA polymerase promoter (exemplified by T7 RNA polymerase) 5' to its homology to the template. This primer is then extended by DNA polymerase (exemplified by the large "Klenow" fragment of E. coli DNA polymerase I), resulting in a double-stranded DNA ("dsDNA") molecule, having a sequence identical to that of the original RNA between the primers and having additionally, at one end, a promoter sequence. This promoter sequence can be used by the appropriate RNA polymerase to make many RNA copies ofthe DNA. These copies can then re-enter the cycle leading to very swift amplification. With proper choice of enzymes, this amplification can be done isothermally without addition of enzymes at each cycle. Because ofthe cyclical nature of this process, the starting sequence can be chosen to be in the form of either DNA or RNA. [0182] Miller et al. in PCT Application WO 89/06700 disclose a nucleic acid sequence amplification scheme based on the hybridization of a promoter/primer sequence to a target single-stranded DNA ("ssDNA") followed by transcription of many RNA copies ofthe sequence. This scheme is not cyclic, i.e., new templates are not produced from the resultant RNA transcripts. Other amplification methods include "RACE" and "one-sided PCR" (Frohman, M. A., In: "PCR Protocols: A Guide to Methods and Applications", Academic Press, N.Y., 1990; Ohara etal, 1989, Proc. Natl Acad. Sci. U.S.A., 86: 5673-567). [0183] Methods based on ligation of two (or more) oligonucleotides in the presence of nucleic acid having the sequence ofthe resulting "di-oligonucleotide", thereby amplifying the di-oligonucleotide, may also be used for amplifying target nucleic acid sequences. Wu et al, (1989, Genomics 4: 560). [0184] Depending on the format, the EPM marker nucleic acid of interest is identified in the sample directly using a template-dependent amplification as described, for example, above, or with a second, known nucleic acid following amplification. Next, the
identified product is detected. In certain applications, the detection may be performed by visual means (e.g., ethidium bromide staining of a gel). Alternatively, the detection may involve indirect identification ofthe product via chemiluminescence, radioactive scintigraphy of radiolabel or fluorescent label or even via a system using electrical or thermal impulse signals (Affymax Technology; Bellus, 1994, JMacromol Sci. Pure, Appl Chem., A31(l): 1355-1376). [0185] In some embodiments, amplification products or "amplicons" are visualized in order to confirm amplification ofthe EPM marker sequences. One typical visualization method involves staining of a gel with ethidium bromide and visualization under UV light. Alternatively, if the amplification products are integrally labeled with radio- or fluorometrically- labelled nucleotides, the amplification products can then be exposed to x-ray film or visualized under the appropriate stimulating spectra, following separation. In some embodiments, visualization is achieved indirectly. Following separation of amplification products, a labeled nucleic acid probe is brought into contact with the amplified EPM marker sequence. The probe is suitably conjugated to a chromophore but may be radiolabeled. Alternatively, the probe is conjugated to a binding partner, such as an antigen-binding molecule, or biotin, and the other member ofthe binding pair carries a detectable moiety or reporter molecule. The techniques involved are well known to those of skill in the art and can be found in many standard texts on molecular protocols (e.g., see Sambrook et al, 1989, supra and Ausubel et al. 1994, supra). For example, chromophore or radiolabel probes or primers identify the target during or following amplification. [0186] In certain embodiments, target nucleic acids are quantified using blotting techniques, which are well lαiown to those of skill in the art. Southern blotting involves the use of DNA as a target, whereas Northern blotting involves the use of RNA as a target. Each provide different types of information, although cDNA blotting is analogous, in many aspects, to blotting or RNA species. Briefly, a probe is used to target a DNA or RNA species that has been immobilized on a suitable matrix, often a filter of nitrocellulose. The different species should be spatially separated to facilitate analysis. This often is accomplished by gel electrophoresis of nucleic acid species followed by "blotting" on to the filter. Subsequently, the blotted target is incubated with a probe (usually labeled) under conditions that promote denaturation and rehybridization. Because the probe is designed to base pair with the target, the probe will bind a portion ofthe target sequence under renaturing conditions. Unbound probe is then removed, and detection is accomplished as described above. [0187] Following detection quantification, one may compare the results seen in a given subject with a control reaction or a statistically significant reference group of normal subjects or of subjects lacking EPM. In this way, it is possible to correlate the amount of a EPM marker nucleic acid detected with the progression or severity ofthe disease.
[0188] Also contemplated are genotyping methods and allelic discrimination methods and technologies such as those described by Kristensen et al. (Biotechniques 30(2):
318-322), including the use of single nucleotide polymorphism analysis, high performance liquid chromatography, TaqMan®, liquid chromatography, and mass spectrometry. [0189] Also contemplated are biochip-based technologies such as those described by Hacia et al. (1996, Nature Genetics 14: 441-447) and Shoemaker et al. (1996, Nature Genetics 14: 450-456). Briefly, these techniques involve quantitative methods for analysing large numbers of genes rapidly and accurately. By tagging genes with oligonucleotides or using fixed probe arrays, one can employ biochip technology to segregate target molecules as high density arrays and screen these molecules on the basis of hybridization. See also Pease et al.
(1994, Proc. Natl. Acad. Sci. U.S.A. 91: 5022-5026); Fodor et al. (1991, Science 251: 767-773). Briefly, nucleic acid probes to EPM marker polynucleotides are made and attached to biochips to be used in screening and diagnostic methods, as outlined herein. The nucleic acid probes attached to the biochip are designed to be substantially complementary to specific expressed EPM marker nucleic acids, i.e., the target sequence (either the target sequence ofthe sample or to other probe sequences, for example in sandwich assays), such that hybridization ofthe target sequence and the probes ofthe present invention occurs. This complementarity need not be perfect; there may be any number of base pair mismatches which will interfere with hybridization between the target sequence and the nucleic acid probes ofthe present invention. However, if the number of mismatches is so great that no hybridization can occur under even the least stringent of hybridization conditions, the sequence is not a complementary target sequence. In certain embodiments, more than one probe per sequence is used, with either overlapping probes or probes to different sections ofthe target being used. That is, two, three, four or more probes, with three being desirable, are used to build in a redundancy for a particular target. The probes can be overlapping (i.e. have some sequence in common), or separate. [0190] As will be appreciated by those of ordinary skill in the art, nucleic acids can be attached to or immobilized on a solid support in a wide variety of ways. By "immobilized" and grammatical equivalents herein is meant the association or binding between the nucleic acid probe and the solid support is sufficient to be stable under the conditions of binding, washing, analysis, and removal as outlined below. The binding can be covalent or non-covalent. By "non- covalent binding" and grammatical equivalents herein is meant one or more of either electrostatic, hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the covalent attachment of a molecule, such as, sfreptavidin to the support and the non-covalent binding ofthe biotinylated probe to the sfreptavidin. By "covalent binding" and grammatical equivalents herein is meant that the two moieties, the solid support and the probe, are attached by at least one bond, including sigma bonds, pi bonds and coordination bonds. Covalent bonds
can be formed directly between the probe and the solid support or can be formed by a cross linker or by inclusion of a specific reactive group on either the solid support or the probe or both molecules. Immobilization may also involve a combination of covalent and non-covalent interactions. [0191] In general, the probes are attached to the biochip in a wide variety of ways, as will be appreciated by those in the art. As described herein, the nucleic acids can either be synthesized first, with subsequent attachment to the biochip, or can be directly synthesized on the biochip. [0192] The biochip comprises a suitable solid or semi-solid substrate or solid support. By "substrate" or "solid support" is meant any material that can be modified to contain discrete individual sites appropriate for the attachment or association ofthe nucleic acid probes and is amenable to at least one detection method. As will be appreciated by practitioners in the art, the number of possible substrates are very large, and include, but are not limited to, glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes,
Teflon™, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica-based materials including silicon and modified silicon, carbon, metals, inorganic glasses, plastics, etc. In general, the substrates allow optical detection and do not appreciably fluorescese. [0193] Generally the substrate is planar, although as will be appreciated by those of skill in the art, other configurations of substrates may be used as well. For example, the probes may be placed on the inside surface of a tube, for flow-through sample analysis to minimize sample volume. Similarly, the substrate may be flexible, such as a flexible foam, including closed cell foams made of particular plastics. [0194] In certain embodiments, oligonucleotides probes are synthesized on the substrate, as is lαiown in the art. For example, photoactivation techniques utilizing photopolymerisation compounds and techniques can be used. In an illustrative example, the nucleic acids are synthesized in situ, using well known photolithographic techniques, such as those described in WO 95/25116; WO 95/35505; U.S. Pat. Nos. 5,700,637 and 5,445,934; and references cited within; these methods of attachment form the basis ofthe Affymetrix GeneChip™ technology. [0195] In an illustrative biochip analysis, oligonucleotide probes on the biochip are exposed to or contacted with a nucleic acid sample suspected of containing one or more EPM polynucleotides under conditions favoring specific hybridization. Sample extracts of DNA or
RNA, either single or double-stranded, may be prepared from fluid suspensions of biological materials, or by grinding biological materials, or following a cell lysis step which includes, but is not limited to, lysis effected by treatment with SDS (or other detergents), osmotic shock,
guanidinium isothiocyanate and lysozyme. Suitable DNA, which may be used in the method of the invention, includes cDNA. Such DNA may be prepared by any one of a number of commonly used protocols as for example described in Ausubel, et al, 1994, supra, and Sambrook, et al, et al, 1989, supra. [0196] Suitable RNA, which may be used in the method of the invention, includes messenger RNA, complementary RNA transcribed from DNA (cRNA) or genomic or subgenomic RNA. Such RNA may be prepared using standard protocols as for example described in the relevant sections of Ausubel, et al. 1994, supra and Sambrook, et al. 1989, supra). [0197] cDNA may be fragmented, for example, by sonication or by treatment with restriction endonucleases. Suitably, cDNA is fragmented such that resultant DNA fragments are of a length greater than the length ofthe immobilized oligonucleotide probe(s) but small enough to allow rapid access thereto under suitable hybridization conditions. Alternatively, fragments of cDNA may be selected and amplified using a suitable nucleotide amplification technique, as described for example above, involving appropriate random or specific primers. [0198] Usually the target EPM marker polynucleotides are detectably labeled so that their hybridization to individual probes can be determined. The target polynucleotides are typically detectably labeled with a reporter molecule illustrative examples of which include chromogens, catalysts, enzymes, fluorochromes, chemiluminescent molecules, bioluminescent molecules, lanthanide ions (e.g., Eu34), a radioisotope and a direct visual label. In the case of a direct visual label, use may be made of a colloidal metallic or non-metallic particle, a dye particle, an enzyme or a substrate, an organic polymer, a latex particle, a liposome, or other vesicle containing a signal producing substance and the like. Illustrative labels of this type include large colloids, for example, metal colloids such as those from gold, selenium, silver, tin and titanium oxide. In some embodiments in which an enzyme is used as a direct visual label, biotinylated bases are incorporated into a target polynucleotide. Hybridization is detected by • incubation with streptavidin-reporter molecules. [0199] Suitable fluorochromes include, but are not limited to, fluorescein isothiocyanate (FITC), tetramethylrhodamine isothiocyanate (TRITC), R-Phycoerythrin (RPE), and Texas Red. Other exemplary fluorochromes include those discussed by Dower et al.
(International Publication WO 93/06121). Reference also may be made to the fluorochromes described in U.S. Patents 5,573,909 (Singer et al), 5,326,692 (Brinkley et al). Alternatively, reference may be made to the fluorochromes described in U.S. Patent Nos. 5,227,487, 5,274,113, 5,405,975, 5,433,896, 5,442,045, 5,451,663, 5,453,517, 5,459,276, 5,516,864, 5,648,270 and 5,723,218. Commercially available fluorescent labels include, for example,
fluorescein phosphoramidites such as Fluoreprime™ (Pharmacia), Fluoredite™ (Millipore) and FAM (Applied Biosystems International) [0200] Radioactive reporter molecules include, for example, 32P, which can be detected by an X-ray or phosphoimager techniques. [0201] The hybrid-forming step can be performed under suitable conditions for hybridizing oligonucleotide probes to test nucleic acid including DNA or RNA. In this regard, reference may be made, for example, to NUCLEIC ACID HYBRIDIZATION, A PRACTICAL APPROACH (Homes and Higgins, eds.) (IRL press, Washington D.C., 1985). In general, whether hybridization takes place is influenced by the length ofthe oligonucleotide probe and the polynucleotide sequence under test, the pH, the temperature, the concentration of mono- and divalent cations, the proportion of G and C nucleotides in the hybrid-forming region, the viscosity ofthe medium and the possible presence of denaturants. Such variables also influence the time required for hybridization. The preferred conditions will therefore depend upon the particular application. Such empirical conditions, however, can be routinely determined without undue experimentation. [0202] In certain advantageous embodiments, high discrimination hybridization conditions are used. For example, reference may be made to Wallace et al. (1979, Nucl. Acids Res. 6: 3543) who describe conditions that differentiate the hybridization of 11 to 17 base long oligonucleotide probes that match perfectly and are completely homologous to a target sequence as compared to similar oligonucleotide probes that contain a single internal base pair mismatch. Reference also may be made to Wood et al. (1985, Proc. Natl. Acid. Sci. USA 82: 1585) who describe conditions for hybridization of 11 to 20 base long oligonucleotides using 3M tetramethyl ammonium chloride wherein the melting point ofthe hybrid depends only on the length ofthe oligonucleotide probe, regardless of its GC content. In addition, Drmanac et al {supra) describe hybridization conditions that allow stringent hybridization of 6-10 nucleotide long oligomers, and similar conditions may be obtained most readily by using nucleotide analogues such as 'locked nucleic acids (Christensen et al, 2001 Biochem J 354: 481-4). [0203] Generally, a hybridization reaction can be performed in the presence of a hybridization buffer that optionally includes a hybridization-optimizing agent, such as an isostabilising agent, a denaturing agent and/or a renaturation accelerant. Examples of isostabilising agents include, but are not restricted to, betaines and lower tetraalkyl ammonium salts. Denaturing agents are compositions that lower the melting temperature of double stranded nucleic acid molecules by interfering with hydrogen bonding between bases in a double stranded nucleic acid or the hydration of nucleic acid molecules. Denaturing agents include, but are not restricted to, formamide, formaldehyde, dimethylsulf oxide, tetraethyl acetate, urea, guanidium isothiocyanate, glycerol and chaotropic salts. Hybridization accelerants include
heterogeneous nuclear ribonucleoprotein (hnRP) Al and cationic detergents such as cetyltrimethylammonium bromide (CTAB) and dodecyl trimethylammonium bromide (DTAB), polylysine, spermine, spermidine, single stranded binding protein (SSB), phage T4 gene 32 protein and a mixture of ammonium acetate and ethanol. Hybridization buffers may include target polynucleotides at a concentration between about 0.005 nM and about 50 nM, preferably between about 0.5 nM and 5 nM, more preferably between about 1 nM and 2 nM. [0204] A hybridization mixture containing the target EPM marker polynucleotides is placed in contact with the array of probes and incubated at a temperature and for a time appropriate to permit hybridization between the target sequences in the target polynucleotides and any complementary probes. Contact can take place in any suitable container, for example, a dish or a cell designed to hold the solid support on which the probes are bound. Generally, incubation will be at temperatures normally used for hybridization of nucleic acids, for example, between about 20° C and about 75° C, example, about 25° C, about 30° C, about 35° C, about 40° C, about 45° C, about 50° C, about 55° C, about 60° C, or about 65° C. For probes longer than 14 nucleotides, 20° C to 50° C is desirable. For shorter probes, lower temperatures are preferred. A sample of target polynucleotides is incubated with the probes for a time sufficient to allow the desired level of hybridization between the target sequences in the target polynucleotides and any complementary probes. For example, the hybridization may be carried out at about 45° C +/-10° C in formamide for 1-2 days. [0205] After the hybrid-forming step, the probes are washed to remove any unbound nucleic acid with a hybridization buffer, which can typically comprise a hybridization optimizing agent in the same range of concentrations as for the hybridization step. This washing step leaves only bound target polynucleotides. The probes are then examined to identify which probes have hybridized to a target polynucleotide. [0206] The hybridization reactions are then detected to determine which ofthe probes has hybridized to a corresponding target sequence. Depending on the nature ofthe reporter molecule associated with a target polynucleotide, a signal may be instrumentally detected by irradiating a fluorescent label with light and detecting fluorescence in a fluorimeter; by providing for an enzyme system to produce a dye which could be detected using a specfrophotometer; or detection of a dye particle or a colored colloidal metallic or non metallic particle using a reflectometer; in the case of using a radioactive label or chemiluminescent molecule employing a radiation counter or autoradiography. Accordingly, a detection means may be adapted to detect or scan light associated with the label which light may include fluorescent, luminescent, focussed beam or laser light. In such a case, a charge couple device (CCD) or a photocell can be used to scan for emission of light from a probe:target polynucleotide hybrid from each location in the micro-array and record the data directly in a
digital computer. In some cases, electronic detection ofthe signal may not be necessary. For example, with enzymatically generated color spots associated with nucleic acid array format, visual examination ofthe array will allow interpretation ofthe pattern on the array. In the case of a nucleic acid array, the detection means is suitably interfaced with pattern recognition software to convert the pattern of signals from the array into a plain language genetic profile. In certain embodiments, oligonucleotide probes specific for different EPM marker gene products are in the form of a nucleic acid array and detection of a signal generated from a reporter molecule on the array is performed using a 'chip reader'. A detection system that can be used by a 'chip reader' is described for example by Pirrung et al (U.S. Patent No. 5,143,854). The chip reader will typically also incorporate some signal processing to determine whether the signal at a particular array position or feature is a true positive or maybe a spurious signal. Exemplary chip readers are described for example by Fodor et al (U.S. Patent No., 5,925,525). Alternatively, when the array is made using a mixture of individually addressable kinds of labeled microbeads, the reaction may be detected using flow cytometry. 7.2 Protein-based diagnostics [0207] Consistent with the present invention, the presence of an aberrant concentration of an EPM marker protein is indicative ofthe presence, degree, or stage of development of EPM. EPM marker protein levels in biological samples can be assayed using any suitable method known in the art. For example, when an EPM marker protein is an enzyme, the protein can be quantified based upon its catalytic activity or based upon the number of molecules ofthe protein contained in a sample. Antibody-based techniques may be employed, such as, for example, immunohistological and immunohistochemical methods for measuring the level of a protein of interest in a tissue sample. For example, specific recognition is provided by a primary antibody (polyclonal or monoclonal) and a secondary detection system is used to detect presence (or binding) ofthe primary antibody. Detectable labels can be conjugated to the secondary antibody, such as a fluorescent label, a radiolabel, or an enzyme (e.g., alkaline phosphatase, horseradish peroxidase) which produces a quantifiable, e.g., coloured, product. In another suitable method, the primary antibody itself can be detectably labeled. As a result, immunohistological labeling of a tissue section is provided. In some embodiments, a protein extract is produced from a biological sample (e.g., tissue, cells) for analysis. Such an extract (e.g., a detergent extract) can be subjected to western-blot or dot/slot assay ofthe level ofthe protein of interest, using routine immunoblotting methods (Jalkanen et al, 1985, J. Cell. Biol. 101: 976-985; Jalkanen etal, 1987, J. Cell. Biol. 105: 3087-3096). [0208] Other useful antibody-based methods include immunoassays, such as the enzyme-linked immunosorbent assay (ELISA) and the radioimmunoassay (RIA). For example, a protein-specific monoclonal antibody, can be used both as an immunoadsorbent and as an
enzyme-labeled probe to detect and quantify an EPM marker protein of interest. The amount of such protein present in a sample can be calculated by reference to the amount present in a standard preparation using a linear regression computer algorithm (see Lacobilli et al, 1988, Breast Cancer Research and Treatment 11: 19-30). In other embodiments, two different monoclonal antibodies to the protein of interest can be employed, one as the immunoadsorbent and the other as an enzyme-labeled probe. [0209] Additionally, recent developments in the field of protein capture arrays permit the simultaneous detection and/or quantification of a large number of proteins. For example, low-density protein arrays on filter membranes, such as the universal protein array system (Ge, 2000 Nucleic Acids Res. 28(2):e3) allow imaging of arrayed antigens using standard ELISA techniques and a scanning charge-coupled device (CCD) detector. Immuno- sensor arrays have also been developed that enable the simultaneous detection of clinical analytes. It is now possible using protein arrays, to profile protein expression in bodily fluids, such as in sera of healthy or diseased subjects, as well as in subjects pre- and post-drug treatment. [0210] Protein capture arrays typically comprise a plurality of protein-capture agents each of which defines a spatially distinct feature ofthe array. The protein-capture agent can be any molecule or complex of molecules which has the ability to bind a protein and immobilize it to the site ofthe protein-capture agent on the array. The protein-capture agent may be a protein whose natural function in a cell is to specifically bind another protein, such as an antibody or a receptor. Alternatively, the protein-capture agent may instead be a partially or wholly synthetic or recombinant protein which specifically binds a protein. Alternatively, the protein-capture agent may be a protein which has been selected in vitro from a mutagenized, randomized, or completely random and synthetic library by its binding affinity to a specific protein or peptide target. The selection method used may optionally have been a display method such as ribosome display or phage display, as lαiown in the art. Alternatively, the protein- capture agent obtained via in vitro selection may be a DNA or RNA aptamer which specifically binds a protein target (see, e.g., Potyrailo et al, 1998 Anal. Chem. 70:3419-3425; Cohen et al, 1998, Proc. Natl. Acad. Sci. USA 95:14272-14277; Fukuda, et al, 1997 Nucleic Acids Symp. Ser. 37:237-238; available from SomaLogic). For example, aptamers are selected from libraries of oligonucleotides by the Selex™ process and their interaction with protein can be enhanced by covalent attachment, through incorporation of brominated deoxyuridine and UV-activated crosslinking (photoaptamers). Aptamers have the advantages of ease of production by automated oligonucleotide synthesis and the stability and robustness of DNA; universal fluorescent protein stains can be used to detect binding. Alternatively, the in vitro selected protein-capture agent may be a polypeptide (e.g., an antigen) (see, e.g., Roberts and Szostak,
1997 Proc. Natl. Acad. Sci. USA, 94:12297-12302).
[0211] An alternative to an array of capture molecules is one made through
'molecular imprinting' technology, in which peptides (e.g., from the C-terminal regions of proteins) are used as templates to generate structurally complementary, sequence-specific cavities in a polymerisable matrix; the cavities can then specifically capture (denatured) proteins which have the appropriate primary amino acid sequence (e.g., available from ProteinPrint™ and Aspira Biosystems). [0212] Exemplary protein capture arrays include arrays comprising spatially addressed antigen-binding molecules, commonly referred to as antibody arrays, which can facilitate extensive parallel analysis of numerous proteins defining a proteome or subproteome. Antibody arrays have been shown to have the required properties of specificity and acceptable background, and some are available commercially (e.g., BD Biosciences, Clontech, BioRad and Sigma). Various methods for the preparation of antibody arrays have been reported (see, e.g., Lopez et al, 2003 J. Chromatogr. B 787:19-27; Cahill, 2000 Trends in Biotechnology 7:47-51; U.S. Pat. App. Pub. 2002/0055186; U.S. Pat. App. Pub. 2003/0003599; PCT publication WO 03/062444; PCT publication WO 03/077851 ; PCT publication WO 02/59601 ; PCT publication WO 02/39120; PCT publication WO 01/79849; PCT publication WO 99/39210). The antigen- binding molecules of such arrays may recognise at least a subset of proteins expressed by a cell or population of cells, illustrative examples of which include growth factor receptors, hormone receptors, neurotransmitter receptors, catecholamine receptors, amino acid derivative receptors, cytokine receptors, extracellular matrix receptors, antibodies, lectins, cytokines, serpins, proteases, kinases, phosphatases, ras-like GTPases, hydrolases, steroid hormone receptors, transcription factors, heat-shock transcription factors, DNA-binding proteins, zinc-finger proteins, leucine-zipper proteins, homeodomain proteins, intracellular signal transduction modulators and effectors, apoptosis-related factors, DNA synthesis factors, DNA repair factors, DNA recombination factors, cell-surface antigens, hepatitis C virus (HCV) proteases and HIV proteases. [0213] Antigen-binding molecules for antibody arrays are made either by conventional immunization (e.g., polyclonal sera and hybridomas), or as recombinant fragments, usually expressed in E. coli, after selection from phage display or ribosome display libraries (e.g., available from Cambridge Antibody Technology, Biofcivent, Affitech and Biosite). Alternatively, 'combibodies' comprising non-covalent associations ofVH and VL domains, can be produced in a matrix format created from combinations of diabody-producing bacterial clones (e.g., available from Domantis). Exemplary antigen-binding molecules for use as protein-capture agents include monoclonal antibodies, polyclonal antibodies, Fv, Fab, Fab' and F(ab')2 immunoglobulin fragments, synthetic stabilized Fv fragments, e.g., single chain Fv fragments (scFv), disulfide stabilized Fv fragments (dsFv), single variable region domains
(dAbs) minibodies, combibodies and multivalent antibodies such as diabodies and multi-scFv, single domains from camelids or engineered human equivalents. [0214] Individual spatially distinct protein-capture agents are typically attached to a support surface, which is generally planar or contoured. Common physical supports include glass slides, silicon, microwells, nitrocellulose or PVDF membranes, and magnetic and other microbeads. [0215] While microdrops of protein delivered onto planar surfaces are widely used, related alternative architectures include CD centrifugation devices based on developments in microfluidics (e.g., available from Gyros) and specialized chip designs, such as engineered microchannels in a plate (e.g., The Living Chip™, available from Biofrove) and tiny 3D posts on a silicon surface (e.g., available from Zyomyx). [0216] Particles in suspension can also be used as the basis of arrays, providing they are coded for identification; systems include color coding for microbeads (e.g., available from Luminex, Bio-Rad and Nanomics Biosystems) and semiconductor nanocrystals (e.g., QDots™, available from Quantum Dots), and barcoding for beads (UltraPlex™, available from
Smartbeads) and multimetal microrods (Nanobarcodes™ particles, available from Surromed). Beads can also be assembled into planar arrays on semiconductor chips (e.g., available from LEAPS technology and BioArray Solutions). Where particles are used, individual protein- capture agents are typically attached to an individual particle to provide the spatial definition or separation ofthe array. The particles may then be assayed separately, but in parallel, in a compartmentalized way, for example in the wells of a microtiter plate or in separate test tubes. [0217] In operation, a protein sample, which is optionally fragmented to form peptide fragments (see, e.g., U.S. Pat. App. Pub. 2002/0055186), is delivered to a protein- capture array under conditions suitable for protein or peptide binding, and the array is washed to remove unbound or non-specifically bound components ofthe sample from the array. Next, the presence or amount of protein or peptide bound to each feature ofthe array is detected using a suitable detection system. The amount of protein bound to a feature ofthe array may be determined relative to the amount of a second protein bound to a second feature ofthe array. In certain embodiments, the amount ofthe second protein in the sample is already known or known to be invariant. [0218] For analyzing differential expression of proteins between two cells or cell populations, a protein sample of a first cell or population of cells is delivered to the array under conditions suitable for protein binding. In an analogous manner, a protein sample of a second cell or population of cells to a second array, is delivered to a second array which is identical to the first array. Both arrays are then washed to remove unbound or non-specifically bound
components ofthe sample from the arrays. In a final step, the amounts of protein remaining bound to the features ofthe first array are compared to the amounts of protein remaining bound to the corresponding features ofthe second array. To determine the differential protein expression pattern ofthe two cells or populations of cells, the amount of protein bound to individual features ofthe first array is subtracted from the amount of protein bound to the corresponding features ofthe second array. [0219] In an illustrative example, fluorescence labeling can be used for detecting protein bound to the array. The same instrumentation as used for reading DNA microarrays is applicable to protein-capture arrays. For differential display, capture arrays (e.g. antibody arrays) can be probed with fluorescently labeled proteins from two different cell states, in which cell lysates are labeled with different fluorophores (e.g., Cy-3 and Cy-5) and mixed, such that the color acts as a readout for changes in target abundance. Fluorescent readout sensitivity can be amplified 10-100 fold by tyramide signal amplification (TSA) (e.g., available from PerkinElmer Lifesciences). Planar waveguide technology (e.g., available from Zeptosens) enables ultrasensitive fluorescence detection, with the additional advantage of no washing procedures. High sensitivity can also be achieved with suspension beads and particles, using phycoerythrin as label (e.g., available from Luminex) or the properties of semiconductor nanocrystals (e.g., available from Quantum Dot). Fluorescence resonance energy transfer has been adapted to detect binding of unlabelled ligands, which may be useful on arrays (e.g., available from Affibody). Several alternative readouts have been developed, including adaptations of surface plasmon resonance (e.g., available from HTS Biosystems and Intrinsic Bioprobes), rolling circle DNA amplification (e.g., available from Molecular Staging), mass spectronietry (e.g., available from Sense Proteomic, Ciphergen, Intrinsic and Bioprobes), resonance light scattering (e.g., available from Genicon Sciences) and atomic force microscopy (e.g., available from BioForce Laboratories). A microfluidics system for automated sample incubation with arrays on glass slides and washing has been co-developed by NextGen and Perkin Elmer Life Sciences. [0220] In certain embodiments, the techniques used for detection of EPM marker expression products will include internal or external standards to permit quantitative or semi- quantitative determination of those products, to thereby enable a valid comparison ofthe level or functional activity of these expression products in a biological sample with the corresponding expression products in a reference sample or samples. Such standards can be determined by the skilled practitioner using standard protocols. In specific examples, absolute values for the level or functional activity of individual expression products are determined. [0221] In specific embodiments, the diagnostic method is implemented using a system as disclosed, for example, in International Publication No. WO 02/090579 and in
copending PCT Application No. PCT/AU03/01517 filed November 14, 2003, comprising at least one end station coupled to a base station. The base station is typically coupled to one or more databases comprising predetermined data from a number of individuals representing the level or functional activity of EPM marker expression products, together with indications ofthe actual status ofthe individuals (e.g., presence, absence, degree, or stage of development of EPM) when the predetermined data was collected. In operation, the base station is adapted to receive from the end station, typically via a communications network, subject data representing a measured or normalized level or functional activity of at least one expression product in a biological sample obtained from a test subject and to compare the subject data to the predetermined data stored in the database(s). Comparing the subject and predetermined data allows the base station to determine the status ofthe subject in accordance with the results ofthe comparison. Thus, the base station attempts to identify individuals having similar parameter values to the test subject and once the status has been determined on the basis of that identification, the base station provides an indication ofthe diagnosis to the end station. 7.3 Kits [0222] All the essential materials and reagents required for detecting and quantifying EPM maker gene expression products may be assembled together in a kit. The kits may also optionally include appropriate reagents for detection of labels, positive and negative controls, washing solutions, blotting membranes, microtiter plates dilution buffers and the like. For example, a nucleic acid-based detection kit may include (i) an EPM marker polynucleotide (which may be used as a positive control), (ii) a primer or probe that specifically hybridizes to an EPM marker polynucleotide. Also included may be enzymes suitable for amplifying nucleic acids including various polymerases (Reverse Transcriptase, Taq, Sequenase™ DNA ligase etc. depending on the nucleic acid amplification technique employed), deoxynucleotides and buffers to provide the necessary reaction mixture for amplification. Such kits also generally will comprise, in suitable means, distinct containers for each individual reagent and enzyme as well as for each primer or probe. Alternatively, a protein-based detection kit may include (i) an EPM marker polypeptide (which may be used as a positive control), (ii) an antigen-binding molecule that is immuno-interactive with an EPM marker polynucleotide. The kit can also feature various devices and reagents for performing one ofthe assays described herein; and/or printed instructions for using the kit to quantify the expression of an EPM marker gene.
8. Methods of treatment or prophylaxis [0223] The present invention also extends to the treatment or prevention of EPM in subjects following positive diagnosis for the presence, or stage of development of EPM in the subjects. Generally, the treatment will include administering to a positively diagnosed subject an effective amount of an agent, typically an antiprotozoal agent, that ameliorates the symptoms
or reverses the development of EPM or that reduces or abrogates a pathogenic infection underlying EPM or that reduces potential ofthe animal to developing EPM or that inhibits the growth of an organism belonging to a protozoal organism (e.g., from the family Sarcocystidae, especially Sarcocystis neurona). Current antiprotozoal agents suitable for treating EPM include, but are not limited to: inhibitors of diaminopyrimidine dihydrofolate reductase (DHFR), illustrative examples of which include sulphonamides, 2, 4-diaminopyrimidine (pyrimethamine) and analogues of paraaminobenzoic acid or combinations thereof as disclosed, for example, in U.S. Pat. Nos. 6,255,308 and 6,448,252; anti-coccidial agents illustrative examples of which include triazine-based anticoccidials such as diclazuril, toltrazuril, ponazuril and sulphonotoltrazuril or combinations thereof as disclosed, for example, in U.S. Pat. No.
5,830,893 and U.S. Pat. Appl. Pub. No. 20030096815; lipophilic nitrothiazole compounds such as nitazoxanide as described, for example, in U.S. Pat. Appl. Pub. No. 20030108596; formulations comprising transfer factor, zinc and at least one essential fatty acid as disclosed, for example, in U.S. Pat. No. 6,506,413; and formulations comprising peroxidic species or reaction products resulting from oxidation of an alkene, such as geraniol, by an oxygen- containing oxidizing agent, such as ozone; a penetrating solvent, such as dimethyl sulfoxide; a dye containing a chelated metal, such as hematoporphyrin; and an aromatic redox compound, such as benzoquinone as described, for example, in U.S. Pat. Appl. Pub. No. 20030032677; and praziquantel compounds as described, for example, in U.S. Pat. Appl. Pub. No. 20020143018. However, it will be understood that the present invention encompasses any agent that is useful for treating or preventing EPM and is not limited to the aforementioned illustrative compounds and formulations. [0224] Typically, the agents will be administered in pharmaceutical (or veterinary) compositions together with a pharmaceutically acceptable carrier and in an effective amount to achieve their intended purpose. The dose of active compounds administered to a subject should be sufficient to achieve a beneficial response in the subject over time such as a reduction in, or relief from, the symptoms of EPM. The quantity ofthe pharmaceutically active compounds(s) to be administered may depend on the subject to be treated inclusive ofthe age, sex, weight and general health condition thereof. In this regard, precise amounts ofthe active compound(s) for administration will depend on the judgement ofthe practitioner. In determining the effective amount ofthe active compound(s) to be administered in the treatment or prevention of EPM, the veterinarian may evaluate severity of any symptom associated with the presence of EPM including asymmetric neurological symptoms, lameness, muscle wasting, stumbling, incoordination, head tilt, inability to walk backwards or in a tight circle, nerve paralysis, soreness, sudden recumbency or sleep, seizures, spasticity, hypermefria, ataxia, paralysis and recumbency. In any event, those of skill in the art may readily determine suitable dosages ofthe antiprotozoal agents and suitable treatment regimens without undue experimentation.
[0225] The antiprotozoal agents may by administered in concert with adjunctive therapies to reduce inflammation ofthe nervous tissue in the affected subject. Illustrative examples of such adjunctive therapies include non steroidal-anti inflammatory drugs (NSAIDs) and dimethylsulfoxides (DMSO). Optionally, folic acid supplementation may also be administered to affected subjects to prevent haematological and fetal development side effects. [0226] In order that the invention may be readily understood and put into practical effect, particular preferred embodiments will now be described by way ofthe following non- limiting examples.
EXAMPLES EXAMPLE 1 SPECIFIC DIAGNOSTIC GENES FOR EPM [0227] Animals were exposed to large numbers of S. neurona spores. A proportion of those animals developed clinical signs of EPM (including presence of antibodies to S. neurona in cerebro spinal fluid, and characteristic lesions in the central nervous system (revealed at autopsy)). Blood samples obtained from exposed animals were analyzed using GeneChips™ (method of use is described below in detail in "Generation of Gene Expression Data") containing thousands of genes expressed in white blood cells of horses. Analysis of these data (see "Identification of Responding Genes and Demonstration of Diagnostic Potential" below) reveal a number of specific genes that differ in expression between animals with and without clinical evidence of EPM from day 2 following infection with S. neurona. Other genes are differentially expressed early in infection (Day 4-17) and others are differentially expressed later in infection (Day 21 on). Despite these varying patterns of differential gene expression over the course of infection, it is possible to design an assay that measures the RNA level in the sample from the expression of at least one and desirably at least two EPM diagnostic marker genes representative transcript sequences of which are set forth in SEQ ID NO: 1, 3, 5, 7, 8, 10, 12, 14, 16, 17, 18, 20, 22, 24, 26, 27, 28, 30, 32, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 54, 55, 56, 57 and 422. This provides a level of specificity and sensitivity both equal to 94%. Alternatively, any combination of at least these two polynucleotides with any ofthe other 31 EPM diagnostic marker polynucleotides listed in Table 1 provides strong diagnostic capacity.
Materials and Methods Blood Collection [0228] Blood is collected from a horse (in a non-agitated state) for the purpose of extraction of high quality RNA or protein. Suitable blood collection tubes for the collection, preservation, transport and isolation of RNA include PAXgene™ tubes (PreAnalytix Inc., Valencia, CA, USA). Alternatively, blood can be collected into tubes containing solutions designed for the preservation of nucleic acids (available from Roche, Ambion, Invitrogen and ABI). For the determination of protein levels, 50 mL of blood is prevented from clotting by collection into a tube containing 4 mL of 4% sodium citrate. White blood cells and plasma are isolated and stored frozen for later analysis and detection of specific proteins. PAXgene tubes can be kept at room temperature prior to RNA extraction. Clinical signs are recorded in a standard format.
Total RNA Extraction [0229] A kit available from Qiagen Inc (Valencia, CA, USA) has the reagents and instructions for the isolation of total RNA from 2.5 mL blood collected in the PAXgene Blood RNA Tube. Isolation begins with a centrifugation step to pellet nucleic acids in the PAXgene blood RNA tube. The pellet is washed and resuspended and incubated in optimized buffers together with Proteinase K to bring about protein digestion. An additional centrifugation is carried out to remove residual cell debris and the supernatant is transferred to a fresh microcentrifuge tube. Ethanol is added to adjust binding conditions, and the lysate is applied to the PAXgene RNA spin column. During brief centrifugation, RNA is selectively bound to the silica-gel membrane as contaminants pass through. Remaining contaminants are removed in three efficient wash steps and RNA is then eluted in Buffer BR5. [0230] Determination of RNA quantity and quality is necessary prior to proceeding and can be achieved using an Agilent Bioanalyzer and Absorbance 260/280 ratio using a specfrophotometer. Generation of Gene Expression Data
Choice of Method [0231] Measurement of specific RNA levels in a tissue sample can be achieved using a variety of technologies. Two common and readily available technologies that are well known in the art are: [0232] • GeneChip® analysis using Affymetrix technology. [0233] • Real-Time Polymerase Chain Reaction (TaqMan™ from Applied Biosystems for example). [0234] GeneChips® quantitate RNA by detection of labeled cRNA hybridized to short oligonucleotides built on a silicon substrate. Details on the technology and methodology can be found at www.affymetrix.com. [0235] Real-Time Polymerase Chain Reaction (RT-PCR) quantitates RNA using two PCR primers, a labeled probe and a thermostable DNA polymerase. As PCR product is generated a dye is released into solution and detected. Internal controls such as 18S RNA probes are often used to determine starting levels of total RNA in the sample. Each gene and the internal confrol are run separately. Details on the technology and methods can be found at www.appliedbiosytems.com or www.qiagen.com or www.biorad..com. Applied Biosystems offer a service whereby the customer provides DNA sequence information and payment and is supplied in return all ofthe reagents required to perform RT-PCR analysis on individual genes.
[0236] GeneChip® analysis has the advantage of being able to analyze thousands of genes at a time. However it is expensive and takes over 3 days to perform a single assay. RT- PCR generally only analyses one gene at a time, but is inexpensive and can be completed within a single day. [0237] RT-PCR is the method of choice for gene expression analysis if the number of specific genes to be analyzed is less than 20. GeneChip® or other gene expression analysis technologies (such as Hlumina Bead Arrays) are the method of choice when many genes need to be analyzed simultaneously. [0238] The methodology for GeneChip® data generation and analysis and Real Time PCR is presented below in brief.
GeneChip® Data Generation cDNA & cRNA Generation [0239] The following method for cDNA and cRNA generation from total RNA has been adapted from the protocol provided and recommended by Affymetrix (www.affymetrix.com). [0240] The steps are: [0241] • A total of 3 μg of total RNA is used as a template to generate double stranded cDNA. [0242] • cRNA is generated and labeled using biotinylated Uracil (dUTP). [0243] • biotin-labeled cRNA is cleaned and the quantity determined using a specfrophotometer and MOPS gel analysis. [0244] • labeled cRNA is fragmented to ~ 300bp in size. [0245] • RNA quantity is determined on an Agilent "Lab-on-a-Chip" system (Agilent Technologies). Hybridization, Washing & Staining [0246] The steps are: [0247] • A hybridization cocktail is prepared containing 0.05 μg/μL of labeled and fragmented cRNA, spike-in positive hybridization controls, and the Affymetrix oligonucleotides B2, bioB, bioC, bioD and ere. [0248] • The final volume (80 μL) ofthe hybridization cocktail is added to the GeneChip® cartridge.
[0249] • The cartridge is placed in a hybridization oven at constant rotation for 16 hours. [0250] • The fluid is removed from the GeneChip® and stored. [0251] • The GeneChip® is placed in the fluidics station. [0252] • The experimental conditions for each GeneChip® are recorded as an .EXP file. [0253] • All washing and staining procedures are carried out by the Affymetrix fluidics station with an attendant providing the appropriate solutions. [0254] • The GeneChip® is washed, stained with steptavidin-phycoerythin dye and then washed again using low salt solutions. [0255] • After the wash protocols are completed, the dye on the probe array is 'excited' by laser and the image captured by a CCD camera using an Affymetrix Scanner (manufactured by Agilent).
Scanning & Data File Generation [0256] The scanner and MAS 5 software generates an image file from a single
GeneChip® called a .DAT file (see figure overleaf). [0257] The .DAT file is then pre-processed prior to any statistical analysis. [0258] Data pre-processing steps (prior to any statistical analysis) include: [0259] • . DAT File Quality Control (QC). [0260] • .CEL File Generation. [0261] • Scaling and Normalization.
.DAT File Quality Control [0262] The .DAT file is an image. The image is inspected manually for artifacts (e.g. high low intensity spots, scratches, high regional or overall background). (The B2 oligonucleotide hybridization performance is easily identified by an alternating pattern of intensities creating a border and array name.) The MAS 5 software used the B2 oligonucleotide border to align a grid over the image so that each square of oligonucleotides was centered and identified. [0263] The other spiked hybridization controls (bioB, bioC, bioD and ere) are used to evaluate sample hybridization efficiency by reading "present" gene detection calls with
increasing signal values, reflecting their relative concentrations. (If the .DAT file is of suitable quality it is converted to an intensity data file (.CEL file) by Affymetrix MAS 5 software).
. CEL File Generation [0264] The .CEL files generated by the MAS 5 software from .DAT files contain calculated raw intensities for the probe sets. Gene expression data is obtained by subfracting a calculated background from each cell value. To eliminate negative intensity values, a noise correction fraction based from a local noise value from the standard deviation ofthe lowest 2% ofthe background is applied. [0265] All .CEL files generated from the GeneChips® are subjected to specific quality mefrics parameters. [0266] Some mefrics are routinely recommended by Affymefrix and can be determined from Affymefrix internal confrols provided as part ofthe GeneChip®. Other mefrics are based on experience and the processing of many GeneChips®.
Analysis of GeneChip® Data [0267] Two illustrative approaches to normalising data may be used: [0268] • Affymefrix MAS 5 Algorithm. [0269] • Robust Multi-chip Analysis (RMA) algorithm of Irizarry (Irizarray et al. , 2002, Biostatistics (in print)). [0270] Those of skill in the art will recognise that many other approaches might be adopted, without materially affecting the invention.
Affymetrix MAS 5 Algorithm [0271] .CEL files are used by Affymefrix MAS 5 software to normalize or scale the data. Scaled data from one chip are compared to similarly scaled data from other chips. [0272] Affymefrix MAS 5 normalization is achieved by applying the default "Global Scaling" option ofthe MAS 5 algorithm to the .CEL files. This procedure subtracts a robust estimate ofthe center ofthe distribution of probe values, and divides by a robust estimate ofthe probe variability. This produces a set of chips with common location and scale at the probe level. [0273] Gene expression indices are generated by a robust averaging procedure on all the probe pairs for a given gene. The results are constrained to be non-negative. [0274] Given that scaling takes place at the level of the probe, rather than at the level of the gene, it is possible that even after normalization there may be chip-to-chip
differences in overall gene expression level. Following standard MAS5 normalization, values for each gene were de-trended with respect to median chip intensity. That is, values for each gene were regressed on the median chip intensity, and residuals were calculated. These residuals were taken as the de-trended estimates of expression for each gene [0275] Median chip intensity was calculated using the Affymetrix MAS5 algorithm, but with a scale factor fixed at one.
RMA Algorithm [0276] This algorithm quantifies the expression of a set of chips, rather than of a single chip. It estimates background intensities using a robust statistical model applied to perfect match probe data. It does not make use of mis-match probe data. Following implicit background correction, chips are processed using Quantile Quantile normalization (Rizarray et al, 2002, Biostatistics (in print)).
DNA Extraction [0277] A kit available from Qiagen Inc (Valencia, CA, USA) has the reagents and instructions for the isolation of total DNA from 8.5 mL blood collected in the PAXgene Blood DNA Tube. Isolation begins with the addition of additional lysis solution followed by a centrifugation step. The pellet is washed and resuspended and incubated in optimized buffers together with Proteinase K to bring about protein digestion. DNA is precipitated using alcohol and an additional centrifugation is carried out to pellet the nucleic acid. Remaining contaminants are removed in a wash step and the DNA is then resuspended in Buffer BG4. [0278] Determination of DNA quantity and quality is necessary prior to proceeding and can be achieved using a specfrophotometer or agarose gel electrophoresis.
Genotvping Analysis [0279] Many methods are available to genotype DNA. A review of allelic discrimination methods can be found in Kristensen et al. (Biotechniques 30(2): 318-322 (2001). An illustrative method for genotyping using allele-specific PCR is described here.
Primer Design [0280] Upstream and downstream PCR primers specific for particular alleles can be designed using freely available computer programs, such as Primer3 (http://frodo.wi.mit.edu/primer3/primer3 code.htm . Alternatively the DNA sequences ofthe various alleles can be aligned using a program such as ClustalW
(http ://www.ebi .ac .uk/clustalw and specific primers designed to areas where DNA sequence differences exist but retaining enough specificity to ensure amplification ofthe correct
amplicon. Preferably a PCR amplicon is designed to have a restriction enzyme site in one allele but not the other. Primers are generally 18-25 base pairs in length with similar melting temperatures.
PCR Amplification [0281] The composition of PCR reactions has been described elsewhere (Clinical
Applications of PCR, Dennis Lo (Editor), Blackwell Publishing, 1998). Briefly, a reaction contains primers, DNA, buffers and a thermostable polymerase enzyme. The reaction is cycled (up to 50 times) through temperature steps of denaturation, hybridization and DNA extension on a thermocycler such as the MJ Research Thermocycler model PTC-96V. DNA Analysis [0282] PCR products can be analyzed using a variety of methods including size differentiation using mass spectrometry, capillary gel electrophoresis and agarose gel electrophoresis. If the PCR amplicons have been designed to contain differential restriction enzyme sites, the DNA in the PCR reaction is purified using DNA-binding columns or precipitation and re-suspended in water, and then restricted using the appropriate restriction enzyme. The restricted DNA can then be run on an agarose gel where DNA is separated by size using electric current. Various alleles of a gene will have different sizes depending on whether they contain restriction sites. Thus, homozygotes and heterozygotes can be determined.
Real-Time PCR Data Generation [0283] Background information for conducting Real-time PCR may be obtained, for example, at http://dorakmt.tripod.com genetics/realtime.html and in a review by Bustin SA (2000, J Mol Endocrinol 25:169-193).
TaqMan ™ Primer and Probe Design Guidelines: [0284] 1. The Primer Express™ (ABI) software designs primers with a melting temperature (Tm) of 58-60° C, and probes with a Tm value of 10° C higher. The Tm of both primers should be equal. [0285] 2. Primers should be 15-30 bases in length. [0286] 3. The G+C content should ideally be 30-80%. If a higher G+C content is unavoidable, the use of high annealing and melting temperatures, cosolvents such as glycerol, DMSO, or 7-deaza-dGTP may be necessary. [0287] 4. The run of an identical nucleotide should be avoided. This is especially true for G, where runs of four or more Gs is not allowed.
[0288] 5. The total number of Gs and Cs in the last five nucleotides at the 3' end of the primer should not exceed two (the newer version ofthe software has an option to do this automatically). This helps to introduce relative instability to the 3' end of primers to reduce nonspecific priming. The primer conditions are the same for SYBR Green assays. [0289] 6. Maximum amplicon size should not exceed 400 bp (ideally 50-150 bases).
Smaller amplicons give more consistent results because PCR is more efficient and more tolerant of reaction conditions (the short length requirement has nothing to do with the efficiency of 5' nuclease activity). [0290] 7. The probes should not have runs of identical nucleotides (especially four or more consecutive Gs), G+C content should be 30-80%, there should be more Cs than Gs, and not a G at the 5' end. The higher number of Cs produces a higher ΔRn. The choice of probe should be made first. [0291] 8. To avoid false-positive results due to amplification of contaminating genomic DNA in the cDNA preparation, it is preferable to have primers spanning exon-exon junctions. This way, genomic DNA will not be amplified (the PDAR kit for human GAPDH amplification has such primers), [0292] 9. If a TaqMan™ probe is designed for allelic discrimination, the mismatching nucleotide (the polymorphic site) should be in the middle ofthe probe rather than at the ends, [0293] 10. Use primers that contain dA nucleotides near the 3' ends so that any primer-dimer generated is efficiently degraded by AmpErase™ UNG (mentioned in p.9 ofthe manual for EZ RT-PCR kit; P/N 402877). If primers cannot be selected with dA nucleotides , near the ends, the use of primers with 3' terminal dU-nucleotides should be considered. [0294] (See also the general principles of PCR Primer Design by InVitroGen.) General Method: [0295] 1. Reverse transcription of total RNA to cDNA should be done with random hexamers (not with oligo-dT). If oligo-dT has to be used long mRNA transcripts or amplicons greater than two kilobases upstream should be avoided, and 18S RNA cannot be used as normalizer, [0296] 2. Multiplex PCR will only work properly if the confrol primers are limiting
(ABI control reagents do not have their primers limited), [0297] 3. The range of target cDNA used is 10 ng to 1 μg. If DNA is used (mainly for allelic discrimination studies), the optimum amount is 100 ng to 1 μg,
[0298] 4. It is ideal to treat each RNA preparation with RNAse free DNAse to avoid genomic DNA contamination. Even the best RNA extraction methods yield some genomic DNA. Of course, it is ideal to have primers not amplifying genomic DNA at all but sometimes this may not be possible, [0299] 5. For optimal results, the reagents (before the preparation ofthe PCR mix) and the PCR mixture itself (before loading) should be vortexed and mixed well. Otherwise there may be shifting Rn value during the early (0 - 5) cycles of PCR. It is also important to add probe to the buffer component and allow it to equilibrate at room temperature prior to reagent mix formulation. TaqMan ™ Primers and Probes: [0300] The TaqMan™ probes ordered from ABI at midi-scale arrive already resuspended at 100 μM. If a 1/20 dilution is made, this gives a 5 μM solution. This stock solution should be aliquoted, frozen and kept in the dark. Using 1 μL of this in a 50 μL reaction gives the recommended 100 nM final concentration. [0301] The primers arrive lyophilized with the amount given on the tube in pmols
(such as 150.000 pmol which is equal to 150 nmol). If X nmol of primer is resuspended in X μL of H2O, the resulting solution is 1 mM. It is best to freeze this stock solution in aliquots. When the 1 mM stock solution is diluted 1/100, the resulting working solution will be 10 μM. To get the recommended 50 - 900 nM final primer concentration in 50 μL reaction volume, 0.25 - 4.50 μL should be used per reaction (2.5 μL for 500 nM final concentration). [0302] The PDAR primers and probes are supplied as a mix in one tube. They have to be used 2.5 μL in a 50 μL reaction volume.
Setting up One-step TaqMan ™ Reaction: [0303] One-step real-time PCR uses RNA (as opposed to cDNA) as a template. This is the preferred method if the RNA solution has a low concentration but only if singleplex reactions are run. The disadvantage is that RNA carryover prevention enzyme AmpErase cannot be used in one-step reaction format. In this method, both reverse franscriptase and real-time PCR take place in the same tube. The downstream PCR primer also acts as the primer for reverse franscriptase (random hexamers or oligo-dT cannot be used for reverse transcription in one-step RT-PCR). One-step reaction requires higher dNTP concentration (greater than or equal to 300 mM vs 200 mM) as it combines two reactions needing dNTPs in one. A typical reaction mix for one-step PCR by Gold RT-PCR kit is as follows:

* If a PDAR is used, 2.5 μL of primer + probe mix used. [0304] Ideally 10 pg - 100 ng RNA should be used in this reaction. Note that decreasing the amount of template from 100 ng to 50 ng will increase the C
τ value by 1. To decrease a C
τ value by 3, the initial amount of template should be increased 8-fold. ABI claims that 2 picograms of RNA can be detected by this system and the maximum amount of RNA that can be used is 1 microgram. For routine analysis, 10 pg - 100 ng RNA and 100 pg - 1 μg genomic DNA can be used.
Cycling Parameters for One-step PCR: [0305] Reverse transcription (by MuLV) 48° C for 30 min. [0306] AmpliTaq activation 95° C for 10 min. [0307] PCR: denaturation 95° C for 15 sec and annealing/extension 60° C for 1 min (repeated 40 times) (On ABI 7700, minimum holding time is 15 seconds.) [0308] The recently introduced EZ one-step™ RT-PCR kit allows the use of UNG as the incubation time for reverse transcription is 60° C thanks to the use of a thermostable reverse transcriptase. This temperature also a better option to avoid primer dimers and nonspecific bindings at 48° C.
Operating the ABI 7700: [0309] Make sure the following before starting a run: [0310] 1. Cycle parameters are correct for the nm.
[0311] 2. Choice of spectral compensation is correct {off for singleplex, on for multiplex reactions). [0312] 3. Choice of "Number of PCR Stages" is correct in the Analysis Options box (Analysis/Options). This may have to be manually assigned after a run if the data is absent in the amplification plot but visible in the plate view, and the X-axis of the amplification is displaying a range of 0-1 cycles. [0313] 4. No Template Control is labeled as such (for accurate ΔRn calculations). [0314] 5. The choice of dye component should be made correctly before data analysis. [0315] 6. You must save the run before it starts by giving it a name (not leaving as untitled). Also at the end ofthe run, first save the data before starting to analyze. [0316] 7. The ABI software requires extreme caution. Do not attempt to stop a run after clicking on the Run button. You will have problems and if you need to switch off and on the machine, you have to wait for at least an hour to restart the run. [0317] When analyzing the data, remember that the default setting for baseline is 3 -
15. If any Cτ value is < 15, the baseline should be changed accordingly (the baseline stop value should be 1-2 smaller than the smallest Cτ value). For a useful discussion of this matter, see the ABI Tutorial on Setting Baselines and Thresholds. (Interestingly, this issue is best discussed in the manual for TaqMan™ Human Endogenous Control Plate.) [0318] If the results do not make sense, check the raw spectra for a possible CDC camera saturation during the run. Saturation of CDC camera may be prevented by using optical caps rather than optical adhesive cover. It is also more likely to happen when SYBR Green I is used, when multiplexing and when a high concentration of probe is used.
Interpretation of Results: [0319] At the end of each reaction, the recorded fluorescence intensity is used for the following calculations: [0320] Rn+ is the Rn value of a reaction containing all components, Rn- is the Rn value of an unreacted sample (baseline value or the value detected in NTC). ΔRn is the difference between Rn+ and Rn-. It is an indicator ofthe magnitude ofthe signal generated by the PCR. [0321] There are three illustrative methods to quantitate the amount of template: [0322] 1. Absolute standard method: In this method, a known amount of standard such as in vitro translated RNA (cRNA) is used.
[0323] 2. Relative standard: Known amounts ofthe target nucleic acid are included in the assay design in each run, [0324] 3. Comparative Cτ method: This method uses no lαiown amount of standard but compares the relative amount ofthe target sequence to any ofthe reference values chosen and the result is given as relative to the reference value (such as the expression level of resting lymphocytes or a standard cell line).
The Comparative CT Method (ΔΔCT) for Relative Quantitation of Gene Expression: [0325] This method enables relative quantitation of template and increases sample throughput by eliminating the need for standard curves when looking at expression levels relative to an active reference control (normalizer). For this method to be successful, the dynamic range of both the target and reference should be similar. A sensitive method to control this is to look at how ΔCT (the difference between the two CT values of two PCRs for the same initial template amount) varies with template dilution. If the efficiencies ofthe two amplicons are approximately equal, the plot of log input amount versus ΔCT will have a nearly horizontal line (a slope of <0.10). This means that both PCRs perform equally efficiently across the range of initial template amounts. If the plot shows unequal efficiency, the standard curve method should be used for quantitation of gene expression. The dynamic range should be determined for both (1) minimum and maximum concentrations ofthe targets for which the results are accurate and (2) minimum and maximum ratios of two gene quantities for which the results are accurate. In conventional competitive RT-PCR, the dynamic range is limited to a target-to-competitor ratio of about 10: 1 to 1 : 10 (the best accuracy is obtained for 1 : 1 ratio). The real-time PCR is able to achieve a much wider dynamic range. [0326] Running the target and endogenous control amplifications in separate tubes and using the standard curve method requires the least amount of optimization and validation. The advantage of using the comparative Cτ method is that the need for a standard curve is eliminated (more wells are available for samples). It also eliminates the adverse effect of any dilution errors made in creating the standard curve samples. [0327] As long as the target and normalizer have similar dynamic ranges, the comparative Cτ method (ΔΔCT method) is the most practical method. It is expected that the normalizer will have a higher expression level than the target (thus, a smaller Cτ value). The calculations for the quantitation start with getting the difference (ΔCT) between the Cτ values of the target and the normalizer: [0328] ΔCT = Cτ (target) - Cτ (normalizer)
[0329] This value is calculated for each sample to be quantitated (unless, the target is expressed at a higher level than the normalizer, this should be a positive value. It is no harm if it is negative). One of these samples should be chosen as the reference (baseline) for each comparison to be made. The comparative ΔΔCT calculation involves finding the difference between each sample's ΔCT and the baseline's ΔCT. If the baseline value is representing the minimum level of expression, the ΔΔCT values are expected to be negative (because the ΔCT for the baseline sample will be the largest as it will have the greatest Cτ value). If the expression is increased in some samples and decreased in others, the ΔΔCT values will be a mixture of negative and positive ones. The last step in quantitation is to transform these values to absolute values. The formula for this is: [0330] comparative expression level = 2 - cτ [0331] For expressions increased compared to the baseline level this will be something like 23 = 8 times increase, and for decreased expression it will be something like 2-3 = 1/8 ofthe reference level. Microsoft Excel can be used to do these calculations by simply entering the Cτ values (there is an online ABI tutorial at http://www.appliedbiosystems.com/support/tutorials/7700amp/ on the use of spread sheet programs to produce amplification plots; the TaqMan™ Human Endogenous Confrol Plate protocol also contains detailed instructions on using MS Excel for real-time PCR data analysis). [0332] The other (absolute) quantification methods are outlined in the ABI User Bulletins
(http://docs.appliedbiosystems.com/search.taf?_UserReference=A8658327189850A13A0C598 E). The Bulletins #2 and #5 are most useful for the general understanding of real-time PCR and quantification. [0333] Recommendations on Procedures: [0334] 1. Use positive-displacement pipettes to avoid inaccuracies in pipetting, [0335] 2. The sensitivity of real-time PCR allows detection ofthe target in 2 pg of total RNA. The number of copies of total RNA used in the reaction should ideally be enough to give a signal by 25-30 cycles (preferably less than 100 ng). The amount used should be decreased or increased to achieve this. [0336] 3. The optimal concentrations ofthe reagents are as follows: [0337] i. Magnesium chloride concentration should be between 4 and 7 mM. It is optimized as 5.5 mM for the primers/probes designed using the Primer Express software.
[0338] ii. Concentrations of dNTPs should be balanced with the exception of dUTP (if used). Substitution of dUTP for dTTP for control of PCR product carryover requires twice dUTP that of other dNTPs. While the optimal range for dNTPs is 500 μM to 1 mM (for one-step RT-PCR), for a typical TaqMan reaction (PCR only), 200 μM of each dNTP (400 μM ofdUTP) is used. [0339] iii. Typically 0.25 μL (1.25 U) AmpliTaq DNA Polymerase (5.0 U/μL) is added into each 50 μL reaction. This is the minimum requirement. If necessary, optimization can be done by increasing this amount by 0.25 U increments. [0340] iv. The optimal probe concentration is 50-200 nM, and the primer concenfration is 100-900 nM. Ideally, each primer pair should be optimized at three different temperatures (58, 60 and 62° C for TaqMan primers) and at each combination of three concentrations (50, 300, 900 nM). This means setting up three different sets (for three temperatures) with nine reactions in each (50/50 mM, 50/300 mM, 50/900, 300/50, 300/300, 300/900, 900/50, 900/300, 900/900 mM) using a fixed amount of target template. If necessary, a second round of optimization may improve the results. Optimal performance is achieved by selecting the primer concentrations that provide the lowest Cτ and highest ΔRn. Similarly, the probe concentration should be optimized for 25-225 nM. [0341] 4. If AmpliTaq Gold DNA Polymerase is being used, there has to be a 9-12 min pre-PCR heat step at 92 - 95° C to activate it. If AmpliTaq Gold DNA Polymerase is used, there is no need to set up the reaction on ice. A typical TaqMan reaction consists of 2 min at 50° C for UNG (see below) incubation, 10 min at 95° C for Polymerase activation, and 40 cycles of 15 sec at 95° C (denaturation) and 1 min at 60° C (annealing and extension). A typical reverse transcription cycle (for cDNA synthesis), which should precede the TaqMan reaction if the starting material is total RNA, consists of 10 min at 25° C (primer incubation), 30 min at 48° C (reverse transcription with conventional reverse transcriptase) and 5 min at 95° C (reverse transcriptase inactivation). [0342] 5. AmpErase uracil-N-glycosylase (UNG) is added in the reaction to prevent the reamplification of carry-over PCR products by removing any uracil incorporated into amplicons. This is why dUTP is used rather than dTTP in PCR reaction. UNG does not function above 55 °C and does not cut single-stranded DNA with terminal dU nucleotides. UNG- containing master mix should not be used with one-step RT-PCR unless xTth DNA polymerase is being used for reverse transcription and PCR (TaqMan EZ RT-PCR kit).
[0343] 6. It is necessary to include at least three No Amplification Controls (NAC) as well as three No Template Controls (NTC) in each reaction plate (to achieve a 99.7% confidence level in the definition of +/- thresholds for the target amplification, six replicates of NTCs must be run). NAC former contains sample and no enzyme. It is necessary to rule out the presence of fluorescence contaminants in the sample or in the heat block ofthe thermal cycler (these would cause false positives). If the absolute fluorescence ofthe NAC is greater than that ofthe NTC after PCR, fluorescent contaminants may be present in the sample or in the heating block ofthe thermal cycler. [0344] 7. The dynamic range of a primer/probe system and its normalizer should be examined if the ΔΔCT method is going to be used for relative quantitation. This is done by running (in triplicate) reactions of five RNA concentrations (for example, 0, 80 pg/μL, 400 pg/μL, 2 ng/μL and 50 ng/μL). The resulting plot of log ofthe initial amount vs Cτ values (standard curve) should be a (near) straight line for both the target and normalizer real-time RT- PCRs for the same range of total RNA concentrations. [0345] 8. The passive reference is a dye (ROX) included in the reaction (present in the TaqMan universal PCR master mix). It does not participate in the 5' nuclease reaction. It provides an internal reference for background fluorescence emission. This is used to normalize the reporter-dye signal. This normalization is for non-PCR-related fluorescence fluctuations occurring well-to-well (concentration or volume differences) or over time and different from the normalization for the amount of cDNA or efficiency of the PCR. Normalization is achieved by dividing the emission intensity of reporter dye by the emission intensity ofthe passive reference. This gives the ratio defined as Rn. [0346] 9. If multiplexing is done, the more abundant ofthe targets will use up all the ingredients ofthe reaction before the other target gets a chance to amplify. To avoid this, the primer concentrations for the more abundant target should be limited. [0347] 10. TaqMan Universal PCR master mix should be stored at 2 to 8° C (not at -20° C). [0348] 11. The GAPDH probe supplied with the TaqMan Gold RT-PCR kit is labeled with a JOE reporter dye, the same probe provided within the Pre-Developed TaqMan™ Assay Reagents (PDAR) kit is labeled with VIC. Primers for these human GAPDH assays are designed not to amplify genomic DNA. [0349] 12. The carryover prevention enzyme, AmpErase UNG, cannot be used with one-step RT-PCR which requires incubation at 48° C but may be used with the EZ RT-PCR kit.
[0350] 13. One-step RT-PCR can only be used for singleplex reactions, and the only choice for reverse transcription is the downstream primer (not random hexamers or oligo-dT). [0351] 14. It is ideal to run duplicates to control pipetting errors but this inevitably increases the cost. [0352] 15. If multiplexing, the spectral compensation option (in Advanced Options) should be checked before the run. [0353] 16. Normalization for the fluorescent fluctuation by using a passive reference (ROX) in the reaction and for the amount of cDNA/PCR efficiency by using an endogenous control (such as GAPDH, active reference) are different processes. [0354] 17. ABI 7700 can be used not only for quantitative RT-PCR but also end- point PCR. The latter includes presence/absence assays or allelic discrimination assays (such as SNP typing). [0355] 18. Shifting Rn values during the early cycles (cycle 0-5) of PCR means initial disequilibrium ofthe reaction components and does not affect the final results as long as the lower value of baseline range is reset. [0356] 19. If an abnormal amplification plot has been noted (Cτ value <15 cycles with amplification signal detected in early cycles), the upper value ofthe baseline range should be lowered and the samples should be diluted to increase the Cτ value (a high Cτ value may also be due to contamination). [0357] 20. A small ΔRn value (or greater than expected Cτ value) indicates either poor PCR efficiency or low copy number ofthe target. [0358] 21. A standard deviation >0.16 for Cτ value indicates inaccurate pipetting. [0359] 22. SYBR Green entry in the Pure Dye Setup should be abbreviated as "SYBR" in capitals. Any other abbreviation or lower case letters will cause problems. [0360] 23. The SDS software for ABI 7700 have conflicts with the Macintosh
Operating System version 8.1. The data should not be analyzed on such computers. [0361] 24. The ABI 7700 should not be deactivated for extended periods of time. If it has ever been shutdown, it should be allowed to warm up for at least one hour before a run. Leaving the instrument on all times is recommended and is beneficial for the laser. If the machine has been switched on just before a run, an error box stating a firmware version conflict may appear. If this happens, choose the "Auto Download" option. [0362] 25. The ABI 7700 is only one ofthe real-time PCR systems available, others include systems from BioRad, Cepheid, Corbett Research, Roche and Stratagene.
EXAMPLE 2 IDENTIFICATION OF DIAGNOSTIC MARKER GENES [0363] Differences in gene expression between animals with and without clinical evidence of EPM were analyzed using the empirical Bayes approach of Lonnstedt and Speed (Lonnstedt and Speed, 2002, Statistica Sinica 12: 31-46). A general linear model was fitted to each gene, with a term for clinical status (with or without clinical evidence of EPM). Genes were ranked according to their posterior odds of differential expression between clinical status groups. Only those genes with statistically significant changes (assessed using the t statistic based on the empirical Bayes shrunken standard deviations) were recorded. Strong control of the type 1 Error rate was maintained, using Holm's adjustment to the p Values (Holm, S. 1979, Scandinavian Journal of Statistics 6: 65-70). Genes which showed statistically significant differences between clinically positive horses (i.e., with EPM symptoms) and clinically negative horses (i.e., without clinical symptoms) were tabulated for each day post infection.
EXAMPLE 3 DEMONSTRATION OF DIAGNOSTIC POTENTIAL [0364] In addition, the diagnostic potential ofthe entire set of genes was assessed using discriminant analysis (Venables and Ripley, 2002, Modern Applied Statistics in S, Springer) on the principal component scores (Jolliffe, IT. Principal components analysis, Springer-Verlag, 1986) calculated from gene expression. The entire process was cross- validated. Sensitivity and specificity were calculated for a uniform prior. This may be interpreted as a form of shrinkage regularization, where the estimates are shrunken to lie in a reduced space. [0365] Cross-validated discriminant function scores were used to estimate a receiver operator curve. The receiver operator curve was calculated by moving a critical threshold along the axis ofthe discriminant function scores. Both raw empirical ROCs were calculated, and smoothed ROCs using Lloyd's method (Lloyd, C.J. 1998, Journal ofthe American Statistical Association 93: 1356-1364). Curves were calculated for the comparison of clinically negative and clinically positive animals. Separate curves were calculated, using gene expression at each day post-infection. The area under the receiver operator curve was calculated by the trapezoidal rule, applied to both the empirical ROC and the smoothed ROC. [0366] The ROC curve provides a useful summary of the diagnostic potential of an assay. A perfect diagnostic assay has a ROC curve which is a horizontal line passing through the point with sensitivity and specificity both equal to one. The area under the ROC curve for such a perfect diagnostic is 1. A useless diagnostic assay has a ROC curve which is given by a 45 degree line through the origin. The area for such an uninformative diagnostic is 0.5.
[0367] Sensitivity, and selectivity and the areas under the ROC curve are shown in Table 5, for samples taken 2, 4, 7, 9, 11, 14, 17, 21, 24 and 28 days after infection. [0368] Although it appears that the EPM marker genes have little diagnostic ability at 2 and 4 days post post-infection, it is clearly apparent that they have some diagnostic ability from day 9 onwards. Clinical signs first appeared in the experimental animals in the study on Day 9 and persisted through to Day 28. Most importantly, there is a correlation between the area under the ROC curve and neurological clinical signs. One ofthe principal objectives ofthe present invention is to disclose the method for a practical diagnostic test for EPM that distinguishes between exposure to the causative organism and active disease or aberrant host response. Active disease, or an aberrant host response to parasitic infection, is required for clinical signs to be evident in EPM. These results demonstrate that specific and sensitive changes can be measured in active disease in animals suffering from pathogenic protozoal infection. [0369] Receiver Operator curves calculated in this way, based on shrinkage estimates over the entire set of genes on the chip are conservative - that is, they tend to underestimate the diagnostic potential. Better diagnostic performance should be obtained in operational diagnostics, based on a selected subset ofthe genes. [0370] As an illustration, Table 6 below shows the sensitivity, selectivity and area under the ROC curve for an analysis based on 31 genes, selected because of their large differential expression following infection.
[0371] The sensitivities, specificities and ROC areas are greater than those for the shrinkage based analysis. [0372] The ROC curves for the analysis based on 31 genes for days 2, 4, 7, 9, 11, 14, 17, 21, 24 and 28 are shown in Figures 1-10, respectively. The diagnostic capability is very high.
EXAMPLE 4
PRIORITY RANKING OF GENES [0373] Genes were ranked according to an Empirical Bayes approach (Lonnstedt and Speed, 2002, Statistica Sinica 12: 31—46), based on a comparison of clinically positive and clinically negative animals at days 20 and above. The empirical Bayes approach was used to provide a shrinkage estimator ofthe within groups variance for each gene.
[0374] Individual p values were based on a t Test using this shrinkage estimator. The p values ofthe t test were adjusted using Holms method to maintain strong control ofthe family wise type I error rate. [0375] The genes listed in Table 7 are ranked in order of their t statistic - which may be interpreted as a signal-to-noise ratio. The tabulation also displays the log 2 fold change (M value), and the adjusted p Values. Genes with a negative t value (and hence a negative M value) are down regulated. [0376] Genes with positive t and M values are up-regulated.
EXAMPLE 5 MINIMALLY PREDICTIVE GENE SETS [0377] Although about 30 genes have been identified as having diagnostic potential, a much fewer number are generally required for acceptable diagnostic performance. [0378] Table 8 shows the cross-validated classification success, sensitivity and specificity obtained from a linear discriminant analysis, based on two genes selected from the set of potential diagnostic genes. The pairs presented are those producing the highest prediction success, many other pairs of genes produce acceptable classification success. The identification of alternate pairs of genes would be readily apparent to those skilled in the art. Techniques for identifying pairs include (but are not limited to) forward variable selection (Venables W.N. and Ripley B.D. Modern Applied Statistics in S 4th Edition 2002. Springer), best subsets selection, backwards elimination (Venables W.N. and Ripley B.D., 2002, supra), stepwise selection
(Venables W.N. and Ripley B.D., 2002, supra) and stochastic variable elimination (Figueirodo M.A. Adeaptive Sparseness for Supervised Learning). [0379] Table 9 shows the cross-validated classification success obtained from a linear discriminant analysis based on three genes selected from the diagnostic set. Only twenty sets of three genes are presented. It will be readily apparent to those of skill in the art that other suitable diagnostic selections based on three EPM marker genes can be made. [0380] Table 10 shows the cross-validated classification success obtained from a linear discriminant analysis based on four genes selected from the diagnostic set. Only twenty sets of four genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on four EPM marker genes can be made. [0381] Table 11 shows the cross-validated classification success obtained from a linear discriminant analysis based on five genes selected from the diagnostic set. Only twenty sets of five genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on five EPM marker genes can be made.
[0382] Table 12 shows the cross-validated classification success obtained from a linear discriminant analysis based on six genes selected from the diagnostic set. Only twenty sets of six genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on six EPM marker genes can be made. [0383] Table 13 shows the cross-validated classification success obtained from a linear discriminant analysis based on seven genes selected from the diagnostic set. Only twenty sets of seven genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on seven EPM marker genes can be made. [0384] Table 14 shows the cross-validated classification success obtained from a linear discriminant analysis based on eight genes selected from the diagnostic set. Only twenty sets of eight genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on eight EPM marker genes can be made. [0385] Table 15 shows the cross-validated classification success obtained from a linear discriminant analysis based on nine genes selected from the diagnostic set. Only twenty sets of nine genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on nine EPM marker genes can be made. [0386] Table 16 shows the cross-validated classification success obtained from a linear discriminant analysis based on ten genes selected from the diagnostic set. Only twenty sets often genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on ten EPM marker genes can be made. [0387] Table 17 shows the cross-validated classification success obtained from a linear discriminant analysis based on 20 genes selected from the diagnostic set. Only 20 sets of twenty genes are presented. It will be readily apparent to practitioners in the art that other suitable diagnostic selections based on twenty EPM marker genes can be made. EXAMPLE 6
DEMONSTRATION OF SPECIFICITY [0388] The 30 EPM diagnostic genes were used as a training set against a gene expression database of over 850 GeneChips®. Gene expression results in the database were obtained from samples from horses with various diseases and conditions including; induced EPM in the acute stage of disease, induced EPM in the chronic stage of disease, clinical cases of EPM, herpes virus infection, degenerative osteoarthritis, stress, Rhodococcus infection, endotoxaemia, laminitis, gastric ulcer syndrome, animals in athletic training and clinically normal animals.
[0389] An EPM index score was calculated for each GeneChip®, using the 30 genes in the training set. The score was calculated from a regularized discriminant function, so that large values would be associated with high probability of EPM, and the variance ofthe score should be approximately 1. GeneChips® were ranked on this score, from the largest to the smallest. [0390] Specificity was investigated by varying a threshold value for a positive diagnosis. At each value ofthe threshold, specificity was defined as the proportion of positive results (i.e. GeneChip® index score greater than the threshold) which were true positives. A threshold value of two (i.e. two standard deviations) was adopted. [0391] Only nine animals from the database had conditions that were not related to acute, chronic or clinical EPM and were two standard deviations above zero on discriminant function. A further 120 animals with acute EPM were two standard deviations above zero on discriminant function. No animals with chronic or clinical EPM were two standard deviations above zero on discriminant function. Using this method and a gene signature of 30 genes, a specificity of 93% for acute EPM was obtained from a population sample size of over 850.
EXAMPLE 7 GENE ONTOLOGY [0392] Gene sequences were compared against the GenBank database using the BLAST algorithm (Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215:403-410), and gene homology and gene ontology searches were performed in order to group genes based on function, metabolic processes or tellular component. Table 18 lists and groups the genes based on these criteria. See also Table 1, which contains sequence information for each gene. [0393] The disclosure of every patent, patent application, and publication cited herein is hereby incorporated herein by reference in its entirety. [0394] The citation of any reference herein should not be construed as an admission that such reference is available as "Prior Art" to the instant application. [0395] Throughout the specification the aim has been to describe the preferred embodiments ofthe invention without limiting the invention to any one embodiment or specific collection of features. Those of skill in the art will therefore appreciate that, in light ofthe instant disclosure, various modifications and changes can be made in the particular embodiments exemplified without departing from the scope ofthe present invention. All such modifications and changes are intended to be included within the scope ofthe appended claims.
TABLE 1
Gene Name GenBank -DNA SEQUENCE /^DEDUCED AMINQ, lACID" .SEQUENCE •• SEQUENCE , Homology IDENTIFIER: 81 S V A D L L F V I T L P F W A V D A V A 241 TCAGTGGCCGACCTCCTCTTTGTCATCACGCTTCCCTTCTGGGCAGTTGATGCCGTGGCA 101 N W Y F G N F L C K A V H V I Y T V N L 301 AACTGGTACTTTGGGAACTTCCTATGCAAGGCAGTCCATGTCATCTACACAGTCAACCTC 121 Y S S V L I L A F I S L D R Y L A I V H 361 TACAGCAGTGTCCTCATCCTGGCCTTCATCAGTCTGGACCGCTACCTGGCCATCGTCCAC 141 A T N S Q R P R K L L A E K V V Y V G V 421 GCCACCAACAGTCAGAGGCCAAGGAAGCTGTTGGCTGAAAAGGTGGTCTATGXTGGCGTC 161 W I P A L L L T I P D F I F A N V S E A 481 TGGATCCCTGCCCTCCTGCTGACTATTCCCGACTTCATCTTTGCCAACGTCAGTGAGGCA 181 D D R Y I C D R F Y P N D L W V V V F Q 541 GATGACAGATATATCTGTGACCGCTTCTACCCCAATGACTTGTGGGTGGTTGTGTTCCAG 201 F Q H I M V G L I L P G I V I L S C Y C 601 TTTCAGCACATCATGGTTGGCCTTATCCTGCCTGGTATTGTCATCCTGTCCTGCTATTGC 221 I I I S K L S H S K G H Q K R K A L K T 661 ATTATCATCTCCAAGCTGTCACACTCCAAGGGCCACCAGAAGCGCAAGGCCCTCAAGACC 241 T V I L I L A F F A C W L P Y Y I G I S 721 ACAGTCATCCTCATCCTGGCTTTCTTCGCCTGTTGGCTGCCTTACTACATTGGGATCAGC 261 I D S F Γ L L E I I K Q G C E F E N T V 781 ATCGACTCCTTCATCCTCCTGGAAATCATCAAGCAAGGGTGTGAGTTTGAGAACACTGTG 281 H K W I S I T E A L A F F H C C L N P I 841 CACAAGTGGATTTCCATCACCGAGGCCCTAGCTTTCTTCCACTGTTGTCTGAACCCCATC 301 L Y A F L G A K F K T S A Q H A L T S V 901 CTCTATGCTTTCCTTGGAGCCAAATTTAAAACCTCTGCCCAGCATGCACTCACCTCTGTG 321 S R G S S L K I L S K G K R G G H S S V 961 AGCAGAGGCTCCAGCCTCAAGATCCTCTCCAAGGGCAAGCGGGGTGGACATTCTTCTGTT 341 S T E S E S S S F H S S - 1021 TCAACTGAGTCTGAGTCTTCAAGTTTTCACTCCAGCTAA 61567 (165) Leucine a inopeptidase 3 / 1 GTCTGGCCGTGAGACGTTTCGGGAGCCGGAGTCTCTCCACCGCAGACATGACGAAGGGCC SEQ ID NO: 3 Cytoplasm, leucyl 6i TTGTΓTTAGGAATCTAT CCAAAGAAAAAGAAGATGATGTGCCACAGTTCACAAGTGCAG aminopeptidase 121 GAGAGAATTTTGATAAATTGTTAGCTGGAAAGCTGAGAGAGACTTTGAACATATCTGGAC activity. 181 CACCTCTGAAGGCAGGGAAGACTCGAACCTTTTATGGTCTGCATCAGGACTTCCCCAGCG
Gene Name GenBank "DNA1SEQUENCE / DEDUCED AMINO ACID1- SEQUENCE SEQUENCE Homology , I it* *A IDENTIFIER: 241 TGGTGCTAGTTGGCCTCGGCAAAAAGGCAGCTGGAATCGACGAACAGGAAAACTGGCATG 301 AAGGCAAAGAAAACATCAGAGCTGCTGTTGCAGCGGGGTGCAGGCAGATTCAAGACCTGG 361 AGCTCTCGTCTGTGGAGGTGGATCCCTGTGGAGACGCTCAGGCTGCTGCGGAGGGAGCGG 421 TGCTTGGTCTCTATGAATACGATGACCTAAAGCAAAAAAAGAAGATGGCTGTGTCGGCAA 481 AGCTCTATGGAAGTGGGGATCAGGAGGCCTGGCAGAAAGGAGTCCTGTTTGCTTCTGGGC 541 AGAACTTGGCACGCCAATTGATGGAGACGCCAGCCAATGAGATGACGCCAACCAGATTTG 601 CCGAAATTATTGAGAAGAATCTCAAAAGTGCTAGTAGTAAAACCGAGGTCCATATCAGAC 661 CCAAGTCTTGGATTGAGGAACAGGCAATGGGATCATTCCTCAGTGTGGCCAAAGGATCTG 721 ACGAGCCCCCAGTCTTCTTGGAAATTCACTACAAAGGCAGCCCCAATGCAAACGAACCAC 781 CCCTGGTGTTTGTTGGGAAAGGAATTACCTTTGACAGTGGTGGTATCTCCATCAAGGCTT 841 CTGCAAATATGGACCTCATGAGGGCTGACATGGGAGGAGCTGCAACTATATGCTCAGCCA 901 TCGTGTCTGCTGCAAAGCTTAATTTGCCCATTAATATTATAGG CTGGCCCCTCTTTGTG 961 AAAATATGCCCAGCGGCAAGGCCAACAAGCCGGGGGATGTTGTTAGAGCCAAAAACGGGA 1021 AGACCATCCAGGTTGATAACACTGATGCTGAGGGGAGGCTCATACTGGCΓGATGCGCΓCT 1081 GTTACGCACACACGTΓTAACCCGAAGGTCATCCTCAATGCCGCCACCTTAACAGGTGCCA 1141 TGGATGTAGCΓΓΓGGGATCAGGTGCCACTGGGGTCΓTΓACCAATTCATCCTGGCTCTGGA 1201 ACAAACΓCΓTCGAGGCCAGCATTGAAACAGGGGACCGTGTCTGGAGGATGCCTCTCΓΓCG 1261 ACATTATACAAGACAGGTTGTAGATTGCCAGCTTGCTGATGTTAACAACATTGGAAAAT 1321 ACAGATCTGCAGGAGCATGTACAGCΓGCAGCATTCCTGAAAGAATTCGTAACTCATCCTA 1381 AGTGGGCACATTTAGACATAGCAGGCGTGA GACCAACAAAGATGAAGTTCCCTATCTAC 1441 GGAAAGGCATGACTGGGAGGCCCACAAGGACTCTCATTGAGTTCTTACTTCGTTTCAGTC 1501 AAGACAATGCTTAGTTCAGATACTCAAAAATGTCTTCACTCTGTCTTAAATTGGACAGTT 1561 GAACTTAAAAGGTTTTTGAATAAATGGATGAAAATCTTTTAACGGAGACAAAGGATGGTA 1621 TTTAAAAATGTAGAACACAATGAAATTTGTATGCCTTGATTTΑTRRTTCATTTCACACAA 1681 AGAΓΓΓATAAAGGTAAAGTΓAATATCΓΓACTTGATAAGGATTTTTAAGATACTCTATAAA 1741 TGATTAAAATTTTTAGAACTTCCTAATCACTTTTCAGAGTATATGTTTTTCATTGAGAAG 1801 CAAAATTGTAACTCAGATTTGTGATGCTAGGAACATGAGCAAACTGAAAATTACTATGCA 1861 CTTGΓCAGAAACAATAAATGCAACTΓGT GΓGCTCAAAAAAAAAAAAAAAAAAAAAAAAA 1921 AAAAAAAAAAAAAAAAAA
1 L A V R R F G S R S L S T A D M T K G L SEQ ID NO: 4 3 CTGGCCGTGAGACGTTTCGGGAGCCGGAGTCTCTCCACCGCAGACATGACGAAGGGCCTT 21 V L G I Y S K E K E D D V P Q F T S A G 63 GTTTTAGGAATCTATTCCAAAGAAAAAGAAGATGATGTGCCACAGTTCACAAGTGCAGGA 41 E K F D K L L A G K L R E T L N I S G P 123 GAGAATTTTGATAAATTGTTAGCTGGAAAGCTGAGAGAGACTTTGAACATATCTGGACCA 61 P L K A G K T R T F Y G L H Q D F P S V 183 CCTCTGAAGGCAGGGAAGACTCGAACCTTTTATGGTCTGCATCAGGACTTCCCCAGCGTG 81 V L V G k G K K A A G I D E Q E N H E 243 GTGCTAGTTGGCCTCGGCAAAAAGGCAGCTGGAATCGACGAACAGGAAAACTGGCATGAA 101 G K E N I R A A V A A G C R Q I Q D L E 303 GGCAAAGAAAACATCAGAGCTGCTGTTGCAGCGGGGTGCAGGCAGATTCAAGACCTGGAG
Gene Name GenBank DNA SEQUENCE / DEDUCED AMINO ACID SEQUENCE SEQUENCE Homology if κ ,|JDENTIF,IER: 61 M C Q Q S M Q K S S L E F H K A N E C Q 181 ATGTGTCAGCAGAGCATGCAGAAGTCCTCGCTGGAGTTTCATAAGGCCAATGAGTGCCAG 81 E R P V E C K F C K L D M Q L S K L E L 241 GAGCGCCCTGTTGAGTGTAAGTTCTGCAAACTGGACATGCAGCTCAGCAAGCTGGAGCTC 101 H E S Y C G S R T E L C Q G C G Q F I M 301 CACGAGTCCTACTGTGGCAGCCGGACAGAGCTCTGCCAAGGCTGTGGCCAGTTCATCATG 121 H R M L A Q H R D V C R S E Q A Q L G K 361 CACCGCATGCTCGCCCAGCACAGAGATGTCTGTCGGAGTGAACAGGCCCAGCTCGGGAAA 141 G E R I S A P E R E I Y C H Y C N Q M I 421 GGGGAAAGAATTTCAGCTCCTGAAAGGGAAATCTACTGTCATTATTGCAACCAAATGATT 161 P E N K Y F H H M G K C C P D S E F K K 481 CCAGAAAATAAGTATTTCCACCATATGGGTAAATGTTGTCCAGACTCAGAGTTTAAGAAA 181 H F P V G N P E I L P S S L P S Q A A E 541 CACTTTCCTGTTGGAAATCCAGAAATTCTTCCTTCATCTCTTCCAAGTCAAGCTGCTGAA 201 N Q T S T M E K D V R P K T R S I N R F 601 AATCAAACTTCCACGATGGAGAAAGATGTTCGTCCAAAGACAAGAAGTATAAACAGATTT 221 P L H S E S S S K K A P R S K N K T L D 661 CCTCTTCATTCTGAAAGTTCATCAAAGAAAGCACCAAGAAGCAAAAACAAAACCTTGGAT 241 P L L M S E P K P R T S S P R G D K A A 721 CCACTTTTGATGTCAGAGCCCAAGCCCAGGACCAGCTCCCCTAGAGGAGATAAAGCAGCC 261 Y D I L R R C S Q C G I L L P L P I L N 781 TATGACATTCTGAGGAGATGTTCTCAGTGTGGCATCCTGCTTCCCCTGCCGATCCTAAAT 281 Q H Q E K C R W L A S S K R K T S E K F 841 CAACATCAGGAGAAATGCCGGTGGTTAGCTTCATCAAAAAGGAAAACAAGTGAGAAATTT 301 Q L D L E K E R Y Y K F K R F H F 901 CAGCTAGATTTGGAAAAGGAAAGGTACTACAAATTCAAAAGATTTCACTTT
BM735363 (592) Galectin 3 binding protein / 1 CGCTTCGTGGCCCACGTCGCTGATTTCAAGGGCTCGAAGGCCGTGATCCCCAGCGCCCTG SEQ ID NO: 7 Membrane, 61 GGCACCAACAGTTCCAGGAGCGCCTCTCTCTTTCCCTGCCAGGCAGGGTCCTTCAGTGGC extracellular 121 TTCCAGGTGGTCATCCGCCCCTTCTACCTGACCAACCCCTCGGCGGAGGACTAGACGGGA space and 181 GGCTGGGTGAGCCGAGGGGGCGAGGGACAGGAGCACAGAGAAGCGAGGCGCCTCCCAGGA membrane, signal 241 TGCCCCCCGCCCCCAGCTGAGCCTCTCGCATCCTTCCTTCCTCTGCATGCACCTCCAGCA transduction, 301 GCTGCCACCAGATGTCCCCCCTGCTTCCACTGAGTGCTCTGAGCTTGGAGAAATTACTGG cellular defence 361 AAGGTTTCACCTAGTGCTCACCAGGGTGGTGAGAATTCCTGTTCTCCCACTGCCTGGCTG
Gene Name GenBank DNA SEQUENCE / DEDUCED AMINO ACID SEQUENCE SEQUENCE Homology "1lι*i* ,,, IDENTIFIER: response, cell 421 GTTTGTCGGGAAGCAACAGGCAGAGCATGACACTGTGAGATTGTCCACACTGCCTGTCTC adhesion, 481 TGTCTCAGTCCTCTGTTGTCCTTTGTCTGAGATCATTAAAATTGCATCGTGGTTCCC scavenger receptor activity.35546 (677) Tryptophanyl- RNA synthetase / 1 AAGAAACCCTTAGCTGAATGCAGGGTGGGGAGAACGAAAGACAAAAGCATCTTTTTTCAG SEQ ID NO : 8 Cytoplasm, 61 AAGGGAAACTGAAAGAAAGAGGGGAAGAGTATTAAAGACCATTACTGGCTGGGCAGGGCA Typtophanyl t-RNA 121 CTCTCAGCAGCTCAACTGCCCAGCGTGACCAGTGGCCACCTCTGCAGTGTCTTCCACAAC aminoacylation, 181 CTGGTCTTGACTCGTCTGCTGAACAAATCCTCTGACCTCAGGCCGGCTGTGAACGTAGTT tryptophan tRNA 241 CCTGAGAGATAGCAAACATGCCCAACAGTGAGCCCGCATCTCTGCTGGAGCTGTTCAACA ligase activity. 301 GCATCGCCACACAAGGGGAGCTCGTAAGGTCCCTCAAAGCGGGAAATGCGTCAAAGGATG 361 AAATTGATTCTGCAGTAAAGATGTTGGTGTCATTAAAAATGAGCTACAAAGCTGCCGCGG 421 GGGAGGATTACAAGGCTGACTGTCCTCCAGGGAACCCAGCACCTACCAGTAATCATGGCC 481 CAGATGCCACAGAAGCTGAAGAGGATTTTGTGGACCCATGGACAGTACAGACAAGCAGTG 541 CAAAAGGCATAGACTACGATAAGCTCATTGTTCGGTTTGGAAGTAGTAAAATTGACAAAG 601 AGCTAATAAACCGAATAGAGAGAGCCACCGGCCAAAGACCACACCACTTCCTGCGCAGAG 661 GCATCTTCTTCTCACACAGAGATATGAATCAGGTTCTTGATGCCTATGAAAATAAGAAGC 721 CATTTTATCTGTACACGGGCCGGGGCCCCTCTTCTGAAGCAATGCATGTAGGTCACCTCA 781 TTCCATTTATTTTCACAAAGTGGCTCCAGGATGTATTTAACGTGCCCTTGGTCATCCAGA 841 TGACGGATGACGAGAAGTATCTGTGGAAGGACCTGACCCTGGACCAGGCCTATAGCTATG 901 CTGTGGAGAATGCCAAGGACATCATCGCCTGTGGCTTTGACATCAACAAGACTTTCATAT 961 TCTCTGACCTGGACTACATGGGGATGAGCTCAGGTTTCTACAAAAATGTGGTGAAGATTC 1021 AAAAGCATGTTACCTTCAACCAAGTGAAAGGCATTTTCGGCTTCACTGACAGCGACTGCA 1081 TTGGGAAGATCAGTTTTCCTGCCATCCAGGCTGCTCCCTCCTTCAGCAACTCATTCCCAC 1141 AGATCTTCCGAGACAGGACGGATATCCAGTGCCTTATCCCATGTGCCATTGACCAGGATC 1201 CTTACTTTAGAATGACAAGGGACGTCGCCCCCAGGATCGGCTATCCTAAACCAGCCCTGC 1261 TGCACTCCACCTTCTTCCCAGCCCTGCAGGGCGCCCAGACCAAAATGAGTGCCAGCGACC 1321 CCAACTCCTCCATCTTCCTCACCGACACGGCCAAGCAGATCAAAACCAAGGTCAATAAGC 1381 ATGCGTTTTCTGGAGGGAGAGACACCATCGAGGAGCACAGGCAGTTTGGGGGCAACTGTG 1441 ATGTGGACGTGTCTTTCATGTACCTGACCTTCTTCCTCGAGGACGACGACAAGCTCGAGC 1501 AGATCAGGAAGGATTACACCAGCGGAGCCATGCTCACCGGTGAGCTCAAGAAGGCACTCA 1561 TAGAGGTTCTGCAGCCCTTGATCGCAGAGCACCAGGCCCGGCGCAAGGAGGTCACGGATG 1621 AGATAGTGAAAGAGTTCATGACTCCCCGGAAGCTGTCCTTCGACTTTCAGTAGCACTCGT 1681 TTTACATATGCTTATAAAAGAAGTGATGTATCAGTAATGTATCAATAATCCCAGCCCAGT 1741 CAAAGCACCGCCACCTGTAGGCTTCTGTCTCATGGTAATTACTGGGCCTGGCCTCTGTAA 1801 GCCTGTGTATGTTATCAATACTGTTTCTTCCTGTGAGTTCCATTATTTCTATCTCTTATG 1861 GGCAAAGCATTGTGGGTAATTGGTGCTGGCTAACATTGCATGGTCGGATAGAGAAGTCCA 1921 GCTGTGAGTCTCTCCCCAAAGCAGCCCCACAGTGGAGCCTTTGGCTGGAAGTCCATGGGC 1981 CACCCTGTTCTTGTCCATGGAGGACTCCGAGGGTTCCAAGTATACTCTTAAGACCCACTC 2041 TGTTTAAAAATATATATTCTATGTATGCGTATATGGAATTGAAATGTCATTATTGTAACC 2101 TAGAAAGTGCTTTGAAATATTGATGTGGGGAGGTTTATTGAGCACAAGATGTATTTCAGC 2161 CCATGCCCCCTCCCAAAAAGAAATTGATAAGTAAAAGCTTCGTTATACATTTGACTAAGA 2221 AATCACCCAGCTTTAAAGCTGCTTTTAACAATGAAGATTGAACAGAGTTCAGCAATTTTG 2281 ATTAAATTAAGACTTGGGGGTGAAACTTTCCAGTTTACTGAACTCCAGACCATGCATGTA 2341 GTCCACTCCAGAAATCATGCTCGCTTCCCTTGGCACACCAGTGTTCTCCTGCCAAATGAC 2401 CCTAGACCCTCTGTCCTGCAGAGTCAGGGTGGCTTTTCCCCTGACTGTGTCCGATGCCAA 2461 GGAGTCCTGGCCTCCGCAGATGCTTCATTTTGACCCTTGGCTGCAGTGGAAGTCAGCACA
Gene Name GenBank DNA- SEQUENCE- / DEDUCEES*ΑMINO ACID SEQUENCE SEQUENCE Homology *Sfe>. IDENTIFIER: 781 AACCAAGTGAAAGGCATTTTCGGCTTCACTGACAGCGACTGCATTGGGAAGATCAGTTTT 281 P A I Q A A P S F S N S F P Q I F R D R 841 CCXGCCATCCAGGCTGCTCCCTCCTTCAGCAACTCATTCCCACAGATCTTCCGAGACAGG 301 T D I Q C L I P C A I D Q D P Y F R M T 901 ACGGATATCCAGTGCCTTATCCCATGTGCCATTGACCAGGATCCTTACTTTAGAATGACA 321 R D V A P R I G Y P K P A L L H S T F F 961 AGGGACGTCGCCCCCAGGATCGGCTATCCTAAACCAGCCCTGCTGCACTCCACCTTCTTC 341 P A L Q G A Q T K M S A S D P N S S I F 1021 CCAGCCCTGCAGGGCGCCCAGACCAAAATGAGTGCCAGCGACCCCAACTCCTCCATCTTC 361 L T D T A K Q I K T K V N K H A F S G G 1081 CTCACCGACACGGCCAAGCAGATCAAAACCAAGGTCAATAAGCATGCGTTTTCTGGAGGG 381 R D T I E E H R Q F G G N C D V D V S F 1141 AGAGACACCATCGAGGAGCACAGGCAGTTTGGGGGCAACTGTGATGTGGACGTGTCTTTC 401 M Y L T F F L E D D D K L E Q I R K D Y 1201 ATGTACCTGACCTTCTTCCTCGAGGACGACGACAAGCTCGAGCAGATCAGGAAGGATTAC
U 421 T S G A M L T G E L K K A L I E V L Q P 1261 ACCAGCGGAGCCATGCTCACCGGTGAGCTCAAGAAGGCACTCATAGAGGTTCTGCAGCCC 441 L I A E H Q A R R K E V T D E I V K E F 1321 TTGATCGCAGAGCACCAGGCCCGGCGCAAGGAGGTCACGGATGAGATAGTGAAAGAGTTC 461 M T P R K L S F D F Q - 1381 ATGACTCCCCGGAAGCTGTCCTTCGACTTTCAGTAG BC008C11 Notch2 Homolog / Plasma membrane, 1 AGGCTGCTTCGTTGCACACCCGAGAAAGTTTCAGCCAAACTTCGGGCGGCGGCTGAGGCG SEQ ID NO: 10 nucleus , notch 61 GCGGCCGAGGAGCGGCGGACTCGGGGCGCGGGGAGTCGAGGCATTTGCGCCTGGGCTTCG signalling 121 GAGCGTAGCGCCAGGGCCTGAGCCTTTGAAGCAGGAGGAGGGGAGGAGAGAGTGGGGCTC pathway, ligand 181 CTCTATCGGGACCCCCTCCCCATGTGGATCTGCCCAGGCGGCGGCGGCGGCGGCGGAGGA regulated 241 GGAGGCGACCGAGAAGATGCCCGCCCTGCGCCCCGCTCTGCTGTGGGCGCTGCTGGCGCT transcription 301 CTGGCTGTGCTGCGCGGCCCCCGCGCATGCATTGCAGTGTCGAGATGGCTATGAACCCTG factor activity. 361 TGTAAATGAAGGAATGTGTGTTACCTACCACAATGGCACAGGATACTGCAAATGTCCAGA 421 AGGCTTCTTGGGGGAATATTGTCAACATCGAGACCCCTGTGAGAAGAACCGCTGCCAGAA 481 TGGTGGGACTTGTGTGGCCCAGGCCATGCTGGGGAAAGCCACGTGCCGATGTGCCTCAGG 541 GTTTACAGGAGAGGACTGCCAGTACTCAACATCTCATCCATGCTTTGTGTCTCGACCCTG 601 CCTGAATGGCGGCACATGCCATATGCTCAGCCGGGATACCTATGAGTGCACCTGTCAAGT 661 CGGGTTTACAGGTAAGGAGTGCCAATGGACGGATGCCTGCCTGTCTCATCCCTGTGCAAA 721 TGGAAGTACCTGTACCACTGTGGCCAACCAGTTCTCCTGCAAATGCCTCACAGGCTTCAC 781 AGGGCAGAAATGTGAGACTGATGTCAATGAGTGTGACATTCCAGGACACTGCCAGCATGG
Gene Name GenBank DNA SEQUENCE / DEDUCED AMINOf ACID SEQUENCE ff SEQUENCE* Homology **" ' * * ,, -** -. A ■» V"* _ IDENTIFfER: 841 TGGCACCTGCCTCAACCTGCCTGGTTCCTACCAGTGCCAGTGCCCTCAGGGCTTCACAGG 901 CCAGTACTGTGACAGCCTGTATGTGCCCTGTGCACCCTCACCTTGTGTCAATGGAGGCAC 961 CTGTCGGCAGACTGGTGACTTCACTTTTGAGTGCAACTGCCTTCCAGGTTTTGAAGGGAG 1021 CACCTGTGAGAGGAATATTGATGACTGCCCTAACCACAGGTGTCAGAATGGAGGGGTTTG 1081 TGTGGATGGGGTCAACACTTACAACTGCCGCTGTCCCCCACAATGGACAGGACAGTTCTG 1141 CACAGAGGATGTGGATGAATGCCTGCTGCAGCCCAATGCCTGTCAAAATGGGGGCACCTG 1201 TGCCAACCGCAATGGAGGCTATGGCTGTGTATGTGTCAACGGCTGGAGTGGAGATGACTG 1261 CAGTGAGAACATTGATGATTGTGCCTTCGCCTCCTGTACTCCAGGCTCCACCTGCATCGA 1321 CCGTGTGGCCTCCTTCTCTTGCATGTGCCCAGAGGGGAAGGCAGGTCTCCTGTGTCATCT 1381 GGATGATGCATGCATCAGCAATCCTTGCCACAAGGGGGCACTGTGTGACACCAACCCCCT 1441 AAATGGGCAATATATTTGCACCTGCCCACAAGGCTACAAAGGGGCTGACTGCACAGAAGA 1501 TGTGGATGAATGTGCCATGGCCAATAGCAATCCTTGTGAGCATGCAGGAAAATGTGTGAA 1561 CACGGATGGCGCCTTCCACTGTGAGTGTCTGAAGGGTTATGCAGGACCTCGTTGTGAGAT 1621 GGACATCAATGAGTGCCATTCAGACCCCTGCCAGAATGATGCTACCTGTCTGGATAAGAT 1681 TGGAGGCTTCACATGTCTGTGCATGCCAGGTTTCAAAGGTGTGCATTGTGAATTAGAAAT 1741 AAATGAATGTCAGAGCAACCCTTGTGTGAACAATGGGCAGTGTGTGGATAAAGTCAATCG 1801 TTTCCAGTGCCTGTGTCCTCCTGGTTTCACTGGGCCAGTTTGCCAGATTGATATTGATGA 1861 CTGTTCCAGTACTCCGTGTCTGAATGGGGCAAAGTGTATCGATCACCCGAATGGCTATGA 1921 ATGCCAGTGTGCCACAGGTTTCACTGGTGTGTTGTGTGAGGAGAACATTGACAACTGTGA 1981 CCCCGATCCTTGCCACCATGGTCAGTGTCAGGATGGTATTGATTCCTACACCTGCATCTG 2041 CAATCCCGGGTACATGGGCGCCATCTGCAGTGACCAGATTGATGAATGTTACAGCAGCCC 2101 TTGCCTGAACGATGGTCGCTGCATTGACCTGGTCAATGGCTACCAGTGCAACTGCCAGCC 2161 AGGCACGTCAGGGGTTAATTGTGAAATTAATTTTGATGACTGTGCAAGTAACCCTTGTAT 2221 CCATGGAATCTGTATGGATGGCATTAATCGCTACAGTTGTGTCTGCTCACCAGGATTCAC 2281 AGGGCAGAGATGTAACATTGACATTGATGAGTGTGCCTCCAATCCCTGTCGCAAGGGTGC 2341 AACATGTATCAACGGTGTGAATGGTTTCCGCTGTATATGCCCCGAGGGACCCCATCACCC 2401 CAGCTGCTACTCACAGGTGAACGAATGCCTGAGCAATCCCTGCATCCATGGAAACTGTAC 2461 TGGAGGTCTCAGTGGATATAAGTGTCTCTGTGATGCAGGCTGGGTTGGCATCAACTGTGA 2521 AGTGGACAAAAATGAATGCCTTTCGAATCCATGCCAGAATGGAGGAACTTGTGACAATCT 2581 GGTGAATGGATACAGGTGTACTTGCAAGAAGGGCTTTAAAGGCTATAACTGCCAGGTGAA 2641 TATTGATGAATGTGCCTCAAATCCATGCCTGAACCAAGGAACCTGCTTTGATGACATAAG 2701 TGGCTACACTTGCCACTGTGTGCTGCCATACACAGGCAAGAATTGTCAGACAGTATTGGC 2761 TCCCTGTTCCCCAAACCCTTGTGAGAATGCTGCTGTTTGCAAAGAGTCACCAAATTTTGA 2821 GAGTTATACTTGCTTGTGTGCTCCTGGCTGGCAAGGTCAGCGGTGTACCATTGACATTGA 2881 CGAGTGTATCTCCAAGCCCTGCATGAACCATGGTCTCTGCCATAACACCCAGGGCAGCTA 2941 CATGTGTGAATGTCCACCAGGCTTCAGTGGTATGGACTGTGAGGAGGACATTGATGACTG 3001 CCTTGCCAATCCTTGCCAGAATGGAGGTTCCTGTATGGATGGAGTGAATACTTTCTCCTG 3061 CCTCTGCCTTCCGGGTTTCACTGGGGATAAGTGCCAGACAGACATGAATGAGTGTCTGAG 3121 TGAACCCTGTAAGAATGGAGGGACCTGCTCTGACTACGTCAACAGTTACACTTGCAAGTG 3181 CCAGGCAGGATTTGATGGAGTCCATTGTGAGAACAACATCAATGAGTGCACTGAGAGCTC 3241 CTGTTTCAATGGTGGCACATGTGTTGATGGGATTAACTCCTTCTCTTGCTTGTGCCCTGT 3301 GGGTTTCACTGGATCCTTCTGCCTCCATGAGATCAATGAATGCAGCTCTCATCCATGCCT 3361 GAATGAGGGAACGTGTGTTGATGGCCTGGGTACCTACCGCTGCAGCTGCCCCCTGGGCTA 3421 CACTGGGAAAAACTGTCAGACCCTGGTGAATCTCTGCAGTCGGTCTCCATGTAAAAACAA 3481 AGGTACTTGCGTTCAGAAAAAAGCAGAGTCCCAGTGCCTATGTCCATCTGGATGGGCTGG 3541 TGCCTATTGTGACGTGCCCAATGTCTCTTGTGACATAGCAGCCTCCAGGAGAGGTGTGCT 3601 TGTTGAACACTTGTGCCAGCACTCAGGTGTCTGCATCAATGCTGGCAACACGCATTACTG 3661 TCAGTGCCCCCTGGGCTATACTGGGAGCTACTGTGAGGAGCAACTCGATGAGTGTGCGTC
o
o
Is)
Gene Name GenBank DN.SFSEQUENCE / DEDUCED AMINQilACID SEQUENCE SEQUENCE Homology »»** ϊ-Stt BSΓ- IDENTIFIERS 481 AATGTCTGTAAGAAATTGTCCAGTCCTTCTAACTACAAGTTGGAGTGTCCTGAGACTGAC 541 TGTGAGAAAGGCTGGGCACTCTTGAAATTTGGAGGAAAGTATTATCAAAAGGCTAAAGCG 601 GCTTTTGAGAAGGCTCTGGAAGTGGAGCCTGACAATCCAGAATTTAACATCGGCTATGCT 661 ATCACAGTGTATCGGCTGGATGATTCTGATAGAGAAGGGTCTGTAAAGAGCTTTTCTCTG 721 GGGCCTTTGAGAAAGGCTGTTACCCTGAACCCAGATAACAGCTATATTAAGGTTTTTCTG 781 GCACTGAAGCTTCAAGATGTACATGCAGAAGCTGAAGGGGAAAAGTATATTGAAGAAATC 841 CTGGACCAAATATCATCCCAGCCTTACGTCCTTCGTTATGCAGCCAAGTTCTATAGGAGA 901 AAAAATTCCTGGAACAAAGCTCTCGAACTTTTAAAAAAGGCCTTGGAGGTGACACCAACT 961 TCTTCTTTCCTGCATCACCAGATGGGACTTTGCTACAGGGCACAAATGATCCAAATCAAG 1021 AAGGCCACACACAACAGACCTAAAGGAAAGGATAAACTAAAGGTTGATGAGCTGATTTCA 1081 TCTGCTATATTTCATTTCAAAGCAGCCATGGAACGAGACTCTATGTTTGCATTTGCCTAC 1141 ACAGACCTGGCCAACATGTATGCTGAAGGAGGCCAGTATAGCAATGCTGAGGACATTTTC 1201 CGGAAAGCTCTTCGTCTGGAGAACATAACCGATGATCACAAACATCAGATCCATTACCAC 1261 TATGGCCGCTTTCAGGAATTTCACCGTAAATCAGAAAATACTGCCATCCATCATTATTTA 1321 GAAGCCTTAAAGGTCAAAGACAGATCACCCCTTCGCACCAAACTGACAAGTGCTCTGAAG 1381 AAATTGTCTACCAAGAGACTTTGTCACAATGCTTTAGATGTGCAGAGTTTAAGTGCCCTA 1441 GGGTTTGTTTACAAGCTGGAAGGAGAAAAGAGGCAAGCTGCTGAGTACTATGAGAAGGCA 1501 CAAAAGATAGATCCAGAAAATGCAGAATTCCTGACTGCTCTCTGTGAGCTCCGACTTTCC 1561 ATTTAAATACATACTCTAGGAAATTAGCTCTAAGTTTTTCCCTTCATTTTGGGTTCTCCT 1621 GTTTGTTTTTTTTTTATTATTTTAATCCCTTGTTTATTATAGAGCTAATATTTATTGAAT 1681 AGTTATTGTGTACCAAGCATTGTGCTAAATACTTTATATGCATTATGATGAATCTTGTGC 1741 GGTTTTCTTTCTTTTTTTCTTTTTAATTAAAATACTATAATCCATTGAGAAATAGCAATA 1801 TTCTAGCTATTGTAACTTCTAAAAATGGTATGGCCATTAGATCTGTGCTTTTTATCTCTG 1861 CTCTTTGAATTTCTCATATTATATAGTAAATATATTCCTACGTAAACCTTTGATACCTAG 1921 ATCAGGAATACTCTTCCAGGAGTACAAAATTACATTATTGATAGTTAAGCTCTTAATTGT 1981 GTAGCTTGCAAAAGACAGCACTTTTTAGTTACAGATGTTTTGACTTTGATGAGGATATTT 2041 AGCTATCAATCTAATAGTCACCTAAAATATCTTTTTTGTTGGAAAAAAGTTTATAATAAA 2101 AAAGTTTGTCATCTCTAGTGACTTCAATAAAGAAAAAACTAGAAGAGGAGAAAAAGGATT 2161 TCCTCAAATTTTAAATATGTAACTTCAGGGATTCAATCCCCAAATGTTTATTAAGTAGCT 2221 AGAAATAATTATGTGGAAAAAAATGAATAATGGAAAATAGTGAGTCTCAAATTGTTTTTT 2281 TTTAACTAAAATCTGCAATGAATCTAGATGCAATTAATTTTATTCCTTCCAACTAAAATT 2341 ACAATATTTTTAGGTTAAAATTATTGAGATATAAAGCAGCCATTGGGAAATTGGGAGAAA 2401 TGATAAACAAATGGAAAAAGAAGATGTCCCTAACCTACACCCATAGATTACCAAGGTTTC 2461 AGTGTACTAGTTTTGAATCTGTTCTGAATGGAGTTTTTATACCCTCAATTTCTGGCCTTT 2521 GGCTATTTTAGCATTTCAAAGTGACTTCTATGAAGCTTTTTTTTTAATGTGAAATTTTCA 2641 TATCCACCTTGAGGGGTCGCTGCTTGAGGGCTCTTATCCCAGGGGACTTTTTAATTCGGA 2701 TGTTACTTAATGTGGCTTCTCTAATGTAGTTTCTTTGATTACCGACTACACAATTATGTA 2761 CCATCACAGTATTAGTGGAAAAGTACCATGTGATTTAATTCTCCATTCCTCCAATGTAAC 2821 TCTTAAAATTATTATGTATGTGTATGTGTTTTACTTTTTGTTTTTTATCATCTTTAAAAT 2881 TTCTATTATGGTTTGATTATTATAAAAATAATGAATTCTCACTGTAAATTTCAAAAAAAA 2941 AATTACAAAAGTATGTGAATTTAAAAATGAGAGCAGTCCCCTCACCCTACCACAGTTCCA 3001 CACCCTCAAGGTAAACTTATAACTTATAATTTGATATGTAAACTTCCAGATCTTTTTTCT 3061 ATGCGTAATCAGACATACATATATACTGCAGTGTATCTCACGTATTAATTTTTAAAAATC 3121 TTTTGTTTTACTTAATTCTGTTTTTATTATTATTATTATTTTGTTTGATCTATTAAGGAA 3181 GAACAAGGAAGGGAATGATCTTTACTCAAGAATTTCAGAAAGTCAGCACTGAAGTCCTGA 3241 CCTATCAGTAGACACATTTGTCCCTTTCAGATATTTTAGGATATTCTAGCAAAGCAGGCC 3301 ATTTCTCCCACCTGAAAGTACATAACTTCTATCACTTGCCACATAATTAAAAGAACTCAC
Gene Name GenBank DNA, SEQUENCE* / DEDUCED AMINO ACID SEQUENCE SEQUENCE Homology If *^ ι* Ah. ffe. "IDENTIFIER: 1441 TAGCAGTTTCCAGTTTTAGTGATTCTCCGTTTTTTTCCTTGTCGTGTAAATATTTATGGG 1501 ATGATCATTTTGTGTACATACCGGTTATTGCCTTTTTATTTGAATTCTTTTAGTGTTTAG 1561 TTCCATGAGACACGTCAGTTTAAATTGATGGAATAAATGTTCTATGACAAATTACATCTT 1621 CCTTGTCAGATGTCAAATGTGTGGAATTTAAACATGAAACTTTTTCAAAAAAAAAAAAAA 1681 AAAAAAAAAAA
1 M E G N R D E A E K C V E I A R E A L N SEQ ID NO: 19 1 ATGGAGGGGAACAGGGATGAGGCGGAGAAGTGTGTGGAGATCGCCCGGGAGGCCCTGAAC 21 A G D R E R A Q R F L H K A Q K L Y P L 61 GCCGGCGACCGCGAGAGGGCCCAGCGCTTCCTGCACAAGGCCCAGAAGCTGTACCCGCTG 41 P A A R A L L E I I M K N G S T A G S S 121 CCCGCGGCCCGCGCACTGTTGGAAATAATTATGAAAAATGGAAGCACTGCTGGAAGTAGC 61 P H C R K P A T S G D Q G R P N C T K D 181 CCTCATTGCCGCAAACCCGCCACAAGTGGAGATCAGGGCAGGCCTAACTGCACGAAGGAC 81 S T T A G G E G G K G Y T K D Q V D G V 241 AGCACGACTGCCGGTGGGGAAGGTGGGAAAGGCTACACCAAAGACCAGGTGGATGGAGTT 101 L S I N K C K N Y Y E V L G V T K D A G 301 CTCAGCATAAACAAATGTAAAAATTACTATGAAGTACTTGGAGTTACCAAAGATGCTGGT 121 D E D L K K A Y R K L A L K F H P D K N 361 GATGAAGATTTGAAAAAAGCTTATAGAAAGCTTGCTTTGAAGTTTCATCCAGACAAAAAC 141 H A P G A T D A F K K I G N A Y A I L S 421 CATGCGCCTGGAGCAACAGATGCTTTTAAAAAGATTGGAAATGCTTATGCTATTTTAAGC 161 N P E K R K Q Y D L T G N E E Q A C N H 481 AATCCAGAAAAACGAAAACAGTATGACCTCACAGGCAATGAAGAACAAGCATGTAATCAC 181 Q N N G R F N F H R G C E A D I T P E D 541 CAAAACAATGGCAGATTTAATTTCCATAGAGGTTGTGAAGCTGATATAACTCCAGAAGAC 201 L F N I F F G G G F P S G S V H S F S N 601 TTGTTTAATATATTCTTTGGTGGTGGATTTCCTTCAGGTAGTGTACATTCATTTTCAAAT 221 G R A G Y S H Q H Q H R H S G H E R E E 661 GGACGAGCTGGTTATAGCCATCAACATCAGCATCGACATAGTGGACATGAAAGAGAAGAG 241 E R G D G G F S V F I Q L M P I I V L I 721 GAAAGAGGAGATGGAGGTTTTTCTGTGTTTATCCAGCTGATGCCCATTATCGTATTGATC 261 L V S L L S Q L M V S S P P Y S L Y P R 781 CTCGTGTCGTTATTAAGCCAGTTGATGGTCTCTAGTCCTCCTTATTCCCTATATCCCAGA
-J
Gene Name GenBank * DNA"SEQUENCE / DEDUCED, AMINO kA"CID-SEQUENCE* SEQUENCE Homology 4 A IDENTIFIER: 121 TCCTTGGTGAATCCTGCCTTCAGATGCTTACCCAGTGGTCCCTGCATCTAG
WBC025E03 (2274) v-ets erythroblastosis 1 ATGCCAGGGAGCAGGGACCTCATTCCCCAGTGGTTCTGCCCCTTGGGGACACAGTGCCCC SEQ ID NO: 30 virus E26 oncogene 61 TACCATAGGTACTCAAAGGTACTCAGAGGTACTCAAAAGGTCCTCCTGCGGACCTTGTGT homologue 2 121 AGGTGTCAAGTTCTCTCTAGAGTGAACATGCCTCAGAATCATAATCAGGGAGGAATGTCA (avian) - ETS2 / 181 TTCACTTTTTCTTCATTGACAAATTGAGTTTAACTCTTTTCCATCCATGTTCACCAAAGG Nucleus, 241 TGGCCCGCCGGTGGGGAAAGAGGAAAAATAAGCCCAAGATGAACTACGAGAAGCTGAGCC regulation of 301 GGGGCTTACGCTACTATTACGACAAGAACATCATCCACAAGACGTCGGGGAAGCGCTACG transcription, DNA 361 TGTACCGCTTCGTGTGCGACCTCCAGAACTTGCTGGGGTTCACGCCCGAGGAACTGCACG dependent, 421 CCATCCTGGGCGTCCAGCCCGACACGGAGGACTGAGGTCGCCGGGACCACCCTGAGCCGG transcription 481 CCCCAGGCTCGTGGACTGAGTGGGAAGCCCATCCTGACCAGCTGCTCCGAGGACCCAGGA factor activity. 541 AAGGCAGGATTGAAAATGTCCAGGAAAGTGGCCAAGAAGCAGTGGCCTTATTGCATCCCA 601 AACCACGCCTCTTGACCAGGCTGCCTCCCTTGTGGCAGCAACGGCACAGCTAATTCTACT 661 CACAGTGCTTTTAAGTGAAAATGGTCGAGAAAGAGGCACCGGGAAGCCGTCCTGGCGCCT 721 GGCAGTCCGTGGGACGGGATGGTTCTGGCTGTTTGAGATTCTCAAAGGAGCGAGCATGTC 781 GTGGACACACACAGACTATTTTTAGATTTTCTTTTGCCTTTTGCAACCAGGAACAGCAAA 841 TGCAAAAACTCTTTGAGAGGGTAGGAGGGTGGGAAGGAAACAACCATGTCATTTCAGAAG 901 TTAGTTTGTATATATTATAATAATCTTATAATTGTTCTCAGAATCCCTTAACAGTTGTAT 961 TTAACAGAAATTGTATATTGTAATTTAAAATAATTATATAACTGTATTTGAAATAAGAAT 1021 TCAGACATCTGAGGTTTTATTTCATTTTTCAATAGCACATATGGAATTTTGCAAAGATTT 1081 AATCTGCCAAGGGCCGACTAAGAGACGTTGTAAAGTATGTATTATTCACATTTAATAGAC 1141 TTACAGGGATAAGGCCTGTGGGGGGTAATCCCTGCTTTTTGTGTTTTTTTTGTTTGTTTG 1201 TTTGTTTGTTTTTGGGGGGTTTTCTTGCCTTGGTTGTCTGGCAAGGACTTTGTACATTTG 1261 GGAGTTTTTATGAGAAACTTAAATGTTATTATCTGGGCTTATATCTGGCCTCTGCTTTCT 1321 CCTTTAATTGTAAAGTAAAAGCTATAAAGCAGTATTTTTCTTGACAAATGGCATATGTTT 1381 TCCACTTCTTTGCATGCGTTTAAGTCAGTTTATACACAAAATGGATTTTATTTTTTAGTT 1441 TAACTGTGTTTCTCCGACAGCTCACCTCTCCCTGACCAGCCAACCATTTCCTTTCTGTGC 1501 TCCACGTTCTTCTGTGTGATTAAAATAAGAATATTATTTTTGGAAATATGCAACTCCTTT 1561 TCAGAGATCAGGAGGGATTTATGTAGCAGCTATTTTTATGCAAAAGTAATTCACTGGAAA 1621 AAAAATGAAATTTGTAAGAAAGCTTTATTTTTATCTCAGCTCTATGTAAAGTTAAAGTTA 1681 CTGTACAGAGCTGAAGGACGGGGGGCGGTAGGGGTCTTGATGAAACCTCTTGAACGAAGC 1741 ACAGTTTGTCCCATCTTTGTTCACTCGTGTGTCTCAACCATCTTAATAGCATGCTGCTCC 1801 TTTTTGCTCAGTGTCCACAGCAAGATGACGTGATTCTTATTTTCTTGGACACAGACTATT 1861 CTGAGGCACAGAGCGGGGACTTAAGATGGGAAAGGGAAAGCATCGGAGCCATTCATTCGG 1921 AGAAAACGTTTTGATCAAAATGGAGACTTTTGTAGTCGTTTCAAAAGAGCACCTGAGTCA 1981 TGTGTATTCCCGGCCTTCTTTATAAATGACCCGGTCAAGTTGGTTTCAAAGTTCGACAGG 2041 CTTGTCTGTTTACTAGCTGCGTGGCCTTGGACGGGTGGCTGACATCTGTAAAGAATCCTC 2101 CTGTGATGAAACTGAGGAATCGGGTGGCCGGGCAAGCTGGGAAGAGCAAAGCCAGAGCTG 2161 CGCTGCCTCAATACCCACAAAAGACCATTCCCAGTATACATAAGCACAGGATGTTTTTCT 2221 CAAGAGGGATGTATTTATCACTTGGACATCTGTTTATAATATAAACAGACATGTGACTGG 2281 GAACATCTTGCTGCCAAAAGAATCCTAGGCAGTGGCTCATTGTATGTGAGGTTGAACCAC 2341 GTCAAATTGCCAATATTAGGCTGGCTTTTATCTACAAAGAAGGAGTTTCATGGGGTCAGC 2401 CTAACAGTTATGGAAACTACAGTCCTTATAAACCATTGGCATGGTAATAAACAGATCTTA 2461 AGTATAAAAATTTTGTAATTGGGCCTTTACTCTCTCAATAATAAAGTATTTTGTTTATAT 2521 AAATTCTTTGTGATAGTCCTCGTTCTTCCTCTCCACACCCAGCATGAAGGAGTTGGAGGA 2581 AGGATGTTAACCCCAGATCCATTCTCTACTCAAAACATTCCATCATCAAGTGGCAAGTCT
Gene Name GenBank DNA SEQUENCE / DEDUCEDAMINOgAGID SEQUENCE SEQUENCE Homology IDENTIFIER: 721 TGTTTCTTCATTCATTCAACAAAATTTGGCTGGAAAAAAAAAAAAAAAAAAAAAAAAAAA 781 AAA
1 M E L C R S L A L L G G S L G L M F C L SEQ ID NO: 58 1 ATGGAGCTCTGCCGGTCCCTGGCCCTGCTGGGGGGCTCCCTGGGCCTGATGTTCTGCCTG 21 I A L S T D F W F E A V G P T H S A H S 61 ATTGCTTTGAGCACCGATTTCTGGTTTGAGGCTGTGGGTCCCACCCACTCAGCTCACTCG 41 G L W P K E H Q D P V A G Y I H V T Q A 121 GGCCTTTGGCCAAAAGAGCATCAGGACCCAGTAGCAGGCTACATCCACGTGACTCAGGCC 61 F S I L A S L G G L V S V S F L V L S C 181 TTCAGCATTCTGGCTTCCCTGGGGGGTTTGGTGTCCGTGAGCTTCCTGGTCCTGTCCTGC 81 I P S L F P P G H G P L V S T T A A F A 241 ATCCCCTCACTGTTCCCCCCAGGCCACGGCCCGCTTGTCTCAACCACCGCAGCCTTTGCT 101 A A L F V M V A M M V Y T I E R W N Q A 301 GCAGCCCTCTTCGTGATGGTGGCCATGATGGTCTATACCATTGAGCGGTGGAACCAGGCT
UJ 121 L N P Q I Q T F F S W S F Y L G W V S A
O 361 CTAAATCCCCAGATCCAGACGTTCTTCTCCTGGTCCTTCTACCTGGGCTGGGTTTCAGCC 141 L F W V C A G A L S L G A H C S A P L P 421 CTCTTCTGGGTCTGTGCAGGTGCCCTGAGCCTGGGTGCTCACTGCAGCGCCCCCCTTCCT 161 G Y E A V - 481 GGCTACGAAGCCGTGTGA
BM734502 No Homology 1 tcaaaggctc ctgacatccc tgatacccag agccgcttag gactttcatg agctggaggc SEQ ID NO: 22 61 acttctacct ttgtgggtcc cttattctat aaaaagtagt aaaaattata ctttgcaact 121 gtattgatat aaacatgaat gtaactgagg ctgtattata ttcctttttt tagtgtaaaa 181 gaaactattt ttgtggactt ctaaaagctc atgggcccta gccactgtgc ctactgtgcc 241 tgctggctag gtgggccctg ccattacgtg gatactgtgc ctttagctgt aacaccgagc 301 ctgtatcctt taatctcctc ggcctctaga gggtcaaagt atttatttac taattgattt 361 cattcaggca tgctcatctg agaaatgtag gaggaaatgg gacccagctc tcttggtgac 421 tgacagcctc cccctttcac tatcttttct caccttccca ctcccccttc aaagtgagcc 481 aggatctctt tacgtttaga aatccattca tattgtttta actctcttcc caggtctcca 541 catggcaaag accatgcctg taactgaatt aaatgccttg cttgagaaca gctttcaaac 601 ttttctgatc ccagcctaca ctaagaaaaa caacatacag tgaattaagc ctgttaaaaa 661 aaaaaaaaaa aa
TABLE 2
» JProbβϊiet « ameJjH * " robe
>PR0BE SEQUENCE f , Seque e "»
^ Interrogation f Identifier Position 1
B1961946.V1.3_at 535 TTCTTCTATGAGATGGCGGCCACGA SEQ ID NO:331
B1961946.V1.3_at 594 AAAGGATGTGTAGCAGACCCCTGCC SEQ ID NO: 332
B1961946.V1.3_at 633 TGGCAGGTACTCAGTTGATCGTCGA SEQ ID NO: 333
No homology
WBC027E08_V1.3_at 113 TGATATCTTGTTTCTTTCTCCTGTT SEQ ID NO:334
WBC027E08_V1.3_at 169 TCAGATATGAGGAGCCCCATCCTTT SEQ ID NO:335
WBC027E08_V1.3_at 230 CATCTCTGCTACCTGTTTTATGTCA SEQ ID NO: 336
WBC027E08_V1.3_at 280 AAAGCTAAGCAGACTCTCTCGGGAC SEQ ID NO: 337
WBC027E08_V1.3_at 312 CCAAGCCCCTATGGTGGTTCATAAA SEQ ID NO:338
WBC027E08_V1.3_at 483 ACCACCCATTTCTAATTTTTCTCTG SEQ ID NO:339
WBC027E08_V1.3_at 507 GGATCAATGGTTCTCTACCCAAGGT SEQ ID NO: 340
WBC027E08_V1.3_at 562 AGTGTCTCTGGTCATTTGGTTGTCA SEQ ID NO: 341
WBC027E08_V1.3_at 594 GTGGCAGGTGCTACTAGCGTCTAAT SEQ ID NO: 342
WBC027E08_V1.3_at 641 TAAACATCCTACAACCTACAGCAAG SEQ ID NO:343
WBC027E08_V1.3_at 657 TACAGCAAGAATTGTCCAGCCCAAA SEQ ID NO:344
No Homology
BM734502.V1.3_at 60 ATGAGCTGGAGGCACTTCTACCTTT SEQ ID NO:345
BM734502.V1.3_at 78 TACCTTTGTGGGTCCCTTATTCTAT SEQ ID NO: 346
BM734502.V1.3_at 208 GACTTCTAAAAGCTCATGGGCCCTA SEQ ID NO: 347
BM734502.V1.3_at 266 GGCCCTGCCATTACGTGGATACTGT SEQ ID NO:348
BM734502.V1.3_at 282 GGATACTGTGCCTTTAGCTGTAACA SEQ ID NO:349
BM734502.V1.3_at 302 TAACACCGAGCCTGTATCCTTTAAT SEQ ID NO:350
BM734502.V1.3_at 367 TGATTTCATTCAGGCATGCTCATCT SEQ ID NO:351
BM734502.V1.3_at 407 AAATGGGACCCAGCTCTCTTGGTGA SEQ ID NO: 352
BM734502.V1.3_at 486 GTGAGCCAGGATCTCTTTACGTTTA SEQ ID NO: 353
BM734502.V1.3_at 523 TATTGTTTTAACTCTCTTCCCAGGT SEQ ID NO: 354
BM734502.V1.3_at 606 TTCAAACTTTTCTGATCCCAGCCTA SEQ ID NO: 355
No Homology
WBC013B08_V1.3_at 167 GAAGAACGTACTCACCGACCAAAAG SEQ ID NO:356
WBC013B08_V1.3_at 235 AGGGTTACCCTGAGCAACGCATCGT SEQ ID NO: 357
WBC013B08_V1.3_at 249 CAACGCATCGTTAGGCAATTTCATC SEQ ID NO:358
WBC013B08_V1.3_at 279 GTGAACATCACAGCGTGCACTTACA SEQ ID NO:359
WBC013B08_V1.3_at 305 AAACCTAGATGGTACAGCCCACTTC SEQ ID NO:360
WBC013B08_V1.3_at 334 CTACGCTGTATGGTACTGATCCTGT SEQ ID NO: 361
WBC013B08_V1.3_at 351 GATCCTGTGGGACCACTGTCATATA SEQ ID NO: 362
WBC013B08_V1.3_at 374 TATGAAGCCCATCGTTGACCAAAAT SEQ ID NO: 363
WBC013B08_V1.3_at 402 ATCAAGCAGTGCCTGACTGTGCTAA SEQ ID NO: 364
WBC013B08_V1.3_at 448 GACAACCTTGTACCCAGTTAGCATG SEQ ID NO:365
WBC013B08_V1.3_at 580 TTTTCCGACAACGTGGCTGGGCCTT SEQ ID NO:366
No Homology .
WBC016A05_V1.3_at 95 GAAGAAACTTCATTACCTCCCAGTC SEQ ID NO:367
WBC016A05_V1.3_at 133 TGATGCTTGAATCCCCTGACAACAC SEQ ID NO:368
WBC016A05_V1.3_at 194 CCCTCAGCCCTGTTTTTAGATAGTG SEQ ID NO: 369
WBC016A05_V1.3_at 236 AAATGTCTAATCCTTACCCAGCTGT SEQ ID NO:370
WBC016A05_V1.3_at 313 GCTGCCCCTGGGTTGAATGCTGAAA SEQ ID NO: 371
TABLE 3 AMINO ACID SUB-CLASSIFICATION
TABLE 4 EXEMPLARY AND PREFERRED AMINO Ac SUBSTITUTIONS
TABLE 6 DIAGNOSTIC PERFORMANCE BASED ON SELECTED GENES
TABLE 7 RANKING OF GENES BASED ON T VALUE
TABLE 8 Two GENES SELECTED
TABLE 9 THREE GENES SELECTED
TABLE 10 FOUR GENES SELECTED
TABLE 11 FIVE GENES SELECTED
TABLE 12 Six GENES SELECTED Genes Sensitivity Specificity Success
B1961023 BM781262 WBC010G03 BC028D07 BC041C01 WBC567 BM735363 BM735402 WBC001D03 WBC013B08 BC041G01 WBC567 BM781262 WBC013B08 WBC016A05 WBC023F10 WBC028D07 WBC041C01 B1961567 BM781262 WBC010G03 WBC019A11 WBC041C01 WBC193 BM735363 WBC013B08 BC025E03 WBC041B04 WBC041G01 BC567 B1961946 BM735546 WBC008C11 WBC013B08 WBC019A11 WBC041C01 B 1961946 BC013B08 BC016A05 WBC025E03 WBC028D07 WBC041C01 BM735402 BM781262 WBC013B08 BC025E03 BC041C01 WBC193 BM781262 gi576646 WBC041B04 WBC041C01 WBC041G01 WBC193 B1961946 BC008C11 WBC013B08 BC016A05 WBC041C01 WBC567 BM781262 WBC008C11 BC019A11 WBC023F10 WBC041C01 WBC567 BM734829 BM735363 BM781262 WBC013B08 WBC041G01 WBC567 BM734607 BM734829 WBC028D07 WBC041G01 WBC193 WBC567 B1961946 WBC008C11 BC016A05 WBC025E03 WBC027E08 WBC041C01 B1961567 B1961946 WBC008C11 BC013B08 WBC025E03 WBC041C01 B1961567 BM734829 BC023F10 WBC041G01 BC193 BC567 BM734607 BM781262 WBC019A11 WBC041C01 WBC193 WBC567 BM735402 WBC013B08 WBC019A11 WBC025E03 WBC041G01 WBC567 BM781262 gi576646 BC001D03 WBC041C01 WBC193 BC422 B1961023 BM735546 WBC013B08 WBC025E03 WBC041C01 WBC567
TABLE 13 SEVEN GENES SELECTED
TABLE 14 EIGHT GENES SELECTED
TABLE 15 NEVE GENES SELECTED
Ul
00
TABLE 16 TEN GENES SELECTED
TABLE 17 TWENTY GENES SELECTED
TABLE 18 EPM MARKER GENE ONTOLOGY