[go: up one dir, main page]

WO2009063249A2 - Early detection of sepsis - Google Patents

Early detection of sepsis Download PDF

Info

Publication number
WO2009063249A2
WO2009063249A2 PCT/GB2008/051069 GB2008051069W WO2009063249A2 WO 2009063249 A2 WO2009063249 A2 WO 2009063249A2 GB 2008051069 W GB2008051069 W GB 2008051069W WO 2009063249 A2 WO2009063249 A2 WO 2009063249A2
Authority
WO
WIPO (PCT)
Prior art keywords
sepsis
data
analysis
clinical
biomarkers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/GB2008/051069
Other languages
French (fr)
Other versions
WO2009063249A3 (en
Inventor
Timothy John Gilby Brooks
Matthew Christopher Jackson
Roman Antoni Lukaszewski
Martin Julian Pearce
Carrie Jane Turner
Amanda Marie Yates
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
UK Secretary of State for Defence
Original Assignee
UK Secretary of State for Defence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by UK Secretary of State for Defence filed Critical UK Secretary of State for Defence
Publication of WO2009063249A2 publication Critical patent/WO2009063249A2/en
Publication of WO2009063249A3 publication Critical patent/WO2009063249A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5091Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing the pathological state of an organism

Definitions

  • bacteraemia leads to the rapid (within 30-90 minutes) onset of pyrexia and release of inflammatory cytokines such as interleukin-1 (IL-1 ) and tumour necrosis factor- ⁇ (TNF- ⁇ ) triggered by the detection of bacterial toxins, long before the development of a specific, antigen-driven immune response.
  • IL-1 interleukin-1
  • TNF- ⁇ tumour necrosis factor- ⁇
  • Gram-negative bacteraemia due to infections such as typhoid, plague, tularaemia and brucellosis, or peritonitis from Gram-negative gut organisms such as Escherichia coli, Klebsiella, Proteus or Pseudomonas this is largely a response to lipopolysaccharide (LPS) and other components derived from bacterial cell walls. Circulating LPS and, in particular, its constituent lipid A, provokes a wide range of systemic reactions. It is probably contact with Kupffer cells in the liver that first leads to IL-1 release and the onset of pyrexia.
  • LPS lipopolysaccharide
  • cytokines such as IL-6, IL-12, IL- 15, IL-18, TNF- ⁇ , macrophage migration inhibitory factor (MIF), and cytokine-like molecules such as high mobility group B1 (HMGB1 ), which, in turn activate neutrophils, lymphocytes and vascular endothelium, up-regulate cell adhesion molecules, and induce prostaglandins, nitric oxide synthase and acute-phase proteins.
  • PAF platelet activating factor
  • prostaglandins prostaglandins
  • leukotrienes and thromboxane activates vascular endothelium, regulates vascular tone and activates the extrinsic coagulation cascade.
  • Dysregulation of these responses results in the complications of sepsis and septic shock in terms of peripheral vasodilation leading to hypotension, and abnormal clotting and fibrinolysis producing thrombosis and intravascular coagulation (Cohen, 2002, Nature 420: 885-891 ).
  • LPS primarily acts on cells by binding to a serum LPS-binding protein (LBP) and CD14 expressed on monocytes and macrophages.
  • LBP serum LPS-binding protein
  • CD14 acts with a co-receptor, Toll-like receptor 4 (TLR-4) and a further component, MD-2, to form a signalling complex and initiate activation of macrophages and release of cytokines
  • the Toll-like receptor family is a group of cell surface receptors involved in a range of bacterial and fungal ligands that act as triggers for innate immune system, including Gram-positive cell wall structures, flagellin, and CpG repeats characteristic of bacterial DNA.
  • septic shock In the case of infection with Gram-positive pathogens, septic shock is associated with the production of exotoxins.
  • toxic shock syndrome a particularly acute form of septic shock that often affects otherwise healthy individuals is due to infection with particular strain of Staphylococcus aureus, which produces an exotoxin known as toxic shock syndrome toxin-1 (TSST-1 ).
  • TSST-1 toxic shock syndrome toxin-1
  • a similar syndrome is caused by invasive infection with certain group A Streptococcus pyogenes strains, and is often associated with streptococcal pyogenic enterotoxin A (SPE-A).
  • SPE-A streptococcal pyogenic enterotoxin A
  • T cell receptor TCR
  • MHC Major Histocompatibility Complex
  • VEEV Venezuelan Equine Encephalitis Virus
  • SIRS Consensus Conference of the American College of Chest Physicians (ACCP) and Society of Critical Care Medicine (SCCM) "SIRS" is considered to be present when patients have more than one of the following: a body temperature of greater than 38 5 C or less than 36 5 C, a heart rate of greater than 90/min, hyperventilation involving a respiratory rate higher than 20/min or PaCO 2 lower than 32mm Hg, a white blood cell count of greater than 12000 cells / ⁇ l or less than 4000 cells / ⁇ l (Bone et al, 1992, Crit Care Med 20: 864- 874).
  • SIRS Systematic sarcoma
  • infection was defined as a pathological process caused by invasion of a normally sterile tissue, fluid or body cavity by pathogenic or potentially pathogenic micro-organisms.
  • Septic shock refers (in adults) to sepsis plus a state of acute circulatory failure characterised by a persistent arterial hypotension unexplained by other causes.
  • the first generally accepted system was the Acute Physiology and Chronic Health Evaluation score (APACHE, and its refinements APACHE Il and III) (Knaus et al, 1985, Crit Care Med 13: 818-829; Knaus et al, 1991 , Chest 100: 1619-1636), with the Mortality Prediction Model (MPM) (Lemeshow ef a/ ,1993, JAMA 270: 2957-2963) and the Simplified Acute Physiology (SAPS) score (Le Gall et al, 1984, Crit Care Med 12: 975-977) also being widely used general predictive models.
  • MPM Mortality Prediction Model
  • SAPS Simplified Acute Physiology
  • CRP C-reactive protein
  • VLDL very low density lipoprotein
  • LCCRP lipoprotein complexed C-reactive protein
  • TNF- ⁇ and IL-1 are archetypal acute inflammatory cytokines long known to be elevated in sepsis (Damas et al, 1989, Critical Care Med ,17 : 975-978) and have reported to be useful predictors of organ failure in adult respiratory distress syndrome, a serious complication of sepsis (Meduni et al, 1995, Chest 107: 1062-1073)
  • C3a Activated complement product C3
  • IL-6 Activated complement product C3 (C3a) and IL-6 have been proposed as useful indicators of host response to microbial invasion, and superior to pyrexia and white blood cell counts (Groeneveld et al, 2001 , Clin Diagn Lab Immunol 8: 1 189-1 195). Secretary phospholipase A 2 was found to be a less reliable marker in the same study.
  • Procalcitonin is the propeptide precursor of calcitonin, serum concentrations of which are known to rise in response to LPS and correlate with IL-6 and TNF- ⁇ levels. Its use as a predictor of sepsis has been evaluated (Al-Nawas et al, 1996, Eur J Med Res 1: 331-333). Using a threshold of 0.1 ng /ml, it correctly identified 39% of sepsis patients. However, other reports suggest that it is less reliable than the use of serial CRP measurements (Neely et al, 2004, J Burn Care Rehab 25: 76-80), although superior to IL-6 or IL-8 (Harbarth et al, Am J Resp Crit Care Med 164: 396 ⁇ 102).
  • Anasthesie und Intensivophil JJ.: 40-43) provides an example of a neural network model being used for sepsis prediction.
  • This model (MEDAN) analysed a range of standard clinical measure and compared its results with those obtained by using the APACHE II, SOFA, SAPS II, and MODS models. The study concluded that, of the markers available, the most informative were systolic and diastolic blood pressure, and platelet count.
  • Neural networks are non-linear functions that are capable of identifying patterns in complex data systems. This is achieved by using a number of mathematical functions that make it possible for the network to identify structure within a noisy data set. This is because data from a system may produce patterns based upon the relationships between the variables within the data.
  • neural networks are able to predict or classify future examples by modelling the patterns present within the data it has seen.
  • the performance of the network is then assessed by its ability to correctly predict or classify test data, with high accuracy scores, indicating the network has successfully identified true patterns within the data.
  • the parallel processing ability of neural networks is dependent on the architecture of its processing elements, which are arranged to interact according to the model of biological neurones.
  • One or more inputs are regulated by the connection weights to change the stimulation level within the processing element.
  • the output of the processing element is related to its activation level and this output may be non-linear or discontinuous.
  • Training of a neural network therefore comprises an adjustment of interconnected weights depending on the transfer function of the elements, the details of the interconnected structure and the rules of learning that the system follows (Place et al, 1995, Clinical Biochemistry 28: 373-389).
  • Such systems have been applied to a number of clinical situations, including health outcomes models of trauma patients (Marble & Healy (1999) Art lntell Med 15: 299- 307).
  • Dybowski et a/ (1996, Lancet 347: 1 146-1 150) use Classification and Regression Trees (CART) to select inputs from 157 possible sepsis prediction criteria and then use a neural network running a genetic algorithm to select the best combination of predictive markers. These include many routine clinical values and proxy indicators rather than serum or cell surface biomarkers. However, the problem being addressed is the prognosis of patients who already have a clear diagnosis of sepsis and are already critically ill.
  • a further refinement of the genetic algorithm approach involves the use of Artificial Immune Systems, of which one version is the Artificial Immune Recognition System (AIRS) (Timmis et al, An overview of Artificial Immune Systems.
  • AIRS Immunologically speaking, AIRS is inspired by the clonal selection theory of the immune system (F. Burnett. The Clonal SelectionTheory of Acquired Immunity. Cambridge University Press, 1959).
  • the clonal selection theory attempts to explain that how, through a process of matching, cloning, mutation and selection, anti-bodies are created that are capable of identifying infectious agents.
  • AIRS is specifically designed for use in classification, more specifically one-shot supervised learning.
  • US patent application 2002/0052557 describes a method of predicting the onset of a number of catastrophic illnesses based on the variability of the heart-rate of the patient. Again, a neural network is among the possible methods of modelling and analysing the data.
  • the outputs from both multiple logistic regression models and neural networks are continuously variable quantities but the likelihoods calculated by neural network models usually fall at one extreme or the other, with few values in the middle range. In a clinical situation this is often helpful and can give clearer decisions (Flanagan et al, 1996, Clinical Performance & Quality Health Care 4: 96-103).
  • the ability to detect the earliest signs of infection and / or sepsis has clear benefits in terms of allowing treatment as soon as possible. Indications of the severity of the condition and likely outcome if untreated inform decisions about treatment options. This is relevant both in vulnerable hospital populations, such as those in intensive care, or who are burned or immunocompromised, and in other groups in which there is an increased risk of serious infection and subsequent sepsis.
  • the use or suspected use of biological weapons in both battlefield and civilian settings is an example where a rapid and reliable means of testing for the earliest signs of infection in individuals exposed would be advantageous.
  • WO 2006/061644 discloses a method for detecting early signs of infection based on measurement of expression levels of particular combinations of cytokines and/or cellular activation markers. Expression was measured by either cell surface expression as detected by FACS, or at a transcriptional level by RT-PCR, optionally combined with the use of predictive algorithms.
  • CD40 is a TNF-receptor superfamily member expressed on T and B lymphocytes, among other cells, and is required for a wide variety of immune and inflammatory responses, in particular B cell immunoglobulin production and isotype switching, and development of memory B cells (Grewel & Flavell, 1998, Annu Rev Immunol 16: 1 1 1 ). Its ligand is another leukocyte cell surface molecule, CD154. Two alternately spliced isoforms are known, the longer isoform (1 ) being encoded by transcript variant 1 (NCBI accession number NM 001250, SEQ ID N0:1 ).
  • CD5 is also a cell surface receptor expressed on T and B lymphocytes where it interacts with its ligand CD72 and has a role in modulating the immune response (Berland & Wortis, 2002, 20: 253).
  • the cDNA sequence encoding human CD5 has the NCBI accession number NM 014207 (SEQ ID NO:2).
  • CD79A previously known as MB-1 or Ig- ⁇ , is part of the B cell antigen receptor complex together with another similar molecule, CD79B (B29 or Ig- ⁇ ), and the surface immunoglobulin chains.
  • CD79A and B are involved in signal transduction and B cell surface immunoglobulin expression Jumaa et al, 2005, Annu Rev Immunol 23: 415).
  • There two known transcript variants, and the longer transcript sequence is listed at NCBI accession number NM 001783 (SEQ ID NO:3).
  • CRX is the gene for cone-rod homeobox, a homeodomain transcription factor that controls differentiation in photoreceptor cells and is required for normal cone and rod cell function. Mutations in this gene are associated with photoreceptor degeneration (Leber congenital amaurosis type III and autosomal dominant cone-rod dystrophy 2, but no immunological functions are known (Chen et al, 2002, Human Molecular Genetics, JJ.: 873).
  • the cDNA sequence is available at NM 000554 (SEQ ID NO:4).
  • CTNND1 is the gene encoding catenin (cadherin-associated protein) delta-1 , a member of the armadillo family of proteins (previously known as p120 cas and p120 catenin). It is one of a number of proteins (others being ⁇ -catenin and plakaglobin) that bind to the cytoplasmic region of cadherins, modulating cell adhesion and linking cadherins to the cytoskeleton (Franze & Ridley, 2004, J Biol Chem 279: 6588). Such molecules may also have a role in signal transduction through rho family GTPases.
  • the cDNA sequence is available at NM 001331 (SEQ ID NO:5).
  • CX3CL1 encodes chemokine (C-X3-C motif) ligand 1 , an unusual chemokine (previously known as fractalkine) characterised by the unique spacing of the first 2 cysteines in its chemokine cysteine motif and its dual role as a chemoattractant and cell adhesion molecule involved in the inflammatory response. It is expressed as a cell surface molecule but a soluble from is generated by juxtamembrane proteolytic cleavage (Umehara et al, 2004, Arterioscler Thromb Vase Biol 24: 34). The cDNA sequence is available at NM 002996 (SEQ ID NO:6).
  • ENTPD2 is the gene for ectonucleoside triphosphate diphosphohydrolase 2 (otherwise known as CD39L or NTPDase-2).
  • ENTPD5 is the related ectonucleoside triphosphate diphosphohydrolase 5 (CD39L4 or NTPDase-5). These molecules are cell surface ATP- hydrolyzing enzymes responsible for the breakdown of extracellular nucleotides, thus regulating a complex system of cell signalling via large families of purine and pyrimidine receptors.
  • ENTPD2 exists in a number of splice variants, which may have distinct functions (Wang et al, 2005, Biochem J 385: 729).
  • a long isoform is encoded by the cDNA sequence of NM 203468 (SEQ ID NO:7).
  • NM 001246 encodes a shorter isoform with a truncated C- terminus.
  • the ENTPD5 sequence is available at NM 001249 (SEQ ID NO:8).
  • EPHA8 is a gene encoding the ephrin A8 receptor, a member of the ephrin receptor subfamily of receptor tyrosine kinases.
  • the ephrin A8 receptor functions as a receptor for ephrin A2, A3 and A5 and is involved in short-range con tact- mediated axonal guidance during development of the nervous system (Gu et al, 2005, Oncogene 24: 4243).
  • GPR44 encodes G protein-coupled receptor 44, more widely known as chemoattractant receptor-homologous molecule expressed on Th2 cells (CRTH2).
  • the sequence is available at NM 004778 (SEQ ID NO:10).
  • HDAC5 is histone deacetylase 5, a class Il histone deacetylase that represses transcription when tethered to a promoter. Histone acetylation/deacetylation alters chromatin structure and is a major factor controlling gene expression. HDAC5 is thought to interact with MEF2 family proteins and may play a role in myogenesis (Zhang et al, 2002, MoI Cell Biol 22: 7302). There are two known isoforms encoded by two splice variants. NM 001015053 relates to the longer transcript (SEQ ID NO:.1 1 ).
  • HMMR hyaluronan-mediated motility receptor
  • RHAMM hyaluronan-mediated motility receptor
  • NM 012484 represents the longest transcript (SEQ ID NO:12).
  • IL-8 is very widely known as a member of the CXC family of chemokines and is a prime mediator of the inflammatory response, being a potent chemotactic and angiogenic factor. It has been reported to be a relatively poor predictor of sepsis (Harbarth et al, Am J Resp Crit Care Med 164: 396). The sequence is available at NM 000584 (SEQ ID NO:13).
  • MAPI A encodes microtubule-associated protein 1 A, a member of a family of microtubule- associated proteins involved in microtubule assembly. MAPI A is expressed predominantly in the brain.
  • the functional protein comprises light and heavy chains resulting from proteolytic processing of a single propeptide encoded by the sequence of NM 002373 (SEQ ID NO:14).
  • MAPK7 is the gene encoding mitogen-activated protein kinase 7 (MAP kinase 7 or ERK5).
  • the MAP kinases occupy a central role in the intracellular signalling cascades from a number of receptor tyrosine kinases and G protein-coupled receptors but MAPK7 differs from the others in that it has not only protein kinase activity but also is also capable of translocating to the nucleus where it appears to be able to phosphorylate and activate transcription factors directly (Buschbeck & Ullrich, 2005, J Biol Chem 280: 2659).
  • Four alternative transcripts encoding two distinct isoforms have been reported. The longest transcript is represented by the sequence of NM 002749 (SEQ ID NO:15).
  • MEF2D is the gene for MADS box transcription enhancer factor 2, polypeptide D (myocyte enhancer factor 2D). Originally described as a muscle-specific transcription factor, MEF2 is now known to exist as four alternatively spliced isoforms (A-D) that are differentially expressed in a range of tissues (Zhu et al, J Biol Chem, 2005, 280: 28749). MEF2D appears to be involved in leukocyte activation and chromosomal translocations resulting in MEF2D fusion proteins contribute to the development of some acute lymphoblastic leukaemias (Prima et al, 2005, Leukemia 19: 806). The MEF2D sequence is available as NM 005920 (SEQ ID NO:16).
  • ODF1 is outer dense fibre of sperm tails 1 and encodes the major protein of the outer dense fibre layer surrounding the axoneme of sperm tails. Defects in the outer dense fibres lead to abnormal sperm morphology and infertility. There is no known connection with genes involved with the inflammatory response.
  • the sequence is available as NM 022410 (SEQ ID NO: 17).
  • SAA3P denotes the serum amyloid A3 pseudogene.
  • the serum amyloid A (SAA) superfamily consists of two acute phase genes, SAA1 and SAA2 and a constitutively expressed gene, SAA4.
  • SAA3P appears to be non-expressed pseudogene.
  • the predicted open reading frame contains an insertion causing a frameshift, which generates a premature stop codon.
  • the resultant hypothetical protein has been expressed.
  • the genomic sequence is available as NG 002634 (SEQ ID NO: 18).
  • SLC6A9 is solute carrier family 6 (neurotransmitter transporter, glycine) member 6 (GLYT1 ).
  • GLYT1 neurotransmitter transporter, glycine
  • a member of a large superfamily of transporter proteins, SLC6A9 is a sodium :glycine symporter, which may be involved in inhibitory glycinergic neurotransmission.
  • SPN is the gene for CD43 (leukosialin, sialophorin).
  • Leukosialin is a major sialoglycoprotein of most leukocytes. It appears to play a part in modulating cell-cell interactions, including T cell activation (Daniels et al , 2002, Nature Immunol 3: 903).
  • the cDNA sequence is available at NM 0031 14 (SEQ ID NO: 20).
  • TDGF1 is teratocarcinoma-derived growth factor 1 (previously known as Cripto). It is a cell surface, glycosyl phosphatidylinositol (GPI) -anchored molecule, a member of the EGF-CFC family of growth factor-like molecules (Shen, 2003, J Clin Invest 1 12: 500). It is over- expressed in a wide range of carcinomas but is not known to have a role in inflammation or the immune response.
  • the cDNA sequence is at NM 003212 (SEQ ID NO: 21 ).
  • TSC22D1 is TSC22 domain family member 1. It is the founding member of the TSC22 family of early response gene transcription factors and is particularly involved in the TGF- ⁇ signalling pathway (and was formerly known as TGF- ⁇ 1 -induced transcript 4 - TGFB1 14) (Gupta et al , 2003, J Biol Chem 278: 7331 ). The accession number is NM 006022 (SEQ ID NO: 22).
  • the invention describes a system and methods of detecting early signs of infection, SIRS or sepsis several days before clinical signs become apparent. It also provides methods capable of predicting the timing of the clinical course of the condition.
  • the system comprises analysing the results of one or more sets of tests based on biological samples, preferably blood samples. Optionally, other routine clinical measurements may be included for analysis.
  • the method comprises determining the level of expression of biomarkers shown herein to be positively correlated to developing sepsis.
  • genes there is a diverse group of genes, many of which have no established connection to the development of acute inflammation or the initiation of an immune response, including CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HDAC5, HMMR, MAPI A, MAPK7, MEF2D, ODF-1 SAA3P, SLC6A9, TDGF1 and TSC22D1. Measuring the expression levels of these genes in combination surprisingly provides early predictive and prognostic information as to the likelihood of sepsis developing in an individual exposed to infective agents.
  • a further group of biomarkers are chemokines or cytokines expressed in blood leukocytes, or are leukocyte surface receptors with an established role in immune function.
  • This group includes CD178 (FAS-L), MCP- 1 (monocyte chemotactic protein-1 ), TNF- ⁇ , IL-1 ⁇ , IL-6, IL-8, IL-10, INF- ⁇ , INF- ⁇ , CD5, CD79A.
  • CD178 is encoded and expressed as a type-ll membrane protein, but may be considered as a cytokine since it is cleaved by a metalloprotease to release a soluble homotrimer, soluble FasL or sFasL.
  • expression of biomarkers is by specific amplification of mRNA by reverse transcription polymerase chain reaction (RT-PCR).
  • RT-PCR reverse transcription polymerase chain reaction
  • the method involves screening a biological sample to detect early stages of infection, SIRS or sepsis comprising the steps of: a. detecting expression of a first set of informative biomarkers by RT-PCR and detecting expression of a second set of informative biomarkers by RT-PCR; b. analysing the results of detection; and c.
  • the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 1 selected from the list consisting of CD40, CD5, CD79A, CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HMMR, IL-8, MAPI A, MAPK7, MEF2D, ODF1 , SAA3P, SLC6A9, SPN, TDGF1 , TSC22D1 and HDAC5 and wherein the second set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 1 selected from the list consisting of CD178, MCP-1 , TNF ⁇ , IL-1 ⁇ , IL-6, IL-10, INF- ⁇ , INF- ⁇ .
  • the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 2, more preferably 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12 , 13, 14, 15, 16, 17, 18, 19, 20, 21 , or 22 selected from the list. It will be understood that, in general, the greater the number of markers, the greater the accuracy of the prediction. Set against this is the greater complexity and time taken for the analysis.
  • markers are SAAP3, MAPI A, EPHA8, CD40, IL-8, CRX, SPN, MEF2D, MAPK7, HMMR, ENTPD2 and TSC22D1.
  • markers providing high levels of prediction of onset of sepsis are as follows: a. SPN,CD40, SAAP3, IL-8, MEF2D, EPHA8, MAPI A and CRX. b. MEF2D, ENTPD2, CD40, EPHA8, SAA3P, HMMR, IL8, MAPK7, SPN and TSC22D1 c. SPN, EPHA8, CD40, CRX, TSC22D1 , MAPK7, HMMR, MEF2D and ENTPD2 d. HMMR, SAAP3, ENTPD2, MAPI A, MAPK7, TSC22D1 and IL-8. e.
  • biomarkers from the second set of informative biomarkers are detected.
  • 3, 4, 5, 6, 7 or 8 biomarkers from the second set of biomarkers are detected in combination with at least one biomarker from the list consisting of CD40, CD5, CD79A, CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HMMR, IL-8, MAPI A, MAPK7, MEF2D, ODF1 , SAA3P, SLC6A9, SPN, TDGF1 , TSC22D1 and HDAC5.
  • analysis of the results yields a prediction of a probability of clinical SIRS or sepsis developing.
  • the analysis may be expressed as a binary yes/no prediction of clinical SIRS or sepsis developing.
  • the results are subjected to a second analysis to determine the likely timing and/or severity of the clinical disease.
  • results of one or more sets of tests are analysed, preferably by means of a neural network program enabling a yes /no prediction of the patient from whom the sample was taken developing sepsis to be calculated.
  • both analyses may be performed by multivariate logistic regression.
  • Analysis of the test groups can be performed individually or simultaneously.
  • further clinical data are entered into the neural net as supplementary data to the PCR data.
  • flow cytometry data can be processed by the neural network. Only one set of data is required for processing through the neural net although there are advantages in inputting one, two or all three data sets as these additional examples help "train" the neural net and improve confidence in the output from the program.
  • the neural network is used to process pre-recorded clinical data or a database of such data may be used to train the neural network and improve its predictive power.
  • Suitable clinical data include at least one, preferably at least three, more preferably at least five selected from the list consisting of temperature, heart rate, total and differential white blood cell count (monocytes, lymphocytes, granulocytes, neutrophils), platelet count, serum creatinine, urea, lactate, base excess, p ⁇ 2 , HCO 3 ' , and C-reactive protein
  • the method may be used as part of routine monitoring for intensive care patients, where regular blood samples are taken for other purposes.
  • Other hospital patients who may be predisposed to infections and/or sepsis may also be monitored.
  • predisposing conditions include inherited or acquired immunodeficiencies (including HIV/AIDS) or immunosuppression (such as general surgery patients, transplant recipients or patients receiving steroid treatment), diabetes, lymphoma, leukaemia or other malignancy, penetrating or contaminated trauma, burns or peritonitis.
  • the method of the invention may be used to screen individuals during an outbreak of infectious disease or alternatively individuals who have been, or who are suspected of having been, exposed to infectious pathogens, whether accidentally or deliberately as the result of bioterrorism or of use of a biological weapon during an armed conflict.
  • this is expressed as a probability. In an alternative embodiment it is expressed as a binary yes/no result.
  • the first analysis suggests that SIRS or sepsis are probable (as defined as exceeding a predetermined arbitrary threshold probability, or a 'yes 1 prediction)
  • said results are subjected to a second analysis to determine the likely time to development of overt clinical signs, or to give an indication of probable severity of the clinical disease.
  • said analysis is by means of a neural network.
  • a neural network is a multilayered perceptron neural network
  • such a neural network is capable of correctly predicting SIRS or sepsis in greater than 70% of cases (determined in trials where such development is not prevented by prophylactic treatment in a control group), more preferably in at least 80% of cases, even more preferably in at least 85% of cases and most preferably in at least 95% of cases.
  • SIRS or sepsis is can be predicted at least one day before the onset of overt clinical signs, more preferably, at least two days, still more preferably at least three days and most preferably more than three days before SIRS or sepsis is diagnosed.
  • analysis is by means of multivariate statistical analysis, preferably comprising principle component analysis and/or discriminant function analysis. It is more preferred that the multivariate statistical analysis comprises discriminate function analysis.
  • the invention provides a system for screening a biological sample to detect early stages of infection, SIRS or sepsis comprising: a means of extracting and purifying RNA from cells obtained from said sample, a thermal cycler or other means to amplify selected RNA sequences by means of reverse transcription polymerase chain reaction (RT-PCR), a means of detecting and quantifying the results of said RT-PCR, a computer-based neural network trained so as to be able to analyse such results and a display means whereby the conclusion of the neural network analysis may be communicated to an operator Note: in this aspect the results of the RT-PCR may be analysed using discriminate function analysis, but the neural network is the preferred embodiment.
  • RT-PCR reverse transcription polymerase chain reaction
  • the invention provides analysis according to any of embodiment of the method described above for the preparation of a diagnostic means for the diagnosis of SIRS, sepsis or infection, or the use of the system described above for the preparation of a diagnostic means for the diagnosis of infection.
  • these are consist of at least one, preferably at least three, more preferably at least five selected from the list consisting of temperature, heart rate, total and differential white blood cell count (monocytes, lymphocytes, granulocytes, neutrophils), platelet count, serum creatinine, urea, lactate, base excess, p ⁇ 2 , HCO 3 " , and C-reactive protein
  • Figure 2 A plot of the CD31 expression measured on granulocytes by flow cytometry.
  • Figure 3 Design of neural network analysing clinical data according to Table 4, model
  • WCC white cell count
  • CRP C-reactive protein
  • Figure 4 Change in cytokine profile obtained following in vitro blood infection with S. aureus. Data from blood taken from three volunteers as detailed in Example 8.
  • Figure 5 Results of neural network analysis of S. aureus in vitro sepsis model.
  • Example 1 Prediction of sepsis by neural network analysis of cytokine expression, cell surface markers and clinical measures.
  • RT-PCR was performed according to commonly-used laboratory techniques. Briefly, in the case of a blood sample, whole blood was taken and cells then lysed in the presence of an RNA stabilising reagent. RNA was separated by affinity binding of beads, which were isolated by centrifugation (or magnetically, as appropriate), contaminating DNA removed by DNase digestion and the RNA subjected to RT-PCR.
  • Fluorescence activated cell sorting flow cytometry is very well-known in the art and any standard technique may be used.
  • Table 2 shows an example of a successful model that classified or "scored” 29/35 (or 82.9%) test patients correctly.
  • Table 2 shows an example of a successful model that classified or "scored” 29/35 (or 82.9%) test patients correctly. Table 2.
  • Classification readout using cytokine mRNA variables (Days 1 to 4)
  • Table 4 lists the averaged prediction accuracy values for a range of networks constructed using differing combinations of variables.
  • the most successful model was constructed using cytokine mRNA expression combined with CD31 % expression from the flow cytometry data (average 81 .0% accuracy, Table 3, model 1 ) with clinical data also scoring highly (80.4%, Table 3, Model 2).
  • Table 6 shows the results of testing a group of volunteers by cytokine RT-PCR, none of whom developed signs of SIRS or sepsis.
  • Example 3 Neural network sepsis prediction of more than 90% accuracy using Clinical Data.
  • the initial AIRS system (A. Watkins. An Artificial Immune Recognition System. Mississippi Sate University: MSc Thesis., 2001 ) employed simple real-value shape space. Recently, other people have extended the representation to Hamming shape space (J. Hanamaker and L. Boggess. The effect of distance metrics on AIRS. In Proc. Of Congress on Evolutionary Computation (CEC). IEEE, 2004) and natural language (D. Goodman, L. Boggess and A. Watkins. "An investigation into the source of power for AIRS, an artificial immune classification system". In Proc. lnt Joint Conference on Neural Networks, pp1678-1683. IEEE, 2003).
  • AIRS maintains a set of Artificial Recognition Balls (ARBs) that contain a vector of the data being learnt, a stimulation level and a number of resources.
  • ARBs Artificial Recognition Balls
  • the stimulation level is calculated by assessing the affinity of the data vector in the ARB against a training item, the stronger the match, the greater the stimulation. This stimulation level is used to dictate how many clones the ARB will produce, and affects survival of the ARB.
  • AIRS evolves with two populations, a memory pool and an ARB pool C. It has a separate training and test phase, with the test phase being akin to a k-nearest neighbour classifier.
  • M This set can be seeded randomly, and experimental evidence would suggest that AIRS is insensitive to the initial starting point.
  • the training item is matched against all memory cells in the set M, and a single cell is identified as the higher match MCmatch. This MCmatch is then cloned and mutated. Cloning is performed in proportion to stimulation (the higher the stimulation, the higher the clonal rate), and mutation is inversely proportional (the higher the stimulation, the lower the mutation rate).
  • This process is performed for each training item, whereupon the memory set will contain a number of cells capable of being used for classification. Classification of an unseen data item is performed in a k-nearest neighbour fashion.
  • AIRS identifies a high percentage of sepsis cases (being able to outperform the neural network on day 5 and day 6, but again with the comparative caveat).
  • the control group did not fair as well, being a lower than expected result, and significantly lower than the neural network approach. This may be due to the fact that AIRS has biased towards the sepsis patients due to the larger amount of data available for training with those, than for non-sepsis.
  • AIRS appears capable of identifying potential cases of sepsis in advance, and comparable at a certain level to neural network approaches.
  • Flow Cytometry Blood was collected into sodium heparin containers (HM&S, Chessington, Surrey) and transported to the laboratory at room temperature. 100 ⁇ l aliquots of blood were mixed with immunofluorescent stains using the volumes recommended by the manufacturer (Beckman Coulter limited, High Wycombe, Buckinghamshire, and Becton Dickinson UK Limited, Cowley, Oxford). T helper cells were identified by co-staining for CD3 and CD4 and T cytotoxic cells were identified by staining for both CD3 and CD8. These cell populations were stained for HLA-DR, CD25, CD54 and CD69. B cells were identified by staining for CD19 and were interrogated with CD80, CD86, CD25 CD54.
  • Natural killer cell were distinguished by staining with CD56 and interrogated with CD1 1 b, CD25, CD54 and CD69.
  • the monocyte population was selected by staining for CD14, these cells were probed with CD1 1 b, CD54, CD80, CD86 and HLA-DR stains. Gating was used in order to identify the granulocyte population, which was stained for CD1 1 b, CD69, CD31 , CD54 and CD62L The stains were incubated at room temperature for 20 minutes. 500 ⁇ l of Optilyse C (Beckman Coulter limited) was added to each tube and vortex mixed immediately.
  • the samples were incubated at room temperature for 10 minutes to lyse the red blood cells and 500 ⁇ l of lsoton (Beckman Coulter limited) were then added in order to fix the stains.
  • the tubes were vortex mixed immediately and incubated at room temperature for 10 minutes. The cells were then counted on a Beckman Coulter Epics XL System 2 Flow Cytometer.
  • Multivariate data analysis procedures were applied to data collected from patients 1 -6 days prior to development of symptoms of Sepsis. Measurements included flow cytometry, PCR and classical clinical observations.
  • Principal component analysis PCA was applied to the data matrix considering each of the three classes of observations individually and combined as a complete data set.
  • DFA Discriminant Function Analysis
  • PCA is a dimensionality reducing technique which endeavours to decompose a multivariate data matrix into a few latent variables, composed of linear combinations of the variables, which explain the bulk of the variance of the original matrix. In this way correlations (positive or negative) of parameters within the data set can be established.
  • DFA is similar in approach to Analysis of Variance (ANOVA).
  • ANOVA Analysis of Variance
  • the DFA problem can be rephrased as a one-way analysis ANOVA problem. Specifically, one can ask whether two groups are significantly different from each other with respect to the mean of a particular variable. However, it should be clear that, if the means for a variable are significantly different in different groups, then it may be concluded that this variable discriminates between the groups.
  • F is essentially computed as the ratio of the between-groups variance in the data over the pooled (average) within-group variance. If the between-group variance is significantly larger then there must be significant differences between means.
  • DFA was performed on clinical, flow cytometry and RT-PCR data using the complete data matrix (including substituted mean values) and by exclusion of data points for which one or more parameters contained substituted mean values.
  • Analogous models were developed to allow analysis of PCA scores from models developed in Model 1. The purpose of the latter was to establish if transformed data matrices (PCA) could be used to classify observations.
  • PCs derived and used in a given model are usually defined as those having an Eigenvalue of >1. 6 PCs meet this criterion for clinical data and explain a total of 74.3% of the variance of the data set. Since each of the PCs is orthogonal (uncorrelated) with respect to the other PCs, the association of a clinical parameter with a particular PC defines the PC and illustrates how the parameter influences the variance of the data set.
  • Table 12 summarises the loadings of each parameter with the six derived PCs from the clinical data. Loading values of >0.5 indicate a strong contribution of a particular parameter to a given PC.
  • the PCs derived from the data set may be interpreted as follows:
  • PC1 this is dominated by the strong correlation of WCC, monocytes, neutrophils and platelets.
  • BXS and HCO 3 ' are highly correlated and contribute to PC1 and PC2 equally. The latter parameters are contrasted by creatinine and lactate in PC1.
  • PC2 shows a negative correlation between the group composed of WCC, monocytes, neutrophils and platelets and the group composed of BXS and HCO 3 '
  • PC5 p ⁇ 2 is contrasted with both urea and MAP in this component.
  • the PC structure may be interpreted as follows:
  • PC1 correlates CD3 CD4 CD25 in CD3 CD4, CD3 CD4 HLA-DR in CD3 CD4, CD3 CD8 CD25 in CD3 CD8, CD19 CD80 in CD19, CD19 CD86 in CD19, CD14 CD80 in
  • CD54 in CD56 in CD56.
  • PC2 contrasts CD19 CD86 in CD19 with CD14 HLA-DR in CD14, CD14 HLA-DR CD1 1 B in CD14, CD14 HLA-DR CD1 1 B CD54 in CD14.
  • CD8 CD3 CD8 CD69 in CD3 CD8
  • CD31 (%), CD54 (%), CD62L (%), CD1 1 B (%), CD69 (%) and CD1 1 B CD69 (%) only be subjected to statistical analysis (fc 24-fc 29).
  • the eigenvalue matrix of the selected flow cytometry variables is shown in Table 17 and loadings of the PCA model constructed summarised in Table 18.
  • the use of 3 PCs explains 76.6% of the variance of the data set.
  • the PC model shows the following:
  • PC1 correlates CD69 (%) and CD1 1 B CD69 (%)
  • PC2 correlates CD31 (%), CD62L (%) and CD1 1 B (%)
  • PC3 is composed of the variance associated with CD54 (%)
  • Table 19 indicates that 72.9% of the variance of the RT-PCR data is explained by only 3 PCs.
  • the loading for this model are shown in Tablei 20.
  • the correlation of variables with each PC are shown in Figures 23 and 24 and reveal the following:
  • PC1 correlates Fas-L, MCP-1 , TNF-alpha, IL-6 and H-S PC2 correlates IL-1 and IL 10
  • PC3 contrasts IL-1 and I L- 10
  • PCA model based on combined clinical, flow cytometry and RT-PCR data
  • Table 21 summarises the parameters included in this final model and the associated Eigenvalues for the correlation matrix. Table 21 indicates that 9 PCs have an Eigenvalue greater than 1 which explain 68.7% of the data variance. The loadings of the model are shown in Table 22. Analysis allows the following interpretation of the PCA model: PC1 shows positive correlation between WCC, Neutrophils, Monocytes, APTR, HCO3-,
  • TNF-alpha TNF-alpha
  • PC3 strongly correlates the PCR parameters Fas-L, MCP-1 , TNF-alpha, IL-6 and H-S PC4 contrasts CRP and IL- 10.
  • PC5 correlates the flow cytometry parameters CD31 (%),CD54 (%),CD62L (%) and CD69
  • PC6 correlates CD62L (%) and HR.
  • PC7 is associated with Temp.
  • PC8 is associated with IL-1
  • PC9 is associated with PO2
  • Model 2 Discriminant Function Analysis (DFA) based on observations and PCA score data
  • DFA Discriminant Function Analysis
  • Model The object of the analysis is to build a "model" of how to best predict to which group a case belongs.
  • the term “in the model” will be used in order to refer to variables that are included in the prediction of group membership, and “not in the model” if they are not included.
  • stepwise discriminant function analysis a model of discrimination is constructed step-by- step. Specifically, at each step all variables are reviewed and evaluated to establish which one will contribute most to the discrimination between groups. That variable will then be included in the model.
  • the stepwise procedure is "guided" by the respective F to enter and F to remove values.
  • the F value for a variable indicates its statistical significance in the discrimination between groups, that is, it is a measure of the extent to which a variable makes a unique contribution to the prediction of group membership.
  • the programme continues to choose variables to be included in the model, as long as the respective F values for those variables are larger than the user-specified F to enter; and excludes (removes) variables from the model if their significance is less than the user-specified F to remove.
  • the tolerance value of a variable is computed as 1 -R 2 of the respective variable with all other variables in the model.
  • the tolerance is a measure of the respective variable's redundancy.
  • a tolerance value of .10 means that the variable is 90% redundant with the other variables in the model.
  • This parameter gives a measure of the discriminatory power of the model and can assume values in the range of 0 (perfect discrimination) to 1 (no discrimination).
  • stepwise discriminant analysis a common misinterpretation of the results of stepwise discriminant analysis is to take statistical significance levels at face value.
  • the programme decides which variable to include or exclude in the next step of the analysis, it actually computes the significance of the contribution of each variable under consideration. Therefore, by nature, the stepwise procedures will capitalize on chance because they "pick and choose" the variables to be included in the model so as to yield maximum discrimination.
  • the stepwise approach awareness that the significance levels do not reflect the true alpha error rate, that is, the probability of erroneously rejecting HO (the null hypothesis that there is no discrimination between groups) must be maintained
  • CCA Canonical Correlation Analysis
  • root is used to describe the individual discriminant functions (DFs).
  • the statistical significance of the derived DFs is tested by the ⁇ 2 test of successive DFs.
  • a report of the step-down test of all canonical roots is obtained containing the significance of all DFs followed by the second line which reports the significance of the remaining roots, after removing the first root, and so on.
  • the number of DFs to interpret is obtained.
  • Raw means that the coefficients can be used in conjunction with the observed data to compute (raw) discriminant function scores.
  • the standardized coefficients are the ones that are customarily used for interpretation, because they pertain to the standardized variables and therefore refer to comparable scales.
  • Table 19 summarises the results of this analysis.
  • the Wilks' ⁇ value of 0.4 indicates a relatively inefficient classification model.
  • the three derived DFs account for a total of 89.9% of the variance of the data set and the DFs are composed mainly of PCs 1 , 3 and 4.
  • the factor structure coefficients indicate that:
  • DF1 is composed of the variance explained by PC1 and to a lesser extent with PC4
  • Table 25 A summary of the classification of this model and its discriminative nature in relation to the PCs is shown in Table 25.
  • the classification matrix for the model is shown in Table 26. Table 26 suggests a good classification can be obtained for control and 6 day data with 80 and 83 % respectively of observations being classified correctly. However the overall classification power of the model is poor with only 48 % of all observations being correctly classified.
  • Table 24 summarises the results of this analysis.
  • the Wilks' ⁇ value of 0.45 indicates a relatively inefficient classification model.
  • the three derived DFs account for a total of 95 % of the variance of the data set and the DFs are composed mainly of PCs 1 , 3 and 5.
  • the factor structure coefficients indicate that:
  • DF1 is composed of the variance explained by the negative correlation between PC1 and PC5
  • PCs is shown in Table 25.
  • the classification matrix for the model is shown in Table 26.
  • Table 26 suggests a good classification can be obtained for control and 6 day data with 83 and 67 % respectively of observations being classified correctly. However the overall classification power of the model is poor with only 44 % of all observations being correctly classified, less than that using mean substituted variables.
  • Table 27 summarises the results of this analysis.
  • the Wilks' ⁇ value of 0.22 is an improvement on the PCA scores classification models.
  • the five derived DFs account for a total of 99 % of the variance of the data set and the DFs are composed of BXS, CRP, lactate, urea, temperature, creatinine, neutrophils, p ⁇ 2 and HCO 3 ' with the other clinical variable having no influence on the classification of observations.
  • the factor structure coefficients indicate that:
  • Table 31 A summary of the classification of this model and its discriminative nature in relation to the clinical variables is shown in Table 31.
  • the classification matrix for the model is shown in Table 32. Table 32 suggests a good classification can be obtained for control and 6 day data with 80 and 83 % respectively of observations being classified correctly. Days 1 , 2 and 5 are greatly improved compared to the PCA scores models but the overall classification power of the model is poor with 55 % of all observations being correctly classified.
  • Table 33 summarises the results of this analysis.
  • the Wilks' ⁇ value of 0.39 indicates a relatively inefficient classification model.
  • the two derived DFs account for a total of 71% of the variance of the data set and the DFs are composed mainly of PCs 1 , 5 and 5.
  • the factor structure coefficients indicate that:
  • Table 34 A summary of the classification of this model and its discriminative nature in relation to the PCs is shown in Table 34.
  • the classification matrix for the model is shown in Table 35. Table 35 suggests a reasonable classification can be obtained for control and 6 day data with 66% of observations being classified correctly in both groups. However the overall classification power of the model is poor with only 44 % of all observations being correctly classified. iv) DFA model based on flow cytometry data
  • Table 38 suggests a good classification can be obtained all groups.
  • the overall classification power of the model is impressive with 76.6% of all observations being correctly classified.
  • Table 40 shows an excellent classification can be obtained for all groups with a minimum correct assignment rate of 76%.
  • the overall classification power of the model is impressive with 86.9% of all observations being correctly classified.
  • PCA has highlights correlations between measured variables for all classes of patients. Many of the correlations are expected from a molecular biology standpoint. Some of the PCA models greatly reduced the dimensionality of the data set but the resulting scores did not spatially separate the groups of patients.
  • Table 18 Loadings for selected flow cytometry data for each PC. (associations with Eigenvalues >0.5 shown for 95% CL) Table 19. Eigenvalues of correlation matrix, and related statistics for RT-PCR data
  • Table 21 Eigenvalues of correlation matrix, and related statistics for combined clinical, RT- PCR and flow cytometry variables
  • CD31 (%) -0 .02 -0.03 0 42 0. 18 0.52 -0 .07 -0 .40 -0 .17 0. 03
  • CD54 (%) -0 .16 0.04 0 20 -0 .09 0.48 0 48 0. 06 0 18 -0 .19
  • CD62L (%) -0 .41 -0.09 0 20 0. 28 0.16 -0 .21 0. 01 -0 .01 0. 39
  • CD1 1 B (%) -0 .13 -0.16 0 22 0. 19 0.54 -0 .13 -0 .09 -0 .19 0. 05
  • CD69 (%) 0 49 -0.55 0 23 -0 .27 0.13 -0 .16 0. 00 0 40 -0 .25
  • Table 25 summary of variable association in the discriminative DFA model based on PCA scores of clinical data containing substituted mean values
  • Table 26 Summary of variable association in the discriminative DFA model based on PCA scores of clinical data containing substituted mean values
  • Table 28 Summary of variable association in the discriminative DFA model based on PCA scores of clinical data without substituted mean values
  • Table 29 Classification matrix of DFA model based on PCA scores of clinical data without substituted mean values
  • Table 39 contd. Summary of DFA model based on combined clinical, RT-PCR and flow cytometry variables Table 40. Classification matrix of DFA model based on combined clinical, RT-PCR and flow cytometry variables
  • Example 7 Binary logistic regression analysis to predict sepsis
  • a binary logistic regression model was used to analyse the RT-PCR, flow cytometry results and clinical data separately, from the ICU patients who went on to develop sepsis and presented positive microbiology results.
  • This model used results gained from an age matched group of ICU patients who were not diagnosed with sepsis as the control group. Although the model identified numerous possible predictors some appeared to be of limited use since the values obtained for the pre-symptomatic sepsis patients were within those obtained for the non sepsis patients.
  • the potential prediction markers that did yield some pre-sepsis data points that differed from the non sepsis data are listed in Table 36. However when combined, these prediction markers could only have identified 8 out of the 24 pre-sepsis patients.
  • Table 41 Summary of potential prediction markers identified by binary logistic regression analysis.
  • Example 8 Sepsis as a model for response to biological weapons
  • in vitro infection of whole blood may be used as a model and the activation marker expression and cytokine response measured.
  • Staphylococcus aureus infection was selected as a model infectious agent directly comparable with the in vivo hospital-acquired infection data.
  • FACS Dendritic cells CD54, CD97, CCR6, CCR7 NK cells: CD25, CD44, CD62L, CD69, CD97 Monocytes: CD44, CD54, CD62L, CD69, CD97, CD107a Neutrophils: CD44, CD62L, CD69, CD107a
  • IL-1 ⁇ IL-6, IL-8, IL-10, MCP-1 , TNF- ⁇ , sFasL
  • Each of these sets of input parameters (ie Dendritic cell markers, NK cell markers, monocyte markers, neutrophil markers, RT PCR data at 24h, RT PCR data at 48h) were used to train its own neural network model. Random selections of infected or non-infected blood samples were used for training (70%) or subsequent testing (30%). The testing phase of the neural network analysis gave a predictive accuracy based on the % of times it would correctly predict that the test set of input parameters was from an infected or non-infected sample. This testing of each set of input parameters was repeated 5 times. Each time the test was conducted a new neural network was constructed using a newly randomised 70% of the infected and non-infected samples. An average predictive accuracy was derived for each set of input parameters by working out the mean from the 5 predictive accuracies calculated from the 5 neural networks constructed on the 5 randomised sets of input data. The methodology was similar to that used in the sepsis patient study.
  • Figure 4 shows the data obtained from three subjects, which demonstrates the somewhat heterogeneous patterns of change in the profiles.
  • the algorithm achieved a good level of identification of infected sample over uninfected controls (Figure 5).
  • a custom human immune response array was designed homologous to the DSTL-designed murine immune function array with additional genes that had been identified from the previous sepsis study.
  • a total of 1438 genes were represented by a single 50-mer oligonucleotide designed by MWG Biotech.
  • the array contained 768 oligonucleotides from the MWG Biotech commercially available 'diverse function' genes to act as an inter-microarray slide control. Printing of the oligonucleotides was performed by MWG according to their array layout plan with the entire set of printed spots (2206) triplicated on each slide.
  • RNA isolation Messenger RNA was isolated from 27.5mls blood lysate (corresponding to 2.5mls of stabilised blood) using the mRNA Isolation Kit for Blood/Bone Marrow (Roche) following the manufacturers guidelines with a few minor changes (volumes for the 55ml lysate protocol were halved, centrifugation was for 3 minutes, washing of MGP beads was performed using 1 ml MGP washing buffer repeated 3 times and elution was into 20 ⁇ l of redistilled water). The entire mRNA preparation was treated with RNase free DNase from the DNA-free kit (Ambion Inc.) following the manufacturers guidelines. The final mRNA preparation was quantitated by
  • Microarray slides were prepared for hybridisation by attaching a GeneFrame® (MWG) over the oligo printed area according to the manufacturers instructions. Fragmented, labelled mRNA (1 1 ⁇ l) was denatured for 3 minutes at 95 5 C, snap-cooled on ice for 3 minutes and briefly centrifuged. 240 ⁇ l MWG hybridisation solution was added to the sample and mixed before applying to the microarray slide. The slide was covered with a plastic coverslip which attaches to the GeneFrame® and placed within a HC2 hybridisation cassette (CamLab). 500 ⁇ l water was added to each well of the cassette to prevent drying. The closed cassette was placed in a 42 5 C hybridisation oven for 16 hours.
  • MWG GeneFrame®
  • TIFF files from the Axon scanners were loaded into BlueFuse software (BlueGnome Ltd) and processed to 'fused' data following the manufacturers instructions.
  • the resultant data files were saved and subsequently analysed in GeneSpring software.
  • Each network was trained with a random 70% selection of balanced sepsis and control data using back propagation algorithms and then tested with the remaining 30% of the data. This process was then repeated, using a different 70% of randomised data, until a total of 5 repeats had been run. The predictive abilities of these 5 models were then averaged to give an overall predictive capability of the network. The most successful network was the one most capable of correctly classifying previously unseen patients as being from either the sepsis or non-sepsis control group.
  • Table 43 shows various sets of genes selected from the 22 most informative genes based on their individual scores. The sets were assigned in such a way as to attempt establish the relative importance of combinations of genes based on such factors as their individual scores (sets B and G representing the top and bottom ranked genes of the 22), whether or not genes with known immunological or inflammatory functions were included (set E with CD40 and IL-8 excluded, for instance) and the effect of larger or smaller sets.
  • Table 44 shows the ranked scores obtained following the neural network analysis
  • set B comprising the top ten-scoring genes based on their individual scores did not give the best overall predictive value.
  • the best predictive set, set F comprised set B together with two genes not known to have any connection with the immune or inflammatory response, CRX and MAPI A.
  • the values indicate that the inclusion of genes that could not have been predicted to be useful based on their known functions nevertheless resulted in improved predictive scores.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Hematology (AREA)
  • Biomedical Technology (AREA)
  • Urology & Nephrology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Physiology (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to a method for screening a biological sample to detect early stages of infection, SIRS or sepsis comprising the steps of detecting expression of a first set of informative biomarkers by RT-PCR and detecting expression of a second set of informative bio markers by RT-PCR; analysing the results of detection; classifying said sample according to the likelihood and/or timing of the development of overt infection.

Description

Early detection of sepsis
Background
Despite greatly improved diagnosis, treatment and support, serious infection and sepsis remain significant causes of death and often result in chronic ill-health or disability in those who survive acute episodes. Although sudden, overwhelming infection is comparatively rare amongst otherwise healthy adults, it constitutes an increased risk in immunocompromised individuals, seriously ill patients in intensive care, burns patients and young children. In a proportion of cases, an apparently treatable infection leads to the development of sepsis; a dysregulated, inappropriate response to infection characterised by progressive circulatory collapse leading to renal and respiratory failure, abnormalities in coagulation, profound and unresponsive hypotension and, in about 30% of cases death. The incidence of sepsis in the population of North America is about 0.3% of the population annually (about 750,000 cases) with mortality rising to 40% in the elderly and to 50% in cases of the most severe form, septic shock (Angus et al, 2001 , Crit Care Med 29: 1303-1310).
Following infection with infectious micro-organisms, the body reacts with a classical inflammatory response and activation of, first, the innate, non-specific immune response, followed by a specific, acquired immune response. In the case of bacterial infections, bacteraemia leads to the rapid (within 30-90 minutes) onset of pyrexia and release of inflammatory cytokines such as interleukin-1 (IL-1 ) and tumour necrosis factor-α (TNF-α) triggered by the detection of bacterial toxins, long before the development of a specific, antigen-driven immune response.
In Gram-negative bacteraemia due to infections such as typhoid, plague, tularaemia and brucellosis, or peritonitis from Gram-negative gut organisms such as Escherichia coli, Klebsiella, Proteus or Pseudomonas this is largely a response to lipopolysaccharide (LPS) and other components derived from bacterial cell walls. Circulating LPS and, in particular, its constituent lipid A, provokes a wide range of systemic reactions. It is probably contact with Kupffer cells in the liver that first leads to IL-1 release and the onset of pyrexia. Activation of circulating monocytes and macrophages leads to release of cytokines such as IL-6, IL-12, IL- 15, IL-18, TNF-α, macrophage migration inhibitory factor (MIF), and cytokine-like molecules such as high mobility group B1 (HMGB1 ), which, in turn activate neutrophils, lymphocytes and vascular endothelium, up-regulate cell adhesion molecules, and induce prostaglandins, nitric oxide synthase and acute-phase proteins. Release of platelet activating factor (PAF), prostaglandins, leukotrienes and thromboxane activates vascular endothelium, regulates vascular tone and activates the extrinsic coagulation cascade. Dysregulation of these responses results in the complications of sepsis and septic shock in terms of peripheral vasodilation leading to hypotension, and abnormal clotting and fibrinolysis producing thrombosis and intravascular coagulation (Cohen, 2002, Nature 420: 885-891 ).
LPS primarily acts on cells by binding to a serum LPS-binding protein (LBP) and CD14 expressed on monocytes and macrophages. On binding a complex of LPS and LBP, CD14 acts with a co-receptor, Toll-like receptor 4 (TLR-4) and a further component, MD-2, to form a signalling complex and initiate activation of macrophages and release of cytokines
(Palsson-McDermott & O'Neill, 2004, Immunology H3: 153-162). The Toll-like receptor family is a group of cell surface receptors involved in a range of bacterial and fungal ligands that act as triggers for innate immune system, including Gram-positive cell wall structures, flagellin, and CpG repeats characteristic of bacterial DNA.
In the case of infection with Gram-positive pathogens, septic shock is associated with the production of exotoxins. For instance, toxic shock syndrome, a particularly acute form of septic shock that often affects otherwise healthy individuals is due to infection with particular strain of Staphylococcus aureus, which produces an exotoxin known as toxic shock syndrome toxin-1 (TSST-1 ). A similar syndrome is caused by invasive infection with certain group A Streptococcus pyogenes strains, and is often associated with streptococcal pyogenic enterotoxin A (SPE-A). Some Gram-positive exotoxins (including TSST-1 )are thought to exert their effects predominantly as a result of their superantigen properties. Superantigens are able to non-specifically stimulate T lymphocytes by cross-linking MHC Class Il molecules on antigen presenting cells to certain classes of T cell receptors. Usually, T cell receptor (TCR) -Major Histocompatibility Complex (MHC) interactions are highly specific, with only T cells carrying TCRs that specifically recognise short antigen-derived peptides presented by the MHC able to bind and be activated, ensuring an antigen-specific T cell response. Superantigens bypass this mechanism resulting in massive and inappropriate activation of T cells. However, SPE-A is not an efficient superantigen and some further mechanism must be implicated.
It should be noted that clinical sepsis may also result from infection with some viruses (for example Venezuelan Equine Encephalitis Virus, VEEV) and fungi, and that other mechanisms are likely to be involved in such cases.
The ability to detect potentially serious infections as early as possible and, especially, to predict the onset of sepsis in susceptible individuals is clearly advantageous. A considerable effort has been expended over many years in attempts to establish clear criteria defining clinical entities such as shock, sepsis, septic shock, toxic shock and systemic inflammatory response syndrome (SIRS). Similarly, many attempts have been made to design robust predictive models based on measuring a range of clinical, chemical, biochemical, immunological and cytometic parameters and a number of scoring systems, of varying prognostic success and sophistication, proposed.
According to the 1991 Consensus Conference of the American College of Chest Physicians (ACCP) and Society of Critical Care Medicine (SCCM) "SIRS" is considered to be present when patients have more than one of the following: a body temperature of greater than 385C or less than 365C, a heart rate of greater than 90/min, hyperventilation involving a respiratory rate higher than 20/min or PaCO2 lower than 32mm Hg, a white blood cell count of greater than 12000 cells /μl or less than 4000 cells /μl (Bone et al, 1992, Crit Care Med 20: 864- 874).
"Sepsis" has been defined as SIRS caused by infection. It is accepted that SIRS can occur in the absence of infection in, for example, burns, pancreatitis and other disease states. "Infection" was defined as a pathological process caused by invasion of a normally sterile tissue, fluid or body cavity by pathogenic or potentially pathogenic micro-organisms.
"Severe sepsis" was defined as sepsis complicated by organ dysfunction, itself defined by Marshall et al (1995, Crit Care Med 23: 1638-1652) or the Sequential Organ Failure Assessment (SOFA) score (Ferreira et al, 2002, JAMA 286: 1754-1758).
"Septic shock" refers (in adults) to sepsis plus a state of acute circulatory failure characterised by a persistent arterial hypotension unexplained by other causes.
In order to evaluate the seriousness of sepsis in intensive care patients and to allow rational treatment planning, a large number of clinical severity models have been developed for sepsis, or adapted from more general models. The first generally accepted system was the Acute Physiology and Chronic Health Evaluation score (APACHE, and its refinements APACHE Il and III) (Knaus et al, 1985, Crit Care Med 13: 818-829; Knaus et al, 1991 , Chest 100: 1619-1636), with the Mortality Prediction Model (MPM) (Lemeshow ef a/ ,1993, JAMA 270: 2957-2963) and the Simplified Acute Physiology (SAPS) score (Le Gall et al, 1984, Crit Care Med 12: 975-977) also being widely used general predictive models. For more severe conditions, including sepsis, more specialised models such as the Multiple Organ Dysfunction Score (MODS) (Marshall et al, 1995, Crit Care Med 23: 1638-1652), the Sequential Organ Failure Assessment (SOFA) score (Ferreira et al, 2002, JAMA 286: 1754- 1758) and the Logistical Organ Dysfunction Score (LODS) (Le Gall et al, 1996, JAMA 276: 802-810) were developed. More recently, a specific model, PIRO (Levy et al, 2003, Intensive Care Med 29: 530-538), has been proposed. All of these models use a combination of a wide range of general and specific clinical measures to attempt to derive a useful score reflecting the seriousness of the patient's condition and likely outcome.
In addition to the standard predictive models described above, the correlation of sepsis and a number of specific serum markers has been extensively studied with a view to developing specific diagnostic and prognostic tests, amongst which are the following.
C-reactive protein (CRP) is a liver-derived serum acute phase protein that is well-known as non-specific marker of inflammation. More recently (Toh et al, 2003, Intensive Care Med 29: 55-61 ) a calcium dependent complex of CRP and very low density lipoprotein (VLDL), known as lipoprotein complexed C-reactive protein (LCCRP), has been shown to be involved in affecting the coagulation mechanism during sepsis. In particular, a common test known as the activated partial thromboplastin time develops a particular profile in cases of sepsis, and this has been proposed as the basis for a rapid diagnostic test.
TNF-α and IL-1 are archetypal acute inflammatory cytokines long known to be elevated in sepsis (Damas et al, 1989, Critical Care Med ,17 : 975-978) and have reported to be useful predictors of organ failure in adult respiratory distress syndrome, a serious complication of sepsis (Meduni et al, 1995, Chest 107: 1062-1073)
Activated complement product C3 (C3a) and IL-6 have been proposed as useful indicators of host response to microbial invasion, and superior to pyrexia and white blood cell counts (Groeneveld et al, 2001 , Clin Diagn Lab Immunol 8: 1 189-1 195). Secretary phospholipase A2 was found to be a less reliable marker in the same study.
Procalcitonin is the propeptide precursor of calcitonin, serum concentrations of which are known to rise in response to LPS and correlate with IL-6 and TNF-α levels. Its use as a predictor of sepsis has been evaluated (Al-Nawas et al, 1996, Eur J Med Res 1: 331-333). Using a threshold of 0.1 ng /ml, it correctly identified 39% of sepsis patients. However, other reports suggest that it is less reliable than the use of serial CRP measurements (Neely et al, 2004, J Burn Care Rehab 25: 76-80), although superior to IL-6 or IL-8 (Harbarth et al, Am J Resp Crit Care Med 164: 396^102). Changes in neutrophil surface expression of leukocyte activation markers (such as CD1 1 b, CD31 , CD35, L-selectin, CD16) have been used as a marker of SIRS and have been found to correlate with IL-6 and subsequent development of organ failure (Rosenbloom et al, 1995, JAMA 274: 58-65). Similarly, expression of platelet surface antigens such as CD63, CD62P, CD36 and CD31 have been examined, but no reliable predictive model constructed.
Finally, it has been shown that downregulation of monocyte HLA-DR expression is a predictor of a poor outcome in sepsis and may be an indication of monocyte deactivation, impairing TNF-α production. Treatment with IFN-γ has been shown to be beneficial in such cases (Docke et al, 1997, Nature Med 3: 678-681 ).
However, although many of these markers correlate with sepsis and some give an indication of the seriousness of the condition, no single marker or combination markers has yet been shown to be a reliable diagnostic test, much less a predictor of the development of sepsis. The 2001 International Sepsis Definition Conference concluded that "the use of biomarkers for diagnosing sepsis is premature" (Levy et al, 2003, Intensive Care Med 29: 530-538).
Extracting reliable diagnostic patterns and robust prognostic indications from changes over time in complex sets of variables including traditional clinical observations, clinical chemistry, biochemical, immunological and cytometric data requires sophisticated methods of analysis. The use of expert systems and artificial intelligence, including neural networks, for medical diagnostic applications has been being developed for some time (Place et al, 1995, Clinical Biochemistry 28: 373-389; Lisboa, 2002, Neural Networks 15: 1 1-39). Specific systems have been developed in attempts to predict survival of sepsis patients (Flanagan et al, 1996, Clinical Performance & Quality Health Care 4: 96-103) by use of multiple logistic regression and neural network models using APACHE scores and the 1991 ACCP/SCCM SIRS criteria described above (Bone et al, 1992, Crit Care Med 20: 864-874). Such studies suggest that, although both approaches can give good predictive results, neural network systems are less sensitive to preselected threshold values (results of a number of studies reviewed by Rosenberg, 2002, Curr Opin Crit Care 8:321-330). Brause et al (2004, Journal fur
Anasthesie und Intensivbehandlung JJ.: 40-43) provides an example of a neural network model being used for sepsis prediction. This model (MEDAN) analysed a range of standard clinical measure and compared its results with those obtained by using the APACHE II, SOFA, SAPS II, and MODS models. The study concluded that, of the markers available, the most informative were systolic and diastolic blood pressure, and platelet count. Neural networks are non-linear functions that are capable of identifying patterns in complex data systems. This is achieved by using a number of mathematical functions that make it possible for the network to identify structure within a noisy data set. This is because data from a system may produce patterns based upon the relationships between the variables within the data. If a neural network sees sufficient examples of such data points during a period known as "training", it is capable of "learning" this structure and then identifying these patterns in future data points or test data. In this way, neural networks are able to predict or classify future examples by modelling the patterns present within the data it has seen. The performance of the network is then assessed by its ability to correctly predict or classify test data, with high accuracy scores, indicating the network has successfully identified true patterns within the data. The parallel processing ability of neural networks is dependent on the architecture of its processing elements, which are arranged to interact according to the model of biological neurones. One or more inputs are regulated by the connection weights to change the stimulation level within the processing element. The output of the processing element is related to its activation level and this output may be non-linear or discontinuous. Training of a neural network therefore comprises an adjustment of interconnected weights depending on the transfer function of the elements, the details of the interconnected structure and the rules of learning that the system follows (Place et al, 1995, Clinical Biochemistry 28: 373-389). Such systems have been applied to a number of clinical situations, including health outcomes models of trauma patients (Marble & Healy (1999) Art lntell Med 15: 299- 307).
Warner et a/ (1996, Ann Clin Lab Sci 26: 471-479) describe a multiparametric model for predicting the outcome of sepsis, using measures of 'septic shock factor' (which appears to be simply whether the patients have signs of septic shock on admission), IL-6, soluble II-6 receptor (as measured by enzyme-linked immunosorbent assay) and the APACHE Il score as components of a four-input algorithm in a multi-layer, feed-forward neural network model. However, this system is not predictive for individuals who do not yet have clinical signs and, arguably, by the time serum levels of cytokines such as IL-6 are raised, the diagnosis, if not the outcome, is clinically obvious.
Dybowski et a/ (1996, Lancet 347: 1 146-1 150) use Classification and Regression Trees (CART) to select inputs from 157 possible sepsis prediction criteria and then use a neural network running a genetic algorithm to select the best combination of predictive markers. These include many routine clinical values and proxy indicators rather than serum or cell surface biomarkers. However, the problem being addressed is the prognosis of patients who already have a clear diagnosis of sepsis and are already critically ill. A further refinement of the genetic algorithm approach involves the use of Artificial Immune Systems, of which one version is the Artificial Immune Recognition System (AIRS) (Timmis et al, An overview of Artificial Immune Systems. In: Paton, Bolouri, Holcombe, Parish and Tateson (eds.) "Computation in Cells and Tissues: Perspectives and Tools for Thought", Natural Computation Series, pp51-86, Springer, 2004; Timmis (L.N. De Castro and J, Timmis. Artifical Immune Systems: A New Computational Intelligence Approach. Springer- Verlag, 2002).which are adaptive systems inspired by the clonal selection and affinity maturation processes of biological immune systems as applied to artificial intelligence.
Immunologically speaking, AIRS is inspired by the clonal selection theory of the immune system (F. Burnett. The Clonal SelectionTheory of Acquired Immunity. Cambridge University Press, 1959). The clonal selection theory attempts to explain that how, through a process of matching, cloning, mutation and selection, anti-bodies are created that are capable of identifying infectious agents. AIRS capitalises on this process, and through a process of matching, cloning and mutation, evolves a set of memory detectors that are capable of being used as classifiers for unseen data items. Unlike other immune inspired approaches, such as negative selection, AIRS is specifically designed for use in classification, more specifically one-shot supervised learning.
US patent application 2002/0052557 describes a method of predicting the onset of a number of catastrophic illnesses based on the variability of the heart-rate of the patient. Again, a neural network is among the possible methods of modelling and analysing the data.
International patent application WO 00/52472 describes a rapid assay method for use in small children based on the serum or neutrophil surface levels of CD1 1 b or 'CD1 1 b complex' (Mac-1 , CR3). The method uses only a single marker, and one which is, arguably, a well- known marker of neutrophil activation in response to inflammation. The alternative approach to analysing such complex data sets where the data are often qualitative and discrete, rather than quantitative and continuous, is to use sophisticated statistical analysis techniques such as logistic regression. Where logistic regression using qualitative binary dependent variables is insufficiently discriminating in terms of selecting significant variables, multivariate techniques may be used. The outputs from both multiple logistic regression models and neural networks are continuously variable quantities but the likelihoods calculated by neural network models usually fall at one extreme or the other, with few values in the middle range. In a clinical situation this is often helpful and can give clearer decisions (Flanagan et al, 1996, Clinical Performance & Quality Health Care 4: 96-103). The ability to detect the earliest signs of infection and / or sepsis has clear benefits in terms of allowing treatment as soon as possible. Indications of the severity of the condition and likely outcome if untreated inform decisions about treatment options. This is relevant both in vulnerable hospital populations, such as those in intensive care, or who are burned or immunocompromised, and in other groups in which there is an increased risk of serious infection and subsequent sepsis. The use or suspected use of biological weapons in both battlefield and civilian settings is an example where a rapid and reliable means of testing for the earliest signs of infection in individuals exposed would be advantageous.
The applicant's earlier International application, WO 2006/061644 (incorporated by reference), discloses a method for detecting early signs of infection based on measurement of expression levels of particular combinations of cytokines and/or cellular activation markers. Expression was measured by either cell surface expression as detected by FACS, or at a transcriptional level by RT-PCR, optionally combined with the use of predictive algorithms.
Despite the greater knowledge of both the molecular basis of, and physiological response to, sepsis a need remains for a method of predicting sepsis as early as possible in the course of an infection, preferably during the therapeutic window of intervention, prior to the onset of clinical symptoms and disease. It is an object of the invention to identify novel markers and combinations of biomarkers, preferably useful for screening by means of micro-array technology. The approach of the prior art described above may be characterised as the selection of genes known to be in some way related to the processes of the inflammatory or immunological response to infection and testing their usefulness in various types of assay. This is logical but presupposes that the processes involved in the earliest stages of infection are well-characterised and that the earliest genes to be activated are known. It also fails to consider the possibility of informative epiphenomena, that is, genes that are activated incidentally, or as part of a parallel or peripheral response. An alternative approach is to screen a wide range of potentially expressed (and, in some cases, apparently completely unconnected) gene sequences to identify those which, despite this, are nevertheless useful predictors of infection, either alone or in combination.
CD40 is a TNF-receptor superfamily member expressed on T and B lymphocytes, among other cells, and is required for a wide variety of immune and inflammatory responses, in particular B cell immunoglobulin production and isotype switching, and development of memory B cells (Grewel & Flavell, 1998, Annu Rev Immunol 16: 1 1 1 ). Its ligand is another leukocyte cell surface molecule, CD154. Two alternately spliced isoforms are known, the longer isoform (1 ) being encoded by transcript variant 1 (NCBI accession number NM 001250, SEQ ID N0:1 ).
CD5 is also a cell surface receptor expressed on T and B lymphocytes where it interacts with its ligand CD72 and has a role in modulating the immune response (Berland & Wortis, 2002, 20: 253). The cDNA sequence encoding human CD5 has the NCBI accession number NM 014207 (SEQ ID NO:2).
CD79A, previously known as MB-1 or Ig-α, is part of the B cell antigen receptor complex together with another similar molecule, CD79B (B29 or Ig-β), and the surface immunoglobulin chains. CD79A and B are involved in signal transduction and B cell surface immunoglobulin expression Jumaa et al, 2005, Annu Rev Immunol 23: 415). There two known transcript variants, and the longer transcript sequence is listed at NCBI accession number NM 001783 (SEQ ID NO:3).
CRX is the gene for cone-rod homeobox, a homeodomain transcription factor that controls differentiation in photoreceptor cells and is required for normal cone and rod cell function. Mutations in this gene are associated with photoreceptor degeneration (Leber congenital amaurosis type III and autosomal dominant cone-rod dystrophy 2, but no immunological functions are known (Chen et al, 2002, Human Molecular Genetics, JJ.: 873). The cDNA sequence is available at NM 000554 (SEQ ID NO:4).
CTNND1 is the gene encoding catenin (cadherin-associated protein) delta-1 , a member of the armadillo family of proteins (previously known as p120 cas and p120 catenin). It is one of a number of proteins (others being β-catenin and plakaglobin) that bind to the cytoplasmic region of cadherins, modulating cell adhesion and linking cadherins to the cytoskeleton (Franze & Ridley, 2004, J Biol Chem 279: 6588). Such molecules may also have a role in signal transduction through rho family GTPases. The cDNA sequence is available at NM 001331 (SEQ ID NO:5).
CX3CL1 encodes chemokine (C-X3-C motif) ligand 1 , an unusual chemokine (previously known as fractalkine) characterised by the unique spacing of the first 2 cysteines in its chemokine cysteine motif and its dual role as a chemoattractant and cell adhesion molecule involved in the inflammatory response. It is expressed as a cell surface molecule but a soluble from is generated by juxtamembrane proteolytic cleavage (Umehara et al, 2004, Arterioscler Thromb Vase Biol 24: 34). The cDNA sequence is available at NM 002996 (SEQ ID NO:6).
ENTPD2 is the gene for ectonucleoside triphosphate diphosphohydrolase 2 (otherwise known as CD39L or NTPDase-2). ENTPD5 is the related ectonucleoside triphosphate diphosphohydrolase 5 (CD39L4 or NTPDase-5). These molecules are cell surface ATP- hydrolyzing enzymes responsible for the breakdown of extracellular nucleotides, thus regulating a complex system of cell signalling via large families of purine and pyrimidine receptors. ENTPD2 exists in a number of splice variants, which may have distinct functions (Wang et al, 2005, Biochem J 385: 729). A long isoform is encoded by the cDNA sequence of NM 203468 (SEQ ID NO:7). NM 001246 encodes a shorter isoform with a truncated C- terminus. The ENTPD5 sequence is available at NM 001249 (SEQ ID NO:8).
EPHA8 is a gene encoding the ephrin A8 receptor, a member of the ephrin receptor subfamily of receptor tyrosine kinases. The ephrin A8 receptor functions as a receptor for ephrin A2, A3 and A5 and is involved in short-range con tact- mediated axonal guidance during development of the nervous system (Gu et al, 2005, Oncogene 24: 4243). There is a splice variant shortened at the C-terminus (not yet detected at the protein level) but the longer isoform is encoded by the sequence of NM 020526 (SEQ ID NO:9)
GPR44 encodes G protein-coupled receptor 44, more widely known as chemoattractant receptor-homologous molecule expressed on Th2 cells (CRTH2). This the prostaglandin D2 (PGD2) receptor responsible for mediating the inflammatory effect of PGD2 on a variety of leukocytes and other cells (Hata et al, 2005, J Biol Chem 280: 32442). It is implicated in the skewing of the T cell response to a Th2 pattern during sepsis and low levels of expression of CRTH2 are associated with a poor outcome (Venet et al, 2004, Clin Immunol Hβ_: 278). The sequence is available at NM 004778 (SEQ ID NO:10).
HDAC5 is histone deacetylase 5, a class Il histone deacetylase that represses transcription when tethered to a promoter. Histone acetylation/deacetylation alters chromatin structure and is a major factor controlling gene expression. HDAC5 is thought to interact with MEF2 family proteins and may play a role in myogenesis (Zhang et al, 2002, MoI Cell Biol 22: 7302). There are two known isoforms encoded by two splice variants. NM 001015053 relates to the longer transcript (SEQ ID NO:.1 1 ).
HMMR is the gene for hyaluronan-mediated motility receptor (RHAMM). RHAMM is thought to be involved in invasion and metastasis of tumour cells. Although widely expressed on tumour cells, in normal tissue its expression is limited to testis, placenta and thymus. There is a truncated splice variant lacking an internal segment. NM 012484 represents the longest transcript (SEQ ID NO:12).
IL-8 is very widely known as a member of the CXC family of chemokines and is a prime mediator of the inflammatory response, being a potent chemotactic and angiogenic factor. It has been reported to be a relatively poor predictor of sepsis (Harbarth et al, Am J Resp Crit Care Med 164: 396). The sequence is available at NM 000584 (SEQ ID NO:13).
MAPI A encodes microtubule-associated protein 1 A, a member of a family of microtubule- associated proteins involved in microtubule assembly. MAPI A is expressed predominantly in the brain. The functional protein comprises light and heavy chains resulting from proteolytic processing of a single propeptide encoded by the sequence of NM 002373 (SEQ ID NO:14).
MAPK7 is the gene encoding mitogen-activated protein kinase 7 (MAP kinase 7 or ERK5). The MAP kinases occupy a central role in the intracellular signalling cascades from a number of receptor tyrosine kinases and G protein-coupled receptors but MAPK7 differs from the others in that it has not only protein kinase activity but also is also capable of translocating to the nucleus where it appears to be able to phosphorylate and activate transcription factors directly (Buschbeck & Ullrich, 2005, J Biol Chem 280: 2659). Four alternative transcripts encoding two distinct isoforms have been reported. The longest transcript is represented by the sequence of NM 002749 (SEQ ID NO:15).
MEF2D is the gene for MADS box transcription enhancer factor 2, polypeptide D (myocyte enhancer factor 2D). Originally described as a muscle-specific transcription factor, MEF2 is now known to exist as four alternatively spliced isoforms (A-D) that are differentially expressed in a range of tissues (Zhu et al, J Biol Chem, 2005, 280: 28749). MEF2D appears to be involved in leukocyte activation and chromosomal translocations resulting in MEF2D fusion proteins contribute to the development of some acute lymphoblastic leukaemias (Prima et al, 2005, Leukemia 19: 806). The MEF2D sequence is available as NM 005920 (SEQ ID NO:16).
ODF1 is outer dense fibre of sperm tails 1 and encodes the major protein of the outer dense fibre layer surrounding the axoneme of sperm tails. Defects in the outer dense fibres lead to abnormal sperm morphology and infertility. There is no known connection with genes involved with the inflammatory response. The sequence is available as NM 022410 (SEQ ID NO: 17).
SAA3P denotes the serum amyloid A3 pseudogene. The serum amyloid A (SAA) superfamily consists of two acute phase genes, SAA1 and SAA2 and a constitutively expressed gene, SAA4. SAA3P appears to be non-expressed pseudogene. The predicted open reading frame contains an insertion causing a frameshift, which generates a premature stop codon. The resultant hypothetical protein has been expressed. The genomic sequence is available as NG 002634 (SEQ ID NO: 18).
SLC6A9 is solute carrier family 6 (neurotransmitter transporter, glycine) member 6 (GLYT1 ). A member of a large superfamily of transporter proteins, SLC6A9 is a sodium :glycine symporter, which may be involved in inhibitory glycinergic neurotransmission. There are a number of splice variants encoding three known isoforms. The longest transcript (giving rise to isoform 2) is available as NM 201649 (SEQ ID NO:19).
SPN is the gene for CD43 (leukosialin, sialophorin). Leukosialin is a major sialoglycoprotein of most leukocytes. It appears to play a part in modulating cell-cell interactions, including T cell activation (Daniels et al , 2002, Nature Immunol 3: 903). The cDNA sequence is available at NM 0031 14 (SEQ ID NO: 20).
TDGF1 is teratocarcinoma-derived growth factor 1 (previously known as Cripto). It is a cell surface, glycosyl phosphatidylinositol (GPI) -anchored molecule, a member of the EGF-CFC family of growth factor-like molecules (Shen, 2003, J Clin Invest 1 12: 500). It is over- expressed in a wide range of carcinomas but is not known to have a role in inflammation or the immune response. The cDNA sequence is at NM 003212 (SEQ ID NO: 21 ).
TSC22D1 is TSC22 domain family member 1. It is the founding member of the TSC22 family of early response gene transcription factors and is particularly involved in the TGF-β signalling pathway (and was formerly known as TGF-β1 -induced transcript 4 - TGFB1 14) (Gupta et al , 2003, J Biol Chem 278: 7331 ). The accession number is NM 006022 (SEQ ID NO: 22).
There remains a need for further improvements in the early detection and diagnosis of sepsis including consideration genes not obviously connected with inflammation and immunity. Statement of Invention
In a first aspect, the invention describes a system and methods of detecting early signs of infection, SIRS or sepsis several days before clinical signs become apparent. It also provides methods capable of predicting the timing of the clinical course of the condition. The system comprises analysing the results of one or more sets of tests based on biological samples, preferably blood samples. Optionally, other routine clinical measurements may be included for analysis.
The method comprises determining the level of expression of biomarkers shown herein to be positively correlated to developing sepsis.
First, there is a diverse group of genes, many of which have no established connection to the development of acute inflammation or the initiation of an immune response, including CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HDAC5, HMMR, MAPI A, MAPK7, MEF2D, ODF-1 SAA3P, SLC6A9, TDGF1 and TSC22D1. Measuring the expression levels of these genes in combination surprisingly provides early predictive and prognostic information as to the likelihood of sepsis developing in an individual exposed to infective agents.
A further group of biomarkers are chemokines or cytokines expressed in blood leukocytes, or are leukocyte surface receptors with an established role in immune function. This group includes CD178 (FAS-L), MCP- 1 (monocyte chemotactic protein-1 ), TNF-α, IL-1 β, IL-6, IL-8, IL-10, INF-α, INF-γ, CD5, CD79A. CD178 is encoded and expressed as a type-ll membrane protein, but may be considered as a cytokine since it is cleaved by a metalloprotease to release a soluble homotrimer, soluble FasL or sFasL.
In a preferred embodiment expression of biomarkers is by specific amplification of mRNA by reverse transcription polymerase chain reaction (RT-PCR).
The method involves screening a biological sample to detect early stages of infection, SIRS or sepsis comprising the steps of: a. detecting expression of a first set of informative biomarkers by RT-PCR and detecting expression of a second set of informative biomarkers by RT-PCR; b. analysing the results of detection; and c. classifying said sample according to the likelihood and/or timing of the development of overt infection wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 1 selected from the list consisting of CD40, CD5, CD79A, CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HMMR, IL-8, MAPI A, MAPK7, MEF2D, ODF1 , SAA3P, SLC6A9, SPN, TDGF1 , TSC22D1 and HDAC5 and wherein the second set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 1 selected from the list consisting of CD178, MCP-1 , TNFα, IL-1 β, IL-6, IL-10, INF-α, INF-γ.
Preferably, the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 2, more preferably 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12 , 13, 14, 15, 16, 17, 18, 19, 20, 21 , or 22 selected from the list. It will be understood that, in general, the greater the number of markers, the greater the accuracy of the prediction. Set against this is the greater complexity and time taken for the analysis.
A preferred selection of markers is SAAP3, MAPI A, EPHA8, CD40, IL-8, CRX, SPN, MEF2D, MAPK7, HMMR, ENTPD2 and TSC22D1.
Further selections of markers providing high levels of prediction of onset of sepsis are as follows: a. SPN,CD40, SAAP3, IL-8, MEF2D, EPHA8, MAPI A and CRX. b. MEF2D, ENTPD2, CD40, EPHA8, SAA3P, HMMR, IL8, MAPK7, SPN and TSC22D1 c. SPN, EPHA8, CD40, CRX, TSC22D1 , MAPK7, HMMR, MEF2D and ENTPD2 d. HMMR, SAAP3, ENTPD2, MAPI A, MAPK7, TSC22D1 and IL-8. e. TDGF1 , HDAC5, ENTPD5, ODF1 , GPR44, CD5, CD79A, CTNND1 , CX3CL1 and SLC6A9 f. IL-8, SAAP3, CD40, SPN, MEF2D, EPHA8, MAPK7, ENTPD2 and HMMR g. MEF2D, ENTPD2, EPHA8, SAA3P, HMMR, MAPK7, SPN and TSC22D1 h. CD40, ENTPD2, EPHA8, IL-8, SAAP3, MEF2D and SPN
Preferably, at least two biomarkers from the second set of informative biomarkers are detected. Alternatively, 3, 4, 5, 6, 7 or 8 biomarkers from the second set of biomarkers are detected in combination with at least one biomarker from the list consisting of CD40, CD5, CD79A, CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HMMR, IL-8, MAPI A, MAPK7, MEF2D, ODF1 , SAA3P, SLC6A9, SPN, TDGF1 , TSC22D1 and HDAC5. In a preferred embodiment, analysis of the results yields a prediction of a probability of clinical SIRS or sepsis developing. Alternatively the analysis may be expressed as a binary yes/no prediction of clinical SIRS or sepsis developing.
In one alternative, if the development of clinical SIRS or sepsis is predicted, the results are subjected to a second analysis to determine the likely timing and/or severity of the clinical disease.
The results of one or more sets of tests are analysed, preferably by means of a neural network program enabling a yes /no prediction of the patient from whom the sample was taken developing sepsis to be calculated.
In the case of a positive prediction, preferably a further analysis is then performed allowing an estimate to be made as to the time to onset of overt clinical signs and symptoms. Alternatively, both analyses may be performed by multivariate logistic regression.
Analysis of the test groups can be performed individually or simultaneously. In one alternative method, further clinical data are entered into the neural net as supplementary data to the PCR data. At the same time flow cytometry data can be processed by the neural network. Only one set of data is required for processing through the neural net although there are advantages in inputting one, two or all three data sets as these additional examples help "train" the neural net and improve confidence in the output from the program.
In a further aspect the neural network is used to process pre-recorded clinical data or a database of such data may be used to train the neural network and improve its predictive power.
Suitable clinical data include at least one, preferably at least three, more preferably at least five selected from the list consisting of temperature, heart rate, total and differential white blood cell count (monocytes, lymphocytes, granulocytes, neutrophils), platelet count, serum creatinine, urea, lactate, base excess, pθ2, HCO3 ', and C-reactive protein
The method may be used as part of routine monitoring for intensive care patients, where regular blood samples are taken for other purposes. Other hospital patients who may be predisposed to infections and/or sepsis may also be monitored. Such predisposing conditions include inherited or acquired immunodeficiencies (including HIV/AIDS) or immunosuppression (such as general surgery patients, transplant recipients or patients receiving steroid treatment), diabetes, lymphoma, leukaemia or other malignancy, penetrating or contaminated trauma, burns or peritonitis. In another aspect, the method of the invention may be used to screen individuals during an outbreak of infectious disease or alternatively individuals who have been, or who are suspected of having been, exposed to infectious pathogens, whether accidentally or deliberately as the result of bioterrorism or of use of a biological weapon during an armed conflict.
In one alternative embodiment, this is expressed as a probability. In an alternative embodiment it is expressed as a binary yes/no result. Optionally, where the first analysis suggests that SIRS or sepsis are probable (as defined as exceeding a predetermined arbitrary threshold probability, or a 'yes1 prediction, said results are subjected to a second analysis to determine the likely time to development of overt clinical signs, or to give an indication of probable severity of the clinical disease.
In one highly favoured embodiment, said analysis is by means of a neural network. Most preferably it is a multilayered perceptron neural network Preferably such a neural network is capable of correctly predicting SIRS or sepsis in greater than 70% of cases (determined in trials where such development is not prevented by prophylactic treatment in a control group), more preferably in at least 80% of cases, even more preferably in at least 85% of cases and most preferably in at least 95% of cases. It is preferred that SIRS or sepsis is can be predicted at least one day before the onset of overt clinical signs, more preferably, at least two days, still more preferably at least three days and most preferably more than three days before SIRS or sepsis is diagnosed.
In another favoured embodiment analysis is by means of multivariate statistical analysis, preferably comprising principle component analysis and/or discriminant function analysis. It is more preferred that the multivariate statistical analysis comprises discriminate function analysis.
In a further aspect, the invention provides a system for screening a biological sample to detect early stages of infection, SIRS or sepsis comprising: a means of extracting and purifying RNA from cells obtained from said sample, a thermal cycler or other means to amplify selected RNA sequences by means of reverse transcription polymerase chain reaction (RT-PCR), a means of detecting and quantifying the results of said RT-PCR, a computer-based neural network trained so as to be able to analyse such results and a display means whereby the conclusion of the neural network analysis may be communicated to an operator Note: in this aspect the results of the RT-PCR may be analysed using discriminate function analysis, but the neural network is the preferred embodiment.
In a further aspect the invention provides analysis according to any of embodiment of the method described above for the preparation of a diagnostic means for the diagnosis of SIRS, sepsis or infection, or the use of the system described above for the preparation of a diagnostic means for the diagnosis of infection.
Also provided is a method of early diagnosis of SIRS, infection or sepsis according to the method as described above.
Where standard clinical measurement are analysed these are consist of at least one, preferably at least three, more preferably at least five selected from the list consisting of temperature, heart rate, total and differential white blood cell count (monocytes, lymphocytes, granulocytes, neutrophils), platelet count, serum creatinine, urea, lactate, base excess, pθ2, HCO3 ", and C-reactive protein
Throughout the description and claims of this specification, the words "comprise" and "contain" and variations of the words, for example "comprising" and "comprises", means "including but not limited to", and is not intended to (and does not) exclude other moieties, additives, components, integers or steps.
Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith.
Detailed description of the invention
The invention will be described in further detail with reference to the following Figures and Examples Figure 1 : Following infection, cells of the immune system recognise and respond to a pathogen by becoming activated. This results in the production of different messenger proteins (e.g. cytokines and chemokines) and expression of activation markers and adhesion molecules on the cell surface. The production of these facilitates communication between cells and results in a co-ordinated immune response against a particular agent. Since this inflammatory immune response is relatively constant in response to infection, and occurs in the very earliest stages of the disease process, monitoring changes in the expression of such markers can be used to predict the early stages of sepsis development. Ideally this is done during the therapeutic window of intervention, prior to the onset of clinical symptoms and disease.
Figure 2: A plot of the CD31 expression measured on granulocytes by flow cytometry.
From blood samples taken from patients three days before diagnosis of sepsis (n=6), and in ICU patients who did not go on to develop sepsis (n=24). Each symbol represents a measurement from one patient.
Figure 3: Design of neural network analysing clinical data according to Table 4, model
2. WCC= white cell count, CRP= C-reactive protein.
Figure 4: Change in cytokine profile obtained following in vitro blood infection with S. aureus. Data from blood taken from three volunteers as detailed in Example 8.
Figure 5: Results of neural network analysis of S. aureus in vitro sepsis model.
Example 1 : Prediction of sepsis by neural network analysis of cytokine expression, cell surface markers and clinical measures.
Study design and patients
The study into the onset of sepsis from the ICU department of Queen Alexandra hospital resulted in a cohort of ninety-one patients (Dstl/CR08631 ). Blood samples were collected daily from these patients throughout their stay in the ICU and in total, twenty-four patients were diagnosed as developing sepsis. Samples taken on the day clinical sepsis was diagnosed (Day 0), back through to six days prior to sepsis diagnosis (Day -6) were analysed by RT-PCR and flow cytometry for the expression of activation markers and cytokine mRNA respectively. In addition, standard hospital data and clinical observations were recorded. Samples from control patients were also processed in the same manner to provide data for traditional statistical analysis.
RT-PCR was performed according to commonly-used laboratory techniques. Briefly, in the case of a blood sample, whole blood was taken and cells then lysed in the presence of an RNA stabilising reagent. RNA was separated by affinity binding of beads, which were isolated by centrifugation (or magnetically, as appropriate), contaminating DNA removed by DNase digestion and the RNA subjected to RT-PCR.
Fluorescence activated cell sorting (FACS) flow cytometry is very well-known in the art and any standard technique may be used.
Data analysis
The complexity of biological systems and intricate relationships between the markers used in this study caused standard linear techniques of data analysis to give inconclusive results. Consequently it was unclear whether any patterns existed in the data and a more powerful technique, capable of non-linear modelling, was sought to cope with the complexity of the data sets.
For analysis, data was collated from patients 1 to 4 days prior to the onset of sepsis and compared with an age/sex matched control group consisting of ICU patients who did not develop sepsis. Individual samples provided data measuring up to 56 different parameters and selective combinations of variables were fed into a multi-layered perceptron neural network (Proforma, Hanon Solutions, Glasgow, Scotland).
Each network was trained with a random 70% selection of balanced sepsis and control data using back propagation algorithms and then tested with the remaining 30% of the data. Five attempts were made at modelling the data within this network, each model differing in its ability to generalise to the data. The most successful model was the one most capable of correctly classifying previously unseen patients as being from either the sepsis or non-sepsis control group.
Results
Table 2 shows an example of a successful model that classified or "scored" 29/35 (or 82.9%) test patients correctly. Table 2. Classification readout using cytokine mRNA variables (Days 1 to 4)
Figure imgf000022_0001
To increase confidence in this model, this was carried out five times, each time using a different random selection of data for which to train and test the network. Once completed, the scores for the individual models were averaged to give an overall indication of the networks ability to classify patients into the correct sepsis or non-sepsis control group.
A series of 5 datasets gives a mean accuracy of prediction of approximately 80%, as shown by Table 3 below
Table 3: neural network predicting sepsis using RT-PCR data only (Classification Performance Analysis of 5 projects)
Figure imgf000022_0002
Figure imgf000023_0001
128 / 157 = 81.5%
79.3 + 85.2 + 80 + 82.9 + 80.8 = 408.2 / 5 = 81.64%
Table 4 lists the averaged prediction accuracy values for a range of networks constructed using differing combinations of variables.
The most successful model was constructed using cytokine mRNA expression combined with CD31 % expression from the flow cytometry data (average 81 .0% accuracy, Table 3, model 1 ) with clinical data also scoring highly (80.4%, Table 3, Model 2).
Table 4. The results from the neural network analysis.
Figure imgf000023_0002
To further test our predictive model, we trained the network on up to 100% of the cytokine data obtained from 1 to 4 days prior to the onset of clinical symptoms. We then selected test data comprising "Day 0" sepsis patients and those from Day -5 and Day -6. day 0, 5 and 6 and also selected 14 control patients from a separate volunteer study, 7 of which developed symptoms of an Upper Respiratory Tract Infection (URTI) within 9 days of sampling (Dstl/CR08631 ). The results are shown below in table 5. Table 5. Performance of cytokine mRNA model (Days -4 to -1) in prediction of other groups
Figure imgf000024_0001
This table shows that our model, built from patterns expressed by sepsis patients up to 4 days before the onset of clinical sepsis, correctly identified, or "scored", 89% of day 0 sepsis patients, 78% of Day -5 sepsis patients and 83% of Day -6 sepsis patients. Overall, analysis using neural networks has led to the creation of a number of predictive models for sepsis. Models built using only cytokine data have proved consistently capable of successfully distinguishing between individuals who will develop sepsis from those that will not.
Example 2: Lack of false positive results from non-sepsis volunteers using neural network model
Table 6 shows the results of testing a group of volunteers by cytokine RT-PCR, none of whom developed signs of SIRS or sepsis.
Table 6
Name Hits/Occurred % Hits/Predicted % Chance Improvement Ratio
Total 13/13 100.0 N/A N/A 50.0« Vo 50.0% 2.0:1
Control 13/13 100.0 13/13 100.0 100.0' Vo 0.0% 1.0:1
Sepsis 0/0 N/A 0/0 0.0 0.0« Vo 0.0% N/A
Example 3: Neural network sepsis prediction of more than 90% accuracy using Clinical Data.
Neural network model tested using clinical data set defined in Table 4 model 2, using the parameters as described in Table 7 below and further illustrated in Figure 3: Table 7: Neural network parameters to analyse clinical data
Figure imgf000025_0001
Table 8
Figure imgf000025_0002
Example 4: Use of Artificial Immune Recognition System
Representation
The initial AIRS system (A. Watkins. An Artificial Immune Recognition System. Mississippi Sate University: MSc Thesis., 2001 ) employed simple real-value shape space. Recently, other people have extended the representation to Hamming shape space (J. Hanamaker and L. Boggess. The effect of distance metrics on AIRS. In Proc. Of Congress on Evolutionary Computation (CEC). IEEE, 2004) and natural language (D. Goodman, L. Boggess and A. Watkins. "An investigation into the source of power for AIRS, an artificial immune classification system". In Proc. lnt Joint Conference on Neural Networks, pp1678-1683. IEEE, 2003). AIRS maintains a set of Artificial Recognition Balls (ARBs) that contain a vector of the data being learnt, a stimulation level and a number of resources. During training, the stimulation level is calculated by assessing the affinity of the data vector in the ARB against a training item, the stronger the match, the greater the stimulation. This stimulation level is used to dictate how many clones the ARB will produce, and affects survival of the ARB.
Affinity Measure
This is dependent on the representation employed. A number of affinity measures for use in AIRS have been proposed, including Hamming distance, Euclidean distance and so on. In this study, both Euclidean and Hamming distance metrics were used, with Euclidean giving the best results.
Immune Algorithm
Essentially, AIRS evolves with two populations, a memory pool and an ARB pool C. It has a separate training and test phase, with the test phase being akin to a k-nearest neighbour classifier. During the training phase, a training data item is presented to M. This set can be seeded randomly, and experimental evidence would suggest that AIRS is insensitive to the initial starting point. The training item is matched against all memory cells in the set M, and a single cell is identified as the higher match MCmatch. This MCmatch is then cloned and mutated. Cloning is performed in proportion to stimulation (the higher the stimulation, the higher the clonal rate), and mutation is inversely proportional (the higher the stimulation, the lower the mutation rate). These clones are inserted into the ARB pool, C. The training item is then presented to the members of the ARB pool, where an iterative procedure is adopted which allows for the cloning and mutation of new candidate memory cells. Through a process of population control, where survival is dictated by the number of resources an ARB can claim, a new candidate memory cell is created. This mechanism is based on the resource allocation algorithm proposed in J. Timmis and M. Neal. A Resource Limited Artificial Immune System. Knowledge Based Systemsm 14(3/4): 121 -130, 2001. This new candidate is compared against the MCmatch, with the training item. If the affinity between the candidate cell and MVCmatch is higher, then the memory cell is replaced with the candidate cell.
This process is performed for each training item, whereupon the memory set will contain a number of cells capable of being used for classification. Classification of an unseen data item is performed in a k-nearest neighbour fashion.
Experimental Setup
An attempt was made to use an experimental procedure that was comparable to the application of neural networks to this data set. For all studies, the marks: asL, MCP-1 , TNF- α, IL-β, IL-6, IL-8 and IL-10 were used. However, it was not possible to completely reproduce exactly the data set, due to incomplete information regarding the pre-processing of the data during the neural network study.
Experiment One
In the first set of experiments, data collected from patients on days 1 through 4 prior to the onset of sepsis, along with data from a control set of patents as training data were used. Specifically, the combined data from days 1 , 2, 3 and 4 for patients who showed signs of sepsis were used, and a random collection of control patients in order to train AIRS. In total, 59 training data items were used. To test AIRS, a random collection of patients from the control group and combined data from all days (excluding data that had been used in the training process) was used. In total, 34 test data points were used. The settings for AIRS are shown in table 9:
Table 9: Parameter Settings for AIRS
Figure imgf000027_0001
Experiment Two
For our second set of experiments, patients were classified who showed signs of sepsis using data for days 0, 5 and 6 and control patients. The AIRS system was trained using the same data as for Experiment One, whilst making use of the same parameters.
Results
The results are not directly comparable with the results obtained from the neural network analysis due to the fact thatit was difficult to ascertain from the original report exactly how the data had been first combined over a period of days, and then divided into training and test sets. Therefore, the results obtained should be considered with this in mind.
Experiment One Ten independent runs of the AIRS algorithm were run, then the average and standard deviation calculated. It was found that AIRS was capable of achieving on average 73(2.96)% classification accuracy. This is approximately 10% lower than the neural network analysis, (using the same markers). However, care has to be taken with a direct comparison.
Experiment Two
Again, ten independent runs of the AIRS algorithm were undertaken, and the average and standing deviation taken. This time, preceding days (0, 5 and 6) before the onset of sepsis were analysed, and the control group. Again, AIRS was trained on data taken from days 1 through 4 and the control group. These results are presented in Table 10.
Table 10: Prediction in Other Groups (standard deviation in braces)
Figure imgf000028_0001
As can be seen from Table 10, AIRS identifies a high percentage of sepsis cases (being able to outperform the neural network on day 5 and day 6, but again with the comparative caveat). The control group did not fair as well, being a lower than expected result, and significantly lower than the neural network approach. This may be due to the fact that AIRS has biased towards the sepsis patients due to the larger amount of data available for training with those, than for non-sepsis.
Conclusions
AIRS appears capable of identifying potential cases of sepsis in advance, and comparable at a certain level to neural network approaches.
Example 5: Use of CD31 expression to predict sepsis
Study design and patients See Example 1.
Flow Cytometry Blood was collected into sodium heparin containers (HM&S, Chessington, Surrey) and transported to the laboratory at room temperature. 100 μl aliquots of blood were mixed with immunofluorescent stains using the volumes recommended by the manufacturer (Beckman Coulter limited, High Wycombe, Buckinghamshire, and Becton Dickinson UK Limited, Cowley, Oxford). T helper cells were identified by co-staining for CD3 and CD4 and T cytotoxic cells were identified by staining for both CD3 and CD8. These cell populations were stained for HLA-DR, CD25, CD54 and CD69. B cells were identified by staining for CD19 and were interrogated with CD80, CD86, CD25 CD54. Natural killer cell were distinguished by staining with CD56 and interrogated with CD1 1 b, CD25, CD54 and CD69. The monocyte population was selected by staining for CD14, these cells were probed with CD1 1 b, CD54, CD80, CD86 and HLA-DR stains. Gating was used in order to identify the granulocyte population, which was stained for CD1 1 b, CD69, CD31 , CD54 and CD62L The stains were incubated at room temperature for 20 minutes. 500 μl of Optilyse C (Beckman Coulter limited) was added to each tube and vortex mixed immediately. The samples were incubated at room temperature for 10 minutes to lyse the red blood cells and 500 μl of lsoton (Beckman Coulter limited) were then added in order to fix the stains. The tubes were vortex mixed immediately and incubated at room temperature for 10 minutes. The cells were then counted on a Beckman Coulter Epics XL System 2 Flow Cytometer.
StatisticsData was analysed using a Binary Logistic Regression model on the SPSS software package version 1 1.0. This analysis compared the control group means for immune modulator expression with the means obtained from the sepsis patients at seven time points: 6 days before diagnosis, 5, 4, 3, 2, and 1 day before diagnosis and 0, on the day of diagnosis of sepsis. Where data points were missing, averaged values for the group were substituted in order to maintain acceptable n values. Results from the model were only reported if the substituted data points did not involve markers that were highlighted by the model as possible predictors.
Results
Analysis of the data found that there was weak evidence of a predictor effect = 0.1 14.
Decreased expression of CD31 was indicated to be a possible predictor of sepsis three days before diagnosis p=0.037 (n=6). The results obtained for 6 days before diagnosis were inconclusive because of the small sample size for this date (n=4). There were no statistically significant predictors found for 5, 4, 2 or 1 day before diagnosis, or for the day of diagnosis.
Conclusions The flow cytometry data obtained from patients prior to the development of sepsis, and from patients who did not develop this disease were collated. Groups were constructed using results from patients in the days before diagnosis of sepsis, with a control group consisting of measurements taken from age matched patients who did not develop sepsis. Examination of bar graphs displaying the medians and 90th and 10th percentiles were difficult to interpret because of the spread of the data and hence statistical analysis was performed.
When the raw data for this is plotted (see Figure 2) it could be seen that 4 out of 6 (66.6 %) of the sepsis patients had CD 31 expression that was lower than that of the control group. It can be seen that the control group (non sepsis) data points are distributed between 1 1 .8 % and 100 %, while four of the six data points in the three days before diagnosis measurements were less than 9 %. Therefore it is possible that CD31 may therefore be used to predict the onset of sepsis three days prior to the appearance of clinical signs and symptoms. This suggests that CD31 could be a useful predictive marker, particularly in combination with other informative sepsis biomarkers.
Example 6: Multivariate statistical analysis to predict sepsis
Introduction
Multivariate data analysis procedures were applied to data collected from patients 1 -6 days prior to development of symptoms of Sepsis. Measurements included flow cytometry, PCR and classical clinical observations. Principle component analysis (PCA) was applied to the data matrix considering each of the three classes of observations individually and combined as a complete data set. Discriminant Function Analysis (DFA) was used to determine whether groups differ with regard to the mean of a variable, and then to use that variable to predict group membership (e.g., of new cases). This was performed on the results from PCA and on the three classes of observations individually and combined as a complete data set.
Data description, manipulation and multivariate techniques
Prior to PCA, the data was summarised by producing probability density functions As normality of distribution is required prior to PCA and DFA, non-normal data were transformed using the Johnson transformation algorithm.
PCA is a dimensionality reducing technique which endeavours to decompose a multivariate data matrix into a few latent variables, composed of linear combinations of the variables, which explain the bulk of the variance of the original matrix. In this way correlations (positive or negative) of parameters within the data set can be established. Essentially, DFA is similar in approach to Analysis of Variance (ANOVA). The DFA problem can be rephrased as a one-way analysis ANOVA problem. Specifically, one can ask whether two groups are significantly different from each other with respect to the mean of a particular variable. However, it should be clear that, if the means for a variable are significantly different in different groups, then it may be concluded that this variable discriminates between the groups.
In the case of a single variable, the final significance test of whether or not a variable discriminates between groups is the F-test. F is essentially computed as the ratio of the between-groups variance in the data over the pooled (average) within-group variance. If the between-group variance is significantly larger then there must be significant differences between means.
When considering multiple variables, it is possible to establish which of several variables contribute to the discrimination between groups. This results in a matrix of total variances and covariances; and likewise, a matrix of pooled within-group variances and covariances. These matrices are then compared via multivariate F-tests in order to determine whether or not there are any significant differences (with regard to all variables) between groups. This procedure is identical to multivariate analysis of variance or MANOVA. As in MANOVA, the multivariate test is performed, and, if statistically significant, which of the variables have significantly different means across the groups is examined. Thus, even though the computations with multiple variables are more complex, the principal reasoning still applies, namely, that variables that discriminate between groups are sought, as evident in observed mean differences.
DFA was performed on clinical, flow cytometry and RT-PCR data using the complete data matrix (including substituted mean values) and by exclusion of data points for which one or more parameters contained substituted mean values. Analogous models were developed to allow analysis of PCA scores from models developed in Model 1. The purpose of the latter was to establish if transformed data matrices (PCA) could be used to classify observations.
Model 1. Principle Component Analysis (PCA) of observation data
i) PCA model based on clinical data
The number of PCs derived and used in a given model is usually defined as those having an Eigenvalue of >1. 6 PCs meet this criterion for clinical data and explain a total of 74.3% of the variance of the data set. Since each of the PCs is orthogonal (uncorrelated) with respect to the other PCs, the association of a clinical parameter with a particular PC defines the PC and illustrates how the parameter influences the variance of the data set.
Table 12 summarises the loadings of each parameter with the six derived PCs from the clinical data. Loading values of >0.5 indicate a strong contribution of a particular parameter to a given PC. The PCs derived from the data set may be interpreted as follows:
PC1 this is dominated by the strong correlation of WCC, monocytes, neutrophils and platelets. A strong correlation exists between creatinine and lactate. Both these groups have a negative relationship (opposite ends of PC1 scale) and are therefore negatively correlated. BXS and HCO3 ' are highly correlated and contribute to PC1 and PC2 equally. The latter parameters are contrasted by creatinine and lactate in PC1. These correlations are summarised in Table 2 and Figure 2
PC2 shows a negative correlation between the group composed of WCC, monocytes, neutrophils and platelets and the group composed of BXS and HCO3 '
PC3 this PC is characterised by the strong relationship between temp, HR and CRP as shown in Figure 3
PC4 although many parameters approach significance for this PC, only CRP is definitively associated with this PC as demonstrated by Figure 4.
PC5 pθ2 is contrasted with both urea and MAP in this component.
PC6 this PC exclusively explains the variance introduced into the data set by lymphocytes.
In interpreting the PC loadings, correlated clinical parameters suggest levels of these species/physical parameters will be elevated or decreased in patients belonging characterised as belonging to PC1 , 2 etc. This will be performed in the discriminant analysis section.
H) PCA model based on flow cytometry data The parameters were abbreviated for clarity and abbreviations listed in Table 13. Nine PCs account for 80.5% of the variance of the data set as shown in the eigenvalue matrix in Table 14 and associated loadings are summarised in Table 15. The correlations between the measured parameters are shown in Table 16 with strong correlations within the derived PCs between the following: fc 1 and fc 3 fc 5 and fc 6 fc 7 and fc 8 fc 9 with fc 10 and fc 1 1 , fc 12 and fc 13 fc 1 1 and fc 14 fc 17 with fc 20 and fc 23 fc 21 with fc 17, 20 and 22 fc 23 with fc 20 and fc 22 fc 28 and fc 29
The PC structure may be interpreted as follows:
PC1 correlates CD3 CD4 CD25 in CD3 CD4, CD3 CD4 HLA-DR in CD3 CD4, CD3 CD8 CD25 in CD3 CD8, CD19 CD80 in CD19, CD19 CD86 in CD19, CD14 CD80 in
CD14, CD14 CD86 in CD14, CD19 CD54 in CD19, CD19 CD25 in CD19 and CD56
CD54 in CD56. PC2 contrasts CD19 CD86 in CD19 with CD14 HLA-DR in CD14, CD14 HLA-DR CD1 1 B in CD14, CD14 HLA-DR CD1 1 B CD54 in CD14. PC3 CD3 CD4 CD54 in CD3 CD4, CD3 CD4 CD69 in CD3 CD4, CD3 CD8 CD54 in CD3
CD8, CD3 CD8 CD69 in CD3 CD8
PC4 CD56 CD69 in CD56, CD69 (%), CD1 1 B CD69 (%) PC5 CD3 CD8 CD54 in CD3 CD8, CD62L (%) PC6 CD31 (%) PC7 no significant components with EV>0.5 PC8 CD54 (%) PC9 CD14 CD1 1 B in CD14 By considering only one parameter of a pair or group, it would be possible to remove 9 parameters thus increasing the y component of the data matrix. However, it was decided that CD31 (%), CD54 (%), CD62L (%), CD1 1 B (%), CD69 (%) and CD1 1 B CD69 (%) only be subjected to statistical analysis (fc 24-fc 29).
The eigenvalue matrix of the selected flow cytometry variables is shown in Table 17 and loadings of the PCA model constructed summarised in Table 18. The use of 3 PCs explains 76.6% of the variance of the data set. The PC model shows the following:
PC1 correlates CD69 (%) and CD1 1 B CD69 (%)
PC2 correlates CD31 (%), CD62L (%) and CD1 1 B (%) PC3 is composed of the variance associated with CD54 (%)
Hi) PCA model based on RT-PCR data
Table 19 indicates that 72.9% of the variance of the RT-PCR data is explained by only 3 PCs. The loading for this model are shown in Tablei 20. The correlation of variables with each PC are shown in Figures 23 and 24 and reveal the following:
PC1 correlates Fas-L, MCP-1 , TNF-alpha, IL-6 and H-S PC2 correlates IL-1 and IL 10 PC3 contrasts IL-1 and I L- 10
The correlation of IL-1 and IL-10 in PC2 and subsequent contrast of these variables in PC3 is interesting. It appears that in some patients these vairables may be highly correlated or contrasted possibly providing a powerful means of discriminating patients.
iv) PCA model based on combined clinical, flow cytometry and RT-PCR data Table 21 summarises the parameters included in this final model and the associated Eigenvalues for the correlation matrix. Table 21 indicates that 9 PCs have an Eigenvalue greater than 1 which explain 68.7% of the data variance. The loadings of the model are shown in Table 22. Analysis allows the following interpretation of the PCA model: PC1 shows positive correlation between WCC, Neutrophils, Monocytes, APTR, HCO3-,
BXS, Platelets, CD69 (%) and CD1 1 B CD69 (%). This PC also contrasts the above with Lactate and Creatinine which are correlated. PC2 correlates CD69 (%), CD1 1 B CD69 (%), WCC, Neutrophils, Monocytes and INR.
These parameters are contrasted with TNF-alpha.
PC3 strongly correlates the PCR parameters Fas-L, MCP-1 , TNF-alpha, IL-6 and H-S PC4 contrasts CRP and IL- 10. PC5 correlates the flow cytometry parameters CD31 (%),CD54 (%),CD62L (%) and CD69
(%). PC6 correlates CD62L (%) and HR. PC7 is associated with Temp. PC8 is associated with IL-1 PC9 is associated with PO2
Model 2: Discriminant Function Analysis (DFA) based on observations and PCA score data The terminology common to all model definition in the DFA models developed is explained below and numerical values shown in Table 23.
Model The object of the analysis is to build a "model" of how to best predict to which group a case belongs. In the following discussion the term "in the model" will be used in order to refer to variables that are included in the prediction of group membership, and "not in the model" if they are not included.
• Forward stepwise analysis
In stepwise discriminant function analysis, a model of discrimination is constructed step-by- step. Specifically, at each step all variables are reviewed and evaluated to establish which one will contribute most to the discrimination between groups. That variable will then be included in the model.
• Backward stepwise analysis
It is possible to step backwards; in that case the programme first includes all variables in the model and then, at each step, eliminates the variable that contributes least to the prediction of group membership. Thus, as the result of a successful discriminant function analysis, one would only keep the "important" variables in the model, that is, those variables that contribute the most to the discrimination between groups. • F to enter, F to remove
The stepwise procedure is "guided" by the respective F to enter and F to remove values. The F value for a variable indicates its statistical significance in the discrimination between groups, that is, it is a measure of the extent to which a variable makes a unique contribution to the prediction of group membership. In general, the programme continues to choose variables to be included in the model, as long as the respective F values for those variables are larger than the user-specified F to enter; and excludes (removes) variables from the model if their significance is less than the user-specified F to remove.
• Tolerance
The tolerance value of a variable is computed as 1 -R2 of the respective variable with all other variables in the model. Thus, the tolerance is a measure of the respective variable's redundancy. For example, a tolerance value of .10 means that the variable is 90% redundant with the other variables in the model.
• Wilks' λ
This parameter gives a measure of the discriminatory power of the model and can assume values in the range of 0 (perfect discrimination) to 1 (no discrimination).
• Partial λ
This is the Wilks' λ associated with the unique contribution (measured orthogonally) of the respective variable to the discriminatory power of the model.
As a point of note, a common misinterpretation of the results of stepwise discriminant analysis is to take statistical significance levels at face value. When the programme decides which variable to include or exclude in the next step of the analysis, it actually computes the significance of the contribution of each variable under consideration. Therefore, by nature, the stepwise procedures will capitalize on chance because they "pick and choose" the variables to be included in the model so as to yield maximum discrimination. Thus, when using the stepwise approach awareness that the significance levels do not reflect the true alpha error rate, that is, the probability of erroneously rejecting HO (the null hypothesis that there is no discrimination between groups) must be maintained
• Canonical Correlation Analysis (CCA) This is an additional procedure for assessing the relationship between variables. Specifically, this manipulation allows the elucidation of the relationship between two sets of variables. Parameters which characterise this analysis are detailed below.
• Significance of roots (y2 test)
The term root is used to describe the individual discriminant functions (DFs). The statistical significance of the derived DFs, is tested by the χ2 test of successive DFs. A report of the step-down test of all canonical roots is obtained containing the significance of all DFs followed by the second line which reports the significance of the remaining roots, after removing the first root, and so on. Thus the number of DFs to interpret is obtained.
• Discriminant function coefficients
Two outputs are produced, one for the Raw Coefficients and one for the Standardized Coefficients. Raw here means that the coefficients can be used in conjunction with the observed data to compute (raw) discriminant function scores. The standardized coefficients are the ones that are customarily used for interpretation, because they pertain to the standardized variables and therefore refer to comparable scales.
• Eigenvalues An Eigenvalue for each DF and the cumulative proportion of explained variance accounted for by each function is obtained. This value is defined in an identical way in PCA and DFA. The larger the value, the greater the amount of variance explained by that DF.
• Factor structure coefficients
These coefficients represent the correlations between the variables and the DFs and are commonly used in order to interpret the "meaning" of discriminant functions. In an analogous way to PCA, the interpretation of factors should be based on the factor structure coefficients.
• Means of canonical variables When knowledge of how the variables participate in the discrimination between different groups is obtained, the next logical step is to determine the nature of the discrimination for each DF. The first step to answer this question is to look at the canonical means. The larger the canonical mean for a given DF and group of observations, the greater the discriminatory power of that DF. i) DFA model based on PCA scores of clinical data
a) Containing substituted mean values
Table 19 summarises the results of this analysis. The Wilks' λ value of 0.4 indicates a relatively inefficient classification model. The three derived DFs account for a total of 89.9% of the variance of the data set and the DFs are composed mainly of PCs 1 , 3 and 4. The factor structure coefficients indicate that:
• DF1 is composed of the variance explained by PC1 and to a lesser extent with PC4
• DF2 is exclusively composed of the variance explained by PC3
• DF3 is composed of the variance explained by PC4
These correlations are confirmed by the standardised coefficients. The means of cannonical variables indicate that:
• DF1 negatively correlates days 1 , 2 and 3 with days 5 and 6
• DF2 defines control group observations
• DF3 defines observations for day 2
A summary of the classification of this model and its discriminative nature in relation to the PCs is shown in Table 25. The classification matrix for the model is shown in Table 26. Table 26 suggests a good classification can be obtained for control and 6 day data with 80 and 83 % respectively of observations being classified correctly. However the overall classification power of the model is poor with only 48 % of all observations being correctly classified.
b) Excluding substituted mean values
Table 24 summarises the results of this analysis. The Wilks' λ value of 0.45 indicates a relatively inefficient classification model. The three derived DFs account for a total of 95 % of the variance of the data set and the DFs are composed mainly of PCs 1 , 3 and 5. The factor structure coefficients indicate that:
• DF1 is composed of the variance explained by the negative correlation between PC1 and PC5
• DF2 is composed of the variance explained by PC3
• DF3 is composed of the variance explained by PC3 but to a lesser degree than DF2 These correlations are confirmed by the standardised coefficients. The means of cannonical variables indicate that:
• DF1 negatively correlates days 1 , 2 and 3 with days 5 and 6
• DF2 negatively correlates days 1 and 6 with the control group
• DF3 defines observations for day 6
A summary of the classification of this model and its discriminative nature in relation to the
PCs is shown in Table 25. The classification matrix for the model is shown in Table 26. Table 26 suggests a good classification can be obtained for control and 6 day data with 83 and 67 % respectively of observations being classified correctly. However the overall classification power of the model is poor with only 44 % of all observations being correctly classified, less than that using mean substituted variables.
The similar prediction efficiency with and without mean substituted values validated the subsequent approach to perform DFA with the inclusion of these values.
H) DFA model based on transformed values of clinical data
In an effort to improve the classification of observations, the transformed variable values from the original data matrix were subjected to DFA. The thesis was that since PCA is a dimensionality reducing technique, perhaps some data quality is lost and performing DFA on PCA scores leads to a model with less predictive power.
Table 27 summarises the results of this analysis. The Wilks' λ value of 0.22 is an improvement on the PCA scores classification models. The five derived DFs account for a total of 99 % of the variance of the data set and the DFs are composed of BXS, CRP, lactate, urea, temperature, creatinine, neutrophils, pθ2 and HCO3 ' with the other clinical variable having no influence on the classification of observations. The factor structure coefficients indicate that:
• DF1 classifies the correlation between BXS and HCO3 " which are negatively correlated with lactate
• DF2 classifies observations showing a high degree of correlation between BXS, CRP and HCO3 • DF3 classifies samples with a negative correlation between temperature and creatinine
• DF4 classifies samples with a negative correlation between temperature and PO2
• DF5 classifies samples with a negative correlation between urea and neuts
A summary of the classification of this model and its discriminative nature in relation to the clinical variables is shown in Table 31. The classification matrix for the model is shown in Table 32. Table 32 suggests a good classification can be obtained for control and 6 day data with 80 and 83 % respectively of observations being classified correctly. Days 1 , 2 and 5 are greatly improved compared to the PCA scores models but the overall classification power of the model is poor with 55 % of all observations being correctly classified.
Hi) DFA model based on PCA scores of flow cytometry
Table 33 summarises the results of this analysis. The Wilks' λ value of 0.39 indicates a relatively inefficient classification model. The two derived DFs account for a total of 71% of the variance of the data set and the DFs are composed mainly of PCs 1 , 5 and 5. The factor structure coefficients indicate that:
• DF1 is composed of the variance explained by PC1 and PC8
• DF2 is exclusively composed of the variance explained by PC5
These correlations are confirmed by the standardised coefficients. The means of cannonical variables indicate that:
• DF1 negatively correlates day 1 with days 5 and 6
• DF2 defines day 3 observations
A summary of the classification of this model and its discriminative nature in relation to the PCs is shown in Table 34. The classification matrix for the model is shown in Table 35. Table 35 suggests a reasonable classification can be obtained for control and 6 day data with 66% of observations being classified correctly in both groups. However the overall classification power of the model is poor with only 44 % of all observations being correctly classified. iv) DFA model based on flow cytometry data
A summary of the classification of this model and its discriminative nature in relation to the variables is shown in Table 36. The Wilks' λ value of 0.034 indicates an excellent classification model. The three derived DFs account for a total of 74% of the variance of the data set and the DFs are composed mainly of fc7-8, fd 1 , fd 6, fc25, fc28, fc29. The factor structure coefficients indicate that:
• DF1 correlates fc7, 16, 28 and 29 and contrasts these to fc8 and 25
• DF2 correlates fc7, 8, 16 and 25 and contrasts these with fd 2
• DF3 is correlated with fd 1 Table 37 summarises this information.
These correlations are confirmed by the standardised coefficients. The means of cannonical variables indicate that:
• DF1 contrasts 1 day with days 5 and 6
• DF2 contrasts the control group with days 3, 4 and 5 and correlates the control with day 6
• DF3 contrasts days 2 and 3 with day 4
Table 38 suggests a good classification can be obtained all groups. The overall classification power of the model is impressive with 76.6% of all observations being correctly classified.
The DFA models for RT-PCR were so poor for both PCA scores and transformed data, with The Wilks' λ values >0.8, they were discarded and will not be considered further.
v) DFA model based on combined clinical, flow cytometry and RT-PCR data
A summary of the classification of this model and its discriminative nature in relation to the variables is shown in Table 39. The Wilks' λ value of 0.0087 indicates an excellent classification model. The four derived DFs account for a total of 89.2% of the variance of the data set and the DF factor structure indicates that:BXS, fc 25, fc 22, fc 1 1 , Temp, CRP, fc 18, fc 6, IL-6, INR, APTR, fc 16, Urea, Lactate, Fas-L, fc 13, fc 24, fc 1 , fc 3, MCP-1 , fc 28, 11-10, fc 27, fc 26, Neutrophils, fc 14, WCC, fc 29, Platelets, pθ2 are included in the model. All other parameters fail to meet the stepwise criteria and hence were eliminated from the model.
The means of canonical variables indicate that:
• DF1 contrasts 1 day with days 5 and 6
• DF2 correlates the control group with day 6 and contrasts these with days 3, 4 and 5
• DF3 correlates the control group with day 5 and contrasts these to days 2, 3 and 6
• DF4 contrasts days 4 and 5
Table 40 shows an excellent classification can be obtained for all groups with a minimum correct assignment rate of 76%. The overall classification power of the model is impressive with 86.9% of all observations being correctly classified.
When each DF is applied to the data using this model,the groups of patients can clearly be seen to cluster and are spatially separated from the other groups.
Conclusions
PCA has highlights correlations between measured variables for all classes of patients. Many of the correlations are expected from a molecular biology standpoint. Some of the PCA models greatly reduced the dimensionality of the data set but the resulting scores did not spatially separate the groups of patients.
DFA on scores obtained from PCA showed disappointing results. The discriminatory power of the models ranged from 44 - 56 % when PCA scores were used. The low discriminatory power of these models may be a result of the reduction in dimensionality of the data set during PCA with significant detail being lost. Using transformed variables in DFA gave much improved models. The discriminatory power of the clinical and flow cytometry models was 55 and 76 % respectively. When DFA was performed on the complete data set (clinical, flow cytometry and RT-PCR variables) a prediction efficiency of 86.9% was observed. Therefore it is recommended that the variables included in this latter model (Table 36) be measured and used to classify new patients suspected of being susceptible to sepsis.
The most impressive feature of the model is its ability to correctly assign patience correctly 6 days before the onset of symptoms. Therefore key discriminatory variables could be monitored and threshold levels established at which medical treatment must be administered. Using the parameters shown in Table 34 it is possible to acquire data from patients and using transformation algorithms input the data into the DFA model. This is then capable of classifying patients into the appropriate groups with an efficiency of approaching 90%. This could of great value when used in clinical laboratories.
Table 11. Eigenvalues of correlation matrix, and related statistics for clinical observations
Figure imgf000043_0001
Table 12. Loadings of clinical measurements for each PC. (associations with Eigenvalues >0.5 shown for 95% CL).
Figure imgf000044_0001
Table 13. Abbreviations used in anal sis of flow c tometr data
Figure imgf000044_0002
Table 14. Eigenvalues of correlation matrix, and related statistics for flow cytometry data
Figure imgf000045_0001
Table 15. Loadings for all flow cytometry data for each PC. (associations with Eigenvalues >0.5 shown for 95% CL)
Figure imgf000045_0002
Figure imgf000046_0001
Table 16. Correlation matrix of PCA model using flow cytometry variables
Figure imgf000046_0002
Table 16 continued
Figure imgf000047_0001
Table 17. Eigenvalues of correlation matrix, and related statistics for selected flow cytometry variables (fc 24-fc 29)
Figure imgf000047_0002
Table 18. Loadings for selected flow cytometry data for each PC. (associations with Eigenvalues >0.5 shown for 95% CL)
Figure imgf000047_0003
Table 19. Eigenvalues of correlation matrix, and related statistics for RT-PCR data
Figure imgf000048_0001
Table 20. Loadings for RT-PCR data for each PC. (associations with Eigenvalues >0.5 shown for 95% CL)
Figure imgf000048_0002
• borderline significance at 95% Cl
Table 21. Eigenvalues of correlation matrix, and related statistics for combined clinical, RT- PCR and flow cytometry variables
Figure imgf000048_0003
Table 21 continued
Figure imgf000049_0001
Table 22. Loadings for combined clinical, RT-PCR and flow cytometry data for each PC. (associations with Eigenvalues >0.5 shown for 95% CL) parameter PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
CD31 (%) -0 .02 -0.03 0 42 0. 18 0.52 -0 .07 -0 .40 -0 .17 0. 03
CD54 (%) -0 .16 0.04 0 20 -0 .09 0.48 0 48 0. 06 0 18 -0 .19
CD62L (%) -0 .41 -0.09 0 20 0. 28 0.16 -0 .21 0. 01 -0 .01 0. 39
CD1 1 B (%) -0 .13 -0.16 0 22 0. 19 0.54 -0 .13 -0 .09 -0 .19 0. 05
CD69 (%) 0 49 -0.55 0 23 -0 .27 0.13 -0 .16 0. 00 0 40 -0 .25
CD1 1 B CD69 (%) 0 48 -0.53 0 29 -0 .27 0.14 -0 .16 -0 .03 0 40 -0 .22
Temp 0 20 0.06 0 30 -0 .35 -0.02 0 37 0. 47 0 17 0. 30
HR 0 15 -0.12 0 13 -0 .35 0.20 0 65 -0 .03 -0 .32 0. 00
MAP 0 19 0.19 -0 .15 0. 10 -0.44 0 31 -0 .30 0 04 -0 .35
WCC 0 57 -0.64 0 14 0. 38 0.01 0 06 0. 02 -0 .08 0. 08
Neuts 0 53 -0.64 0 21 0. 38 -0.01 -0 .05 0. 00 -0 .06 0. 12
Lymphocytes 0 31 -0.10 -0 .32 0. 01 -0.02 0 09 -0 .06 -0 .30 -0 .28
Monocytes 0 56 -0.61 0 04 0. 25 -0.04 0 19 0. 06 -0 .04 0. 09
Platelets 0 49 -0.02 -0 .31 0. 29 0.01 -0 .15 0. 26 0 04 -0 .01
INR -0 .21 -0.71 0 09 0. 10 -0.25 0 21 -0 .24 -0 .12 -0 .04
APTR -0 .50 -0.49 -0 .04 0. 07 -0.33 0 1 1 -0 .22 -0 .08 -0 .08
CRP -0 .07 -0.09 0 26 -0 .63 0.14 -0 .32 -0 .23 -0 .18 0. 02
PO2 -0 .23 0.13 -0 .10 0. 18 0.45 -0 .10 0. 14 -0 .12 -0 .49
HCO3- 0 79 0.32 -0 .07 -0 .25 -0.03 -0 .12 -0 .21 -0 .06 0. 01
BXS 0 80 0.33 -0 .07 -0 .24 -0.04 -0 .1 1 -0 .22 -0 .08 0. 03
Lactate -0 .50 -0.41 -0 .02 -0 .15 0.03 0 06 0. 22 0 08 -0 .08
Urea -0 .25 -0.15 0 21 -0 .12 -0.24 -0 .32 0. 16 -0 .08 -0 .19
Creatinine -0 .68 -0.45 0 16 -0 .06 -0.15 -0 .04 -0 .03 0 10 -0 .18
Fas- L -0 .05 0.12 0 77 -0 .08 0.01 0 1 1 -0 .16 -0 .05 -0 .05
MCP-1 0 12 0.19 0 61 0. 07 -0.36 -0 .09 0. 20 -0 .32 -0 .13
TNF-alpha 0 14 0.49 0 57 0. 32 0.01 0 24 0. 13 0 05 -0 .12
IL-1 -0 .22 0.28 0 20 0. 20 -0.12 0 1 1 -0 .45 0 49 0. 16
IL-6 0 07 0.19 0 66 0. 15 -0.15 -0 .20 0. 20 -0 .05 -0 .25
IL-8 0 02 0.16 0 73 -0 .05 -0.27 -0 .04 -0 .03 -0 .01 0. 16
IL-10 0 14 0.44 0 02 0. 51 0.09 0 03 -0 .05 0 21 -0 .19 Table 23. Model definition used in all DFA models
Figure imgf000050_0001
Table 24. Summary of DFA model based on PCA scores of clinical data containing substituted mean values
Figure imgf000050_0002
Table 25. summary of variable association in the discriminative DFA model based on PCA scores of clinical data containing substituted mean values
Figure imgf000050_0003
Table 26. Classification matrix of DFA model based on PCA scores of clinical data containing substituted mean values
Figure imgf000051_0001
Table 27. Summary of DFA model based on PCA scores of clinical data without substituted mean values
Figure imgf000051_0002
Table 28. summary of variable association in the discriminative DFA model based on PCA scores of clinical data without substituted mean values
Figure imgf000051_0003
Table 29. Classification matrix of DFA model based on PCA scores of clinical data without substituted mean values
Figure imgf000052_0001
Table 30. Summary of DFA model based on transformed clinical data with substituted mean values
Figure imgf000052_0002
Table 31. Summary of variable association in the DFA model based on clinical data with substituted mean values
Figure imgf000053_0001
Table 32. Classification matrix of DFA model based on clinical data with substituted mean values
Figure imgf000053_0002
Table 33. Summary of DFA model based on PCA scores of flow cytometry data containing substituted mean values
Figure imgf000053_0003
Table 34. Summary of variable association in the DFA model based on PCA scores of flow cytometry data containing substituted mean values
DF PC components of each DF
1 1 fc 1 -3 , 9-15, 17, 20-23
8 fc 19
2 5 fc 7 + 26
Table 35 Classification matrix of DFA model based on PCA scores of flow cytometry data containin substituted mean values
Figure imgf000054_0001
Table 36 Summary of DFA model based on flow cytometry data containing substituted mean values
Figure imgf000054_0002
Table 37 Summary of variable association in the DFA model based on flow cytometry data containing substituted mean values
DF clinical variables defining DFs
1 correlates fc7, 16, 28 and 29 and contrasts these to fc8 and 25
2 correlates fc7, 8, 16 and 25 and contrasts these with fd 2
3 correlated with fc1 1
Table 38 Classification matrix of DFA model based on clinical data with substituted mean values
Figure imgf000055_0001
Table 39 Summary of DFA model based on combined clinical, RT-PCR and flow cytometry variables
Figure imgf000055_0002
11-10 I 0.01 I -0.10 I 0.09 I -0.07 I
Table 39 contd. Summary of DFA model based on combined clinical, RT-PCR and flow cytometry variables
Figure imgf000056_0001
Table 40. Classification matrix of DFA model based on combined clinical, RT-PCR and flow cytometry variables
Figure imgf000057_0001
Example 7: Binary logistic regression analysis to predict sepsis
A binary logistic regression model was used to analyse the RT-PCR, flow cytometry results and clinical data separately, from the ICU patients who went on to develop sepsis and presented positive microbiology results. This model used results gained from an age matched group of ICU patients who were not diagnosed with sepsis as the control group. Although the model identified numerous possible predictors some appeared to be of limited use since the values obtained for the pre-symptomatic sepsis patients were within those obtained for the non sepsis patients. The potential prediction markers that did yield some pre-sepsis data points that differed from the non sepsis data are listed in Table 36. However when combined, these prediction markers could only have identified 8 out of the 24 pre-sepsis patients.
Table 41. Summary of potential prediction markers identified by binary logistic regression analysis.
Figure imgf000057_0002
The discovery of a combination of markers that could possibly predict sepsis in 8 out of 24 patients who later went on to develop SIRS with confirmed infection dos not constitute a diagnostic test. Although the prediction capability for CD31 on granulocytes appeared promising (66%), this marker was only effective three days before the appearance of clinical symptoms. A test based on CD31 alone may not constitute a diagnostic test since to be effective there would need to be a larger diagnostic window. This could be achieved by the discovery of even more markers. This study may however have found some markers that could form part of a diagnostic test in the future, but caution must be exercised. In the mid 1980s HLA-DR was believed to be prognostic for the development of infections and sepsis (Spittler, A. & Roth, E. 2003, Intensive Care Med , vol. 29, pp. 121 1 -1213More recent studies however have shown that post-operative levels of this marker did not predict the onset of SIRS, sepsis or infectious complications (Oczenski, W. et a/ 2003, Intensive Care Med, vol. 29, pp. 1253-1257 and Perry, S.et al. 2003, Intensive Care Med, vol. 29, pp. 1245-1252.The conflicting reports in the case of HLA-DR illustrates why caution must be applied to the results of this study. These findings could be due to regional factors such as antibiotic policy, diagnostic criteria, clinical practice, surgical procedures, treatment regimes, environmental factors and the patients predisposing factors (Angus, D. & Wax, R. 2001 , Critical Care Medicine, vol. 29, no. 7 (suppl), pp. 109-1 16). A larger study that involves more patients from several different hospitals, preferably spanning different health authorities, needs to be conducted to further assess the usefulness of the markers identified for the prediction of sepsis.
Example 8: Sepsis as a model for response to biological weapons
Background and method
Since one of the applications for the claimed invention is the early detection of deliberate infection resulting from exposure to biological weapons, the applicability of sepsis as a model for such infection was examined. Presumptive biological weapons pathogens such as Burkholderia pseudomallei and Francisella tularensis are predicted to produce severe sepsis (see Table 42), which is difficult to model for obvious reasons.
However, in vitro infection of whole blood may be used as a model and the activation marker expression and cytokine response measured. To compare this in vitro infection model with the in vivo situation, Staphylococcus aureus infection was selected as a model infectious agent directly comparable with the in vivo hospital-acquired infection data.
Table 42
Figure imgf000058_0001
Figure imgf000059_0001
Blood from 25 healthy volunteers was infected in vitro with Staphylococcus aureus and the following activation markers and cytokine levels measured at 24 and 48 hours post-infection, as previously described.
FACS Dendritic cells: CD54, CD97, CCR6, CCR7 NK cells: CD25, CD44, CD62L, CD69, CD97 Monocytes: CD44, CD54, CD62L, CD69, CD97, CD107a Neutrophils: CD44, CD62L, CD69, CD107a
Real time RT PCR
IL-1 β, IL-6, IL-8, IL-10, MCP-1 , TNF-α, sFasL
Each of these sets of input parameters (ie Dendritic cell markers, NK cell markers, monocyte markers, neutrophil markers, RT PCR data at 24h, RT PCR data at 48h) were used to train its own neural network model. Random selections of infected or non-infected blood samples were used for training (70%) or subsequent testing (30%). The testing phase of the neural network analysis gave a predictive accuracy based on the % of times it would correctly predict that the test set of input parameters was from an infected or non-infected sample. This testing of each set of input parameters was repeated 5 times. Each time the test was conducted a new neural network was constructed using a newly randomised 70% of the infected and non-infected samples. An average predictive accuracy was derived for each set of input parameters by working out the mean from the 5 predictive accuracies calculated from the 5 neural networks constructed on the 5 randomised sets of input data. The methodology was similar to that used in the sepsis patient study.
Results
The most consistent results were obtained from the RT PCR results. Figure 4 shows the data obtained from three subjects, which demonstrates the somewhat heterogeneous patterns of change in the profiles. However, when subjected to the neural network analysis described above, the algorithm achieved a good level of identification of infected sample over uninfected controls (Figure 5).
Example 9:
Microarray design and fabrication
A custom human immune response array was designed homologous to the DSTL-designed murine immune function array with additional genes that had been identified from the previous sepsis study. A total of 1438 genes were represented by a single 50-mer oligonucleotide designed by MWG Biotech. In addition the array contained 768 oligonucleotides from the MWG Biotech commercially available 'diverse function' genes to act as an inter-microarray slide control. Printing of the oligonucleotides was performed by MWG according to their array layout plan with the entire set of printed spots (2206) triplicated on each slide.
Blood samples for analysis
Blood samples were taken from intensive care unit (ICU) patients and mixed with blood/bone marrow RNA stabilisation reagent (Roche) in a 1 :10 ratio as per the manufacturer's instructions. Stabilised samples were shipped to DSTL frozen (-205C) and subsequently stored at -70 5C prior to mRNA extraction.
RNA isolation Messenger RNA (mRNA) was isolated from 27.5mls blood lysate (corresponding to 2.5mls of stabilised blood) using the mRNA Isolation Kit for Blood/Bone Marrow (Roche) following the manufacturers guidelines with a few minor changes (volumes for the 55ml lysate protocol were halved, centrifugation was for 3 minutes, washing of MGP beads was performed using 1 ml MGP washing buffer repeated 3 times and elution was into 20μl of redistilled water). The entire mRNA preparation was treated with RNase free DNase from the DNA-free kit (Ambion Inc.) following the manufacturers guidelines. The final mRNA preparation was quantitated by
A260- mRNA amplification and fluorescent dye labelling
All amplification and labelling steps were performed with the Amino AIIyI Message Amp™ aRNA kit (Ambion Inc.) following the manufacturers instructions. Cy3 and Cy5 post-labelling reactive dyes used in the protocol were obtained from Amersham Bioscience. Amplification of mRNA was performed using 50ng purified mRNA. A total of 3μg of amplified mRNA was fluorescently labelled for hybridisation, 1 .5μg with Cy3 and 1 .5μg with Cy5. Following labelling, the same sample labelled with either Cy3 or Cy5 were mixed together and purified using the MessageAmp™ kit. The volume of eluted sample was reduced to 9μl by drying in a vacuum drier. Following this, the size of the labelled amplified mRNA was reduced for hybridisation using the Fragmentation kit (Ambion Inc.)
Microarray hybridisation
Microarray slides were prepared for hybridisation by attaching a GeneFrame® (MWG) over the oligo printed area according to the manufacturers instructions. Fragmented, labelled mRNA (1 1 μl) was denatured for 3 minutes at 955C, snap-cooled on ice for 3 minutes and briefly centrifuged. 240μl MWG hybridisation solution was added to the sample and mixed before applying to the microarray slide. The slide was covered with a plastic coverslip which attaches to the GeneFrame® and placed within a HC2 hybridisation cassette (CamLab). 500μl water was added to each well of the cassette to prevent drying. The closed cassette was placed in a 425C hybridisation oven for 16 hours. After hybridisation, slides were removed from the cassettes and the GeneFrame® and coverslip removed. Slides were washed sequentially using three buffers (1x SSC, 0.2% SDS; O.δxSSC and 0.25xSSC). Each wash was for 5 minutes with agitation. Slides were centrifuged for 5 minutes at 1500 rpm and dried slides stored in the dark until scanning. Slides were scanned using a GenePix 4000B microarray scanner (Axon Instruments Inc.). PMT voltages for 635 and 532nm channels were adjusted to yield a total pixel intensity ratio of approximately 1 :1 . Images were saved as single image TIFF files.
Microarray gene expression analysis
TIFF files from the Axon scanners were loaded into BlueFuse software (BlueGnome Ltd) and processed to 'fused' data following the manufacturers instructions. The resultant data files were saved and subsequently analysed in GeneSpring software. Neural network
For analysis, data was collated from patients 1 to 6 days prior to the onset of sepsis and compared with a control group consisting of ICU patients who did not develop sepsis. Individual samples provided data measuring up to 22 different parameters and selective combinations of variables were fed into a multi-layered perceptron neural network (Proforma, Hanon Solutions, Glasgow, Scotland).
Each network was trained with a random 70% selection of balanced sepsis and control data using back propagation algorithms and then tested with the remaining 30% of the data. This process was then repeated, using a different 70% of randomised data, until a total of 5 repeats had been run. The predictive abilities of these 5 models were then averaged to give an overall predictive capability of the network. The most successful network was the one most capable of correctly classifying previously unseen patients as being from either the sepsis or non-sepsis control group.
Results
Table 43 shows various sets of genes selected from the 22 most informative genes based on their individual scores. The sets were assigned in such a way as to attempt establish the relative importance of combinations of genes based on such factors as their individual scores (sets B and G representing the top and bottom ranked genes of the 22), whether or not genes with known immunological or inflammatory functions were included (set E with CD40 and IL-8 excluded, for instance) and the effect of larger or smaller sets.
Table 43
Figure imgf000062_0001
Figure imgf000063_0001
Table 44 shows the ranked scores obtained following the neural network analysis
Table 44
Figure imgf000063_0002
Surprisingly, set B, comprising the top ten-scoring genes based on their individual scores did not give the best overall predictive value. Even more surprisingly, the best predictive set, set F, comprised set B together with two genes not known to have any connection with the immune or inflammatory response, CRX and MAPI A. Overall, the values indicate that the inclusion of genes that could not have been predicted to be useful based on their known functions nevertheless resulted in improved predictive scores.
The reader's attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The invention is not restricted to the details of any foregoing embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

Claims

1 . Method for screening a biological sample to detect early stages of infection, SIRS or sepsis comprising the steps of: a) detecting expression of a first set of informative biomarkers by RT-PCR and detecting expression of a second set of informative bio markers by RT-PCR; b) analysing the results of detection; c) classifying said sample according to the likelihood and/or timing of the development of overt infection wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 1 selected from the list consisting of CD40, CD5, CD79A, CRX, CTNND1 , CX3CL1 , ENTPD2, ENTPD5, EPHA8, GPR44, HMMR, IL-8, MAPI A, MAPK7, MEF2D, ODF1 , SAA3P, SLC6A9, SPN, TDGF1 , TSC22D1 and HDAC5 and wherein the second set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 1 selected from the list consisting of CD178, MCP-1 , TNFα, IL-1 β, IL-6, IL-10, INF-α, INF-γ.
2. Method according to claim 1 , wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 2 selected from the list.
3. Method according to either claim 1 or claim 2, wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 3 selected from the list.
4. Method according to any preceding claim, wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 4 selected from the list.
5. Method according to any preceding claim, wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 6 selected from the list.
6. Method according to any preceding claim, wherein the first set of informative biomarkers the expression of which is detected by means of RT-PCR consists of at least 12 selected from the list.
7. Method according to claim 6, wherein the set of informative biomarkers consists of CD40, CRX, ENTPD2, EPHA8, IL-8, HMMR, MAPI A, MAPK7, MEF2D, SAA3P, SPN, TSC22D1 .
8. Method according to claim 1 , wherein at least 2 biomarkers selected from the second list are detected.
9. Method according to claim 8, wherein at least 3 biomarkers selected from the second list are detected.
10. Method according to claim 9, wherein at least 4 biomarkers selected from the second list are detected.
1 1. Method according to claim 10, wherein at least 6 biomarkers selected from the second list are detected.
12. Method according to claim 1 1 , wherein at least 8 biomarkers selected from the second list are detected.
13. Method according to any preceding claim, wherein analysis of the results yields a prediction of a probability of clinical SIRS or sepsis developing.
14. Method according to any of claims 1 to 12, wherein analysis of the results yields a binary yes/no prediction of clinical SIRS or sepsis developing.
15. Method according to any preceding claim, wherein if the development of clinical SIRS or sepsis is predicted, the results are subjected to a second analysis to determine the likely timing and/or severity of the clinical disease.
16. Method of any preceding claim wherein analysis is by means of a neural network.
17. Method of claim 16 wherein the neural network is a multilayered perceptron neural network.
18. Method of any of preceding claim, wherein the analysis is capable of correctly predicting clinical SIRS or sepsis in greater than 80% of cases.
19. Analysis according to the method of any preceding claim for the preparation of a diagnostic means for the diagnosis of infection, SIRS or sepsis.
PCT/GB2008/051069 2007-11-16 2008-11-17 Early detection of sepsis Ceased WO2009063249A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0722582.4 2007-11-16
GBGB0722582.4A GB0722582D0 (en) 2007-11-16 2007-11-16 Early detection of sepsis

Publications (2)

Publication Number Publication Date
WO2009063249A2 true WO2009063249A2 (en) 2009-05-22
WO2009063249A3 WO2009063249A3 (en) 2009-10-15

Family

ID=38896479

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2008/051069 Ceased WO2009063249A2 (en) 2007-11-16 2008-11-17 Early detection of sepsis

Country Status (2)

Country Link
GB (2) GB0722582D0 (en)
WO (1) WO2009063249A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013107826A3 (en) * 2012-01-17 2013-10-03 Institut Pasteur Use of cellular biomarkers expression to diagnose sepsis among intensive care patients
CN103882039A (en) * 2013-09-23 2014-06-25 中国农业科学院上海兽医研究所 Fluorescent quantitation RT-PCR (Reverse Transcription-Polymerase Chain Reaction) method for detecting duck MAPK1 (Mitogen-activated Protein Kinase) gene
WO2014209238A1 (en) * 2013-06-28 2014-12-31 Acumen Research Laboratories Pte. Ltd. Sepsis biomarkers and uses thereof
CN107840876A (en) * 2016-09-18 2018-03-27 北京奥维亚生物技术有限公司 A kind of mouse Odf1 polypeptides and its preparation method for antibody

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210325380A1 (en) * 2020-04-20 2021-10-21 EnLiSense, LLC Disease diagnostics using a multi-configurable sensing array

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200418992A (en) * 2002-11-12 2004-10-01 Becton Dickinson Co Diagnosis of sepsis or sirs using biomarker profiles
US20040097460A1 (en) * 2002-11-12 2004-05-20 Becton, Dickinson And Company Diagnosis of sepsis or SIRS using biomarker profiles
EP1611255A2 (en) * 2003-04-02 2006-01-04 SIRS-Lab GmbH Method for recognising acute generalised inflammatory conditions (sirs), sepsis, sepsis-like conditions and systemic infections
DE102004015605B4 (en) * 2004-03-30 2012-04-26 Sirs-Lab Gmbh Method for predicting the individual disease course in sepsis
DE102004016437A1 (en) * 2004-04-04 2005-10-20 Oligene Gmbh Method for detecting signatures in complex gene expression profiles
GB0426982D0 (en) * 2004-12-09 2005-01-12 Secr Defence Early detection of sepsis
EP1869463A4 (en) * 2005-04-15 2010-05-05 Becton Dickinson Co Diagnosis of sepsis

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013107826A3 (en) * 2012-01-17 2013-10-03 Institut Pasteur Use of cellular biomarkers expression to diagnose sepsis among intensive care patients
WO2014209238A1 (en) * 2013-06-28 2014-12-31 Acumen Research Laboratories Pte. Ltd. Sepsis biomarkers and uses thereof
JP2016526888A (en) * 2013-06-28 2016-09-08 アキュメン リサーチ ラボラトリーズ プライヴェット リミテッドAcumen Research Laboratories Pte. Ltd. Sepsis biomarkers and their use
EP3013985A4 (en) * 2013-06-28 2017-07-19 Acumen Research Laboratories Pte. Ltd. Sepsis biomarkers and uses thereof
CN110129425A (en) * 2013-06-28 2019-08-16 睿智研究实验室私人有限公司 Pyemia biomarker and its application
CN103882039A (en) * 2013-09-23 2014-06-25 中国农业科学院上海兽医研究所 Fluorescent quantitation RT-PCR (Reverse Transcription-Polymerase Chain Reaction) method for detecting duck MAPK1 (Mitogen-activated Protein Kinase) gene
CN103882039B (en) * 2013-09-23 2016-04-20 中国农业科学院上海兽医研究所 Detect the fluorescent quantitative RT-PCR method of duck MAPK1 gene
CN107840876A (en) * 2016-09-18 2018-03-27 北京奥维亚生物技术有限公司 A kind of mouse Odf1 polypeptides and its preparation method for antibody

Also Published As

Publication number Publication date
GB0722582D0 (en) 2007-12-27
GB0820899D0 (en) 2008-12-24
WO2009063249A3 (en) 2009-10-15
GB2454799A (en) 2009-05-20

Similar Documents

Publication Publication Date Title
AU2005313114B2 (en) Early detection of sepsis
US10697975B2 (en) Methods for identifying, diagnosing, and predicting survival of lymphomas
US20220325348A1 (en) Biomarker signature method, and apparatus and kits therefor
US7711492B2 (en) Methods for diagnosing lymphoma types
US20090186774A1 (en) Sepsis detection microarray
US7640114B2 (en) Method of diagnosis of cancer based on gene expression profiles in cells
JP2023138990A (en) Diagnosis of sepsis
Weigt et al. Gene expression profiling of bronchoalveolar lavage cells preceding a clinical diagnosis of chronic lung allograft dysfunction
WO2009063249A2 (en) Early detection of sepsis
CA3103572A1 (en) Methods and compositions for the analysis of cancer biomarkers
WO2025172242A1 (en) Method of assessing the risk of death of a patient
AU2007277142B2 (en) Methods for identifying, diagnosing, and predicting survival of lymphomas
WO2024119057A2 (en) Plasma cell-free rna signatures of tuberculosis
HK1228037A1 (en) Biomarker signature method, and apparatus and kits therefor
KR20160037137A (en) Sepsis biomarkers and uses thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08848987

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08848987

Country of ref document: EP

Kind code of ref document: A2