T CELL EPITOPES USEFUL IN A SEVERE ACUTE RESPIRATORY SYNDROME (SARS) VIRUS VACCJNE AND AS DIAGNOSTIC TOOLS AND METHODS FOR IDENTIFYING SAME
FIELD OF THE INVENTION
The present invention relates to the identification of T-cell epitopes within Severe Acute Respiratory Syndrome (SARS) Virus, for use as diagnostic as well as prophylactic and therapeutic vaccine use. The present invention also relates to subunit vaccines.
BACKGROUND OF THE INVENTION
SARS virus
Severe acute respiratory syndrome (SARS) is a respiratory illness that has recently been reported in Asia, North America, and Europe.
A worldwide outbreak of severe acute respiratory syndrome (SARS) has been associated with exposures originating from a single ill health care worker from Guangdong Province, China. A novel coronavirus has been isolated from patients who met the case definition of SARS.
Prediction of epitopes
We describe an advanced motif sampler method inspired by the Gibbs sampling method described by C. E. Lawrence et al. (Science 1993, 262, 208- 214). The method derives weight-matrices describing the binding motif of a given MHC complex using a Monte Carlo Metropolis sampling of the sequence alignment space combined with the advanced techniques of sequence weighting and pseudo-count correction for low counts. The input the method is amino acid peptide sequences knowing to bind to an MHC complex. We measure the performance of the method by use of both the Pearson correlation coefficient and ROC curve plots. The binding weight- matrices are calculated from peptides downloaded from the public databases of SYFPElTHI (Database for MHC ligands and peptide motifs, Immunogenics 50:213-219, 1999) and MHCPEP (V. Brusic, G. Rudy, A.P. Kyne and L-C. Harrison: MHCPEP, a database of MHC-binding peptides:
update 1997 Nucleic Acids Research, (1998), Vol. 26, No. 1, pp. 368-371). We estimate weight-matrices for the nine class I supertypes (A1 , A2, A3, A24, B7, B27, B44, B58 and 62) described by Sette and Sidney (Immunogenetics 1999 Nov; 50 (3-4): 201-212) and for the class Il allele HLA-D4(B1*0401). The accuracy of the method is benchmarked based on data consisting of peptides with associated measured binding affinity to the HLA class I alleles of HLA- A0101, HLA-A0201, HLA-A0301/HLA-A1101 and HLA-B0702 as well as on a set of public available class Il epitopes. We make a detailed comparison of its performance to that of a series of other methods and demonstrate that the present method for both class I and class Il binding prediction has a performance that is comparable to or higher than that of the other methods. Finally we use the method (for class I in combination with prediction of c- terminal proteasomal cleavage (C. Kesmir et al, Protein Eng 2002 Apr; 15 (4): 287-96) to predict T-cell class I and class Il epitopes for the genome of Severe Acute Respiratory Syndrome (SARS) virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.
Introduction The hallmark of the immune system is its ability to recognise and distinguish between self (friend) and non-self (enemy). The T cells do this by recognizing peptides that are bound to Major Histocompatibility Complex (MHC) complexes. A number of methods for predicting the binding of peptides to MHC molecules have been developed (reviewed by Schirle M, Weinschenk T, Stevanovic S. Combining computer algorithms with experimental approaches permits the rapid and accurate identification of T cell epitopes from defined antigens. J. Immunol Methods. 2001 Nov 1; 257(1-2): 1-16) since the first motif methods was presented (Rothbard JB, Taylor WR. A sequence pattern common to T cell epitopes. EMBO J. 1988 Jan; 7(1):93-100; Sette A, Buus S, Appella E, Smith JA, Chesnut R, Miles C, Colon SM, Grey HM. Prediction of major histocompatibility complex binding regions of protein antigens by sequence pattern analysis. Proc Natl Acad Sci U S A. 1989 May; 86(9): 3296- 300.). The discovery of allele specific motifs (FaIk K, Rotzschke O, Stevanovic S, Jung G, Rammensee HG. Allele-specific motifs revealed by sequencing of
self-peptides eluted from MHC molecules. Nature. 1991 May 23; 351(6324): 290-6) lead to the development of more accurate algorithms (Pamer EG, Davis CE1 So M.. Expression and deletion analysis of the Trypanosoma brucei rhodesiense cysteine protease in Escherichia coli. Infect Immun. 1991 Mar; 59(3): 1074-8; Rόtzschke O, FaIk K, Stevanovic S1 Jung G1 Walden P, Rammensee HG. Exact prediction of a natural T cell epitope. Eur J Immunol. 1991 Nov; 21(11): 2891-4). In the simple types of prediction tools it is assumed that the amino acids at each position along the peptide sequence contribute with a given binding energy, which can be added up to yield the overall binding energy of the peptide. (Meister GE, Roberts CG, Berzofsky JA1 De Groot AS. Two novel T cell epitope prediction algorithms based on MHC- binding motifs; comparison of predicted and published epitopes from Mycobacterium tuberculosis and HIV protein sequences. Vaccine. 1995 Apr;13(6):581-91). Similar types of approaches are used by the EpiMatrix method (Schafer JR, Jesdale BM, George JA, Kouttab NM, De Groot AS. Prediction of well-conserved HIV-1 ligands using a matrix-based algorithm, EpiMatrix. Vaccine. 1998 Nov; 16(19): 1880-4), the BIMAS method (Parker KC, Bednarek MA, Coligan JE. Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol. 1994 Jan 1; 152(1): 163-75)) and the SYFPETHI method
(Rammensee HG, Friede T, Stevanoviic S. MHC ligands and peptide motifs: first listing. Immunogenetics. 1995;41(4):178-228.). Our improved method use a novel Gibbs sampling algorithm to achieve more accurate prediction of the peptide/MHC binding affinity.
SUMMARY OF THE INVENTION
In a first aspect the present invention consists of T cell epitopes in Severe Acute Respiratory Syndrome Virus (SARS). More specifically there is provided 20 linear peptide epitopes for each of the HLA class I supertypes of A1 , A2, A3, A24, B7, B27, B44, B58 and B62, respectively, and 20 linear peptide epitopes for each of the HLA class Il alleles of DRB1*0401 , DRB1*1501, DRB1*0102, DRB1*0410, DRB1*0301, DRB1*1301, DRB1*0802, DRB1*1101 , DRB1*0701 and DRB5*0101 , respectively, from the SARS genome. In a second aspect the invention also consists of variants of
these sequences. In a third aspect this invention consists of a method to predict these sequences from genome data and to validate the predictions experimentally. In a fourth aspect the present invention provides compositions including these epitopes for use in a vaccine and to induce a T cell response in a subject, or as a diagnostic tool.
BRIEF DESCRIPTION OF THE TABLES AND FIRGURES
Table 1. Data for the training and evaluation of the HLA class I binding prediction. The first column gives the supertype names included in the calculation, the second column the number of unique 9mer peptides in the training set for corresponding supertype, the third column the HLA allele name for the evaluation set data, and the fourth and fifth columns the number of peptides and the number of binding peptides in the evaluation set, respectively.
a)
b)
5 Table 2. Performance of the Gibbs sampler compared to that of HMMER for the two different types of sequence weighting of Henikoff & Henikoff and clustering, respectively. The table compares the predictive performance in terms of the Pearson correlation coefficient (a) and Aroc (b) for the four supertypes of A1, A2, A3 and B7. The Henikoff and
10 Henikoff sequence weighting type the performance for the Gibbs sampler is given for pseudo-count weights of 20, 50 and 100, and for clustering for pseudo-count weights of 50, 100 and 200, respectively. The last two columns give the performance of the HMMER package and the SYFPEITHI website predictor, respectively. The lower row in the
15 table gives the average performance over the four evaluation sets.
Table 3 Prediction performance of the Gibbs sampler on the four evaluation datasets for different position specific weight values. The first column gives the supertype name, the second column the motif positions with high importance for peptide binding, the fourth and fifth column the number of peptides in the training and test sets, respectively, and the last five columns the predictive performance in terms of the Pearson correlation coefficient and Aroc for weight of 1, 2, 3, 5 and 9, respectively. The last row gives the average performance over the 4 supertypes.
10
Table 4 Comparison of the prediction accuracy of the Gibbs sampler to that of TEPITOPE for the 10 datasets described in the text. The three columns gives name, the number of peptides and the number of peptides classified as binders for each of the datasets. The last two
15 columns gives the Aroc values calculated using the TEPITOPE and Gibbs sampled weight matrices, respectively.
BEST METHOD OF CARRYING OUTTHE INVENTION
The following examples describe the best methods for carrying out the invention.
EXAMPLE 1: PREDICTION OF EPITOPES
Materials and Methods
The Gibbs sampler
We have implemented an advanced motif sampler method inspired by the Gibbs sampling method described by (C. E. Lawrence et al., Science 1993 262, 208-214). The method finds an optimal local alignment of a set of N sequences by means of Monte Carlo Metropolis sampling of the alignment space. The scoring-function guiding the Monte Carlo search is defined in terms of fitness (information content) to a log-odds matrix calculated from the alignment. In the implementation presented here only fixed length motif search is described.
The program samples possible alignments of the N sequences. For each alignment a log-odds score matrix is calculated as log(pi,j/qi), where pij is the frequency of amino acid i at position j in the alignment and qi the background frequency of that amino acid. The values of pij are estimated using sequence weighting and pseudo-count correction for low counts. Sequence weighting is performed using either the method described by Henikoff and Henikoff (J. Mo/. Biol. 1994, 243574-578) or a clustering algorithm where sequences in a cluster are assigned a weight corresponding to 1/Nc, where Nc is the cluster size. In the Henikoff and Henikoff scheme an amino acid is assigned a weight w = 1/rs, where r is the number of different amino acids at a given position in the alignment and s the number of occurrences of the amino acid. The weight of a sequence is then assigned as the sum of the amino acid weights. The pseudo-count correction for low counts is performed using a Blosum weighting scheme (Altschul et al., Nucleic Acids Research, 1997, 25 173389- 3402).
Two distinct Monte Carlo moves are implemented in the algorithm.
1) Single sequence move
2) Phase shift move
In the single sequence move, a new start point for the alignment of a sequence is selected at random. In the phase shift move the entire alignment is shifted a random number of residues to the left or right. This last move allows the program to efficiently escape local minima.
The energy function guiding the Monte Carlo sampling is defined as
E = S i1J N(Ij) * log( p*Ojyq(i) ) (1)
where N(i j) is the concurrency number of amino acid i at position j in the alignment, p*(i j) is the pseudo-count and sequence weight corrected amino acid frequency of amino acid i and position j in the alignment. Finally q(i) is the background frequency of amino acid j.
The probability of accepting a move in the Monte Carlo sampling is defined as
P = min( 1 , exp(dE/T) ),
where dE is difference in energy between the end and start configurations and T a scale factor. Note that we seek to optimize the energy function, hence the positive sign for dE in the equation. T is a scale that is lowered during the calculation.
Prior knowledge of important positions in sequence motif can be included in the search by allowing for differential weighting of the positions in the motif. The use of position weights can provide an important guide for the Gibbs sampler when searching for a subtle sequence signal.
Weight-matrix calculation
A simple use of the Gibbs sampler is the calculation of weight-matrices from preaiigned sequences. Here the matrix is calculated using the Gibbs sampler with zero Monte Carlo moves and zero T steps. The result is a weight-matrix calculated using the preaiigned sequences with inclusion of both pseudo- count correction and sequence weighting.
Gibbs sampler parameters
The Gibbs sampler has a series of free parameters defining the manner in which a weight-matrix is calculated from an alignment. The important parameters are
i) Sequence weighting method H) Back-ground amino acid distribution iii) Pseudo-counts correction method iv) Weight on pseudo-count correction
ad i) Two different strategies for sequence weighting were test: Sequence clustering and sequence weighting as described by Henikoff and Henikof (1992). For the sequence clustering we use a Hobohm 1 like algorithm
Hobohm U. et al. Protein Sc/ 1992 Mar;1(3):409-17) with ungaped alignment and sequence identity of 62% as cluster threshold. After clustering each peptide in a cluster is assigned a weight equal to 1/Nc, where Nc is the cluster size.
ad ii) Three types of background amino acid frequencies were tested: The Swiss-prot (Bairoch A. and Apweiler R. J. MoI. Med. 75:312-316(1997)) amino acid distribution, a flat distribution, and a distribution estimated from the sequence input to the algorithm.
ad iii) Two strategies for pseudo-count correction were tested: Equal and Blosum correction, respectively. In both cases the pseudo-count frequency is estimated as described by Altschul et al. (1997). For the Equal correction a
substitution matrix with identical frequencies for all amino acid substitutions is applied. For BIosum correction a Blosum 62 substitution matrix is applied.
ad iv) The effective amino acid frequencies are estimated as described by Altschul et al. (Nucleic Acids Research, 1997, 25 173389-3402). When the sequence weighting is performed using clustering, the effective sequence number is the number of clusters. When sequence weighting as described by Henikoff and Henikoff (J. MoI. Biol. 1994, 243 574-578) is applied, the mean number of different amino acids in the alignment gives the effective sequence number. In both situations the effective amino acid frequency is calculated as
f = (a*f + b*g)/(a+b),
Here f is the observed frequency, g the pseudo-count frequency, a the effective sequence number and b the weight on the pseudo-count correction.
Data
To define optimal parameter settings we have applied to Gibbs sampler to three specific problems
I) MHC class I binding affinity prediction
II) Proteasomal cleavage prediction
III) MHC class Ii binding affinity prediction
For the first two problems we apply the Gibbs sampler on prealigned sequences to estimate weight matrices that subsequently are applied to predict the binding affinity of a set of peptides in an evaluation set not included in the Gibbs sampling. For the MHC class Il binding predictions no efficient and accurate method exists that can perform the binding motif recognition, and we hence apply the Gibbs sampler to estimate both the alignment and the weight-matrix.
MHC class I binding:
We download peptides known to bind to MHC class I molecule from the databases of SYFPEITHI (Database for MHC ligands and peptide motifs, lmmunogenitics 50:213-219, 1999) and MHCPEP (V. Brusic, G. Rudy, A.P. Kyne and L.C. Harrison: MHCPEP, a database of MHC-binding peptides: update 1997 Nucleic Acids Research, (1998), Vol. 26, No. 1, pp. 368-371). Only peptides of length 9 were included. The peptides were clustered into nine the supertypes (A1 , A2, A3, A24, B7, B27, B44, B58 and B62) described by Sette and Sydney (Immunogenetics 1999 Nov; 50 (3-4): 201-212). In Table 1 the number of unique peptides for each supertype is given. These peptides constitute the training set for the estimation of MHC class I binding weight-matrices. For 4 of the 9 supertypes (A1 , A2, A3 and B7) datasets of peptides for which the binding affinity to the MHC molecule has been measured by the method described by Sylvester-Hvid C. et al. (Tissue Antigens 2002 Apr;59 (4):251-258) and Buus S. et al. (Biochem Biophys Acta 1995 1243: 453-460) are used to evaluate the prediction accuracy of the corresponding weight-matrices. Also in Table I we give the number of peptides in the evaluation set, the corresponding allele name and number of binding peptides (affinity stronger than 500 nM) for each of the 4 supertypes, respectively.
Proteasomal cleavage:
As training set we use the dataset described by C. Kesmir et al. (C. Kesmir et al, Protein Eng 2002 Apr; 15 (4): 287-96)). The set consists of 881 peptides extracted from the SYFPEITHI and MHCpep databases. The set is constructed so that no MHC molecule is represented with more than 5% of the peptides. The peptides are extracted in a window of 17, so that the C terminal of the epitope is at the central position 9. The evaluation set is the HIV data set described by C. Kesmir et al. (C. Kesmir et al, Protein Eng 2002 Apr; 15 (4): 287-96).
MHC class Il binding:
We download peptides binding to the MHC class Il molecule HLA- DR4(B1*0401) from the SYFPEITHI and MHCPEP databases. The dataset
consists of 509 unique peptides sequences. We remove peptides that do not allow for a hydrophobic residue at the P1 position in the binding motif (Brusic V., et al. 1998, Bioinformatics 14 (2) 121-130). That is a peptide is removed if no hydrophobic residues are present at the first N-L+1 positions, where N is the peptide length and L is the motif length. The hydrophobic filter leaves out 27 peptides. The final training set has 482 unique peptides. The length distribution in the training set ranges from 9 to 30 residues.
Results We apply the Gibbs sampler to the MHC class I binding motif problem to estimate optimal settings for the above-described parameters. For a given parameter setting we estimate weight-matrices for the four supertypes A1 , A2, A3 and B7 using the peptides in the training sets and next evaluate the predictive performance by calculating the Pearson correlation coefficient between the log-transformed affinities (see M. Nielsen et al., Protein Sc/ 2003 May;12(5):1007-17) and the weight-matrix predictions. By applying the same parameter setting to all 4 supertypes, we minimize the risk of overfitting.
In all situations the use of Equal correction for estimating the pseudo-counts gave lower predictive performance compared to Blosum correction (data not shown). The background distribution of amino acids estimated from Swiss- prot gave consistently higher predictive performance compared to other two schemes of background frequency estimation (data not shown). In the rest of the analysts we hence use the Gibbs sampler with the pseudo-count estimated as Blosum correction and the background amino acid distribution estimated from Swiss-prot. We performed weight-matrix calculation for a series of pseudo-count weights (b) for both sequence-weighting schemes. Table Il gives the result of the analysis. The table shows the prediction accuracy estimated in terms of the Pearson correlation coefficient for the two sequence weighting schemes for a series of pseudo-count weights for the four supertypes A1 , A2, A3 and B7, respectively. As a comparison is shown in the table the prediction accuracy of the weight matrices estimated using the HMMER package (Eddy S. R. 1998, Bioinformatics 14:755-763) program hmmbuild with the following command line options — fast — pam
/usr/cbs/bio/src/blast-2 . 1 . 2/data/BLOSUM62 as well as the prediction accuracy using the SYFPEITHI webinterface. It is clear from the results shown that the two sequence weighting schemes have comparable predictive performance and that the optimal performance is found for a value of b close to 50 for the Henikoff and Henikoff sequence weighting and close to 200 for the cluster sequence weighting, respectively. Since the sequence- weighting scheme based on sequence clustering has the best performance we will in the following unless otherwise stated use this sequence-weighting scheme, and we consequently set the pseudo-count weight to. From the table it is also clear that the predictive performance of the Gibbs sampler is comparable to that of both HMMER and SYFPEITHI. We next apply the Gibbs sampler to estimate a weight-matrix for proteasomal cleavage prediction. The matrix is estimated using the parameter setting described above. For the HIV evaluation set we find the following performance numbers for the Gibbs sampler, HMMER and NN (value taken from the original work by C. Kesmir), of respectively: 0.364, 0.370 and 0.370. These numbers thus confirm that the Gibbs sampler has a predictive performance comparable to that of the HMMER package.
Position specific weighting
In many situations prior knowledge about differential importance of the positions in the binding motif exists. This is for example the case for the MHC class I binding motif. Here binding is for most MHC alleles largely determined by the fitness of the peptide to binding pocket at position 2 and 9 in the motif (see Database for MHC ligands and peptide motifs, lmmunogenitics 50:213- 219, 1999). Such prior knowledge can be included in the search for binding motifs. In Table III we give the predictive performance of the weight-matrix for class I binding when position specific weighting is included in the motif search.
The second column in the table states what motif positions have high importance for peptide binding for the specific supertype. The residue set is determined as the set of anchor residue defined in the SYFPEITHI database, extended with auxiliary anchors if they occur at position 2 or 9. For the A1 supertype position 3 and 9 are specified as anchor positions whereas position
2 and 7 are auxiliary anchor positions. This means that positions 2, 3 and 9 are included as positions with high weight in the motif search for this supertype. From the results stated in the table it is clear that a position specific weighting of 2-3 gives an improved predictive performance.
Applying the weighting scheme to the proteasoma! prediction with high weight on position 2 and 9 in the motif, we get the following performance. W1 : 0.364, W2: 0.388, confirming that a higher weight on important position in the motif leads to an improved predictive performance.
MHC class Il binding motif and weight-matrix
We apply the Gibbs sampler to estimate the binding motif and corresponding weight-matrix for the HLA-DR4(B1*0401) molecule. We apply the Gibbs sampler with the parameters estimated previously for MHC class I binding. The T scale is set to 0.15 and is lowered to 0.001 in 10 uniform steps. At each T step 5000 Monte Carlo moves are performed. Equation 1 determines the acceptance of a move. The alignment space of a set of sequences has a very large set of local minima position with close to identical energy. To get an - effective sampling of these local minima, we repeat 100 MC calculations with different start configurations and calculate the final weight-matrix as the average over the top 50 highest scoring weight-matrices. From the SYFPEITHI database we find that anchor positions in the binding motif are located at position 1 , 4, 6 and 9, respectively, and we hence uses these positions with an increased weight to guide the Gibbs sampling. Anchor positions estimated from a logo-plot of a weight-matrix calculated using the Gibbs sampler with equal weights on all positions gave similar results. We benchmark the predictive performance of the weight-matrix on 10 datasets and compare the performance to that of TEPITOPE (Stumiolo T. et al., Nat Biotechnol 1999 Jun;17(6):555-61). The 10 dataset are, 8 data sets described by Raghava G.P.S
(http://www.imtech.res.in/raghava/mhcbench/index.html), and two experimental datasets described by Southwood et al. (J. Immunol, 1998, 160(7), 3363-3373) and Geluk et al. (Diabetes, 1998 vol 47, 1594-1600). The
binding of a peptide is calculated as the score of the highest scoring 9mer sub-peptide.
We use the non-parametric Aroc measure (the area under the ROC curve) (Sweett J.A. 1988, Science 240:1285-1293) to compare the accuracy of the two prediction methods. This measure removes the bias due to the different scales implicit in the two prediction methods. To calculate a ROC curve one must classify the data set into binders and non-binders. For the 8 MHCbench datasets we take peptides with an associate binding value of 0 to be non- binding, and all other peptides to be binders. For the Southwood and Geluk datasets we take an affinity of 1000 nM as threshold for peptide binding (Southwood et al., J. Immunol, 1998, 160(7), 3363-3373). In Table IV we give the results of the benchmark calculation comparing the performance of the Gibbs sampler to that of the TEPITOPE program.
From the table it is clear that the Gibbs sampled weight-matrix has comparable or better predictive performance than that of TEPITOPE. Only, for two of the 10 datasets (MHCbench_5a and b) does the TEPITOPE weight- matrix have a higher Aroc value than the Gibbs matrix.
SARS epitope predictions
We download the genome of SARS from genbank (Benson, D.A. et al. 2002, Nucleic Acids Research, 30 17-20. GenBank entry: NC004718) and make binding affinity predictions on all 9mer peptides using the weight matrices estimated as describe above however with the modification that we for the supertypes A1 , B44, B58 and B62 use position specific weights of 9 at the anchor positions (see Table III), and for the supertypes A24, A3, B7 and B27 use uniform position weights. For the class I weight-matrix calculations the Gibbs sampler is further run using the Henikoff and Henikoff sequence weighting scheme. For the HLA-A2 class I supertype we use artificial neural network as described earlier by M. Nielsen et al. (2003) to predict the peptide binding affinity. For all class Il alleles except DR4(B1*0401) we use the scoring matrices from TEPlTOPE. For class I epitopes we include a filter to
remove epitope with poor C-terminal proteasomal cleavage and exclude all peptides with a Netchop (C. Kesmir et al, Protein Eng 2002 Apr; 15 (4): 287- 96. www.cbs.dtu.dk/services/Netchop) score below 0.5. We rank the 9mer peptides on descending prediction score and report the top 20 as possible high-binding SARS epitopes.
For organisms with high mutation rate it is important to choose the constant regions as candidates epitopes when it comes to vaccine design. We estimate the sequence variation by aligning the sequence of the GenBank entry NC004718 to all other sequences from the organism found in Genbank (12th of May 2003, 12 complete genomes). We exclude protein sequences shorter than 100 amino acids to disregard non-translated open reading frames. From the set of alignments with an e-value smaller than 10"3, we estimate the sequence variation as the number of different amino acids found at each position. If no sequence hit is found at a given position, the mutation rate is undefined and we set the sequence variation to infinity. In situations where one or more of the reported top 20 epitopes are located in a region with sequence variation, we extent the epitope listing so that we for each allele report 20 conserved epitopes
The above calculation is given for the 9 class I supertypes, and for the class Il alleles of DRB1*0401, DRB1*1501, DRB1*0102, DRB1*0410, DRB1*0301, DRB1*1301 , DRB1*0802, DRB1*1101 , DRB1*0701 and DRB5*0101. The set of class Il alleles is selected so as to obtain efficient and broad allele coverage.
EXAMPLE 2
Selection of variant sequences
Besides the peptides identified by the methods described in example 1 , a number of variants of these peptides may be useful as for example a vaccine a diagnostic tool. These variant peptides may differ in that the amino acid found in one or more positions of the original peptide are replaced by different amino acids. These different amino acids may be selected in a number of ways. A hydrophobic amino acid may for example be replaced by another
hydrophobic amino acid. Groups of interchangeable amino acids may for example be selected as polar (N and Q), charged (D, E, K, R and H), acidic (D, E), basic (K, R, H), ambivalent (P, T1 S1 C, A, G1 Y and W) or hydrophobic (F1 L1 1, M and V). Another way to construct groups of similar amino acid is to group those that have an substitution score above a given threshold such as 0 (zero) according to a amino acid substitution matrix such as Blosum 62 (Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992 Nov 15; 89(22): 10915-9). Yet another way to select variant peptides is to replace one or more amino aids with other amino acids with the aim of creating a variant peptide that is predicted to bind almost as good as or better than the original peptide. The prediction may for example involve the use of a neural network method, a hidden Markov model or a matrix method. Amino acids may also be replaced by other amino acids than the 20 amino acids normally found in proteins. The variant peptides may also contain additional amino acids or fewer amino acids than the original peptide. The variant peptides may for example be one amino acid longer or shorter in either the N or the C terminal end.
EXAMPLE 3 Measure binding of epitopes to MHC molecules
The binding of peptides to the HLA molecules can be verified experimentally for example by the method described by Sylvester-Hvid et al. (Sylvester-Hvid C, Kristensen N1 Blicher T, Ferre H, Lauemoller SL1 WoIf XA, Lamberth K, Nissen MH, Pedersen LO1 Buus S. Establishment of a quantitative ELISA capable of determining peptide - MHC class I interaction. Tissue Antigens. 2002 Apr;59(4):251-8). This can be used to verify that peptides that are predicted to bind a given HLA molecule, actually binds that molecule. The results from such experiments can be used to select a subset of peptides will undergo further testing to verify their utility in a vaccine.
EXAMPLE 4
Epitope Stimulation of Cytolytic T Cell Function in vitro
The ability of individual peptides to be recognized by naϊve, normal human T cells can be measured by antigen stimulation using peptide-pulsed antigen
presenting cells. Peptides purified following chemical synthesis will be added to cultures of human antigen-presenting cells (APC), in order to test stimulation of normal human T cells. This is a measurement of the abundance of epitope specific T cells in normal human blood products, and as well a determination that the binding affinity of the peptides for class I HLA is sufficient to trigger T cell activation.
Human monocytes, from peripheral blood of 3 HLA-A2 human donors, will be purified using adherence to glass wool and grown in culture for 2 days prior to peptide loading. Culture conditions will include commercially available lots of GM-CSF and IL-4 at reported concentrations. It has been demonstrated that exogenous peptides can be added in excess to such antigen presenting cells and displace peptides bound on cell surface class I HLA receptors.
These peptide-pulsed cells will then be washed, and incubated with freshly isolated human T cell populations. Samples will be cultured in 96 well microtiter plates for a period of 7 days. Each culture will contain approximately 5 x 10e6 normal human T cells. T cell activation will be measured in two ways by the induction of T cell activation markers and by monitoring of IL-2 production.
Flow cytometry using FITC-conjugated anti-human CD69 will be performed on duplicate samples of all cultures. CD69 is a proven T cell activation marker used in similar studies, and well be used in double-staining experiments with anti-CD3 antibody to determine T cell numbers.
Controls will consist of T cells incubated in identical conditions but lacking peptide-pulsed dendritic cells, and T cells exposed to a positive control peptide-APC population. A positive control peptide from prior investigation of HIV T cell HLA-A2 peptides will be used. Results will be considered positive if parallel numbers of CD69+ T cells are generated in test peptide exposures compared to positive control, and if baseline numbers remain statistically reduced.
In parallel, duplicate wells will be harvested and assayed by ELISA for the induction of IL-2, an interleukin expressed by activated T cells. Controls will be as described above, and individual peptides will be considered positive if parallel levels of IL-2 are generated in test peptide exposures compared to positive control, and if baseline numbers remain statistically reduced
EXAMPLE 5
Abundance of epitope-specific T cells in normal and patient blood
Diagnostic kits are now available which allow flow cytometric detection of antigen specific T cells. These assays use the same principles employed in receptor binding assays described above - adding synthesized peptides to recombinant HLA molecules, but then extend this assay by measuring the ability of T cells to bind the HLA/peptide complex. This is accomplished by generating flourescently conjugated tetrameric complexes of peptide/HLA using avidin-biotin conjugation, and then measuring the number of fluorescently labeled T cells in a blood product.
This is a tool for monitoring quantitative responses to vaccination or in measuring antigen specific T cells in patient blood samples. It is also a surrogate assay to measure potency of a T cell response during healthy donor vaccine trials.
Each peptide will be employed according to protocol, and HLA-A2 donor bloods (3/sample) will be tested for baseline levels of T cell recognition. It is expected that positive numbers in the range of 1:104 to 1:10s be obtained from the naϊve T cell pool.
Baseline numbers will be generated using non-HLA-A2 donors as well as non- peptide loaded complexes. Again, HIV T cell epitopes will be tested as controls as well. Samples which have tested positive in the in vitro T cell stimulation assay will be chosen for testing by this method.
Assays will also be performed on peripheral blood T cells obtained from patients who have recently sero-converted to SARS positivity. It is expected
that significantly higher quantitation (1 :102 to 1 :103 ) of epitope specific T cells be measured, should specific epitopes be involved in the disease response. This determination would be proof of the utility of specific epitopes for diagnostic purposes.
EXAMPLE 6
Demonstration of an effective DNA vaccine formulation for stimulating T cell responses in transgenic mice expressing the human HLA-A2 receptor In order to prove the vaccine utility of predicted T cell epitope peptides, formulation of epitiopes in an accepted delivery platform, followed by induction of T cell responses in an animal model, can be demonstrated. Because small animal models for SARS infectivity do not exist, pre-clinical proof of concept is limited to immunogenicity studies.
It has been reported that DNA vaccines can present multiple T cell epitopes in clustered and simple linear array on a vector, without excessive immunodominance of one or few epitopes. In our initial attempts to validate our epitope selection and testing criteria, we will construct and measure the effectiveness of a DNA vaccine to stimulate T cell responses in a transgenic mouse model expressing human HLA-A2.
Peptide epitopes will be chosen based on T cells stimulation indices, followed by generation of a short DNA vaccine construct to express these peptides intracellular^. Genetic constructs successful in this approach have previously been identified, and we will use this strategy in our experiments. A synthetic mini-gene will be constructed by overlapping oligonucleotides and confirmed by DNA sequence analysis. Known proteosome cleavage sites in will flank the peptide motifs mouse for enhanced processing. The expression cassette will be driven by a standard CMV promoter in a commercially available expression plasmid.
Transgenic mice expressing HLA-A2 molecules in the complete absence of H- 2 class I molecules in an H-2Kb, H-2Db double KO context have been
created, and used to determination the immunological response to viral T cell peptides known to be HLA-A2 specific.
Experimental groups will be organized as follows: Control 1 : (n = 4) animals receiving DNA vaccine cassette lacking peptide epitopes
Control 2: (n = 4) animals receiving DNA vaccine cassette with HIV peptide control Test Group 1: (n=4) animals receiving DNA vaccine with 5 sequential epitopes Test Group 2: (n=4) animals receiving DNA vaccine with 3 sequential epitopes Test Group 3: (n=4) animals receiving DNA vaccine with 1 epitope
Animals will be injected Lm. and boosted at 4 week intervals. Peripheral blood cells will be monitored for epitope specific T cells as described in Example 5, with pre-immunization blood cells measured for baseline levels.
Flow cytometric analysis of collected blood cell populations will be performed as described in Aim 3b, with the exception that secondary staining will be performed using anti-mouse CD3, anti-mouse CD4, and anti-mouse CD8 fluorescent antibodies to gain T cell number and subset information.