[go: up one dir, main page]

AU721803B2 - Metabolically engineered lactic acid bacteria and means for providing same - Google Patents

Metabolically engineered lactic acid bacteria and means for providing same Download PDF

Info

Publication number
AU721803B2
AU721803B2 AU37659/97A AU3765997A AU721803B2 AU 721803 B2 AU721803 B2 AU 721803B2 AU 37659/97 A AU37659/97 A AU 37659/97A AU 3765997 A AU3765997 A AU 3765997A AU 721803 B2 AU721803 B2 AU 721803B2
Authority
AU
Australia
Prior art keywords
ala
leu
val
lys
gly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
AU37659/97A
Other versions
AU3765997A (en
Inventor
Jose Arnau
Hans Israelsen
Flemming Joergensen
Soeren Michael Madsen
Astrid Vrang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Danish Technological Institute
Original Assignee
Danish Technological Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Danish Technological Institute filed Critical Danish Technological Institute
Publication of AU3765997A publication Critical patent/AU3765997A/en
Application granted granted Critical
Publication of AU721803B2 publication Critical patent/AU721803B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0008Oxidoreductases (1.) acting on the aldehyde or oxo group of donors (1.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0006Oxidoreductases (1.) acting on CH-OH groups as donors (1.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1025Acyltransferases (2.3)
    • C12N9/1029Acyltransferases (2.3) transferring groups other than amino-acyl groups (2.3.1)

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Description

Wn o/n7r;7 1 rIIU y1Juj O 9P ClUDK97/00336 METABOLICALLY ENGINEERED LACTIC ACID BACTERIA AND MEANS FOR PROVIDING SAME FIELD OF INVENTION The present invention pertains to the field of lactic acid bacterial starter cultures which are useful in the production of food products, animal feed or aroma compounds, and specifically there is provided means for metabolically engineering such lactic acid bacteria which are thereby modified in their production of metabolic end products including aroma or flavour compounds and/or compounds having antimicrobial effects.
TECHNICAL BACKGROUND AND PRIOR ART Lactic acid bacteria are used extensively as starter cultures in the food industry in the manufacture of fermented products including milk products such as e.g. yoghurt and cheese, meat products, bakery products, wine and vegetable products. Lactococcus lactis is one of the most commonly used lactic acid bacteria in dairy starter cultures. However, several other lactic acid bacteria such as Leuconostoc species, Lactobacillus species and Streptococcus species are also commonly used in food starter cultures. In the art, species of the obligate anaerobic bacteria belonging to Bifidobacterium which are taxonomically different from the group of bacteria generally referred to as lactic acid bacteria, are frequently included in the group of lactic acid bacteria due to their application as dairy starter cultures. Lactic acid bacteria are also commonly used as inoculants in feedstuffs of plant and animal origin, i.a. for preservation purposes.
When a lactic acid bacterial starter culture is added to a substrate including milk or any other food or feed product starting material under appropriate conditions, the bacteria WO 98/07867 T /-irT«frfc-r.y/ 2 r grow rapidly with concomitant conversion of lactose or other sugars to lactic acid/lactate and minor amount of acetate resulting in a pH decrease. In addition, several other metabolites are produced during the growth of lactic acid bacteria. Among these metabolites, diacetyl is one essential flavour compound which is formed during fermentation of the citrate-utilizing species of e.g. Lactococcus, Leuconostoc, and Lactobacillus. Diacetyl is formed by an oxidative decarboxylation (R1, Fig. 1) of a-acetolactate which is formed from two molecules of pyruvate by the action of a-acetolactate synthase (R2, Fig. 1).
Pyruvate is a key intermediate of several lactic acid bacterial metabolic pathways including the citrate metabolism and the degradation of lactose or glucose to lactate. The pool of pyruvate in the cells is critical for the flux through the pathway leading to diacetyl, acetoin and 2,3 butylene glycol due to a-acetolactate synthase affinity for pyruvate.
Overproduction of a-acetolactate synthase in Lactococcus lactis as an approach for increased production of diacetyl has been disclosed by Platteuw et al. 1995.
An alternative metabolic engineering approach to providing an increased pool of pyruvate in lactic acid bacteria is to block one or several pyruvate degrading pathways. As an example hereof, a Lactococcus lactis mutant defective in the lactate dehydrogenase (R3, Fig. 1) has been disclosed by Gasson et al.
(ref. 8, unpublished data, in Platteuw et al. 1995). Under aerobic conditions pyruvate is accumulated in this mutant leading to the formation of increased levels of acetoin and 2,3 butylene glycol. However, formate and ethanol were the major metabolic end products obtained under anaerobic conditions, but the formation of the latter end products in high amounts is generally undesired in fermented dairy products typically being produced under anaerobic conditions.
WO 98/767 'l/'tT/TMr/'^t^trt^-i 3 r ,/IU Y I/UU3Jj The reaction whereby pyruvate is converted to formate and acetyl coenzyme A (acetyl CoA) (R4, Fig. 1) by the action of pyruvate formate-lyase (Pfl) takes place only under anaerobic conditions (Frey et al. 1994). An alternative pathway for the formation of acetyl CoA from pyruvate (R5, Fig. 1) in a lactic acid bacterium is by the activity of the pyruvate dehydrogenase complex (PDC). In contrast to Pfl, the activity of PDC appears to be optimal under aerobic conditions (Snoep et al. 1992).
Consequently, the pyruvate pool assumingly will be increased under anaerobic conditions by partially or completely blocking the Pfl activity. As mentioned above, an increased pyruvate pool may in turn lead to an increased flux from pyruvate towards acetoin and diacetyl via the intermediate a-acetolactate. Fermented foods or feed products produced by using a starter culture with reduced Pfl activity therefore may contain an increased amount of diacetyl or other products derived from conversion of a-acetolactate. In contrast, starter cultures with increased Pfl activity should result in enhanced production of the antimicrobially active metabolite formate and the use of such cultures in the production of feed or food products having increased shelf life can therefore be contemplated.
The pfl gene has been isolated from several microorganisms including Escherichia coli, Haemophilus influenzae, Clostridium pasteurianum and Streptococcus mutans. The Pfl enzyme is posttranslationally activated by the Pfl activase via formation of an organic free radical into a glycine residue located at the C-terminal of Pfl (Frey et al. 1994). This modification of Pfl occurs only in the absence of oxygen. Although the activation gene, act encoding the Pfl activase flanks the pfl gene in E.
coli, H. influenzae and C. pasteurianum, the act gene is transcribed from its own promoter, and the expression is essentially constitutive (Weidner et al. 1996). In contrast, the pfl expression is induced 12 to 15 fold by anaerobiosis (Sauter and Sawers 1990). The free radical enzyme, i.e. the activated Pfl, is destroyed by oxygen with concomitant fragmentation of the WO 98/07867 PT/D 4 X11 IlUU 000 86 4 i LIJ IUU33 polypeptide chain (ref. 2 in Kessler 1992). However, in E. coli a Pfl deactivase activity has been found which under anaerobic conditions reverts the active radical form to the native nonradical form of Pfl (Kessler et al. 1992). By this activity, Pfl deactivase protects Pfl against being irreversibly destroyed by oxygen.
The AdhE protein of E. coli has acetaldehyde dehydrogenase activity, catalyzing the conversion of acetyl CoA to acetaldehyde (R6, Fig. and ethanol dehydrogenase activity, catalyzing the conversion of acetaldehyde to ethanol (R7, Fig.
Additionally, the E.coli AdhE protein is responsible for the Pfl deactivase activity.
In the strict anaerobe, Clostridium acetobutylicum an adhE analogue, aad, has been cloned and characterized. However, the presence of Pfl deactivase activity could not be verified for the Aad protein, since no evidence exists for the presence of Pfl in C. acetobutylicum (Nair et al. 1994).
Lactic acid bacteria including Lactococcus lactis species are facultatively anaerobic organisms like E. coli, indicating that the occurrence of Pfl activase and deactivase activities in these organisms is to be expected. Analysis of the expression of adhE in E. coli has shown an eight fold increase under anaerobic growth (Chen and Lin 1991). The facts that the regulation of expression of pfl and adhE under anaerobic conditions is similar and that expression of act in E. coli is constitutive suggest that an equilibrium is formed between activated and deactivated Pfl under anaerobic conditions. If the deactivase activity of the AdhE protein is partially or completely blocked in lactic acid bacteria, an increased Pfl activity is expected to occur while, on the other hand, a reduced Pfl activity is expected to occur if the deactivase activity is overexpressed. If the Pfl activase is blocked, a decreased Pfl activity is contemplated.
WO 98/07867 PCT/DK97/00336 The acetaldehyde dehydrogenase and the ethanol dehydrogenase activities of the AdhE protein are also potential targets for metabolic engineering in lactic acid bacterial food starter cultures and cultures used in feed production or as cultures for the production of aroma compounds or antimicrobially active compounds. Thus, it can be contemplated that a block or modification of the ethanol dehydrogenase activity of such cultures may result in the overproduction of acetaldehyde which is an important flavour compound in yoghurt. Alternatively, a block of the acetaldehyde dehydrogenase activity could give rise to an increased production of acetate which in turn may result in improved preservation of fermented foods or feed products in whose production such modified cultures are used. Additionally, it is contemplated that such modifications of starter cultures would increase the pyruvate pool and consequently, the formation of diacetyl or other compounds derived from the conversion of a-acetolactate. Increasing one or both dehydrogenase activities will most likely direct the conversion of acetyl CoA from acetate to acetaldehyde or ethanol.
Based on the above analysis of the potential means of regulating the size of the pyruvate pool in lactic acid bacteria and the intracellular fluxes from this metabolic intermediate pool towards desirable end products, a novel approach has been developed for metabolically engineering lactic acid bacteria allowing the provision of useful lactic acid bacterial starter cultures either having an enhanced production of desirable flavour compounds or an increased production of antimicrobially active compounds which can be used to increase the shelf life of food or feed products.
In particular, the starting point for the invention is the achievement of the isolation and sequencing of the entire adhE and pfl genes of Lactococcus lactis. Based on these findings, it has become possible, by appropriate modifications of the genes and their expression and/or activity of one or more of WO 98/07867 PCT/K0"1An112J the enzyme activities encoded by these genes, to provide in a goal-directed manner lactic acid bacterial starter cultures having the above desirable characteristics, including cultures of strains having reduced or enhanced production of particular metabolites.
SUMMARY OF THE INVENTION Accordingly, the present invention provides novel means for metabolically engineering lactic acid bacteria, and lactic acid bacteria being modified by such means. Specifically, the invention relates in a first aspect to an isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
In further aspects, the invention pertains to a recombinant replicon comprising the above DNA sequence and to a recombinant lactic acid bacterial cell comprising such a replicon.
In still further aspects, there is provided an isolated DNA sequence comprising a sequence derived from a lactic acid bacterium, said sequence coding for a polypeptide having pyruvate formate-lyase activity, subject to the limitation that the sequence is not derived from oral Streptococcus species, a recombinant replicon comprising such a DNA sequence and a recombinant lactic acid bacterial cell comprising such a replicon.
In another aspect, the invention relates to a method of producing a lactic acid bacterial metabolite, the method comprising WO 98/07867 PCT/DK97/00336 cultivating a lactic acid bacterium comprising a DNA sequence as defined above which is modified so as to inactivate or reduce or enhance the expression of at least one of the enzymatic activities selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity, or a lactic acid bacterium comprising a DNA sequence which is modified whereby its production of pyruvate formate-lyase is reduced or inhibited, or whereby the enzyme is expressed in a modified form having a reduced pyruvate formate-lyase activity, or wherein the DNA sequence is modified whereby the expression of pyruvate formate-lyase is enhanced or whereby the enzyme is expressed in a modified form having an increased pyruvate formate-lyase activity, and isolating the metabolite from the culture.
The invention also pertains to methods of producing a food product or an animal feed, the method comprising the step of admixing to the food product or feed starting materials a starter culture of a lactic acid bacterium according to the invention and keeping the mixture under conditions allowing the starter culture to be metabolically active.
There is also provided an isolated DNA sequence derived from a lactic acid bacterium, said sequence coding for a product having a formate transporter activity.
DETAILED DISCLOSURE OF THE INVENTION The facultative anaerobe Escherichia coli is capable of carrying out mixed-acid fermentation during anaerobic growth in the absence of exogenous electron acceptors. In this connection, a WO 98/07867 PCT/DK97/00336 8 major fermentation product is ethanol which is synthesized from acetyl CoA by two consecutive NADH-dependent reductions catalyzed by a single polypeptide, AdhE, with an acetaldehyde dehydrogenase (ACDH) domain and alcohol dehydrogenase (ADH) domain. It has also been found that this polypeptide is responsible for pyruvate formate-lyase deactivase activity.
It has now been found that a DNA sequence showing significant homology to the E. coli gene, adhE which codes for a polypeptide showing substantial similarity with the above multi-functional E. coli AdhE polypeptide is present in lactic acid bacteria which are also facultative anaerobes, such as in Lactococcus lactis. It was therefore hypothesized that the gene product of the thus identified and isolated lactic acid bacterial DNA sequence might have similar enzymatic activities as the corresponding E. coli gene. This was found to be the case.
Accordingly, the present invention provides, as mentioned above, in its first aspect an isolated DNA sequence which comprises a sequence derived from a lactic acid bacterium, which sequence codes for a multi-functional polypeptide having at least one of the following enzymatic activities: (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity. The coding sequence for the multifunctional polypeptide is also referred to herein as the adhE gene, and the polypeptide encoded by the gene as the AdhE polypeptide.
In accordance with the invention, the DNA sequence coding for the multi-functional polypeptide may be derived from any lactic acid bacterium. In the present context, the term "lactic acid bacterium" designates gram-positive, microaerophilic or facultatively anaerobic bacteria which ferment sugars with the WO 98/07867 PCT/D)K97/00336 9 production of acids including lactic acid as the predominantly produced acid, acetic acid and propionic acid. The industrially most useful lactic acid bacteria are found among Lactococcus species, Streptococcus species, Lactobacillus species, Leuconostoc species and Pediococcus species. Additionally, the strict anaerobic Bifidobacterium species, which are commonly used in the manufacture of dairy products, are included in the group of lactic acid bacteria. The group of lactic acid bacteria comprises so-called mesophilic species which have optimum growth temperatures in the range of 15-30 0 C and which in many cases do not grow at temperatures exceeding 35-40 0 C. Other groups of lactic acid bacteria have higher growth temperatures, in particular species for which humans and/or animals are the natural habitat, e.g. Enterococcus species, oral streptococci and pathogenic streptococci.
In certain preferred embodiments, the above DNA sequence is derived from Lactococcus lactis including Lactococcus lactis subspecies lactis, Lactococcus lactis subspecies diacetylactis (also frequently referred to as Lactococcus lactis subspecies lactis biovar diacetylactis) and Lactococcus lactis subspecies cremoris.
In useful embodiments of the invention, the lactic acid bacterium-derived DNA sequence codes for a multifunctional polypeptide that is at least 30% identical with the gene products of the adhE gene of E. coli (FASTA, GCG Wisconsin accession No. P17547) or the aad gene of Clostridium acetobutylicum (FASTA, GCG Wisconsin accession No. P33744) or the gene product of the sequence of Table 1.4 herein (SEQ ID NO:3). In other useful embodiments, the identity to such other gene products is at least 40%, such as at least 50%, such as at least 60% identity or even at least 70% identity. The homology between the above gene products may also be expressed in terms of amino acid similarity in which case the similarity suitably is at least 60%, such as at least 70%, e.g. at least 80% similarity.
WO 98/07867 PCT/DK97/00336 In this context, the expression "amino acid similarity" indicates that a particular amino acid in a polypeptide sequence can be replaced by another amino acid having similar physical/chemical characteristics such as charge or polarity characteristics.
The sequence according to the invention which codes for the AdhE protein also includes such a coding sequence of lactic acid bacterial origin which hybridizes to the adhE coding sequence from L. lactis strain DB1341 under the following conditions: hybridization overnight at 65°C followed by washing the filters twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65 0 C for minutes.
In one specific embodiment, the DNA sequence according to the invention comprises the sequence as shown herein in Table 1.4 (SEQ ID NO:3) or the sequence designated adhemgl363 as shown in the below Table 1.8 (SEQ ID NO:12) or the sequence shown in Table 1.9 (SEQ ID NOS:28/30), or a mutant or variant hereof which codes at least in part for a polypeptide having at least one enzymatic activity selected from the group consisting of acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
In the present context, the above term "mutant or variant" is used to designate any naturally occurring or constructed nucleotide modification of the above DNA sequence which still allows a polypeptide having at least one of the defined activities to be expressed by the thus modified sequence. Accordingly, the modification may consist in one or more nucleotide substitutions in one or more codons, resulting in the translation of the same or different amino acid(s), or the modifica- WO 98/07867 PCT/DK97/00336 11 tion may be in the form of the insertion or deletion of one or more nucleotides/codons. The modifications can be provided by any conventional method including, where appropriate, modifications hereof, such as e.g. the use of restriction enzymes or random or site-directed mutagenesis, e.g. by means of transposable elements. It will be understood that the above DNA sequence according to the invention may also be provided as a synthetically produced sequence or it may be a hybrid sequence comprising in part a native sequence and in part a synthetically prepared sequence. Additionally, the above term "mutant and variant" includes any mutein of the sequence.
The above lactic acid bacterial DNA sequence whether in its native form or in a modified mutant or variant form may further comprise one or more sequences that regulate the expression of the coding sequence. Such regulatory sequences may be located upstream and/or downstream of the coding sequence or they can be placed on a different replicon, i.e. in trans. The regulatory sequences may be sequences which are natively associated with the coding sequence or they may be inserted or modified promoter sequences not natively associated with the coding sequence, which can be operably linked to the coding sequence.
Such sequences which are not natively associated with the coding sequence may be derived from the bacterial strain which is the source of the coding sequence or from a different organism. In this context, a regulatory sequence includes a promoter/operator sequence, a ribosome binding site, a sequence coding for a gene product which either enhances or inhibits the expression the coding sequence, such as a repressor or activator substance including e.g. a RNA sequence including an antisense RNA, a terminator sequence or a leader sequence regulating the excretion of the above multifunctional enzyme product. A promoter which is derived from a different organism or from the same organism may, depending on the desired characteristics of the resulting bacterial cell, have a stronger or a weaker promoter activity than the promoter with which the WO 98/07867 PCT/DK97/00336 12 coding sequence is natively associated.
In a useful embodiment, the coding sequence is under the control of a regulatable promoter. As used herein, the term "regulatable promoter" is used to describe a promoter sequence, the activity of which is dependent on physical or chemical factors present in the medium where organisms comprising the above coding sequence and its regulatory sequences are cultivated. Such factors include the cultivation temperature, the pH and/or the arginine content of the medium, a temperature shift eliciting the expression of heat shock genes, the composition of the growth medium including the ionic strength/NaCl content and the growth phase/growth rate of the host cell and stringent response.
A promoter sequence as defined above may further comprise sequences whereby the activity of the promoter becomes regulated. Thus, in lactic acid bacterial cultures for which it is advantageous to have a gradually decreasing activity of the coding sequence under control of the promoter sequence such further sequences may provide a regulation by a stochastic event and may e.g. be sequences, the presence of which results in a recombinational excision of the promoter or of genes coding for substances which are positively needed for the promoter function.
It has been found that in e.g. Lactococcus lactis there may be, upstream of the sequence coding for the above multifunctional polypeptide, DNA sequences coding for one or more open reading frames. Thus, such open reading frames were identified in both L. lactis strain DB1341 and strain MG1363. These open reading frames were designated orfB.
In a further aspect, the invention relates, as it is mentioned above, to a recombinant replicon comprising the above DNA sequence coding for the multifunctional polypeptide. As used WO 98/07867 PCT/DK97/00336 13 herein, the term "replicon" designates a DNA sequence which is capable of autonomous replication in a lactic acid bacterium.
Such a replicon can be selected from a plasmid capable of replicating in a lactic acid bacterium, a lactic acid bacterial chromosome and a bacteriophage derived from a lactic acid bacterium.
The replicon may comprise further sequences including marker sequences and linker sequences for the insertion of genes coding for desirable gene products. Thus, in useful embodiments, the replicon may comprise a gene coding for a lipase, a peptidase, a gene coding for a gene product involved in carbohydrate or citrate metabolism, a gene coding for a gene product involved in bacteriophage resistance or a gene coding for a lytic enzyme or a gene coding for a bacteriocin such as e.g.
nisin or pediocin. The gene may also be one which codes for a gene product conferring resistance to an antibiotic.
The gene coding for a desired gene product may be a homologous gene, i.e. a gene isolated from the same species as the host cell for the replicon, or a heterologous gene including a gene isolated from a lactic acid bacterial species which is of a species different from the host cell.
The invention also provides a recombinant lactic acid bacterial cell comprising the above replicon. Such a host cell may be derived from any species of lactic acid bacteria as defined herein, such as a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
The above lactic acid bacterial cell is useful in starter culture compositions for the manufacturing of food products including dairy products, meat products, wine, vegetables and bakery products, or in the preservation of animal feed. In the latter context, the present recombinant lactic acid bacterial WO 98/07867 Pr"T/i'IiQ"//nnl 14 I cells are particularly useful as inoculants in field crops which are to be ensiled or as preserving agents in feedstuff components of animal origin such as waste products from the slaughtering and fish processing industries.
When the cells are to be used for these purposes they are conveniently provided in the form of freeze-dried or frozen concentrates typically containing 10 9 to 1012 colony forming units (CFUs) per g of concentrate. Such concentrates may be provided as starter culture compositions comprising further suitable components such as e.g. preserving agents, stabilizing agent, cryoprotectants, nutrients, bacterial growth factors or further active components including enzymes.
An interesting use of the above lactic acid bacterial cell is in the manufacturing of a probiotically active composition. In the present context, the term "probiotically active" indicates that the bacteria selected for this purpose have characteristics which enable them to colonize in the gastrointestinal tract and hereby exert a beneficial regulatory effect on the microbial flora in this habitat. Such an effect may be recognizable as an improved food or feed conversion in humans or animals to which the cells are administered, or as an increased resistance against invading pathogenic microorganisms.
The above lactic acid bacterial cell can also be provided in the form of a culture for the production of an aroma or antimicrobially active compound.
In a particularly useful embodiment, the above lactic acid bacterial cell is one wherein the DNA sequence comprising the sequence coding for the multifunctional polypeptide is modified so as to inactivate or reduce the production of or the activity of at least one of the enzymatic activities selected from the group consisting of acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, Wnr OQI/0'T6Q7 ffrhr r r r I i15 rI.% IIJ /U.Y IIU3J3 (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
Such a modification can be made by methods which are known per se in the art. Thus, as typical examples, a DNA modification can be in the form of deletion, insertion or substitution of one or more nucleotides in the coding sequence possibly leading to the translation of a polypeptide having a modified amino acid composition. Such a modified polypeptide may have lost one or more of the above enzymatic activities or it/they may be reduced. An inactivation of the coding sequence may also be obtained by random or site-directed mutagenesis, e.g. using a transposable element which is integratable in the replicon comprising the coding sequence. Another useful means of providing inactivated mutants is Campbell-like homologous integration as it is described in the below examples.
The level of production of the multi-functional polypeptide can also be reduced by modifying or regulating regulatory sequences controlling the expression of the gene coding for the polypeptide. Thus, as one example, a native constitutive promoter can be replaced by a regulatable promoter, the function of which can be reduced or inhibited under appropriate conditions such as those physical and chemical promoter regulating factors as mentioned above. Alternatively, a native promoter which is in itself regulatable by certain factors may be replaced by another regulatable promoter which is negatively regulatable by other factors present in the cultivation medium for the recombinant cell.
Generally, the term "metabolic engineering" in relation to lactic acid bacteria covers manipulations of the bacteria themselves or of the conditions under which they are cultivated whereby the production of metabolites from the fermentation of WO Q/0t7II87 P ^'T/Tl^r^/tt i 16 %_JL1-UZY1 U00 sugars or citrate is modulated quantitatively or qualitatively.
Accordingly, a lactic acid bacterial cell which is modified as described above in one or more of its glycolytic pathways can be characterized as a metabolically engineered cell. Dependent on the type and the site of the DNA modification such a cell will be at least partially blocked in one or more of the above pathways catalyzed by the multi-functional polypeptide (R6/R7 in Fig. 1) and/or the pyruvate formate-lyase deactivase activity will be reduced or blocked. Accordingly, such a metabolically engineered cell may as a result of these modifications produce increased amounts of i.a. acetaldehyde, ethanol and/or acetate.
In a further useful embodiment, the above lactic acid bacterial cell is one wherein the DNA sequence comprising the sequence coding for the multi-functional polypeptide is modified so as to enhance the production of or the activity of at least one of its native enzymatic activities as defined above. It is contemplated that such a modification can be provided by appropriate modifications of the coding sequence itself which result in an enhanced production level of the polypeptide and/or the production of a modified polypeptide having an enhanced activity of at least one of its native activities. Such modification can be made by substitution, deletion or insertion of one or more nucleotides using any conventional methods for such DNA modifications, including random or site-directed mutagenesis followed by selection of the desired mutants.
Alternatively, a lactic acid bacterial cell having enhanced production of and/or enhanced activity of at least one of its native enzymatic activities can be provided by suitable modifications of sequences regulating the production and/or the activity of the multifunctional polypeptide. One suitable manner whereby this can be obtained is by operably linking the coding sequence to a promoter sequence having a stronger promoter activity than the native promoter for the coding sequence.
WO 9Q/n7Ra67 DT3F mlm/mrTTrt J r 17 r iiu ItUU3 In suitable embodiments such an inserted promoter is regulatable by a factor as mentioned above and the expression of the polypeptide can then be enhanced by cultivating the cell in the presence of a factor which mediates a strong promoter activity. It is contemplated that an enhanced production of the AdhE polypeptide in a host cell can be obtained by using a replicon which occurs in a high copy number in that host cell.
It is aimed at that such a metabolically engineered lactic acid bacterial cell having enhanced production of and/or enhanced activity of at least one of its native enzymatic activities will result in that the cell produces increased amounts of at least one metabolite selected from the group consisting of acetaldehyde, ethanol, formate, acetate, acetoin, diacetyl and 2,3 butylene glycol. Thus, in preferred embodiments, such metabolically engineered have a production of one or more of these metabolites which, in comparison with a wild type strain, is at least 2-fold higher such as at last 5-fold higher, e.g.
at least 10-fold higher or even at least 20-fold higher.
The present invention relates in a still further aspect to an isolated lactic acid bacterial DNA sequence that comprises a sequence coding for a polypeptide having pyruvate formate-lyase activity, i.e. a pfl gene. In useful embodiments, such a DNA sequence further comprises at least one regulatory sequence operably linked to the coding sequence and regulating the production of the pyruvate formate-lyase polypeptide or coding for a gene product regulating the pyruvate formate-lyase activity of the polypeptide. In the following, the gene product of pfl will also be referred to as a Pfl polypeptide.
Such regulatory sequences may be located upstream and/or downstream of the coding sequence. The regulatory sequences may be sequences which are natively associated with the coding sequence or they may be inserted or modified promoter sequences not natively associated with the coding sequence, but which can WO 98/07867 PVfIni v nn 18 1VUUJJU be operably linked to the coding sequence. Such sequences which are not natively associated with the coding sequence may be derived from the bacterial strain which is the source of the coding sequence or from a different organism. In this context, regulatory sequences include a promoter sequence, a ribosome binding site, a sequence coding for a gene product which either enhances or inhibits the expression of the coding sequence, such as a repressor or activator substance including e.g an antisense RNA, a transcription terminator sequence or a leader sequence directing the excretion of the Pfl polypeptide. In a useful embodiment, the coding sequence is under the control of a regulatable promoter as defined hereinbefore and being regulatable as also described above.
The activity of the pyruvate formate-lyase enzyme can be regulated or modulated under anaerobic conditions by the presence or absence of an activase and a deactivase, respectively.
Accordingly, the DNA sequence comprising the sequence coding for the Pfl polypeptide preferably comprises sequences coding for a pyruvate formate-lyase activase (act gene) and/or a pyruvate formate-lyase deactivase. In preferred embodiments, such a deactivase is a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity as defined hereinbefore.
wn o8/07867 PrT/DKO7"/nl' 19 ,va In accordance with the invention, the Pfl-encoding DNA sequence can be derived from any lactic acid bacterium including a Lactobacillus species, a Streptococcus species, a Pediococcus species a Bifidobacterium species, a Leuconostoc species and a Lactococcus species such as Lactococcus lactis including Lactococcus lactis subspecies lactis, Lactococcus lactis subspecies lactis biovar diacetylactis and Lactococcus lactis subspecies cremoris.
It has been found that the Pfl polypeptide as encoded by the pfl gene of Lactococcus lactis subspecies lactis biovar diacetylactis strain DB1341 comprises 787 amino acids (Table 3.2 below) (SEQ ID NO:15) and has a deduced molecular weight of 89.1 kDa. This polypeptide shows considerable identity with known pfl gene products (Table Furthermore, it has been found that the corresponding pfl gene in Lactococcus lactis subspecies lactis MG1363 differs from the DB1341 gene in only about 5% of the nucleotides.
In specific embodiments, the DNA sequence comprising a Pfl encoding sequence comprises the coding sequence as shown in Table 3.2 below (SEQ ID NO:15), the sequence designated mg1363pfl as shown in Table 3.6 (SEQ ID NO:22) and the sequence shown in Table 5.3 (SEQ ID NOS:36 and 38), or a DNA sequence which is a mutant or variant hereof which codes for a polypeptide having pyruvate formate-lyase activity, the term "mutant or variant" being used in the same manner as defined hereinbefore.
In accordance with the invention, a pfl gene as defined herein encompasses any of the specific sequences as exemplified in the following and a lactic acid bacterial sequence coding for a polypeptide having the enzymatic activity of the gene products of such isolated sequences which has a DNA homology of at least with the coding sequence of the plf of L. lactis strains DB1341 or MG1363 such as at least 60% homology including at least 70% homology or at least 80% homology, e.g. at least Wn 98/078C;7 PC~TmVQMA~~L homology.
In useful embodiments of the invention, the lactic acid bacterium-derived DNA sequence codes for a Pfl protein that is at least 30% identical with the gene products of the pfl gene of Streptococcus mutans (FASTA, GCG Wisconsin, Accession No.
D50491) or the pfl gene of Hemophilus influenzae (FASTA, GCG Wisconsin, Accession Nos. U32812 and L42023) or the gene product of the sequence of Table 3.2 herein (SEQ ID NO:15). In other useful embodiments, the identity to such gene products is at least 40%, such as at least 50%, such as at least 60% identity or even at least 70% identity. The homology between the above gene products may also be expressed in terms of amino acid similarity in which case the similarity suitably is at least 60%, such as at least 70%, e.g. at least 80% similarity.
In accordance with the invention, the DNA sequence coding for the Pfl polypeptide may also be a coding sequence of lactic acid bacterial origin that hybridizes to the pfl encoding sequence isolated from L. lactis strain MG1363, under the following conditions: hybridization overnight at 650C followed by washing the filter twice in 5 x SSC at room temperature for minutes and subsequently once in 3 x SSC; 0.1% SDS at 650C for minutes.
It was found that e.g. in L. lactis open reading frames may be identified upstream of the coding region for the Pfl polypeptide. Such open reading frames were designated orfA and it was found that the gene products hereof has a function in transport across cell membranes of formate. Thus, it was found that a mutant strain of L. lactis wherein the open reading had been disrupted showed an increased tolerance to the toxic formate analogue, hypophosphite.
In accordance with the invention there is also provided herein a recombinant replicon comprising the above Pfl-encoding DNA wn Rfi/n7R;7 D~F~r m~r 21 Jr %.ui"Y I//U U330 sequence. Such a replicon can be derived from a plasmid, a lactic acid bacterial bacteriophage or a lactic acid bacterial chromosome.
In one aspect the invention relates to a recombinant lactic acid bacterial host cell comprising such a replicon. The cell can be selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species a Bifidobacterium species and a Leuconostoc species.
The lactic acid bacterial cell may conveniently be provided in the form of a starter culture composition for use in the manufacturing of food products as described above. It is also contemplated that the above cells may be used as probiotically active cultures or as inoculants in animal feed preservation.
In this connection, a particular use is as inoculants in field crops or animal waste materials which are subjected to an ensiling process.
In particularly useful embodiments, the above lactic acid bacterial cell is one wherein the DNA sequence coding for pyruvate formate-lyase activity is modified whereby the production of the pyruvate formate-lyase is reduced or eliminated or whereby the enzyme is produced in a modified form having a reduced pyruvate formate-lyase activity.
Such a modification can, as it has been described above for a cell comprising a sequence coding for the AdhE polypeptide, be made by methods which are known per se in the art. Thus, as typical examples, a DNA modification can e.g. be made by deletion, insertion or substitution of one or more nucleotides in the coding sequence possibly leading to the expression of a polypeptide having a modified amino acid composition. An inactivation of the coding sequence can also be obtained by random or site-directed mutagenesis, e.g. by using a W O o0/1AI67 nrYr fjl rn n, Sv w rwuir ouv 2 2 T l 2IIUU2 22 rPC/IDKAO/00.336 transposable element which is integratable in the replicon comprising the coding sequence. Another possible means of providing Pfl-inactivated (pfl')mutants is Campbell-like homologous integration.
The level of expression of the Pfl polypeptide can also be reduced by modifying or regulating regulatory sequences controlling the production of the polypeptide. Thus, as one example, a native constitutive promoter can be replaced by a regulatable promoter, the function of which can be reduced or inhibited under appropriate conditions such as those physical and chemical promoter regulating factors as mentioned hereinbefore. Alternatively, a native promoter which is in itself regulatable by certain factors may be replaced by another regulatable promoter which is negatively regulatable by other factors present in the cultivation medium for the recombinant cell.
A cell being modified in this manner will be a metabolically engineered cell, since under conditions where the pyruvate formate-lyase is normally metabolically active as shown in Fig.
1 such a modified cell will lack one of the major pathways whereby the pyruvate pool in normally consumed. This will result in a modification of the metabolic pathways based on pyruvate including an enhanced flux towards o-acetolactate which is a precursor substance for diacetyl, acetoin and 2,3 butylene glycol. Such a cell is particularly useful in dairy starter cultures where such flavour compounds are generally desirable.
In further useful embodiments, the lactic acid bacterial cell according to the invention is a cell wherein the DNA sequence comprising the sequence coding for pyruvate formate-lyase is modified so that the production of the pyruvate formate-lyase is enhanced or so that the enzyme is produced in a modified form having an increased pyruvate formate-lyase activity.
wn a /76I'7 PTirDK 3m 23 Analogously with what is described above with respect to the modifications leading to an enhanced expression or activity of the AdhE polypeptide, it is contemplated that such a modification can be provided by appropriate modifications of the coding sequence itself which result in an enhanced production level of the Pfl polypeptide and/or the production of a modified polypeptide having an enhanced activity of at least one of its native activities. Such modifications can be made by substitution, deletion or insertion of one or more nucleotides using any conventional methods for such DNA modifications, including random or site-directed mutagenesis followed by selection of the desired mutants.
Alternatively, a lactic acid bacterial cell having enhanced production of and/or enhanced activity of pyruvate formatelyase can be provided by suitable modifications of sequences regulating the expression of the pfl gene and/or the activity of the enzyme. One suitable manner whereby this can be obtained is by operably linking the coding sequence to a promoter sequence having a stronger promoter activity than the native promoter for the coding sequence. In suitable embodiments such an inserted promoter is regulatable by a factor as mentioned above and the production of the polypeptide can then be enhanced by cultivating the cell in the presence of a factor which confers a strong promoter activity. It is contemplated that a thus modified lactic acid bacterial cell produces increased amounts of formate and/or acetate. Enhanced production of the Pfl polypeptide may also be obtained in a host by using a replicon which occurs in a high copy number in that host cell or by chromosomal amplification.
In accordance with the invention, there is also provided a recombinant lactic acid bacterial cell comprising both the DNA sequence comprising the above sequence coding for an AdhE polypeptide, and the above sequence comprising a sequence coding for pyruvate formate-lyase, in both instances including WO 98/07867 PCT/lDK97/fl31 WO 98/07867 PCT/DK97/00336 24 sequences regulating the production and/or the activity of the enzyme activities. As used herein, the term "recombinant" implies that at least one of the coding sequences or regulatory sequences is not a naturally occurring sequence. The sequences may be located on the same replicon or they may be on separate replicons.
Preferably, at least one of the sequences of the above cell is modified so as to modify the production of the pyruvate formate-lyase or the activity hereof, or the distribution of the amounts of end products resulting from the lactose and/or citrate metabolism of the cell.
It will be understood that a lactic acid bacterium which is metabolically engineered in accordance with the invention so that it has an enhanced production of one or more metabolites is useful in a method of producing such a metabolite or such metabolites. In general, such a the method comprises cultivating a lactic acid bacterium which is metabolically engineered in accordance with the invention under conditions where the metabolite is produced, and isolating the metabolite from the culture. The isolation of the metabolite may be carried out according to any conventional methods of recovering the particular substance, such as e.g. distillation.
As it is also mentioned above, the lactic acid bacterial cells according to the invention are useful as food starter cultures.
In accordance herewith, the invention also provides a method of producing a food product, the method comprising the step of admixing to the food product starting materials a starter culture of a lactic acid bacterium as defined above and keeping the mixture under conditions allowing the Starter culture to be metabolically active. Such a method where a starter culture which is metabolically engineered in accordance with the invention is used will, dependent on the type of metabolite modifications, result in a food product having an improved flavour WO 98/07867 PCT/DK97/00336 and/or a product which has an improved shelf life due to an enhanced production of antimicrobially active metabolites by the starter culture.
The invention will now be further illustrated in the below examples and the drawing wherein: Fig. 1 illustrates selected metabolic pathways in citrate fermenting lactic acid bacteria; Fig. 2 shows an overview of the cloned L. lactis DB1341 adhE gene (open arrow), the sequence strategy for clone 1 (box in middle) and the regions covered by the XZAP clones adhEl and adhE3 (bottom). The nucleotide position of relevant restriction sites is shown (top). The position of PCR and sequencing primers is shown as small open arrows. A putative transcription terminator present downstream of the stop codon is shown as a circle. The rbs box shows the position of a consensus lactococcal ribosome binding site. Arrows show the sequencing strategy for clone 1 (middle); Fig. 3 shows an overview of the cloned L. lactis DB1341 adhE gene fragment (open arrow). The nucleotide position of relevant restriction sites is shown (top). The position of PCR and sequencing primers is shown as small open arrows. A putative transcription terminator present downstream of the stop codon is shown as a circle. The rbs box shows the position of a consensus lactococcal ribosome binding site. The cloned PCR fragments of the L. lactis MG1363 adhE gene are shown as lines (MGadhESTART and MGadhESTOP). The PCR fragments used to clone into pSMA500 for gene inactivation in strain DB1341 are shown as open boxes (pSMAKAS4 and Fig. 4 is an overview of the cloned Lactococcus lactis DB1341 strain lactis subspecies lactis biovar diacetylactis) pfl gene (open arrow box). The nucleotide positions of relevant wn /ln78677 PC"T/mK7/imnl WO 98/078lR7 PC-9rtn ,7fl ,f 22 26 IluJu restriction sites are shown (top). The position of PCR and sequencing primers is shown as small open arrows. A putative ribosome binding site (rbs box) and a transcription terminator present downstream of the stop codon is shown as a circle. The plfl (open box) shows the fragment of the XZAP clone of the DB1341 genomic library containing a pfl gene fragment. The cloned PCR fragment of the L. lactis subspecies lactis MG1363 pfl fragment is shown as a line (MGpfll). A Sau3AI fragment used for gene inactivation in strain DB1341 is shown as an open box (pSMAKAS7). The pfl region included in the fragment as obtained by inverse PCR from DB1341 using EcoRI digestion and primers pfll-250 and pfll-390 is shown as a dotted box (pflup- 1); Fig. 5 is a genetic map of the L. lactis MG1363 adhE locus including the orfB open reading frame. In the upper part are indicated primer sequences; Fig. 6 illustrates the structure of the L. lactis OrfA protein.
The shadowed box at the terminal region of OrfA depicts the area covered by the internal orfA fragment used for gene inactivation. The two transmembrane regions were identified using the PredictProtein server at the EMBL, Heidelberg, Germany; Fig. 7 illustrates expression of orfA in L. lactis. A: genetic map of orfA showing the region covered by the probe (thick line below orfA) used in expression studies and in the construction of a null mutant strain. B: Northern blot analysis. RNA isolated from MG1363 was hybridized to the orfA probe. Lane 1: exponential culture in GM17 aerobic; lane 2: same, anaerobic; lane 3: stationary culture in GM17, aerobic; lane 4: same, anaerobic; lane 5: exponential culture i GalM17, aerobic; lane 6: same, anaerobic. The transcript size is shown in kb to the left. The autoradiogram was exposed for 14 days; wn o/R Mi7; Tk/iryi rrMr 27 r I IUKY/0IIUU033 Fig. 8 illustrates inhibition of growth by hypophosphite in strains of L. lactis. Strains were grown anaerobically overnight in GM17 supplemented with different concentrations of hypophosphite. At the end of the incubation period (about 18 hours), OD 600 was measured. Symbols: MG1363; MGl363AorfA; MG1363 Fig. 9 shows a genetic map of the L. lactis MG1363 pfl gene, showing the region used as a probe in the identification of pfl homologues in other lactic acid bacteria, including the position of EcoRl sites; Fig 10 shows autoradiograms from Southern hybridization of genomic DNA from non-Lactococcus lactic acid bacteria to a L.
lactis pfl probe; Lane 1: L. lactis MG1363; lane 2: Streptococcus thermophilus; lane 3: Leuconostoc mesenteroides; lane 4 Lactobacillus acidophilus. Bands are shown in kb. Filters were exposed 2 h or overnight Fig. 11 illustrates two Sau3AI fragments including most of the L. lactis strain DB1341 adhE coding sequence used in Southern hybridization experiments with EcoRI-digested genomic DNA from non-Lactococcus lactic acid bacteria; Fig. 12 illustrates detection of adhE homologues in other lactic acid bacteria by Southern hybridization experiments with EcoRI-digested genomic DNA from non-Lactococcus lactic acid bacteria. Lane 1: L. lactis MG1363; lane 2: S. thermophilus; lane 3: L. mesenteroides; lane 4 L. acidophilus. Bands are shown in kb. Filters were exposed overnight; WO 98/07867 PCT/DK97/00336 28 EXAMPLE 1 Cloning of the L. lactis adhE gene 1. Construction of a L. lactis ssp. lactis biovar diacetylactis DB1341 genomic library for genetic complementation A genomic library was constructed by cloning partially Sau3AIdigested chromosomal DNA from strain DB1341 into BamHI-digested pSMA500 (Madsen et al. 1996) and transforming into E. coli MC1000 by electroporation (Sambrook et al., 1989). Strain DB1341 was kindly provided by Chr. Hansen A/S, Horsholm, Denmark. The genomic library consisted of about 10,000 independent recombinant clones with an average insert size of 4 kb. A mixed culture, containing all clones obtained, was grown in LB erythromycin (erm, 50 fg/ml) and plasmid DNA was isolated for genetic complementation.
2. Genetic complementation in E. coli NZN111 using the pSMA500 library E. coli strain NZN111 (pfl-; Idh::Tn5; kanR) is unable to grow in the absence of 02 due to the accumulation of NADH derived from the lack of fermentative enzyme activities encoded by the pfl and Idh genes (Mat-Jan et al., 1989).
Genetic complementation was attempted by transformation of NZN111 using 200 ng plasmid DNA from the library (see above).
Transformation mixtures were plated on LB erm (50 pg/ml) kanamycin (kan; 50 Ag/ml) and incubated at 37 0 C in anaerobic jars. As a control, pSMA500-transformed strain NZN111 was used.
After two days, transformation plates were incubated aerobically for another two days to allow weak complementing clones to grow. A clone was identified (clone 1) in the library-transformed plates, and no growth was observed in the pSMA500 control.
WO 98/07867 PCT/DK97/nn036 29 In a preliminary screening, protein extracts of clone 1 were used in a modified "Ldh" assay (Crow and Pritchard 1977), where the pyruvate-dependent conversion of NADH to NAD is monitored, to ensure that complementation of the fermentative defects in strain NZN111 had occurred. Protein extraction was carried out adding 100 Al 100 mM MOPS buffer (pH 2 Triton X-100 to the cell pellet from 1.5 ml stationary cultures grown in LB erm (50 Ag/ml) which had been washed in fresh ice cold LB, and frozen at -80 0 C for 15 min. Pellets were dissolved and transferred to Eppendorf tubes. Lysozyme (5 mg) was added and samples were incubated on ice for 30 min. Subsequently, glass beads (100 pM, Sigma; 100 pl) were added and samples were vortexed for 30 sec and kept on ice for 30 sec. This step was repeated 10-15 times, and samples were centrifuged at maximum speed for 2 min. Supernatants were transferred to a new Eppendorf tube and kept at -80 0 C until assayed. To measure NADH oxidation, the following components were mixed in a quartz cuvette: 700 Al 100 mM MOPS, pH 6.5; 100 Al 120 mM Na-Pyruvate; il 2.56 mM NADH and 50 pl H20. The decrease in OD 340 as a result of the oxidation of NADH to NAD was monitored after the addition of 100 Al sample. As control reaction, pyruvate was omitted. No significant decrease in OD was observed in the control. A relatively high conversion rate (approximately 2fold as compared to the NZN111::pSMA500 control) was observed in clone 1.
Plasmid DNA was isolated from clone 1 and used to retransform E. coli NZN111. Duplicate LB erm plates were incubated (i) aerobically for 4 days or (ii) anaerobically for 2 days and then 2 days aerobically at 37 0 C. A similar number of transformants was obtained in both procedures (see Table 1.1 below) Thus, clone 1 did not result from artifact cloning and can indeed complement the defect in strain NZN111.
WO 98/07867 PCT/DK97/00336 Table 1.1. Retransformation of clone 1 into E. coli NZN111 No. of colonies per 10 ng DNA Plasmid anaerobic growth aerobic growth clone 1 600 800 pSMA500 0 1000 NZN111 competent cells were electroporated with the corresponding plasmid, and one half of the cell mixture was plated onto LB kan erm and incubated without 02 (anaerobic growth), and the other half was plated onto the same medium and incubated with 02 (aerobic growth). Transformants were scored after 4 days (see main text).
A sample of clone 1 in E. coli was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11093.
3. Sequence analysis of clone 1 and identification of an adhE fragment Clone 1 was further characterized by restriction enzyme analysis and included a 2.2 kb insert. Sequence analysis determined that it included a 1.7 kb fragment of an open reading frame (ORF) showing homology to the E. coli adhE gene disclosed by Goodlove et al., 1989. The sequence of the 2.2 kb insert is shown in Table 1.2 below (SEQ ID NO:1).
WO 98/07867 31 Table 1. 2. Sequence of the insert in clone 1 PCT/DK97/00336 1 51 101 151 201 251 301 351 401 451 501 551 601 651 701 751 801 851 901 Sau3AI
GATCTGTCCTTAGTACGAGAGGACCGGGATGGACTTACCGCTGGTGTACC
AGTTGTTCCGCCAGAGCACGGCTGGATAGCTATGTAGGGAAGGGATAAGC
GCTGAAAGCATCTAAGTGCGAAGCCACCTCAAGATgAGA
ETACCCATTCG
Sau3AI &G6TTAA CA GG GA AGTTATATGAAA TCTTCTTTCAGCAACGGGA GAG TTGCTCGA GTGGGAT Sau3AI TAACAGAAAGTGATCTGTTGATCGC7XAGCCCTCTCGGTGTACTTGCTG
GTATCGTTCCAACGACTAATCCCATCAACAGCAATCTTTAAATCTTTA
TTGACTGCAAAAACACGTAATGCTATTG7ITTCGCT=TCCACCCTCAAGC
TCAAAAATGTTCAAGCCATGCAGCAAAAATTGTTTACGATGCTGCAATTG
AAGCTGGTGCACCGGAAGACTTTATTCAATGGAITGAAGTACCAAGCCTr GACATGACTACCGCC'rrGATTCAAAACCGTGGACTTGCAACAATCCT'rGC
AACTGGTGGCCCAGGAATGGTAAACGCCGCACTCAAATCTGGTAACCCTT
CACTCGGTGTTGGAGCTGGTAATGGTGCTG'TrATGTTGATGCAACTGCA
AATATTGAACGTGCCGTTGAAGACCTTTTGCTTTCAAAACGTTTTGATA
TGGGATGATTTGTGCCACTGAAAATTCAGCTGTTAT-GATGC'prCATT
SD
ATGATGAATrrATTGCTAAAATGCA4GAACAAGGCGCTTATATGGTTCCT M V P
AAAAAAGACTACAAAGCTATTGAAAGTTTCGTTTTTGTTGAACGTGCTGG
K K D Y K A I E S F V F V E RA G
TGAAGGTTTTGGAGTAACTGGTCCTGTTGCCGGTCGTTCTGGTCAATGGA
E G F G V T G P V A G R S G Q W I
TTCGAAGTGGCAGTCTAGTAGTTCTT
A E Q A G V KV P K D K D V L L WO 98/07867 32PCTIDK97/00336 9 51 TrGAACTTGATAAGAAAAATATTGGTGAAGCAC'1TTCTTCTGAAAAACT F E L D K K N I G E A L S S E K L 1001 TTCTCCTTTGCTTCAATCTACAAAGCTGAAAr-ACGTGAAGAAGGAATTG S P L L S I Y K A E T R E E G I E 87 1051 AGATTGTACGTAGCTTACTTGCTTATCAAGGTGCTGGACATAATGCTGCA I V R S L L A Y Q G A G H N A A 103 Sau3AI 1101 ATTCAAATCGGTGCAATGGATGATCCATTCGTTrAAAGAATATGGCGAAAA I Q I G A M D D P F V K E Y G E K 120 1151 AGTTGAAGCTTCTCGTATCCTCGTTAACCAACCAGATTCTAITGGTGGGG V E A S R I L V NQ PD S I G G V 137 1201 TCGGAGATATCTATACTGATGCAATGCGTCCATCACTTACACTTGGAACT (4 D I Y T D A M R P S L T L G4 T 153 Sau3AI 1251 GGTTCATGGGGGAAAAATTCACTTTCACACAA TrGAGTACATACGATCT G4 S W G K N S L S H N L S T Y D L 170 1301 ATTGAATGTTAAAACAGTGGCTAAACGTCGTAATCGCCCACAATGGGTTC L N V K T V A K R R N R P Q W V R 187 1351 GTTTGCCAAAAGAAATTTACTACGAAAAAATGCAATTTCTTACTTACAA L P K B I Y Y E K N A I S Y L Q 203 1401 GAATTGCCACACGTCCACAAJAGCTTCATCGTTGCTGACCCTGGTATGGT E L P H V H K A F I V A D P G M V 220 1451 TAAATTTGGTTTCGTTGATAAAGTTTTGGAACAACTTGCTATCCGCCA K P G F V D K VL E Q L AI R P T 237 1501 CTCAAGTTGAAACAAGCATTATGGCTCTGTTCAACCTGACCCAACTTrG Q V E T S I Y G S V Q P D P T L 253 1551 AGCGAAGCAATGCAATCGCTCGTCAAATGAAACAATTTGAACCTGACAC S E A I A I A R Q M K Q F E P D T 270 1601 TGTCATCTGTCTTGGTGGTGGTTCTGCTCTCGATGCCGGTAAGATTGGTC V I C L G G G4 S A L D A G K I G R 287 1651 GTTTGATTATGAATATGATGCTCGTGGTGAAGCTGACCTTTCTGATGAT L I Y E Y D A R G E A D L S D D 303 1701 GCAAGTrTGAAAGAACTTTTCCAAGAATTAGCTCAAAAATTTGTCGATAT A S L K E L F Q E L A Q K F V D I 320 1751 TCTACTTATATCACACAAAACCATGT R K RI I K F Y H P H K A Q M VA 337 1801 CAATTCCTACTACTTCTGGTACTGGTTCTGAAGTGACTCCATTTGCAGTT I P T T S G T G S E v T P F A V 353 1851 ATCACTGATGATGAAACTCATG'TAAGTACCCACTTGCTGACTACATT 1 T D D E T H V K Y P L A D Y Q L 370 1901 AACACCACAAGTTGCCATTGTTGACCCTGAGTTTGTTATGACTGTACCAJA T P Q V A I V D P E F V M T V P K 387 WO 98/07867 PCT/DK97/nf0336 33 1951 AACGTACTGTTTCTTGGTCTGGTATTGATGCGATGTCACACGCGCTTGAA R T V S W S G I D A M S H A L E 403 2001 TCTTACGTTTCTGTTATGTCTTCTGACTATACAAAACCAATTTCACTTCA S Y V S V M S S D Y T K P I S L Q 420 Sau3AI 2051 AGCGATCCCGGGTCTAGATTAGGGTAACTTTGAAAGGA (SEQ ID NO:1) A I P G L D (SEQ ID NO:2) 426 Sau3AI recognition sites are indicated above the sequence. DNA homology to the E. coli adhE starts at nucleotide position 262 (data not shown). A Sau3AI fragment with 100% homology to the 23S rRNA of L. lactis is shown doubly underlined at the top (positions 1-173). Putative expression signals functional in E.
coli are shown: -35, -10 promoter regions (underlined); Shine Dalgarno (SD, doubly underlined) and putative start codon (bold, discontinuous underline). The amino acid sequence of the open reading frame is given in one-letter-code. The open reading frame ends in the multiple cloning site of vector pSMA500 (doubly underlined at bottom) (Madsen et al., 1996).
E. coli AdhE is a multi-functional protein consisting of 890 amino acids that catalyzes the conversion of acetyl CoA into ethanol and has acetaldehyde-DHase (ACDH) and alcohol-DHase (ADH) activities. Additionally, AdhE shows Pfl deactivase activity involved in the inactivation of pyruvate-formate lyase, a key enzyme in anaerobic metabolism (Knappe et al.
1991).
As shown in the above Table 1.2 and Table 1.3 below, clone 1 includes the ADH domain of a L. lactis AdhE homologue, and it contains expression signals necessary for expression in E. coli (Shine Dalgarno and -35 and -10 regions). The putative gene product of 427 amino acids is highly homologous to a number of other iron-dependent ADHs. Comparison at the protein level showed a 41.4% identity (78% similarity) with E. coli AdhE, in addition to significant homology to other ADHs of both eukaryotic and prokaryotic origin (Table 1.3).
WO 98/07867 WO 9807867PCT/DK97/00336 34 Table 1.3. Homology search (FASTA. GCG Wisconsin packagre version 8. Genetics Computer Group) -usingT the 427 amino acid Putative protein encoded by clone 1 (see also Table 1.2) The region of homology to AdhE corresponds to the central region, where the ADH domain is possibly located. Only homology to the best score is shown.
(Peptide) FASTA of: clonel.pep from: 1 to: 427 TRANSLATE of: clonel.seq check: 2521 from: 792 to: 2072 The best score sw:adhe ecoli sw:adhe cloab sw:adhl cloab sw: medh-bacmt sw:adh4_yeast sw:adhf_schpo sw :yiayecoli sw:sucd-cloki sw: adh2_zymmo sw:fuco ecoli sw:adha_cloab ~s are: initi initn P17547 eacherichia coli. alcohol dehydroge.276 P33744 clostridium acetobutylicumn. alcoh. .256 P13604 clostridium acetobutylicum. nadph. .256 P31005 bacillus methanolicus. nad-depend. .169 P10127 saccharomyces cerevisiae (baker's. .146 Q09669 schizosaccharomyces pombe (fission.146 P37686 escherichia coli. hypothetical 40.-158 P38947 clostridiumn kluy-veri. succinate-s. .132 P06758 zymomonas mobilis. alcohol dehydr. .129 P11549 escherichia coli. lactaldehyde re. .141 Q04944 clostridium acetobutylicui. nadh-. .136 736 600 357 224 224 219 218 186 180 175 153 opt.
768 703 279 173 165 162 187 179 169 147 145 clonel .pep sw:adhe-ecoli
ID
Ac
DR
ADHE ECOLI STAND.ARD; PRT; 890 AA.
P1754i7; ALCOHOL DEHYDROGENASE (EC 1.1.1.1) (ADH) ACETALDEHYDE DEHYDRO-
GENASE...
SCORES Iflitl:- 276 Initn: 736 Opt: 41.4!k identity in 430 aa overlap 20 clonel IAVPKKDYKAIESPVFVERAGEGFGVTGPVA adhe-e GVICASEQSVVVVDSVYDAVRERFATHGGYLLQGKELKAVQDVIL-
-ALNAAIV
250 260 270 280 290 50 60 70 80 clone 1 GRSGQWIAEQAGVVPKDDLLFELDKKNIGEAISSEKJSPLLSIYKAETREEGIEIVR adhe-e GQPAYKIAELAGFSVPENTKIIJIGEVTVDESEPFAHEKLSPTLAMYRAKDFEDAVEKAE 300 310 320 330 340 350 100 110 120 130 140 149 clone 1 SLLAYQGAGHNA1IQIGAMDDP FVKEYGEKVEASRILVNQPDSIGGVGDIYTDAMPSL adhe-e KLVAMGGIGHTSCLYTDQDNQPARVSYFGQKMTARILINTPASQGGIGDLYNFKLAPSL 360 370 380 390 400 410 Wn QQ PQdV7 'V -lOWlOr o35 PC iuIY7UU.013 150 160 170 180 190 200 clone1 TLGTGSWGKISLSHNLSTYDLLNVKTVAKRRNRPQWVRLPKEIYYEKNAISY-LQE-LPH ill 1111 :1:1 111111 1 :111:11: 1:1 adhe e TLGCGSWGGNSISENVGPKHLINKKTVAKRAENMEWHKLPKSIYFRRGSLPIALDEVITD 420 430 440 450 460 470 210 220 230 240 250 260 clone 1 VHK -AFIVADPGMVKFGFVDKVLEQLAIRPTQVETS
IYGSVQPDPTLSEAIAIARQMKQF
ti 1:1:I: I Ill::: :::IiII: I :I adhee GHKRALIVTDRFLFNNGYADQITSVL
KAAGVETEVFFEVEADPTLSIVRKGAELANSF
480 490 500 510 520 530 270 280 290 300 310 320 clone1 EPDTVICLGGGSALDAGKIGRLIYEYDARGEDLSDDASLKELFQELAQKFVDIRKRI1K adhe e KPDVIIALGGGSPDAAKIMWVMYE
HPETH----------FEELALRFMDIRKRIYK
540 550 560 570 580 330 340 350 360 370 380 clonel FYH-PHKAQMVAIPTTSGTGSEVTPFAVITDDETH
MYPLADYQLTPQVAIVJDPEFVMT
adhe e FPKMGVKAMIAVTTSGTGSEVTPFAVVTDDATGQKYP 590 600 610 620 630 640 390 400 410 420 clonel PKRTVSWSGIDAMSHALESYVSVMSSDYTKPISLQAIPGLD (SEQ ID NO:2) I i ::II:II:III:: :111 I: I adhe e PKSLCAFGGLDAVTHAMEAYVSVLASEFSDGQALQAKLLKEYLPASYHEGSKNPVARER 650 660 670 680 690 700 adhe e VHSAATIAGIAFANAFLGVCHSMAHKIGSQFHI
PHGLANALLICNVIRYNANDNPTKQTA
(corresponding to a.a. residues 43-762 of SEQ ID NO:6) 710. 720 730 740 750 760 4. DNA hybridization of the DB134. XZAP library using an adhE fragment Sequence comparison of clone 1 with the previously cloned adhE gene indicated that the first 500 bp and the last 600 bp of the putative L. lactis achE homologue were not present in clone 1.
Therefore, a XZAP genomic library of strain DB1341 was constructed according to manufacturer's instructions (Stratagene).
The average insert size was estimated to be approx. 3 kb, with 800 recombinant clones. Approximately 2 x 105 pfu were screened using a 0.8 kb Sau3AI fragment (position 1296-2054 in Table 1.2) and 10 positive clones (named adhE-l to 10 were selected for characterization.
wn oa/n7867 PrTDK97/00336 36 Sequencing of positive XZAP adhE clones Following 'in vivo' excision of the pBK plasmid version (Stratagene) of the clones, restriction mapping and sequencing of clones adhE-1 and adhE-3 was carried out as shown in Fig. 2.
Clone adhE-1 included a 1.7 kb insert that was identical to the adhE fragment of clone 1 (position 262-2054 in Table 1.2).
Clone adhE-3 contained a 4 kb insert spanning from the Sau3AI site at position 1296 in Table 1.2. This fragment could harbour the 3'-end of the L. lactis adhE gene. Sequence analysis of this clone confirmed that it included the 3'-end of the L.
lactis adhE gene, which ends with a double stop codon (TAATAA, position 2854-2859 in Table 1.4 below). Downstream from this position, a possible transcription terminator was found (position 2883-2905 in Table 1.4).
A sample of clones adhE-1 and adhE-3, respectively in E. coli was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 25 July 1996 under the accession Nos DSM 11101 and DSM 11102, respectively.
Table 1.4. Sequence of the L. lactis DB1341 adhE gene (SEO ID NO:3) In this Table a putative ribosome binding site is shown in bold (position 127-133), 12 bp upstream the putative start codon (position 145-147), deduced from homology comparisons (Figs. 2 and Two adjacent stop codons, located at position 2854- 2859) are shown (double underline). A putative rho-independent transcription terminator (de Vos and Simons, 1994) is also shown downstream of the stop codons at position 2883-2904 (single and dotted underline show stem and loop sequences, respectively).
Iludri no lik,709c,7 VV 2~UI IUI37J AAGCTTGTTACAAAACCG'TrTTCTAAACTTTTGATGAGTGTTT'IGTAAA 1 AACTATCACAATATTGC'GACATCTATAAAAAACTTGTTAAACTA'rrC 51 100
ACGTAAAAGAAAGTGAATGAAGTCACAA).GGAGAACCTACAAATATGGCA
101 150 MetAla
ACTAAAAAGCCGCTCCAGCTGCAAGAAAGTI~TAAGCGCTGAAGAAAA
151 200 ThrLysLysAlaAlaProAlaAlaLysLysValLeuSerAlaGluGluLys. AGCCGCAAAATTCCAAGAAGCTGTTGC'iTATACTGACAAATTAGTCAAAA 201 250 AlaAlaLysPheGlnGluAlaValAlaTyrThrAspLysLeuValLyaLys AAGCACAAGCTGCTGTTCTTAAAgTrTGAAGGATATACACAAACTCAAGTC 251 300 AlaGlnAlaAlaValLeuLysPheGluGlyTyrThrGlnThrGlnVal
GATACTATTGTCGCTGCAATGGCTCTTGCAGCAAGCAAAC-ATTCTCTAGA
301 350 AspThrlleValAlaAlaMetAlaLeuAlaAlaSerLysHisSerLeuGlu
ACTCGCTCATGAAGCCGTTAACGAAACTGGTCGTGGTGTTGTCGAAGACA
351 400 LeuAlaHisGluAlaValAsnGluThrGlyArgGlyValValGluAspLys
AAGATACCAAAAACCACTTTGCTTCTGAATCTGTTATAACGCAATTAAA
401 450 As~pThrLysAsnliisPheAlaSerGluSerValTyrAsnAlalleLys
AATGACAAACTGTTGGTGTCATTTCTGAAAACAAGGTTGCTGGATCTGT
451 500 AsnAspLysThrValGlyVal IleSerGluAsnLysValAlaGlySerVal
TGAATCGCAAGCCCTCTCGGTGTACTGCTGGTATCGTTCCAACGACTA
501 550 GluIleAlaSerProLeuGlyValLeuAlaGlyIleValProThrThrAsn ATCCAACATeA1ACAGCAATCTTTAAATCTTATTGACTGCAAAAACACGT 551 600 ProThrSerThrAlal lePheLysSerLeuLeuThrAlaLysThrArg
AATGCTATTGTTTTCGCTTCCACCCTCAAGCTCAAAAATGTTCAAGCCA
601 650 AsnAlalleValPheAlaPheHisProGlnAlaGlnLysCysSerSerHis TGCAGCAAAAATGTjTrACGATGCTGCAATTGAAGCTGGTGCACCGGAAG 651 700 AlaAlaLysI leValTyrAspAlaAlaIleGluAlaGlyAlaProGluAsp ACTTTATTCAATGGATTGAAGTACCAAGCCTTGACATGACTACCGCC'rTG 701 750 PhelleGlnTrpleGluValProSerLeuAspMetThrThrAlaLeu
ATTCAAAACCGTGGACTTGCAACAATCCTTGCAACTGGTGGCCCAGGAAT
751 800 I leGlnAsnArgGlyLeuAlaThrIleLeuAlaThrGlyGlyProGlyMet 1 'CT/DK97/00336 v CT/DK97/00336 WO QR8fr7RA~7CTD7036 38 GGTAAACGCCGCACTCAAATCTGGTAACCC7TCACTCGGTGTTGGAGCTG 801 850 ValAsnAlaAlaLeuLysSerGlyAsnProSerLeuGlyValGlyAlaGly
GTAATGGTGCTGTTTATGTTGATGCAACTGCAAATATTGAACGTGCCGTT
851 900 AsnGlyAlaValTyrValAspAlaThrAlAslI leGluArgAlaVa1 GAAGACCTTTGCTTTCAAAACGT=rGATAATGGGATGAT=GTGCCAC 901 950 GluAspLeuLeuLeuSerLysArgPheAspAsnGlyMetl leCysAlaThr TGAAAA ITCAGCTGTTATTGATGCTTCAGTTTATGATGAATTTATTGCTA 951 1000 GluAsnSerAlaValIleAspAlaSerValTyrAspGluPheIleAlaLys AAATGCAAGAACA1AGGCGCTTATATGGTTCCTAAAAAAGACTACAAAGCT 1001 1050 MetGlnGluGlnGlyAlaTyrMetValProLysLysAspTyrLysAla ATTGAAAG=rCG7rTTGTTGAACGTGCTGGTGAAGGTTTrrGGAGTAAC 1051 1100 I leGluSerPheValPheValGluArgAlaGlyGluGlyPheGlyValThr
TGGTCCTGTTGCCGGTCGTTCTGGTCAATGGATTGCTGAACAAGCTGGTG
1101 1150 GlyProValAlaGlyArgSerGlyGlnTrpIleAlaGluGlnAlaGlyVal
TCAAAGTTCCTAAAGATAAAGATGTCCTTCTTTTTGAACTTGATAAGAAA
1151 LysValProLysAspLysAspValLeuLeuPheGluLeuAspLysLys AATATTGc4TGAAGCACITTCTTCTGAAAAACTTTCTCCT=GCTTTCAAT 1250 AsnI leGlyGluAlaLeuSerSerGluLysLeuSerProLeuLeuSerlle CTACAAAGCTGAAACACGTGAAGAAGGAATTGAGA'FrGTACGTAGC ITAC 1300 TyrLysAlaGluThrArgGluGluGlyI leGlulleValArgSerLeuLeu-
TTGCTTATCAAGGTGCTGGACATAATGCTGCAATTCAAATCGGTGCAATG
1350 AlaTyrGlnGlyAlaGlyHisAsnAlaAlaI leGlnIleGlyAlaMet
GATGATCCATTCGTTAAAGAATATGGCGAAAAAGTTGAAGCTTCTCGTAT
1400 AspAspProPheValLysGluTyrGlyGluLysValGlu.AlaSerArgIle
CCTCGTTAACCAACCAGATTCTATTGGTGGGGTCGGAGATATCTATACTG
1450 LeuValAsnGlnProAspSerlleGlyGlyValGlyAsplleTyrThrAsp
ATGCAATGCGTCCATCACTTACACTTGGAACTGGTTCA.TGGGGGAAAAAT
1500 AlaMetArgProSerLeumhrLeuGlyThrGlySerTrpGlyLysAsn TCACI-rTCACACAATTGAGTACATACGATCTATTGAATGTTAAAACAGT 1550 SerLeuSerHi sAsnLeuSerThrTyrAspLeuLeuAsnValLysThrVaI GGCTAAACGTCGTAATCGCCCACAATGGGTTCGTTTGCCAAAAGAAATrT 1600 Al aLysArgArgAsnArgProGlnTrpValArgLeuProLysGlul leTyr
ID
CT/DK97/00336 WO 98/0l7867 CTD 7/36 39 ACTACGAAAAAAATGCAATTCTrACTTACAAGAATTGCCACACGTCCAC 1601 1650 TyrGluLysAsniAlaI leSerTyrLeuGlnGluLeuProHisValHis AAAGCTTTCATCGTTGCTGACCCTGGTATGGTTAAATTTGGTTrCGTTGA 1651 1700 LysAlaPhel leValAlaAspProGlyMetValLysPheGlyPheValAsp TAAAG7rTGGACACTGCTATCCGCCCAACTCAAGTGAAACAAGCA 1701 1750 LysValLeuGluGlnLeuAlaIle.ArgProThrGlnValGluThrSerIle
TTTATGGCTCTGTTCAACCTGACCCAACTTTGAGCGAAGCAAITGCAATC
1751 1800 TyrGlySerValGlnProAspProThrLeuSerGluAlaIleAlaI le
GCTCGTCAAATGAAACAATTTGAACCTGACACTGTCATCTGTCTTGGTGG
1801 AlaArgGlnMetLysGlnPheGluProAspmhrVal IleCysLeuGlyGly
TGGTTCTGCTCTCGATGCCGGTAAGATTGGTCGTTTGATTTATGAATATG
1851 1900 GlySerAlaLeuAspAlaGlyLyslleGlyArgLeulleTyrGluTyrAsp
ATGCTCGTGGTGAAGCTGACC'ITCTGATGATGCAAGITGAAAGAACTT
1901 1950 AlaArgGlyGluAlaAspLeuSerAspAspAlaSerLeuLysGluLeu TTCCAAGAATTAGCTCAAAAATTTGTCGATATTCGTAAACGTATTArrAA 1951 PheGlnGluLeuAlaGlnLysPheValAsplleArgLysArgllelleLys ATTCTACCATCCACATAAAGCACAAATGGTGCAATrCCTACTAC'rCTG 2001 2050 PheTyrHi sProHisLysAlaGlniMetValAlalleProThrThrSerGly GTACTGGTTCTGAAGTGACTCCATTTGCAGrrATCACTGATGATGAAACT 2051 2100 ThrGlySerGluValThrProPheAlaVallleThrAspAspGlumhr CATGTTAAGTACCCAC ETGCTGACTACCAATTAACACCACAAGTTGCCAT 2101 2150 Hi sValLysTyrProLeuAlaAspTyrGlnLeuThrProGlnValAl al le
TGTTGACCCTGAGTTTGTTATGACTGTACCAAAACGTACTGTTTCTTGGT
2151 2200 ValAspProGluPheValMetThrValProLysArgThrValSerTrpSer CTGGTA~rGATGCGATGTCACACGCGCTTGAATCTTACGITTCTGTTATG 2201 2250
TCTTCTGACTATACAAACCAATTTCACTCAAGCGATCAAACTTATCITT
2251 2300 SerSerAspTyrThrLysProlleSerLeuGlnAlalleLysLeullePhe
TGAAAACTTGACTGAGTCTTATCATTATGACCCACGCATCCAACTAAAG
2301 2350 GluAsnLeuThrGluSerTyrHisTyrAspProAlaHisPromhrLysGlu
AAGGACAAAAAGCCCGCGAAAACATGC-ACAATGCTGCAACACTCGCTGGT
2 351 2400 GlyGlnLysAlaArgGluAsniMetHi sAsnAlaAlaThrLeuAlaGly WO 98/07867 40 PCT/DK97/00336 ATGGCCTTCGCTAATGCCCTT'1GGAATAACCACTCACTTGCTCATAA 2401 2450 MetAlaPheAlaAsnAlaPheLeuGlyIleAsnHisSerLeuAlaHisLys
AATTGGTGGTGATTGGACTTCCTCATGGTCTTGCCATTGCCATCGCTA
2500 IleGlyGlyGluPheGlyLeuProHisGlyLeuAlaIleAlaIleAlaMet
TGCCACATGTCATTAAATTTAACGCTGTAACAGGAAACGTAAACGTACC
2501 2550 ProHisValIleLysPheAsrAlaValThrGlyAsnValLysArgThr
CCTTACCCACGTTATGAAACATATCGTGCTCAAGAGGACTACGCTGAAAT
2551 2600 ProTyrProArgTyrGluThrTyrArgAlaGlnGluAspTyrrAlaGluIe
TTCACGCTTCATGGGATTTGCTGGTAAAGATGATTCAGATGAAAAAGCTG
2601 2650 SerArgPheMetGlyPheAlaGlyLysAspAspSerAspGluLysAiaVal TGCAAGCTCTGGTrGCTGAACTTAAGAAACTGACTGATAGCATTGATA~r 2651 2700 GlinAaLeuValAlaGluLeuLysLYBLeuThrAspSerlleAsple AATATCACCCr=TCAGGAAATGGTATCGATAAAGCTCACCTTGAACGTGA 2701 2750 AsnhleThrLeuSerGlyAsnGlylleAspLysAlaHisLeuGluArgGlu
ACTTGATAAATTGGCTGACCTTGTTTATGATGATCAATGTACTCCTGCTA
2751 2800 LeuAspLysLeuAlaAspLeuValTyrAspAspGlnCysThrPraAlaAsn ATCCTCGTCAACCAAGAATTGATGAGA TTGTGTTAGATCAA 2801 2850 ProArgGlnProArglleAspGluIleLyGlnLeuLeuLeuAspGln
TACTATAATCTGTTGATAAAATTATTAAAACGCTCTGATGAATTCGTCA
-P7- 2900 TyrEndEnd (SEQ ID NO:4) GAGCATTT=TTATrATAGCTTATACAACTATCAAAAGGTATAAATCAATT 2901 2950
TCGATATAGGCTCTTTTCACTCCATTGATTTATGCATTTCTATAAAAATC
3000
AATAATTAATTAGCGATAGAAGTCGAGTTCATGCATGCTAATAATGAAAT
3001 +3050 TGrrrAAATTCTGGTr'FTCTTTATGTrCTTTGCGAACATCTCACAG 3051 3100 TITCTTTGTTCATGAAAATCCTCCTTATATGGTACTA TGAGCCCA 3101 3150 AATAGTTATATAAGAATCCTAAACTTCGGATATCITATCAAAG (SEQ ID NO:3) 3193 pg-qr/nWG'7 M24 W ORM179rpy ~X~l O~flQ~' V tt~'V41. a..,IV..J The L. lactis adhE gene of strain DB1341 encodes a 903 amino acid long protein, as deduced from the DNA sequence (Table with an estimated molecular weight of 98.2 KDa. A putative ribosome binding site (AAAGGAG, position 127-133 in Ta ble 1.4 is found 11 bp upstream of the start codon (de Vos and Simmons 1994).
Homology comparisons have shown a identity similarity) of the L. lactis AdhE to the E. coli protein and 42.4% identity similarity) to the Clostridiium acetobutylicum Aad protein throughout an approx. 750 amino acids fragment (Tables 1.4 and A significantly lower homology is observed at the Cterminal region of these three proteins.
Table 1.5. Protein homology search (FASTA. GCG Wisconsin package version 8. Genetics Computer Group) using the deduced secruence of the AdhE protein encoded by the L. lactis DB1341 a ci E grene In this Table only alignment of the best two scores coli AdhE and C. acetobutylicumn Aad) is shown.
(Peptide) FASTA of: adhedbl34l.pep from: 1 to: 904 TRANSLATE of: adhedb246.seq check: 3519 from: 145 to: 2856 The best scores are: initi sw:adhe ecoli P17547 escherichia coli. alcohol dehydr. .708 sw:adhe cloab P33744 clostridium acetobutylicum. alcoh... .404 sw:adhl-cloab P13604 clostridium acetobutylicum. nadph 283 sw:sucd~cloki P38947 clostridium, kluyveri. succinate-s. .290 sw:medhi -bacmt P31005 bacillus methanolicus. nad-depend. .187 sw:adh2 -zymmo P06758 zymomonas mobilis. alcohol dehydr... .170 sw:adh4_yeast P10127 saccharomyces cerevisiae (baker's .173 sw:dhat -citfr P45513 citrobacter freundii. 1,3-propan .163 sw:eute-salty P41793 salmonella typhimurium. ethanolam. .150 initn opt 1819 1507 1297 1053 581 434 460 621 389 298 376 299 368 295 329 295 309 372 adhedbl34l .pep sw:adhe-ecoli ID ADHE ECOLI AC P17547; STANDARD; STADAD; PRT; 890 AA.
WO 98/07867 WO 9807867PCT/flK97/00336 42 DT 01-ATJG-1990 (REL. 15, CREATED) DT 01-AUG-1990 (REL..15, LAST SEQUENCE UPDATE) DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) DE ALCOHOL DEHYDROGENASE (EC 1.1.1.1) (ADH) ACETALDEHYDE DEHYDRO-
GENASE...
SCORES Initi: 708 Initl: 1819 Opt: 1507 44.3Ws identity in 757 aa overlap 20 30 40 50 adhe2 4 MATKKAAPAAKKVLSAEEKAAKFQEAVAYTDKLVKKAQAAVLKFEGQTQVDTIVAM Ii11:::: 1 1:1 1 adhe-e AVTNVAEIJNALVERVKKAQREYASFTQEQVDKI
FRAAA
20 s0 90 100 110 120 adhe2 4 LAASKHSLELAHEAVNETGRGVVEDKDTNHFASESVNAIKNqDKcVGvIENKVAGXJE adhe-e LAAADARIPLAKM~AVAESGMGIVEDKVIKNHFASEYIYNAYKDEKTCGVLSEDDTFGTIT so 60 70 80 130 140 150 160 170 180 adhe2 4 IASPLGVLAGIVPTTNPTSTAIFKSLLTAKRNAIVFAFHPQAQKCSSHAAKCIVYDAAIE adhe-e IAEPIGIICGIVPTTNPTSTAIFKSLISLKTRNAIIFSPHPRAKDATNKAADIVLQAAIA 100 110 120 130 140 150 190 200 210 220 230 240 adhe24 AGAPEDFIQWI EVPSLDMTrALIQNRGLATILATGGPGMVNAALKSGNPSLGVGAGNGAV adhe-e AGAPKDLIGWIDQPSVELSNALMHHPDINLILATGGPGMVKAAYSSGKPAIGVGAGTPV 160 170 180 190 200 210 250 260 270 280 290 300 adhe2 4 YVDATANIERAVEDLLLSKRFDNGMICATENSAVIDASVYDEFIAGQEQGAYMVPKmcy adhe-e VIDETADIKRAVASVLMSKTFDNGVICASEQSVVVWDSVYDAVRERFATHGGYLLQGKEL 220 230 240 250 260 270 310 320 330 340 350 360 adhe2 4 KAIESFVFVERAGEGFGVTGPVAGRSGQWIAEQAGVKVPKDKDVLLFELDKXNIGEALSS adhe-e KAVQDVIL KNG -ALNAIVGQPAYKIAELAGFSVPENTKILIGEVI'VVDESEPFAj{ 280 290 300 310 320 330 370 380 390 400 410 419 adhe2 4 EKLSPLLSIYKAETREEGIEIVRSLLAYQGAGHNAMQIGAADP FVEYGEKVEAzSRI adhe-e EKLS PTIAYRADFEDAVEKAEKLVAMGGIGHTSCLYTDQDNQPVSYFGQTRI 340 350 360 370 380 390 420 430 440 450 460 470 479 adhe2 4 LVNQPDS IGGVGDIYTDAM~RPSLTLGTGSWGKNSLSHNLSTYDLLNVKTVACRRNRPQWV adhe-e LITAQGGLNKASTGGWGNIEVPHIKTAREMW 400 410 420 430 440 450 WO 98/07867 43 PCT/DK97/00336 480 490 500 510 520 530 adhe24 RLPKEIYYEKNAI SY LQE LPHVHK -API VADPGMVKPGFVKVLEQLAIRPTQVETSI adhe-e KLPKS IYFRRGSLPIALDEVITDGHKRALIVDRFLFNNGYADQITSVL KAAGVETEV 460 470 480 490 500 510 540 550 560 570 580 590 adhe2 4 YGSVQPDPTLSEAIAIARQMKQFEPDTVI CLGGGSALDAGKIGRLIYEYDARGEADWLSDD adhe-e FFEVEADPTLSIVRKGAELANSFKPDVIIALGGGSPMAAIMWVMYE- 520 530 540 550 560 600 610 620 630 640 650 adhe24 ASLKELFQELAQKFVIRKRIIKFYH- PHKAQMVAIPTTSGTGSEVrPFAVITDDETHVK adhe-e--------FEELALRFDIRCRIYKFPKdGVKAKNIAVTTSGTGSEVTPFAVVTDDATGQK is 570 580 590 600 610 660 670 680 690 700 710 adhe24 YPLZADYQLTPQVAIVDPEFVMTVPKRTVSWSGID.AMSHALESfVSVMSSDYTKPISLQAI adhe-e YPLADYALTPDMAIVDANLVDMPKSLCAFGGLDAVHMAYVSVLASEFSDGQALQAL 620 630 640 650 660 670 720 730 740 750 760 770 adhe2 4 KLI FENLTESYHYDPAHPTKEGQKARENHNAATLAGMAFANAPLGINHSLAHKIGGEFG adhe-e KLLKEYLPASYHEGSKNqPVARERVHSAATIAGIAFAN-AFLGVCHSMAHKLGSQFH.i
PHG
680 690 700 710 720 730 780 790 800 810 820 830 adhe2 4 LPHGLAIAIAMPHVIKFNAVrGNVKRTPYPRYETYRAQEDYAEI SRFMGFAGKDDSDEKA adhe-e LANALLICNVIRYN2ANDNPTKQTAFSQYDRPQARRRYAEIADHLGLSAPGDRTAAIIKL 740 750 760 770 780 790 adhe24: SEQ ID NO:5; adh-e: SEQ ID NO:6 adhedbl34l .pep sw:adhe-cloak, ID ADHECLOAB STANDARD; PRT; 862 AA.
AC P33744; DT 01-FEB-1994 (EEL. 28, CREATED) DT 01-FEB-1994 (REL. 28, LAST SEQUENCE UPDATE) DT 01-FEB-1995 (EEL. 31, LAST ANNOTATION UPDATE) DE ALCOHOL DEHYDROGENASE (EC 1.1.1.1) (ADH) ACETALDEHYDE DEHYDRO-
GENASE...
SCORES Initi: 404 Initn: 1297 Opt: 1053 38.6* identity in 568 aa overlap 20 30 40 50 adhe2 4 MATKKAAPAAKKVLSAEEKAAKFQEAVAYTDKLVKKAQAAVLKFEGYTQTQVDTIVAM i 11:1:111:1 I adhe-c MKVTTVKELDEKLKVIKEAQKKFSCYSQEMVDEI FRNAA 20 WO 98/07867 PCT/DK97/fl336 44 80 90 100 110 120 adhe2 4 LAASKHSLELAHEAVNETGRGVVEDKDTKNHFASESVYNAIKNDKTVGVISENKVAGSVE adhe-c MA IDARIELAKAAVLETGMGLVEDKVIKNHFAGEYIYNKYKDEKTCGI IERNEPYGITK 40 50 60 70 80 130 140 150 160 170 180 adhe2 4 IASPLGVLAGIVPTTNPTSTAI FKSLLTAYTRNAIVFAFHPQAQKCSSHAAKIVYDAAIE adhe-c IAEP IGVVAAI IPVTNPTSTTIFKSLI SLKTRNGI FFSPHPRAKKSTILAAKTILDAAVC 100 110 120 130 140 150 190 200 210 220 230 240 adhe24 AGAPEDFIQWIEVPSLDMTTALIQNRGLATILATGGPc4MVNAALKSGNPSLGVGAGNGAV adhe-c SGAPENIIGWIDEPSIELTQYLMQKADIT- -LATGGPSLVKSAYSSGKPAIGVGPGNTpv 160 170 180 190 200 210 250 260 270 280 290 300 adhe24 YVDATANIERAVEDLLLSKRFDNGMI CATENSAVIDASVYDEFIA]K4QEQGAYMVPKKDY adhec I IDESAHIIK4AVSS I LSKTYDNGVICASEQSVIVLKSIYNKVKDEFQERGAYI IKKNEL 220 230 240 250 260 270 310 320 330 340 350 360 adhe2 4 KAIES FVFVERAGEGFGVGPVAGRSGQWIAEQAGVKPDVLLFELDKKNIGALSS adhec DKVREVIF- -KDG- -SVNPKIVGQSAYTIAAMAGIKVP1CTRILIGEVTSLGEEEPFAH 280 290 300 310 320 330 370 380 390 400 410 419 adhe2 4 EKLSPLLSIYKAETREEGIE IVRSLLAYQGAGHNAAIQIGAMDDP -FVKEYGEKVE.ASRI adhe-c EKLSPVLMYEADNFDDAJJKKAVLINLGGLGHTSGIYADEIKDKIDRFSSMKTVPT 340 350 360 370 380 390 420 430 440 450 460 470 479 adhe24 LVNQPDSIGGVGDIYTDAMRPSLTLGTGSWGKNSLSHNLSTYDLLNVKTV.AKRINRPQWV adhe-c FVNI PTSQGASGDLYINFRI PPSFTLGCGFWGGNSVSENVGPKHLLNIKAEREMWF 400 410 420 430 440 450 480 490 500 510 520 530 adhe2 4 RLPKEIYYEKNAI SY -LQELPHVHK -AFIVADPGMVKFGFVDKVLEQLAIRPTQVETS I 1:1 I I11: I I adhe-c RVPHKVYFKFGCLQFALKDLKDLKKKRAFIVTDSDPYNLNYVDSI IKILE HLDIDFKV 460 470 480 490 500 510 540 550 560 570 580 590 adhe24 YGSVQPDPTLSEAIAIARQMKQFEPDTVICLGGGSALDAGKIGRLIYEYDARGEADLSDD adhec FNKVGREADLKTIKKATEEM4SSFMPDTI IALGGTPEMSSAKLMWVLYEHPEVKFEDLAIK 520 530 540 550 560 570 600 610 620 630 640 650 adhe2 4 ASLKELFQEjLAQKFVDIRKRI IKFYHPHKAQMVAIPTTSGTGSEVTPFAVITDDETHVKY adhe-c FMDIRKRIYTFPKLGKKLVAITTSAGSGSEVPFALVDNNTG1YAYEMTrPNMA 580 590 600 610 620 630 adhe24: corresponding to amino acid residues 1-656 of SEQ ID adhC: corresponding to amino acid residues 1-630 of SEQ ID NO:11 WO 98/07867 PCT/DK97/00336 WO 9807867PCT/DC97/00336 6. Inverse PCR to obtain sequences upstream of the L. lactis DB1341 adhE coding sequence and cloning of PCR fragments Inverse PCR was used to obtain additional sequences from the upstream region of the L. lactis DB1341 adhE gene. HindIII-, HpaI- or PvuII-digested genomic DNA of strain DB1341 was ligated at low concentration and PCR was carried out using primers adhE-350 and adhE-700 (or adhE1300x) (see Fig. 2).
Sequence analysis of the obtained PCR products, using primers adhE-240 (or adhE-1300x), allowed the identification of the upstream region of the adhE gene. A 0.6 kb PCR product obtained from HindIII inverse PCR amplification was subsequently cloned into pSMA500 resulting in E. coli DH5u strain adhEup-1.
A sample of adhEup-l was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11091.
Further inverse PCR was carried out using PstI-digested and religated chromosomal DNA of strain DB1341, using primers derived from the above sequence. An about 5 kb PCR product was obtained which in addition to the entire coding sequence of the adhE gene comprises about 1800 bp upstream of the coding sequence. This upstream sequence includes an open reading frame, designated orfB that encodes a putative 341 aa protein having no homology to in available databases.
Table 1.6. DNA sequence upstream of the coding sequence of the L. lactis DB1341 adhE gene PstI 1 CTGCAGCTTGTTTTTTAGTACCAACAAAAAGGACTACTGCACCTTCTTGT 51 GAAGCGTTTTTTACATAGTTGTAAGCATCGTCAACAAGTTTACAGTTT 100 101 TTGAAGGTCGATAACGTGGATACCATTACGTTCTGTGAAGATGTATGGTT 150 151 TCATTTTTGGGTTCCAACGACGAGTTTGGTGACCGAAGTGAACACCAGCT 200 201 TCAAGAAGTTGTTTCATTGAAATAACTGACATGTAATGTCTCCTTTTAA 250 251 AATAGTTTTTCCTCTTTCATCTGTCATCCGCAGCCGCAATACTTGCGTAC 300 301 ACTACGACTTTGTCGAGACGAAATGCGAGATGGTTGCATAGCAACTCTAT 350 351 CATTATACATTGTTTGACCTATTTTTGCAAGTATCTATTCATGCTTCTAT 400 PCUDWO-7/00-419 WO 98/07867 C/lrOIn2~ 401 TGITCAGTAAATCTATITITCTAACCACTCCTATrATCTGACAAATrTAA 450 451 TTGTrAATTAGGCTCTATAATCACTAAAAGAGTAAGTTTTTAAATTrT 500 501 ITT AAAAAr A rGCTGAAACCGCT-rrrrGTGATAA 550 551 AATAA7TATAGTAAATAAATTAGTTGTGAGGAGAGAAATATGAAAGAAA 600 OrfB M K E K 601 AAATCCTTTTAGGCGGCTATACAAAACGTGTATCTAAAGGCGTATATAGT 650 I LL G G Y T K R V S K G V Y S 651 G ITCTTTTGGACACTAAAGCTGCTGAATTATCATCATTAAATGAAGTCGC 700 V L L D T K A AE L S S L N EV A 701 TGCGGTTCAAAACCCTACTTATATCACTCTCGATGAAAAGGGACACCTCT 750 A VQ N P T Y I T L D E K G H L Y 751 ATACTTGTGCAGCAGATAGTAATGGTGGAGGAATCGCCGCCTTTGATTT 800 T C A AD S N G G G I A A F D F 801 GATGGCGAAACTGCTACTCATCTCGGAAATGTCACAACCACGGGAGCTCC 850 D G E T AT H L G N VT T T GA P 851 ACTCTGCTATGTrGCCGTGGACGAAGCGCGACAATTAGTT'ACGGAGCGA 900 L C YV AVD E A R Q L VY G A N 901 ACTATCATCTTGGAGAAGTCGTGTTTrATAAGATTCAAGCTAATGGCTA 950 Y H L G E V R V Y K I QA N G S 951 CTCCGATTAACGGATACAGTAAAACATACCGGTTCTGGACCACGTCCTGA 1000 L R L T D T V K H T G S G P R P E 1001 ACAAGCTAGCTCACACGTTCATTATrCTGATTTGACTCCTGACGGACGAC 1050 Q A SS H V H Y S D L T P D G R L 1051 rTGTCACCTGTGAT'ITGGGAAC-AGATGAAGTCACTGTTTATGATGTCATT 1100 V T C DL G T D E V T VY DV I 1101 GGTGAAGGTAAACTCAATATTGCTACAATTTATCGGGCAGAAAAAGGAAT 1150 G E G K L N I A T I Y RA E K G M 1151 GGGTGCTCGTCATATTACTTTCCATCCAAATGGTAAAATCGCTTATTTGG 1200 G AR H IT F H P N G K I A Y L V 1201 TTGGAGAGrrAAA rCAACAATTGAAGTTTTAAGTrACAATGAAGAAAAA 1250 G EL N S T I E V L S Y N E E K 1251 GGACGCI=TGCTCGTCTTCAAACAATTAGCACCCTACCTGAAGATTATCA 1300 G R F A R L Q T I S T L P E D Y H 1301 TGGAGCAAATGGTGTTGCTGCCATCCGTATTrCATCTGACGGTAAATrCC 1350 G A N G V AA I R IS S D G K F L 1351 TCTATACTTCTAATCGTGGACATGATTCTTGACAACTACAAAGTAAGT 1400 Y T S N R G H D S L TT Y K V S 1401 CCTCTTGGTACAAAACTTGAAACTATTGc4CTGGACAAATACTGAAGGTCA 1450 P L G T K LE T I G W TN T E G H 1451 TATCCCTCGCGATITTAA'ITTCAACAAAACTGAAGAITATATCATTGTCG 1500 I P R D F N F N K T E D Y I I V A 1501 CTCATCAAGAATCTGATAATTTATCTCTTTrCTGCGAGATAAAAAAACC 1550 H Q E S D N L S L F L R D K K T 1551 GGTACTTTAACTTTGGAACAAAAAGATT'TTACGCTCCTGAAATCACTTG 1600 G T L T L E Q K D F Y AP E I T C 1601 TGTTTTACCACTATAAAAATTATMrrCACAAAGTTTGACTGATAAAC 1650 V L P L Stop (SEQ ID NO:27) 1651 TAAAAAAGATTGCTAAmCTCTCAAAGAATTAGCAATCTTTTTTCT-C 1700 1701 AGTAAAGCTTGTTACAAAACCGT TCTAAAC'TTTrGATGAGTGTTT1-G 1750 1751 TAAAAACTATCACAATATGCTTGACATCTATAACTTrGITTjxACT 1800 1801 ATTCACGTAAAAGAAAGTGAATGAAGTCACAAAGGAGAACCTACAAJAT (SEQ ID NO:26) WO 98/07867 PCT/DKT/l0nnl; 47 7. Sequence of a fragment of the L. lactis strain MG1363 adhE gene PCR was used to characterize the adhE homologue of strain MG1363. Primers adhE-mgl and adhE-1697 were used to amplify a 1.5 kb fragment from this strain, named MGadhESTART. Primers adhE-1300x and adhE-mg2 were used to amplify an overlapping kb fragment, named MGadhESTOP (Fig. 3).
The above fragments were subsequently cloned into the plasmid pGEM and transformed into E. coli DH5a resulting in strains MGadhESTART and MGadhESTOP, respectively. Using the relevant primers a sequence was obtained that spans from position 1306- 2775 shown in Table 1.2. An additional primer adhe-mg3 CTTCTTTGGTTGGATGAGC-3') (SEQ ID NO:7), derived from the MG1363 adhE sequence and corresponding to position 2359-2335 of the DB1341 adhE sequence (Table 1.4) was used to fill a sequence gap. A limited sequence variation at the DNA level (84 base changes, no insertion/deletions in the 1470 bp MG1363 adhE fragment, corresponding to 5.7 variation; Table 1.7 below), resulting in only 8 amino acid substitutions (or 1.6 variation; Table 1.7).
A sample of E. coli DH5a strain MGadhESTART and strain MGadhESTOP, respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11089 and DSM 11090, respectively.
Table 1.7. Multialignment of the deduced L. lactis AdhE protein from strain MG1363 (fragment. adhemrl363) and DB1341 (adhedbl3- 41) with the E. coli (adhe ec) and C. acetobutvlicum (aad ca) AdhE homologues The program lineup (GCG Wisconsin package version 8, Genetics WO 98 7867 WO 98/07867PCT/DCQ7/00336i 48 Computer Group) was used f or the alignment. The consensus, sequence (bold type at bottom) shows only conserved residues for all proteins. The differences between the two L. l.actis AdhE proteins are shown as bold, underlined in adhemg1363.
adhemg13 63 adhedb1341 adhe ec aad~ca consensus adhemgI3 63 adhedb134 1 adhe ec aad~ca consensus adhemg13 63 adhedb134l adhe ec aad~ca consensus adhemgl 363 adhedb134l adhe ec aad ca consensus adhemg1363 adhedb1341 adhe ec aad~ca consensus adhemg1363 adhedbl341 adhe ec aad~ca consensus adhemg13 63 adhedb134 1 adhe ec aad~ca consensus adhemg13 63 adhedb134 1 adhe ec aad~ca consensus
MATKKAAPAA
MAVTNWA.. M 51
QTQVDTIVAA
101
NAIKNDKTVG
NAYKDEKTCG
NKYKDEKTCG
151
AKTRNAIVFA
LKTRNAI I S
LKTRNGIFFS
KTRN 201 TTALIQN6GL
SN'ALMHHPDI
TQYLMQKADI
251
ERAVEDLLLS
KRAVASVLMS
KKAVSSIILS
AV..S
301
DYKAIESFVF
ELKAVQDVIL
ELDKVREVIF
KKVLSAEEKA
ELNALVER
ELDEKLKV
MALAASKHSL
AALAAADARI
AAMAAIDARI
.A.AA..
VISENKVAGS
VLSEDDTFGT
IIERNEPYGI
FHPQAQKCSS
PHPRAKDATN
PHPRAKKSTI
ATILATGGPG
NLILATGGPG
T. .LATGGPS
LATGGP.
KRFDNGMICA
KTFDNGVICA
KTYDNGVICA
DNG.ICA
VERAGEGFGV
.KNG. .AL *.KDMG.. .SV AKF QEAVAY
VKKAQREYAS
IKEAQKKFSC
.Q
ELAHEAVNET
PLAKMAVAES
ELAKAAVLET
.LA. .AV.E.
TDKLVK1CAQA
QE
QE
.Q.
GRG VVEDKDT
GMGIVEDKVI
GMGLVEDKVI
G .G .VEDK..
yE IASPLGVL AGIVPTTNPT ITIAEPIGII cGIVP ITNPT TKIAEP 1GV AAIIPVTNPT
.I.P.TNPT
HAA.KIVYDAA
KAADIVLQAA
LAAKTILDAA
MVNAALKSGN
MVKAAYSSGK
LVKSAYSSGK
.VK.A. .SG.
TENSAVIDAS
SEQSVVVVDS
SEQSVIVLKS
S
TGPVAGRSGQ
NAAIVGQPAY
NPKIVGQSAY
IEAGAPEDF'I
IAAGAPKDLI
VKSGAPENII
GAP... I1
PSLGVGAGNG
PAIGVGAGNqT
PAIGVGPGNT
P. IGVG.GN.
VY-DEFIAK4Q
VYDAVRERFA
IYNKVKDEFQ
WIAEQAGVKV
KIAELAGFSV
TIAAMAGIKV
.IA. .AG. .V AVLK. FEGYT
QVDRIFRA..
MVEIFRN. 100
KNHPASESVY
KNHFASEYIY
KNHFAGEYIY
KNHFA.E. .Y 150
STAIFKSLIS
STTIFKSLIS
ST. IFKSLI.
200
QWIEVPSLDM
GWIDQPSVEL
GWIDEPSIEL
.wI. 250
AVYVDATANI
PVVIDETADI
PVIIDESAHI
.A.I
300
EQGAYMVPKK
THGGYLLQGK
ERGAYI IKKN 350
PKDKDVLLFR
PEN'TKILIGR
PKTTRILIGE
P..LIGE
400
QGAGHNAAIQ
QGAGHNAAIQ
GGIGHTSCLY
GGLGHTSGIY
G.GH
LDKKNIGEAL
VTVVDESEPF
VTSLGEEEPF
SSEKLSPLLS IYKAETREEG AHEKLSPTLA MffRAKDFEDA AHEKLSPVLA MYEADN'FDDA .EKLSP.L.
.Y
IE IVRSLLAY
VEKAEKLVAM
LKKAVTLINL
WO 98/07867 WO 9807867PCT/DK9700336 adhemgl3 63 adhedbl34 1 adhe ec aad-ca consensus adhemgl3 63 adhedbl34 1 adhee aad~ca consensus adhemgl3 63 adhedbl34l adhe_ec aad-ca consensus adhemgl3s3 adhedbl341 adhe ec aad~ca consensus adhemgl363 adhedbl34 1 adhe-ec aad-ca consensus adhemgl3 63 adhedbI34l adhe ec aad~ca consensus adhemgl363 adhedbl34 1 adhe ec aadca consensus adhemgl3 63 adhedbl34l.
adhe ec aad~ca consensus adherngl363 adhedbl341 adhe ec aadCa consensus 401 IGAMDDP .FV
TDQDNQPARV
JADEIKARDKI
451
SWGKNSLSHN
SWGKNSLSHN
SWGGNSISEN
FWGGNSVSEN
.WG .NS N
KEYGIKVEAS
KEYGEKVEAS
SYFGQKMKTA
DRFSSAM1KTV
RILVNQPDSI
RILVNQPDSI
RILINTPASQ
RTFVNIPTSQ
.N.P.S.
GGVGDIYTDA
GGVGDIYTDA
GGIGDLYNFK
GASGDLYNFR
G. LSTYDLLNVK TVAKJRRNRPQ LSTYDLIJNVK TVAKRRNRPQ VGPKHLINKK TVAKRAENNIJ VGPKHLLNIK TVAERRENMVL L.N.K TVA.R 501 LPHVHK. A
LPHVHK..A
EVITDGHKRA
DLKDLKKKR1A 551
TLSEAIAIAR
TLSEAIAIAR
TLSIVRKGAE
DLKTIKKATE
601
DDASLKEIFQ
DDASLKELFQ
.FE
.FE
.F.
651
FAVITDDETH
FAVITDDETH
FAVVTDDATG
FALVTDNNTG
PA. .TD. .T.
701 ALESYVSV14S ALESYVS VMS
AMRAYVSVLA
SIRAYvrSVYA
.E.Y.SV..
751
MHNAATLAGM
MHNAATLAGM
VHSAATIAGI
MAHASTMAGM
801
AVTGNVKFTP
AVTGNVKRTP
ANDNPTKQTA
AVDNPVKQAP
FIVAflPGMVK
FIVADPGMVK
LIVTDRFLFN
FIVTDSDPYN
.IV.D QZ4FEPDTV
QMKQFEPDTV
LANSFKPDVI
EMSSFMPDTI
F.PD..
ELAQKFVDIR
ELAQKFVDIR
ELALRFHDIR
DLAIKFMIR
.LA. .F.DIR
VKYPLADYQL
VKYPLAflYQL
QKYPLADYAL
NKYMLADYEM
KY. LADY..
SDYTKPISLQ
SD'rrKPISLQ
SEFSDGQALQ
SEYTNGLALE
S L.
AFANAFLGIN
AF.ANAFLGIN
AFANAFLGVC
ASANAFLGLC
A.ANAFLG..
YPRYETYRAQ
YPRYETYRAQ
FSQYDRPQAR
CPQYKYPNTI
FGFVDKVLEQ
FGFVDKVLEQ
NGYAflQITSV LNYVDS IIKI
ICLGGGSALD
ICLGGGSAAD
IALGGGS PHD
IALGGTPEMS
I .LGG
KRIIKFYH.-P
KRIIKFYH. P
KRIYKFPKMG
KRIYTFPKLG
KRI.
TPQVAIVDPE
TPQVAIVDPE
TPDMAIVDAN
TPNMAIVDAE
TP. .AIVD..
AIKLI FENLT
AIKLIFENLT
AIJKLLKEYLP
AIRLIFKYLP
AI.L. .L.
HSLAIIKIAGE
HSLAHKIGGE
HSMAHKIJGSQ
HSMAIKLSSE
HSM&
EDYAEISRFM
EDYABISRFM
RRYAEIADHL
FRYARIADYI
WVRLPKEIYY
WVRLPKEIYY
WHKLPKSIYF
WFRVPHKVYF
pT...EY LAI RPTQ VET L. .KAAGVET L. .EHLD)IDF
AGKIGIU
1
IYE
AGKIGRLIYE
AAKIMWVMYE
SAKLMWVLYE
.YE
HKAQMVAIPT
HKAQMVAI PT
VKAKMIAVT
KK7AMLVAITT
.T
FVMTVPKRTV
FVMTVPKRTV
LVbMDPKSLC
LMMKMPKGLT
PK
ESYHYDPAHP
ESYHYDPAHP
ASYHEGSKNP
EAYKNGRTNE
FGLPHGLAIA
FHI PHGLAKA
HNIPSGIANA
P.G.A.A
GFAGKEDSDE-
GFAGKDDSDE
GLSAPGDRTA
KLGG1NTDEEK 450
MPSLTLGTG
MRPSLTLGTG
LAPSLTIJGCG
IPPSFTLGCG
.PS.TLG.G
500
EKNA-ISYLQE
EKKAISYLQE
RRGSLPIALD
KFGCLQF.ALK
550
SIYGSVQPDP
S IYGSVQPDP
EVFFEVEADP
KVFNKVGREA
600
YDARGEADLS
YDARGEADLS
HPETH..
HPEVK..
650
TSGTGSEVTP
TSGTGSEVTP
TSGTGSEVTP
SAGSGSEVTP
GSEVTP
700
SWSGIDAMSH
SWSGIDAMSH
AFGGLDAVTH
AYSGIDAIVN
750
TKEGQKAREN
TKEGQKAREN
VARER
FAREK
ARE.
800
IAMPHVIKFN
IAMPHVIKFN
LLICNVIRYN
LLIEEVIKFN
VI..N
850
KAVKAFVAEL
KAVQALVAEL
AKIEKLLAWL
VDLLINKIHE
WO 98/07867 WO 9807867PCTIDK97/00336 adhemgl3 63 adhedb1341 adhe ec aad-ca consensus adhedbl341 adhe ec aad~ca conlsensus 851 900 KKLTDSIDINq ITLSGN. .GV DKAHLERELD KLADLV KKLTDSIDIN ITLSGN. .GI DKAHLERELD KLADLVYDDQ CTP.AIPRQPR ETLKA. .ELG IPKSIREAGV QEADFLANVD KLSEDAFDDQ CTGANPRYPL LKKAL. N IPTSIKDAGV LEENFYSSLD RISELALDDQ CTGANPRFPL DDQ CT.ANPR.P.
901 941 IDEIKQLLLD QY* ISELKQILLD TYYGRDYVEG ETAAKKEAAP AKAEKKAXKS A TSEIKE1MYIN CFKKQP .E.K adhemgl363: SEQ ID NO:8; adhedbl34l: SEQ ID N0:9; adhe-ec: SEQ ID aadCa: SEQ ID N0:11 Table 1. 8. Alignment of the adhE secruences from L. lactis DB1341 and MG1363 The complete sequence of the acdhE gene of strain DB1341 is compared to the sequence obtained via PCR amplification of MG1363 adhE fragments (see Fig. 2).
adhemgl363 adhedbl34 1 Consensus adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedb13 41 consensus adhemgl3 63 adhedbl34 1 consensus adhemgl3 63 adhedbl34l consensus AAGCTrGTrA CAAAACCGTT 51 AACTACACA ATATTGCTTG 101 ACGTAAAAGA AAGTGAATGA 151 ACTAAAAAAG ccGcTccAGc 201 AGCCGCAAAA TTCCAAGAAG 251 AAGCAACAAGC TGCTGTTCTT TTCTAAACTT TTGATGAGTG ACATcTATAA AAAACTTrGT AGTCACAAAG GAGAACCTAC TGCAAAGAAA GTTTTAAGCG CTGTTrGCTTA TACTGACAA1A AAATTTGAAG GATATACACA 100
TAAACTATTC
150
AAATAGCA
200
CTGAAGAAAA
250
TTAGTCAAAA
300
AACTCAAGTC
350
ATTCTCTAGA
adhemgl3 63 adhedbl34l consensus adhemgl363 adhedb134l consensus 301 GATACTATTG TCGCTGCAAT GGcTCTTGCA GCAAGCAAAC WO 98/07867 PCT/DK97/00336 adhemgl363 adhedbl341 consensus adhemgl363 adhedbl341 consensus adhemgl363 adhedbl341 consensus adhemgl363 adhedbl34l consensus adhemgl363 adhedbl341 consensus adhemgl363 adhedbl34l consensus 351 ACTCGCTCAT GAAGCCGTA 401 AAGATACCAA AAACCACTTT 451 AATGACAAAA CTGTTGGTGT 501 TGAAATCGCA AGCCCTCTCG 551 ATCCAACATC AACAGCAATC 601 AATGCTATTG TTTTCGCTT 651 TGCAGCAAAA ATTGTTTACG 701 ACTTATTCA ATGGATTGAA 751 ATTCAAAACC GTGGACTTGC 801 GGTAAACGCC GCACTCAAAT 851 GTAATGGTGC TGTrrATGTT ACGAAACTGG TCGTGGTGTT GCTTCTGAAT CTGTTATAA CAITrCTGAA AACAAGGTTG GTGTACTGC TGGTATCGTT 400
GTCGAAGACA
450 CGCAATrAAA 500
CTGGATCTGT
550
CCAACGACTA
600
AAAAACACGT
650
GTTCAAGCCA
700
GCACCGGAAG
TTTAAATCTT
CCACCCTCAA
ATGCTGCAAT
TATTGACTGC
GCTCAAAAAT
TGAAGCTGGT
adhemgl363 adhedbl341 consensus adhemgl363 adhedb1341 consensus adhemgl363 adhedbl34l consensus adhemgl363 adhedbl341 consensus adhemgl363 adhedbl34l consensus adhemgl363 adhedbl34l consensus adhemgl363 adhedbl341 consensus GTACCAAGCC TTGACATGAC AACAATCCTT GCAACTGGTG CTGGTAACCC TTCACTCGGT 750
TACCGCCTTG
800
GCCCAGGAAT
850
GTTGGAGCTG
900
ACGTGCCGTT
901 GAAGACCTTT TGCT=TCAAA ACGTTTTGAT 951 TGAAAATICA GCTGTTATTG ATGCTTCAGT 950 AATGGGATGA TTTGTGCCAC 1000 TTATGATGAA TTTATTGCTA WO 98/07867 WO 9807867PCT/DK97/00336 1001 1050 adhemgl3 63 adhedb134 1 consensus AAATGCAAGA ACAAGGCGCT TATATGG'rrC CTAAAAAAGA CTACAAAGCT 1051 1100 adhemgl3 63 adhedbl34 1 consensus adhemgl363 adhedbl34l consensus adhemgl363 adhedb1341 consensus adhemgl3 63 adhedbl34 1 consensus ATTGAAAGTT TCGTTTTTGT TGAACGTGCT GGTGAAGG'rT ITGGAGTAAC 1101 1150 6T TCCTG' T GCCGGTCG Fr CTGGTCAATG GA*TG CTGAA CAAGC*TGGTG 1151 1200 TCAAAGTTCC TAAAGATAAA GATGTCCTTC TITI=rGAACT TGATAAGAAA 1201 1250 AATAITGGTG AAGCACTI'TC TTCTGAAAAA CTTrCTCCTT TGCTriCAAT 1251 1300 adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedb134 1 consensus CTACAAAGCT GAAACACGTG AAGAAGGAAT TGAGATTGTA CGTAGCTTAC adhemgl363 adhedbl34 1 consensus adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedbl34l consensus 1301
TACCA
TTGCTTATCA
TA.CA
1351
GACGACCCAT
GATGATCCAT
GA.GA.CCAT
1401
CCTCGTTAAC
CCTCGTTAAC
CCTCGTTAkC 1451
ATGCAATGCG
ATGCAATGCG
ATGCAATGCG
1501
TCACTTTCAC
TCACTTTCAC
TCACTTTCAC
1551
GGCTAAACGT
GGCTAAACGT
GGCTA&ACGT
1601
ACTACGAAAA
ACTACGAAAA
ACTACGAAA
AGGAGCTGGT
AGGTGCTGGA
AGG.GCTGG.
CACAACGCTG
CATAATGCTG
CA.AA.GCTG
TTGTCAAAGA ATACGGAA'rr TCGTTAAAGA ATATGGCGAA T.GT.A&AGA ATA.GG....
CAACCTGACT CTATCGGTGG CAACCAGATT CTATTGGTGG CAACC.GA.T CTAT.GGTGG TCCATCATTG ACGCTCGGAA TCCATCACTT ACACTTGGAA TCCATCA.T. AC.CT.GGAA
CAATTCAAAT
CAATTCAAAT
CAATTCAAAT
AAAGTCGAAG
AAAGTTGAAG
AAAGT GILAG
GGTCGGAGAT
GGTCGGAGAT
GGTCGGAGAT
CTGGTTCATG
CTGGTTCATG
CTGGTTCATG
CTATTGAATG
CTATTGAATG
CTATTGAATG
1350
CGGTGCAATG
CGGTGCAATG
CGGTGCAATG
1400 C'FrCTCGTAT
CTTCTCGTAT
CTTCTCGTAT
1450
ATI'TATACTG
ATCTATACTG
AT. TATACTG 1500
GGGGAAAAAT
GGGGAAAAAT
GGGGAILAAAT
1550
TTAAAACAGT
TTAAAACAGT
TTAAAACAGT
adhemgl3 63 adhedb134 1 consensus adhemgl3 63 adhedbl34l consensus
ACAATTTGAG
ACAAITrTGAG
ACAATTTGAG
CGTAATCGCC
CGTAATCGCC
CGTAATCGCC
AAATGCAATT
AAATGCAATT
AAATGCAATT
TACATACGAT
TACATACGAT
TACATACGAT
CTCAATGGGT
CACAIATGGGT
C .CAATGGGT
TCTTACTTAC
TCTTACTTAC
TCTTACTTAC
1600 TCGTTTGCC-A AAAGAAATT TCGTTTGCCA AAAGAAAT TCGTTTGCCA AAAGA&ATTT adhemgl363 adhedbl34l consensus AAGAAI-rGCC
AAGAATTGCC
AAGAATTGCC
1650
ACACGTCCAC
ACACGTCCAC
ACACGTCCAC
WO 98/07867 53 PCT/DK97/00336 adhemgl3 63 adhedbl34.
consensus adhemgl3 63 adhedb134 1 conlsensus adhemgl363 adhedbl34 1 consensus adhemgl363 adhedb1341 consensus adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedbl341 consensus adhemgl363 adhedbl34 1 consensus 1651 AAAGCTTrCA AAAGCTrrCA
AAAGCTTTCA
1701
TAAAGTITTG
TAAAGTTTrG
TAAAGTTTTG
1751
TTTATGGCTC
TTTATGGCTC
TTTATGGCTC
1801
GCTCGTCAAA
GCTCGTCAAA
GCTCGTCAAA
1851 TGGTTrCTGCT
TGGTTCTGCT
TGGTTCTGCT
1901
ATGCTCGTGG
ATGCTCGTGG
ATGCTCGTGG
1951
TTCCAAGAGT
TTCCAAGAAT
TTCCAAGA. T 2001
ATTCTACCAC
ATrCTACCAT
ATTCTACCA.
2051 GTACTGG ITC
GTACTGGTTC
GTACTGGTTC
2101
CACGTTAAAT
CATGTTAAGT
CA. GTTAA. T 2151 TGTrGACCCT
TGTTGACCCT
TGTTGACCCT
2201 CTGGGA'FrGA
CTGGTATTGA
CTGG .ATTGA 2251
TCTTCTGACT
TCTTCTGACT
TCTTCTGACT
TTGTTGCCGA
TCGTI'GCTGA
T.GTTGC.GA
GAACAAC'ITG
GAACAACTTG
GAACAACTTG
AGTCCAACCT
TGTTCAACCT
.GT.CAACCT
TGAACCA I=
TGAAACAAT
TGAA~. CA. TT
CTCGATGCTG
CTCGATGCCG
CTCGATGC.G
TGAGGCTGAC
TGAAGCTGAC
TGA.GCTGAC
TAGCTCAAAA
TAGCTCAAAA
TAGCTCAAAA
CCACACAAAG
CCACATAAAG
CCACA.AAAG
TGAAGTGACT
TGAAGTGACT
TGAAGTGACT
ATCCACTTGC
ACCCACTTGC
A. CCACTTGC
GAGTTTGTA
GAGTTTGTTA
GAGTTTGTTA
TGCTATGTCA
TGCGATGTCA
TGC .ATGTCA
ATACAAAACC
ATACAAAACC
ATACAAAACC
CCCTGGTATG
CCCTGGTATG
CCCTGGTATG
CTATCCGCCC
CTATCCGCCC
CTATCCGCCC
GACCCAAC7T
GACCCAACTT
GACCCAACTT
TGAACCTGAC
TGAACCTGAC
TGAACCTGAC
GTAALGATTGG
GTAALGATTGG
GTAAGATTGG
CTTTCCGATG
CT'ITCTGATG
CTTTC.GATG
ATTTGT'GAT
ATI'TGTCGAT
ATTTGT .GAT
CACAAATGGT
CACAAATGGT
CACAAATGGT
CCA TrTGCGG
CCATTTGCAG
CCATTTGC.G
TGACTATCAA
TGACTACCAA
TGACTA .CAA
TGACTGTACC
TGACTGTACC
TGACTGTACC
CACGCGCTTG
CACGCGCTTG
CACGCGCTTG
AATTTCAC'rr
AATTTCACTT
AATTTCACTT
GTrAAATTCG
GTTAAATTTG
GTTAAATT. G
AACTCAAGTT
AACTCAAGTT
AACTCAAGTT
TGAGTGAAGC
TGAGCGAAGC
TGAG .GAAGC
ACTGTCATCT
ACTGTCATCT
ACTGTCATCT
TCGTTTGATT
TCGTTTGA'TT
TCGTTTGATT
ACGCAAGTT
ATGCAAGTT
A. GCAAGTTT
ATTCGTAAAC
ATTCGTAAAC
ATTCGTAAAC
TGCTATCCCT
TGCAA7TCCT TGC .AT. CCT
TTATCACTGA
'rrATCACTGA
TTATCACTGA
TTGACACCTC
TTAACACCAC
TT .ACACC .C
AAAACGTACT
AAAACGTACT
AAAACGTACT
AATCTTATGT
AATCTTACGT
AATCTTA. GT
CAAGCCATCA
CAAGCGATCA
CAAGC.ATCA
1700
GTITCGTTGA
GTTCGTTGA
GTTTCGTTGA
1750
GAAACAAGCA
GAAACAA3cA
GAAACAAGCA
1800
AATTGCAATC
AATIGCAATC
AATTGCAATC
1850
GTCTTGGTGG
GTCTTGGTGG
GTCTTGGTGG
1900
TATGAATATG
TATGAATATG
TATGAATATG
1950
GAAAGAGATC
GAAAGAACTT
GAILAGA. T.
2000
GTATTATCAA
GTATTATTAA
GTATTAT .AA 2050
ACTACTTCTG
ACTACTITCTG
ACTACTTCTG
2100
TGATGAAACT
TGATGAAACT
TGATGAAACT
2150
AAGTTGCCAT
AAGTTGCCAT
AAGTTGCCAT
2200
GTTTCTTGGT
GTTTC7TGGT
GTTTCTTGGT
2250
TTCTGTCATG
TICTGTTATG
TTCTGT .ATG 2300
AACTCATCT
AACTT-ATCTT
AACT.ATCTT
adhemgl363 adhedbl34 1 consensus adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedbl34 1 consensus adhemgl363 adhedbl34 1 consensus adhemgl363 adhedbl34l consensus WO 98/07867 adhemgl363 adhedbl34l consensus adhemgl363 adhedb134 1 consensus adhemgl363 adhedbl34l consensus adhemgl363 adhedbl34l consensus adhemgl3 63 adhedbl34 1 consensus adhemgl3 63 adhedb134 1 consensus adhemgl3 63 adhedbl34l consensus adhemgl363 adhedbl3 41 consensus adhemgl3 63 adhedbl34l consensus adhemgl3 63 adhedb1341 consensus adhedbl341 consensus adhedbl34 1 consensus adhedbl34l consensus 2301 TGAAAAC ITG ACTGAGTCTT TGAAAACTTG ACTGAGTCTT TGAAAACTTG ACTGAGTCTT 2351 AAGGTCAAAA AGCTCGCGAA AAGGACAAJ4A AGCCCGCGAA AAGG .CAAIAA AGC .CGCGAA PCT/DK97/00336 54 2350 ATCATTATGA CCCAGCTCAT CCAACCAAAG ATCATI'ATGA CCCAGCGCAT CCAACTAAAG ATCATTATGA CCCAGC.CAT CCAAC.AAG 2400 AACATGCACA ATGCTGCAAC ACTCGCTGGT AACATGCACA ATGCTGCAAC ACTCGCTGGT AACATGCACA ATGCTGCAAC ACTCGCTGGT 2401
ATGGCC'ITCG
ATGGCCTTCG
ATGGCCTTCG
2451
AATTGCTGGT
AATTGGTGGT
AATTG .TGGT 2501
TGCCACATGT
TGCCACATGT
TGCCACATGT
2551 CC ITACCCAC
CCTTACCCAC
CCTTACCCAC
2601
TTCACGCTTC
TTCACGCTTC
TTCACGCTTC
2651 TCAAAGC=r
TGCAAGCTCT
T. .AAGCT.T 2701
AATATCACCC
AATATCACCC
AATATCACCC
2751
GCTTGATAAA
ACTTGATAAA
.CTTGATAAA
2801
ATCCTCGTCA
2851
TACTAATAAT
2901 GAG CATT
CCAATGCTI
CTAATGC'FIT
C .AATGCTTT
GAATTGGGC
GAATTTGGAC
GAATTTGG .C
CATTAAATT
CATTAAATT
CATTAAATTT
GTTATGAAAC
G ITATGAAAC
GTTATGAAAC
ATGGGATTTG
ATGGGATTTG
ATGGGATTTG
TGTTGCTGAA
GG'ITGCTGAA
.GTTGCTGAA
TTTCAGGAAA
TTTCAGGAA&
TTTCAGGAAA
TTGGCTGACC
TI'GGCTGACC
TTGGCTGACC
ACCAAGAATT
CTGTTGATAA
TATTATAGCT
CCTTGGAATT
CCTrGGAArr
CCTTGGAATT
TTCCTCATGG
TTCCTCATGG
TTCCTCATGG
AACGCTGTAA
AACGCTGTAA
AACGCTGTAA
TTATCGTGCG
ATATCGTGCT
.TATCGTGC.
CTGGCAAAGA
CTGGTAAAGA
CTGG .A&AGA
CITAAAAAAT
CTTAAGAAAC
CTTAA.AAA.
TGGTGTAGAT
TGGTATCGAT
TGGT. T. GAT TT-GTT (SEQ
TTGTTTATGA
TTGTT
GATGAGATTA
AATTATTAAA
TATACAACTA
AACCACTCAC
AACCACTCAC
AACCACTCAC
TCTTGCCATT
TCTTGCCATT
TCTTGCCATT
CAGGAAACGT
CAGGAAACGT
CAGGAILACGT
CAAGAAGACT
CAAGAGGACT
CAAGA. GACT
AGATTCAGAT
TGATTCAGAT
.GATTCAGAT
TGACTGATAG
TGACTGATAG
TGACTGATAG
AAAGCTCACC
AAAGCTCACC
AAAGCTCACC
ID NO:12)
TGATCAATGT
AACAG IWGTT
ACGCTCTGAT
TCAAAAGGTA
2450
TTGCTCATAA
TTGCTCATAA
TTGCTCATAA
2500
GCTATCGCTA
GCCATCGCTA
GC .ATCGCTA 2550
TAAATACC
TAAACGTACC
TAA.. TACC 2600
ACGCTGAAAT
ACGCTGAAAT
ACGCTGAAAT
2650
GAAAAAGCGG
GAAAAAGCTG
GAAAAAGC. G 2700
TATTGATATT
CATTGATT
ATTGATATT
2750
TTGAACGTGA
TTGAACGTGA
TTGAACGTGA
2800
ACTCCTGCTA
2850
GTTAGATCAA
2900
GAATTCGTCA
2950
TAAATCAATT
WO 98/07867 adhedbl341 consensus adhedbl34l consensus adhedbl341 consensus PCT/DK97/00336 2951 3000 TCGATATAGG CTCTTTTCAC TCCATTGATT TATGCAITTC TATAAAAATC 3001 3050 AATAATTAAT TAGCGATAGA AGTCGAGTTC ATGCATGCTA ATAATGAAAT 3051 3100 TGTTTTAAAT TCTGGTTTTT CTTTATGTTC TTTGCGAACA TCTTTCACAG 3101 3150 TTTCTTTGTT CATGAAAATT CCTCCTTATT ATGGTACTAT T=TGAGCCCA 3151 3193 AATAGTTATA TAAGAATCCT AAACTTCGGA TATCTTATCA AAG (SEQ ID NO:13) adhedbl34l consensus adhedbl34l consensus 8. Obtaining and sequencing the entire achE locus from L.
lactis strain MG1363 Inverse PCR was carried out on digested and religated chromosomal DNA of strain MG1363, using primers adhE-146 and adhE-MG5 (see Fig. A PCR fragment was obtained which in addition to the above fragment of the MG1363 adhE sequence comprised an about 2.9 kb sequence upstream of that fragment including the 5'-end of the adhE coding sequence and and open reading frame, designated orfB showing a high homology with the corresponding open reading frame from strain DB1341.
The entire sequence of the adhE locus of Lactococcus lactis strain MG1363 is shown in Table 1.9 below.
Table 1.9. The adhE locus of strain MG1363 1 TTTGGTGACCGAGTGACACCAGCTrCAAGAAG TTGTTTCArGAAATA 51 AcTGAcATGTrAATGTCTCCTTTTAAAATAGTTTTTCCTCTrrCATcTGT 100 101 CATCCGCAGCCGCAATACTTGCGTACACTACGACTGTcGAGACGAAAT 150 151 GCGAGATGGTTGCATAGCAACTCTCTCATTATACATTGTAAGCTACT 200 201 TTGCAAGCATcTATTCATTATrCITATcAATATGAGTAAATGAAAG 250 251 CTATCCTACCCCCTTTCTTTTTATTCTGTTTFTATATCTCAATGTTGT 300 301 CTGACAAATTTAACGAATATTTTTGCCTATATAATcCCCATAAGGGAGAT 350 351 TTTTACATTTTCTAAGAATAAAATTAATATTGCTGAAAACGcT 400 401 TTTTTGTGATAAAATAATTATAGTAAATAAAATAGTTrGTGAGGAGAGAA 450 451 ATATGAAAGAAAAATCCT=TAGGCGGTTATACTAAACGTGTATCTAA 500 orfE M K E K I L L G G Y T K R V S K 501 GGCGTTTACAGTGTTCTATTAGATAGCAAGAAAGcTGAATTGTcGGCTT 550 GVY S V LL D SK KAE L S AL Sau3AI WO 98/07867 PCT/DK97/00336 56 551 AACTGAAGITGCAGCGGTTCAAAATCCAACTTATATCACTCTTGATCAAA 600 T E V AA V Q N P T Y I T L D Q K 601 AAGGGCACCTCTACAC'FrGTGCTGCTGATGGAAATGGTGGTGGAATTGCT 650 G H L Y T C A AD G N G G G I A 651 GCCTTTrGATTTCGATGGTCAAAATACAACTC-ACCTAGGGAATGTAACGAG 700 A F D F D G Q N T T H L G N V T S 701 TACTGGAGCCCCTTTGTGTTATGTGGCTGTTGATGAAGCACGTCAACTCG 750 T G A PL C Y V AV D EA R Q L V 751 TTTATGGTGCCAACTATCACTTGGGTGAAGTTCGTGTGTACAAAATrCAA 800 Y G ANY H L G EV RV Y K I Q 801 GCTGATGGTTCCCTTAGATTAACCGATACAGTTAAACATAATGGTTCTGG 850 A D G S L R L TD T VK H N G S G 851 CCCTCGACCTGAGCAAGCAAGTTCTCATGTCCATTACTCTGATTTAACTC 900 P R P E Q A S S H V H Y S D L T P 901 CAGATGGTCGTCTTGTTACTTGTGATrAGGTACAGATGAAGTGACTGTrr 950 D G R L V T C D L G T D E V T V 951 TACGATGTTATTGGTGAAGGTAAACTCAATATCGTTACGATTTATCGTGC 1000 Y D V I G E G KL N I V T I Y R A 1001 CGAAAAAGGAATGGGAGCTCGTCACATCAGCTTCCATCCTAATGGAAAAA 1050 E K G M G A R H I S F H P N G K I 1051 TTGCTTATCTCGTCGGAGAATTAAATTCAACTATTGAAGTTCTAAGCTAT 1100 A YL V G E L N S T IE V L S Y 1101 AATGAAGAAAAAGGACGATTCGCTCGTc=TCAAACAATCAGTACI=rACC 1150 N E E K G R F A R L Q T I S T L P 1151 TGAAGACTATCACGGAGCCAATGGAGTAGCTGCTATTCGAATTrCITCTG 1200 E D Y H G A N G V AA I R IS S D 1201 ATGGTAAGTTCCTCTATGCTTCTAATCGTGGGCACGACTCTTrTAGCAATT 1250 G K FL Y A S N R G H D S L A I 1251 TACAAGGTAAGTCCTCTCGGAACAAAATTAGAATCTATTGGTTGGACAAA 1300 Y K VS P L G T K LE S I G WT K 1301 GACTGAATATCATATTCCACGCGATTTFrAAIT=AATAAAACCGAAGATT 1350 T E Y H I P R D F N F N K T E D Y 1351 ATATCATTGTCGCTCATCAAGAATCTGATAATTAACTCTI-rTCTTGAGA 1400 I I VA H Q E S D N L T L F L R 1401 GATAAAAATACAGGGTCATTAACGTTAGAACAAAAAGACTrTACGCTCC 1450 D K N T G S L T L E Q K D F YA P 1451 TGAAATTACTTGTGTT=TACCTTTGTAAAAJACTAAACTTTAGTAAATCTT 1500 E I T C V L P L Stop (SEQ ID NO:29) 1501 GCTTTGTITTTTCACAAAGTTTTACTAAATCAGACAAAAAAATATTGCC 1550 1551 AAATCTTTAAAAGGATrGGCAATATTTTTrGTCTGAAACCCTrGCTTrAT 1600 1601 AAAGCGA=rCTAATTAGATTTFGTAAATTCATCAC.AAT 1650 1651 ATCGC'ITGACTTCT=AAAAAACTTGTAAACTATCACGTAAAAGAAA 1700 1701 GTGAATGGAATCACAAAGGAGAACGTACACATATGGCAACTAAAAA.AGCC 1750 adhE M A T K K A 1751 GCTCCAGCTGCAAAGAAAG'FrTTAAGCGCTGAAGAAAAAGCCGCAAAATT 1800 A P A A K K V L S A E E K AA K F Sau3AI 1801 CCAAGGAAGTGTCGCTTATACTGATCAATTAGTCAAAAAAGCTCAAGCTG 1850 Q G S VA YT D QL V KKA QA A 1851 CAGTTCTTAAATTGAAGGATACACACAAACTCAAGTTGATACTATTGTT 1900 V LK FER GY T QT Q VD T I V 1901 GCTGC2ATGGCTCTTGCAGCAAGCAAACATTCTCTGGAACTCGCTCACGA 1950 A A M A L A A S K H S L E L AH E 1951 AGCCGTTAATGAAACTGGCCGTGGAGTTGTTAGCAGTCAA 2000 A V NE T G R G VV E D K D T K N 2001 ACCATTTTGCTTCTGATCTGTTTATAATGCATCAATGATAACA 2050 H F A S E S V Y NA I K N D K T 2051 GTTGGCG7TATCGCTGAAAACAAAGTTGCTGGTTCTGTrGA4ATCGCAAG 2100 V G V IA E N K VA G S V E I A S 2101 CCCCCTTGGAGTACTTrGCTGGTATGTCCCACAACTAATCCAACATCAA 2150 P L G VL A G I V PT T N PT S T 2151 CAGCCATCT=TAAATCATTATTAACTGCAAAGACACGTAATGCTA~rGTC 2200 A I F K S L L T A K T R N A I V WO 98/07867 PCT/DK97/00336 57 2201 TTTGCCTTrCACCCACAAGCACAAAAATGCTCAAGCCATGCGGCAAAAAT 2250 F A FH PQ A Q K C S S H A AK I 2251 TGTTTATGATGCTGCGATTGAAGCTGGTGCACCTGAAGACTITATrCAAT 2300 V Y D AA I E A G A P ED F I Q W 2301 GGA'TGAAGTACCCAGTCTTGATATGACGACTGCITTGATTCAAAATAGA 2350 1 EV P S L D M T T A L I Q N R 2351 GGAATTGCTACAATTCTTGCAACTGGTGGTCCAGGTATGGTCAATGCCGC 2400 G I A T I L A T G G P G MV NA A 2401 GC'TAAGTCTGGTAATCCTTCACTTGGTGTAGGTGCTGGTAATGGTGC-AG 2450 L K S G N P S L GVG A GN GA V Sau3AI Sau3AI 2451 TTTATG ITGATGCAACTGCAAATATCGATCGTGCTGTTGAAGATCTTrTG 2500 Y V DA T AN I D RA VE D L L 2501 UiTCAAAACGTTrGATAACGGAATGATTrGTGCGACTGAAAACTCTGC 2550 L S K R F D N G M I C A T E N S A 2551 AGTTATTGATGCATCAATCTATGATGAATTTGTCGCTAAAATGCCAACGC 2600 V I DA SlIY D E F V A K M P T Q 2601 AAGGCGCTTATATGGTTCCTAAAAAAGATTACAAGGCAATTGAAAGFI 2650 GA Y MV P K K D Y K A I E S F 2651 GTTTTCGTTGAACGTGCTGGTGAAGGTTTTGGTGTAACTGGTCCTGTTGC 2700 V F VE R AG E G F G V T G P V A 2701 TGGTCGTTCTGGTCAATGGATTGCTGAACAAGCTGGTGITAACGTCCCTA 2750 G R S G Q W I A E QA G V NVP K 2751 AAGATAAAGATGTTCTTC~rrTTGAACTGATAAGAAAAATATTGGGGAA 2800 D K D V L L F E L D K K N I G E 2801 GCTCTTTCTrCTGAAAAACTTTCTCCTTGCTTTCAATCTACAAATCAGA 2850 A L S S E K L S P L L S I Y K S E 2851 AACACGTGAAGAAGGAATGAAATGTACGTAGCTTACTTGCTTACCAAG 2900 T R EE G I ElI V R S L L AY Q G 2901 GAGOTGGTCACAACGCTGCCA'ITCAAATCGGTGCAATGGACGACCCATTr 2950 A G H N AA I Q I G AM D D P F 2951 GTCAAAGAATACGGAA ITAAAGTCGAI4GCTTCTCGTATCCTCGTTAACCA 3000 V K E Y G I KV E A S RI L VN Q 3001 ACCTGACTCTATCGGTGGGGTCGGAGATATTTATACTGATGCAATGCGTC 3050 P D S I G G V G D I Y T D A M R P.
3051 CATCATTGACGCTCGGAACTGGTTCATGGGGGAAAAATTCACTTTCACAC 3100 S L T L G T G S W G K N S L S H Sau3AI 3101 AATTTGAGTACATCGATCTATTGAATGTTAAAACAGTGGCTAAACGTCG 3150 N L S TY D L L N V K T VA K R R 3151 TAATCGCCCTCAATGGGTTCGTTTGCCAAAAGAAATTACTACGAAAAAA 3200 N R P Q W V R L P K ElIY Y E K N 3201 ATGCAATTTCTTACITACAAGAATTGCCACACGTCCACAAAGCT17rCMAT 3250 AlI S Y L Q E L P H V H K A F I 3251 GTTGCCGACCCTGGTATGGTTAAATTCGGTrTTCGTTGATAAAGTTTTGGA 3300 VA D P G MV K F G F V D K V L E 3301 ACAACTTGCTATCCGCCCAACTCAAGTTGAAACAzAGCAT=rATGGCTCAG 3350 Q L AI R P T QV E T S I Y GS V 3351 TCCAACCTGACCCAACrI-rGAGTGAAGCAATTGCAATCGCTCGTCAAATG 3400 Q P D P T L SEHA I A I A R Q M 3401 AACCATTTTGAACCTGACACTGTCATCTGTCTTGGTGGTGGTTCTGCTCT 3450 N H F E P D T V I C L G G G SA L 3451 CGATGCTGGTAAGATTGGTCG TrGATTrATGAATATGATGCTCGTGGTG 3500 D A G K I G R L I Y E Y D A R G E Sau3AI 3501 AGGCTGACCTrCCGATGACGCAAGTTTGAAAGAGATCTTCCAAGAGTTA 3550 A D L S D D A S L K ElI F Q E L 3551 GCTCAAAAATTTGTTGATATrCGTAAACGTATTATCAAATTCTACCACCC 3600 A Q K F V D I RK R I IK F Y H P 3601 ACACAAAGCACAAATGGTTGCTATCCCTACTACTrCTGGTACTGGTTCTG 3650 H K A Q MV A I P T T S G T G S E 3651 AAGTGACTCCATTTGCGGTTATCACTGATGATGAAACTCACGTTAAATAT 3700 V T P F A V I T DD E T H V K Y WO 98/07867 PCT/DKO'IICLnttr; WO 8/0867PCTD7/00336 58 3701 CCACTrGCTGAcTATCAATTGACACCTCAAGTTGCcATrGTTGAcccTGA 3750 P LA DY Q L T P Q VA IV D P E 3751 GTTTGTTATGACTGTACCAAAACGTACTGTTTCTTGGTCTGGGATTGATG 3800 F VM TV P KR TV SW S G IDA 3801 CTATGTCACACGCGCTTGAATCITATGTTrCTGTCATGTCTTCTGACTAT 3850 MS HAL ES Y VS VMS SD Y 3851 ACAAAACCAATTTCACTTCAAGCCATCAAACTCATCTrTGAAJACTTGAC 3900 T K PIS L Q A IK LIFE N L T 3901 TGAGTCTrATCATTATGACCCAGCTCATCCAACCAAAGAAGGTCAAAAAG 3950 E S Y H Y D P A H P T K E G Q K A 3951 CTCGCGAAAACATGCACAATGCTGCAACACTCGCTGGTATGGCCTTCGCC 4000 R E N M H N A A T L A G M A F A 4001 AATGCTTTCCTTGGAATTAACCACTCACTTGCTCATAAAATTGCTGGTGA 4050 NAF L GIN H S LAH K IAG E 4051 ATTTGGGCTTCCTCATGGTCTTGCCATTGCTATCGCTATGCCACATGTCA 4100 F G L PH G LA IA I AM PH VI 4101 TTAAATTTAACGCTGTAACAGGAAACGTTAAATTTACCCCTTACCCACGT 4150 K F NAVT G NV K FTP Y PR 4151 TATGAAACTTATCGTGCGCAAGAAGAACGCTGAAAT T 4200 YET Y RA Q ED YAE IS R FM 4201 GGGATTTGCTGGCAAGAAGATTCAGATGAAAAGCGGTCAGCT'TGG 4250 G FAG K ED SD E KAVKA LV 4251 TrGCTGAACTAAAAATTGTGATAGTATTGATATTAACACCC 4300 A ELK K LT D SI DIN IT L 4301 TCAGGAAATGGTGTAGATAAAGCTCATCTTGAACGTGAGTGATAAATT 4350 S G N G VD KA H L ER E LD K L 4351 GGCTGACCTTGTACGATGACCAATGTACCTGCTAATCCACGTCAAC 4400 AD LVYD D Q CT PAN PR Q P 4401 CAAGAATTGATGAGAT-TAAACAACTCTrTGrAGACCAATATTAATATATT 4450 R I D E I K Q L L L D Q Y Stop (SEQ ID NO:31 4451 AATTATAGTATTTGGAACCGAACGATATCCATGCTCGCTAACCTGCTAAA 4500 4501 GCAGGAAGTCGCAATGGTACGTCAACCAAGAATTGATGAGATTAAACAAC 4550 Sau3AI 4551 TCTTGTTAGATCAATACTAATAATCTGTTGATAAAAATAATTAAAACGCT 4600 4601 CTGATGAATTCGTCAGAGCrTri-irATTATAGCTATACCTATCAAA 4650 4651 AGGTATAAATCAATTTCGATATAGGCTCITrrCACTCCATTGATTTATAT 4700 Sau3AI 4701 TTATATAAAAATCAATAAITAATTAGCGATAGAAGTGATCC 4741 (SEQ ID NOS:28/30) EXAMPLE 2 1. Construction of L. lactis DB1341 and MG1363 adhE mutant strains by gene inactivation Inactivation of the adhE gene of strain DB1341 was carried out by Campbell-like integration (Leenthous et al., 1991) of pSMA- 500 derivatives into the DB1341 chromosome. The adbE gene of strain DB1341 was inactivated at two different positions by cloning of PCR fragments (see Fig. 2) into the integration vector pSMA500 (Madsen et al., 1996). A 706 bp internal adhE fragment was amplified from the DB1341 chromosome using primer adhP1 (position 1069-1088 in Table 1.4) and primer adhP2 (posi- WO 98/07867 PCT/DK9700336 59 tion 1775-1756 in Table These primers contain a XhoI and a BamHI recognition site at the 5' end. The PCR fragment was digested with XhoI and BamHI followed by cloning into pSMA500.
The resulting plasmid, pSMAKAS4 (Fig. was introduced into E. coli MC1000 by electroporation (Sambrook et al., 1989).
Plasmid pSMAKAS4 was purified and subsequently introduced into strain DB1341 by electroporation (Holo and Nes 1989) and transformants were selected on SGM17 plates containing 1 Ag/ml erythromycin and 80 Ag/ml X-gal (Madsen et al., 1996).
Homologous integration leads to an adhE gene which is interrupted after amino acid residue Asp 543 About 100 blue transformants were obtained, indicating that a transcriptional fusion of the adhE gene to the lacLM reporter gene of pSMA500 had occurred. Eight blue transformants were restreaked and the integration point was verified by PCR analysis. One strain, DBKAS4, was selected for further studies.
Another integration further downstream in the adhE gene was constructed by a similar strategy. A 616 bp adhE fragment was amplified from the DB1341 chromosome using primer orf3Pl (position 2112-2138 in Table 1.4) and primer orf3P2 (position 2728- 2708 in Table The cloning of this fragment into pSMA500 resulted in plasmid pSMAKAS5 (Fig. Introduction of into DB1341 and subsequent integration into the adhE gene leads to an adhE gene, which is interrupted after amino acid residue Ile 861 About 400 blue-transformants were obtained, which again indicated that a transcriptional fusion of the adhE gene to the lacLM reporter gene of pSMA500 had occurred. Eight blue transformants were restreaked and the integration point was verified by PCR analysis. One strain, DBKAS5, was selected for further studies.
pSMAKAS4 and pSMAKAS5 were used also to inactivate the MG1363 adhE gene. One transformant from each transformation that turned blue on X-gal plates (MGKAS4 and MGKAS5), and therefore WO 98/07867 PCT/DK(97/00336 contained a translational fusion of the lacLM reporter gene of pSMA500 to the MG1363 adhE gene, was isolated for further studies.
A sample of Lactococcus lactis subspecies lactis biovar diacetylactis strains DBKAS4 and DBKAS5, respectively and of Lactococcus lactis subspecies lactis strains MGKAS4 and respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11084, DSM 11085, DSM 11081 and DSM 11082, respectively.
A further adhE mutant strain was obtained by PCR using MG1363 DNA as template and primers adhPl-XhoI (sequence GGTTGAACGTGCTGGTGAAGG-3';spanning position 2657-2676 in the MG1363 adhE sequence) (SEQ ID NO:32) and adhP2-BamHI (sequence 5'-TAGTAGGATCCGGGTCAGGTTGGACTGAGCC-3';spanning position 3363- 3344 in the MG1363 adhE sequence) (SEQ ID NO:33). A 700 bp fragment was digested with XhoI and BamHI, cloned into likewise digested pSMA500 and transformed into E. coli MC1000. The new construction, pSMAKAS14 was introduced into L. lactis MG1363 via electroporation. Integration led to disruption of the resident adhE gene and one transformant that turned blue on Xgal plates (integration results in transcriptional fusion to lacLM, a reporter gene) was selected for further analysis and was named MGKAS14. This integrant should express an AdhE protein truncated at position Asp 53 A sample of MGKAS14 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 10 July 1997 under the accession No. DSM 11654.
WO 98/07867 PCTDK97T/n0nl 61 2. Physiological characterization of MGKAS14 Physiological studies of MGKAS14 was carried by cultivating the strain in anaerobiosis in M17 medium supplemented with either glucose (GM17) or galactose (GalM17). The production under these conditions of the metabolites formate, acetaldehyde and pyruvate, respectively was measured and compared to corresponding measurement for the wild type strain, cultivated under similar conditions. In GM17 the production of formate in the mutant strain was reduced (4.86 in GM1363 vs. 1.67 in MGKAS14), the production of acetaldehyde was increased (0.52 in MG1363 vs. 0.67 in MGKAS14). No pyruvate was detected with any of the test strains. In the GalM17 medium, the production of formate was reduced substantially in the mutant strain (39.11 in GM1363 vs. 4.39 in MGKAS14) and that of acetaldehyde increased (0.67 in MG1363 vs. 1.12 in MGKAS14). None of the strains produced pyruvate.
EXAMPLE 3 Cloning of the L. lactis pfl gene The sequence of the pfl gene encoding pyruvate formate-lyase, a key enzyme in anaerobic metabolism, has only been reported in a few bacteria. DNA sequence homology between the different bacterial pfl genes is limited, making it difficult to clone this gene from other organisms (Table Recently, this gene has been cloned in Streptococcus mutans (Yamamoto et al., 1996). The S. mutans pfl gene encodes a 775 amino acid protein as deduced from the published DNA sequence.
WO 98/07867 PCT/DK97/00336 Table 3.1. Homology (DNA and protein level) of the L. lactis pfl with other bacterial pfl genes Homology to the L. lactis Pfl protein Pfl protein Identity Similarity DNA homology' Organism 759 aa 42.2% 73% 55.1% E. coli 769 aa 42.1% 76% 55.4% H. influenzae 740 aa NA NA 52.6 C. pasteurianum 775 aa NA NA 71.8% S. mufans aDNA homology through the L. lactis pfl sequence obtained.
NA: not submitted to the databases; NF: not found in database searches.
1. Construction of Lactococcus lactis XZAP genomic libraries XZAP genomic libraries of L. lactis strains DB1341 and MG1363 were constructed according to the manufacturer's instructions (Stratagene) using partially Sau3AI-digested chromosomal DNA (average size about 5 kb) cloned into X vector BamHI arms.
Average insert size was estimated to be 3 kb.
2. Screening of a XZAP genomic library of strain DB1341 with a S. mutans pfl probe A 1 kb EcoRI fragment from the S. mutans pfl gene, encompassing positions 1190-2213 of the published S. mutans sequence (codons 298-639 of the pfl gene) was randomly labelled and used for screening the XZAP genomic library of strain DB1341 (approximately 2 x 105 pfu; Sambrook et al., 1989). Filters were washed at low stringency (2 x 30 min at room temperature in 5 x SSC, then 1 x 30 min at 65 0 C in 3 x SSC; 0.1 SDS), and two positive clones, pfll and pfl2 were identified.
WO 98/07867 PCT/DK97/00336 63 A sample of an E. coli strain transformed with clone pfll was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D- 38 124 Braunschweig, Germany on 25 July 1996 under the accession Nos DSM 11103.
3. Sequencing positive XZAP clones and identification of clones containing a pfl fragment Following in vivo excision (Stratagene) and plasmid DNA isolation, sequence analysis (ALF sequenator, Pharmacia) was carried out for pfll using T7 and T3 primers (Stratagene). Approximately 2.1 kb was sequenced from one end of clone pfll (from position 1342 in Table 3.2 below), and a truncated, uninterrupted ORF spanning 1.1 kb was found that showed significant homology to other pfl genes, both at the DNA and protein level (Tables 3.3 and A putative rho-independent transcription terminator (de Vos and Simons 1994) is located 26 bp downstream of the stop codon (positions 2468-2490 in Table 3.2).
Table 3.2. Sequence of the L. lactis DB1341 pfl gene The coding sequence starts at position 80 and ends at position 2443. A putative ribosome binding site is shown in bold, double underline (positions 65-71). A putative rho-independent transcriptional terminator (de Vos and Simons 1994) is found at positions 2468-2490 and is shown in bold, underline (stem) or dotted underline (loop).
E
C
0
R
I
GAATTCTGTTTGCTATTCTCAAACTGTATGATATAATGAAGTTGTAATTT
1
GAAACAGAAAGAACAAAGGGATTTCAAAATGAAAACCGAAGTTACGGAA
51 100 MetLysThrGluValThrGlu WO 98/07867 PCT/DK97/00336 64
AATATCITTTGAACAAGCTTGGGATGGTI'-TAAAGGAACCAACTGGCGCGA
101 AsnI lePheGluGlnAlaTrpAspGlyPheLysGlyThrAslTrpArgAsp
TAAAGCAAGCGTTACTCGCTTTGTACAAGAAAACTACAAACCATATGATG
151 200 LysAlaSerValThrArgPheValGlnGluAsnTyrLysProTyrAspGly
GTGATGAAAGCTTTCTTGCTGGGCCAAAGAACGTACACTTAAAGTAAAG
201 250 AspGluSerPheLeuAlaGlyPromhrGluArgThrLeuLysValLys AAAATTATTGAAGATACAAAAAATCACTACGAAGAAGTAGGAI-FrCCCTT 251 LysIlel leGluAspThrLysAsnHisTyrGluGluValGlyPheProPhe CGATACTGACCGCGTAACCTCTA ITGATAAAATCCCTGCTGGATATATCG 301 350 AspThrAspArgValThrSerlleAspLysl leProAlaGlyTyrI leAsp
ATGCTAATGATAAAGAACTTGAACTCATCTATGGGATGCAAAATAGCGAA
351 AlaAsnAspLysGluLeuGluLeulleTyrGlyMetGlnAsnSerGlu CT=TCCGC'rrGAATTTCATGCCAAGAGGTGGACTTCGTG'rrGCTGAAAA 401 450 LeuPheArgLeuAsnPheMetProArgGlyGlyLeuArgValAlaGluLys
GATTTTGACAGAACACGGTCTCTCAGTTGACCCAGGCTTGCATGATGTTT
451 500 I leLeuThrGluHi sGlyLeuSerValAspProGlyLeuli AspValLeu TGTCACAAACAATGACTTCTGTAAATGATGGAATCT'rrCGTGCTTATACT 501 550 SerGlnThrMetThrSerValAsnAspGlyl lePheArgAlaTyrThr
TCAGCJATTCGTAAAGCACGTCATGCTCATACTGTAACAGG=~GCCAGA
551 600 SerAlal leArgLys.AlaArgHisAlaHi sThrValThrGlyLeuProAsp
TGC=TACTCTCGTGGACGTATCA'ITGGTGTCTATGCACGTCITGCCCTT=
601 650 AlaTyrSerAxgGlyArgIleI leGlyvalTyrAlaArgLeuAlaLeuTyr
ACGGTGCTGATTACCTTATGAAGGAAAAAGCAAAAGAATGGGATGCAATC
651 700 GlyklaAspTyrLeuMetLysGluLysAlaLysGluTrpAspAlalle
ACTGAAATTAACGAAGAAAACATTCGTCTTAAAGAAGAAATTAATATGCA
701 ThrGluIleAsnGluGluAsnI leArgLeuLysGluGlulleAsnimetGln
ATACCAAGCTGCAAGAAGTTGTAAACTIGGTGCCTTATATGGTCTTG
751 800 TyrGlnAlaLeuGlnGluValValAsnPheGlyAlaLeuTyrGlyLeuAsp
ATGTTTCACGTCCAGCTATGAACGTAAAAGAAGCAATCCAATGGGTTAAC
801 850 ValSerArgProAlaMetAsnValLysGluzaleGlnTrpValAsn
ATCGCTTATATGGCAGTATGTCGTGTCATTAATGGAGCTGCAACTTC!ACT
851 900 IleAlaTyrMetAlaValCysArgValI leAsnGlyAlaAlaThrSerLeu WO 98/07867 PCT/DK97/00336 TGGACGTGTTCCAATCGTTCTTGATATC I=rGCAGAACGTGACCTTGCTC 901 950 GlyArgValProlleValLeuAspllePheAlaGluA-rgAspLeuAlaArg GTGGAACATTThCTGAACAAGAAATTCAAGAATTGTTrGATGATTCG'rr 951 1000 GlyThrPheThrGluGlnGluIleGlnGluPheValAspAspPheVal TTGAAGCTTCGTACAATGAAA I=TGCTCGTGCAGCTGCTTATGATGAACT 1001 1050 LeuLysLeuArgThrMetLysPheAlaArgAlaAlaAlaryrAspGluLeu
TTATTCTGGTGACCCAACATTCATCACAACATCTATGGCTGGTATGGGTA
1051 1100 TyrSerGlyAspPromhrPhelleThrThrSerMetAlaGlyMetGlyAsn
ATGACGGACGTCACCGTGTCACTAAAATGGACTACCGTTTTTGA
1101 1150 AspGlyArgHi sArgValThrLysMetAspTyrArgPheLeuAsnThr CTTGATACAATCGGAAATGCTCCAGAAC CTGACAGTCcGG 1151 1200 Leu.AspThrIleGlyAsnAlaProGluProAsnLeuThrValLeuTrpAsp
TTCTAAACTTCC'ITACTCATTCAAACGTTATTC.AATGTCTATGAGCCACA
1201 1250 SerLysLeuProTyrSerPheLysArgTyrSerMetSerMetSerHisLys
AGCATTCTTCTATTCAATATGAAGGTGTTGAACAATGGCTAAAGATGGA
1251 1300 HisSerSerIleGlrlTyrGluGlyValGluThrMetAlaLysAspGly s a
U
3
A
1 1301 TyrGlyGluMetSerCysIleSerCysCysValSerProLeuAspProGlu
AAATGAAGAAGGACGTCATAACCTCCAATACTTTGGTGCGCGTGTAAACG
1351 1400 AsnGluGluGlyArgHisAsnLeuGlnTyrPheGlyAlaArgValAsnVal TCTTGAAAGCAATGTGACTGGTTGAACGGTG E1ATGATGACGTTAT 1401 1450 LeuLysAlaMetLeuThrGlyLeuAsnGlyGly'ryrAspAspValHi s
AAGATATAAGTATCGACATCGAACCTGTTCGTGACGAAATTCTTGA
1451 LysAspTyrLysValPheAspl leGluProValArgAspGluI leLeuAsp
CTATGATACAGTTATGGAAAAC=TGACAAATCTCTCGACTGGTTACTG
1501 1550 TyrAspmhrValMetGluAsnPheAspLysSerLeuAspTrpLeuThrAsp ATACTTATGTTGATGCATGAATATCATTCATFrACATGACTGATAJA.TAT 1551 1600 ThrTyrValAspAlaMetAsnl leIleHisTyr~etThrAspLysTyr WO 98/07867 PCT/DK97/00336 66
AACTATGAAGCAGTTCAAATGGCCTTC'GCCTACTAAAGTTCGTGCTAA
1601 1650 AsnTyrGluAlaValGlniMetAlaPheLeuProThrLysValArgAlaAsn
CATGGGATTTGGTATCTGTGGATTCGCAAATACAGTTGATTCAC=~CAG
1651 1700 MetGlyPheGlylleCysGlyPheAlaAsnThrValAspSerLeuSerAla
CAATTAAATATGCTAAAGTTAAAACATTGCGTGATGAAAATGGCTATATC
1701 1750 IleLysTyrAlaLysValLysThrLeuArgAspGluAsnGlyTyrl le s a
U
3
A
1
TACGATTACGAAGTAGAAGGTGATTTCCCTCGTTATGGTGAAGATGATGA
1751 1800 TyrAspTyrGluValGluGlyAspPhe ProArgTyrGlyGluAspAspAsp TCGTGCTGATGATA ITGCTAAACTTGTCATGAAAATGTACCATGAAAAAT 1801 1850 ArgAlaAspAsplleAlaLysLeuValMetLysMetTyrHisGluLysLeu TAGCTTCACACAAACTrITACAAAAATGCTGAAGCTACTG CAC'FrITG 1851 1900 AlaSerHisLysLeuTyrLysAsnAlaGluAlaThrValSerLeu~eu ACAATTACATCTAACGTTGCTTACTCTAAACAA6ACTGGTAA!ITCTCCAGT 1901 1950 ThrlleThrSerAsnValAlaTyrSerLysclnThrGlyAsnSerProVaI
ACATAAAGGAGTATTCCTCAATGAAGATGGTACAGTAAATAAATCTAAAC
1951 2000 HisLysGlyValPheLeuAsnGluAspGlyThrValAsnLysSerLysLeu
E
C
0
R
1
TTGAATTCTTCTCACCAGGTGCTAACCCATCTAATAAAGCTAAGGGTGGT
2001 2050 GluPhePheSerProGlyAlaAsnProSerAsnLysAlaLysGlyGly
E
c 0
R
I
TGG ITGCAAACCTTCGCTCATTGGCTAAGTTGGAATTCAAAGATGCAAA 2051 2100 TrpLeuGlrnAsnLeuArgSerLeuAlaLysLeuGluPheLysAspA1Asn
TGATGGTATTTCATTGACTACTCAAGTTTCACCTCGTGCACTTGGTAAA
2101 2150 AspGlyI leSerLeuThrThrGlnValSerProArgAlaLeuGlyLysThr
CTCGTGATGAACAAGTGGATAACTTGGTTCAAATTCTTGATGGATACTTC
2151 2200 ArgAspGluGlnvalAspAsnLeuValGlnI leLeuAspGlyTyrPhe WO 98/07867 PCT/DK97/00336 67 ACACCAGGTGCTTTGATTAATGGTACTGAATTTGCAGGTCAACACGTrAA 2201---- 2250 ThrProGlyAlaLeuIleAsnGlyThrGluPheAlaGlyGlnHisValAsn CTTGAACGTAATGGACCTTAAAGATGT ACGATAAAATCATGCGTGGTG 2300 LeuAsnValMetAspLeuLysAspValTyrAspLyslleMetArgGlyGlu
AAGATGTTATCGTTCGTATCTCTGGTTACTGTGTCAATACTAAATACCTC
2301---- 2350 AspVallleValArglleSerGlyTyrCysValAsnmrLysTyrLeu ACACCAGAACAAAAACAAGAATTAACTGAAGTGTCTTCCATGAAGrTCT 2351---- 2400 ThrProGluGlnLysGlnGluLeuThrGluArgValPheHisGluValLeu
TTCAAACGATGATGAAGAAGTAATGCATACTTCAAACATCTAATTCTTAA
2450 SerAsnAspAspGluGluValMetHisThrSerAsnleEnd (SEQ ID NO:16) AATITAATGAATATTCGGTCTGTCAGTrTTACTGACAGACTrrAC 2451 2500 GAAAA4TTAATCATAATAGTTAAAAACTATTGT AGTTTAAGG 2501 2550
TTAAATTTTATGCTAAAATAGATGAATAAAATGGTAATGGATTGACAG
2551---- 2600
GCGGAATTGCGATGGGAAATCAACGGTGGTTGAI'TTGATTCTGAGGG
2601---- 2650
TTATCAAGTGATTGATGCTGACAAAGTTGTCCGTCAATTACAAGAACCT
2651---- 2700 GGCGGAAAACTTACAAGGCAATATTAGAAACTTACGGTTTAGATrTAT 2701---- 2750 TGCTGACAATGGACAGTTAAATCGTGAAAAATTAGGAGCTTTAGTrr 2800
TCTGATTCAAAAGAGCGCGAGAAATTATCAAACTTACAAG
2801 2850
TCGTACAGAATTATATGATAGACGTGATGACTTATTAAAAAAAATGACTG
2900
ACAAGTCTGTCAGTAAAAATTTTGATTCAAAGAGTCAAGGAAAAAATCTG
2901---- 2950
TCAGTAAATAAGCCAATATTTATGGATATTCCGTTATTAATTGAATACAA
2951 3000
TTATACCGGATTGATGAAATATGGTTGGTCAGCTTACCTGAJAJATAC
3001---- 3050 AATAGAAAGACTGATGGCAAGAAATAzGTTTACGGAAGAAGAGCT1A 3051---- 3100
AAACGAATTTCTTCACAAATGCCATTGTCAGAAAAACAAAAAGTCGCTGA
3150 TGTCATTCTGGATAATTCTGGAAAGATTGAAGCACTAAA1JATCC 3151---- 3200 WO 98/07867 PCT/DK97/00336 68
AGCGAGAACTAGCTAGGAAAGAAGATGGTGAATCGCACGA
3201 3250 AAACAGTTAATTGGAAAGGAATTTATTTATAACATGGA TGGCTGCT'rrrI 3251 3300
TTGTAGGTTCATCAT=TTCACTCGTCATGCCT'ITCTCCCCTTGTATATTC
3301 3350
AAGGACTGGGTGAAGCGGTGGGAATTTGACTTTACTCAGGGTACTTT
3351 3400 TCTITTGCCAGCCTTA (SEQ ID Table 3.3. DNA homologry (PASTA. GCG Wisconsin Package Version 8. Genetics Computer Group) using the complete L. lactis DB1341 1ffl sequence shown in Table 3.2 Only the two highest scores xnutans and H. influerizae pfl genes, designated smpf 1 and hi3281, respectively) are shown.
(Nucleotide) PASTA of: dbpfl.seq from: 1 to: 3415 July 19, 1996 10:11 The best scores are: ifliti ilitfl opt..
empro:smpfl D50491 Streptococcus mutans pfl. 4335 5345 4996 empro:hi32812 /rev U32812 Hae. influenzae focA 652 1077 1299 empro:ecpfl X08035 E. coli pfl (pyruvate form. 429 735 1214 empro:cppflact X93463 C.pasteurianum pf 1 and act3O9 487 744 emnew:cefl3bl2 /rev Z70683 Caenorhabditis eleg. 94 168 128 dbpfl1. seq empro: smpf 1 ID SMPFL standard; DNA; PRO; 3067 BP.
AC D50491; NI g1129081 DT 23-DEC-1995 (Rel. 46, Created) DE Streptococcus mutans pil gene f or pyruvate formate-lyase SCORES Initi: 4335 Initn: 5345 Opt: 4996 71.8%6 identity in 2608 bp overlap 20 30 40 dbpf 1. GAATTCTGTTTGCTATTCTCAAACTGTATGATATAATGAJAGTTGTATTTGA IlIIIlII III I I till I smpf 1 AAGCAGTTCTTTCGCTTGTGL-TACCGGJVXACTGTATGATAGATATATCGTAAArGT 200 210 220 230 240 250 WO 98/07867 PCT/DK97/00336 69 70 80 dbpf 1. AACAGA AAGAACAAAGGAGA=rCAA-AATGA-AAAC CGAAGTrACG 111111 111II1I111II liii III III I Ii Ii smpf 1 AACAGATTAACTGTTACTAGAATAGAGGGGAACTCAATTATGGCAACTGTCAAJAJACTAAC 260 270 280 290 300 310 100 110 120 130 140 150 dbpf 1. GAA-ATATC=GAACAAGCTTGGGATGGTTTTAAAGGAACCAACTGGCGCGATA-GC I I11HIM liii Hill11 Ii Ill1tl1l1ll Hill, 11 11111 320 330 340 350 360 370 160 170 180 190 200 210 dbpf 1. AGGTCCCTGAAGAATCACAAGTGGTAAC~T I'll 11 Il11 ll 11111111 11111 H11 II 1 111111H SMPf 1 AGCATCTCGCTTTGTTCAAGACAACTACACTCCATATGACGGAGGCG
GTCT
380 390 400 410 420 430 220 230 240 250 260 270 dbpf 1. GCGGCAAACGAATAA
A
!1 11i 11 111IM Hll! I1I1HIII1 IllHilll 111 t1 440 450 460 470 480 490 280 290 300 310 320 330 dbpf 1. TACGAAGAGTAGGA=rCCCTCGATACTGACCGCGTAACCTCTATTGATAAATCCCT littlillIl I I l lll! 1111 1 Illtlill I HI smpf 1 TACGAAGAAACACGTTTTCCAATGGATAC---
-ACGTATTACATCTATTGCTGATATCCCA
500 510 520 530 540 550 340 350 360 370 380 390 dbpf 1. GCGAAACAGTAGTAGACTACCTTTGAGAATG smpf 1 GCAGG ITATAT TGACAAGGAAAATGAATTGATTTTTGGTATCCAAAACGAT 560 570 580 590 600 400 410 420 430 440 450 dbpf 1. GAACTM CCGCTTGAATTTCATGCCGAGGTGGACTTCGTGTTGCTGAAGA'TTG illlIl lIlt IIIIII 111 11 till IlIII Ill smpf 1 GAACTrrTAGCTGAACTrCATGCCAAAAGGCGGTATTCGCATGGCTGAAJAGCTT-TG 610 620 630 640 650 660 460 470 480 490 500 510 dbpf 1. ACAGAACACGGTCTCTCAGTTGACCCAGGCTTGCATGATGTTTGTCACAAACAATG
A
smpf 1 AAAGAACATGGTTATGAACC-AGACCCTGCCGTTCATGAAATCT-
-TTACCAAATATGCAA
670 680 690 700 710 720 520 530 540 550 560 570 dbPf 1. CTCGAAGTGACTCTCTAATCGATCTAGAGCT I 1l llIIIHIIllIllIIIItlllHIM Hll 11 IIIIII srnpf 1 CAACCGTTAATGATGGTATCTTTCGTGCTTACACCAAATTCGCCGTGACGTCATG 730 740 750 760 770 780 580 590 600 610 620 630 dbpf 1. CTCATACTGTAACAGG GCCAGATGCTTACTCTCGTGGACGTATCATTGGTGTCTATG SMpf 1 CCCACACTGTAACTGGTCTCCCAGATGC-ATACTCTCGCGGACGTATTATTGGAGTTTATG 790 800 810 820 830 840 WO 98/07867 WO 9807867PCT/DK97/0033S 640 650 660 670 680 690 dbpf 1. CACGTCTTGCCCTTTACGGTGCTGATTACCTTATGAAGGAAAAAGCAAAAGAATGGGATG II~II 1 1 IjIIIIIIIII11 111I1I111 I ii iii I Smnpf 1 CCCGTCTTGCTCTCTATGGTGCTGACTACTTGATGCAAGAAAAAGTGAACGACTGACT 850 860 870 880 890 900 700 710 720 730 740 750 dbpf 1. CAATCACTGAAATTAACGAAGAAAACATTCGTCTAAAGAAGAAATTAATATGATACC liii IIII1111I1III11I 111111111 Il1l1ll1lt 11 11111 1 smpf 1 CATTGCTGAAArGATGAAGAATCATTCGTCTCGTGAAGAAATCAATCTCATATC 910 920 930 940 950 960 760 770 780 790 800 810 dbpf 1. AAGCTTTGCAGAAGTTGTAACTTTGGTGCOTATATGGTCTTGATG=CACGTCCAG 1I I1 111111H 11 1111 1IIIIIIIIIIIIIII 11 1 sinpf 1 AGGCACTTGGCGAAGTAGTGCGGTTGGGTGATCTGTATGGTCGATGCGAPCCTG 970 980 990 1000 1010 1020 820 830 840 850 860 870 dbpf 1. CTATGAACGTAAAAGAAGCAATCCAATGGGTTAACATCGCTTATATGGCAGTATGTCGTG IiiI11 1iIHIIIII 111111111 111ii 1 I IIIlIlII lIi II1 smpf 1 CTATGAATGTTAAGGAAGCTATCCAATGGATTAATATCGCC~rrATGGCTGTCTGCCGCG 1030 1040 1050 1060 1070 1080 880 890 900 91.0 920 930 dbpf 1. TCAATGGAGCTGCACTTCACTTGGACGTGTCCAATCGTT.CTTGATATCTT.TGCAG 1 I1 11111 IIIIIIIIIII IIIIIIIIIII IIIIIIIIIIIIIIIHIIIIIIiii smpf 1 TATGGTCACTTTGAGTTCAACTCTGTTTTrA 1090 1100 1110 1120 1130 1140 940 950 960 970 980 990 dbpf 1. AAGGCTGTGGAC=CGAAGAATAGAT~rAGT smpf 1 AACGTGACCTTGCTCGTGGCACTTTCACTGAATCAGAAATCCAAGAATTCGTTGATGACT 1150 1160 1170 1180 1190 1200 1000 1010 1020 1030 1040 1050 dbpf 1. TCGTTTTGAAGCTTCGTACAATGAAATTGCTCGTGCAGCTGCTTATGATGC'j.TT 1210 1220 1230 1240 1250 1260 1060 1070 1080 1090 1100 1110 dbpf 1. CTGGTGACCCAACATTCATCACAACATCTATGGCTGGTATGGGTAATGACGGACGTCACC smpf 1 CAGGTGACCCAACATTTATTACGACTTCTATGGCTGGTATGGGAGCTGATGGACGTACC 1270 1280 1290 1300 1310 1320 1120 1130 1140 1150 1160 1170 dbpf 1. GTTATALTGCAC=TGAAATGTCACGATCCA lil III 1 111111 1IIII 11i 1111111 II II ifIIIIIIII smpf 1 TATAAGACACTTTAATCCTAATTTGATCCA 1330 1340 1350 1360 1370 1380 1180 1190 1200 1210 1220 1230 dbpf 1. AACACTAATCTGGTCAACTCTCCTCACTATA 11IIIIII1III1I 11II11111 H 1 I IIlIIIIIII I 11111 smpfl 1 CACTACTC GGCATATGCTCTTTC~-TATT 1390 1400 1410 1420 1430 1440 WO 98/07867 PCT/DK97/00336 71 1240 1250 1260 1270 1280 1290 dbpf 1. TGTCTATGAGCCACAAGCATCTCTATTCAATATGAAGGTGTTGAACAATGGTAAG H111111 iHIlllllIIl 111 H I iii III II III 11I IllI smpf 1 TGTCTATGAGCCACAAGCATTCTCAATCATATGAAGGTGTCACAACTATGGCTAAG 1450 1460 1470 1480 1490 1500 1300 1310 1320 1330 1340 1350 dbpf 1. ATGGATATGGCGAAATGTCATGTATCTCTTGTTGTGTCTCACCACTTGATCCAGAAAAATG II 1111Hill HIlllltlIIIlIIi H 11 H II 1 I 11 111111 III smpf 1 AAGGTTATGGTGAAATGTCATGTATCTCATGCTGTGTATCTCCGCTGATCCTGAACG 1510 1520 1530 1540 1550 1560 1360 1370 1380 1390 1400 1410 dbPf 1. AAGAAGGACGTCATAACCTCCAATACTTTGGTGCGCGTGTACGTTAGATGT 1570 1580 1590 1600 1610 1620 1420 1430 1440 1450 14G0 1470 dbpf 1. TGACTGGTTTGAACGGTGGTTATGATGACGTCATAAGATTATAGTATTCGACATCG 1630 1640 1650 1660 1670 1680 1480 1490 1500 1510 1520 1530 dbpf 1. AACGTGGCAATTGCAGAAATAGAACTGCATT Hiiill tIIIIlIill I 11111 lili 1 11lii 111111 smpf 1 AACCTATCCGTGATGAAGTCCTTGATTGAAACGGTTAGCTATrGAAAAGCAC 1690 1700 1710 1720 1730 1740 1540 1550 1560 1570 1580 1590 dbpf 1. TCGACTGGTTGACTGATACTTATGTTGATGCAATGAATATCA'ITCATTACATGACTGATA Ill ~tlllll~lI~lI I I illlillItIIll HIlltIIIItI smpf 1 TTGA TGGTTGACTGATACTTACGTGGACGCAATGAATATCATTCACTATATGACTGATA 1750 1760 1770 1780 1790 1800 1600 1610 1620 1630 1640 1650 dbpf 1. AATATAACTATGAAGCAGTTCATGGCCTTCTTGCCTACTAGTTCCTGCTACTGG smpf 1 AATATAACTATGAAGCCGTTCAAATGGCCTTCTTACCAACACGTGTTAJAAGCATATGG 1810 1820 1830 1840 1850 1860 1660 1670 1680 1690 1700 1710 dbpf 1. GA GGTATCTGTGGATTCGCATACAGTTGATTCACTTTCAGCAJATTAAATATGCTA snipfl 1 ATTTGGTATTTGCGGATTCTCTAATACAGTTGATTCATTATCAGCTATTAAATATGCTA 1870 1880 1890 1900 1910 1920 1720 1730 1740 1750 1760 1770 dbpf 1. AAGTAAAACATTGCGTGATGAAATGGCTATATCTACGATACGAGTAGAAGGTGATT 1930 1940 1950 1960 1970 1980 1780 1790 1800 1 810 1820 1830 dbpf 1. TCCCTCGTTATGGTGAAGATGATGATCGTGCTGATGATATTGCTAJA
CTTGTCATGAA
HittlIIi IiliIlttIIHIlii Hi 1illlit Ill I 111 smpf 1 TCCCTCGTTACGGAGAAGATGATGACCGTGTAGACTCA-ATCGCTGAATGGTTG
-CTTGAA
1990 2000 2010 2020 2030 2040 WO 98/07867 PCT/DK97/00336 72 1840 1850 1860 1870 1880 1890 dbpf 1. AATGTACCATGAAAAATTAGCTTCACACAAACTrrACAAAAATGCTGAAGCTACTGI-I-C 1 1 I II 1 11 11111 1IIIIIIIIII1I1 1II smpf 1 GCT TCCATACTCGTCTGCACGTCATAAACTGTACAAAGATrCCGAAGCTACTGTATC 2050 2060 2070 2080 2090 2100 1 900 1910 1920 1930 1940 1950 dbpf 1. ACTTTGC TCTTAGTGTATTACACTGATCCATC I I IHil1 ll 11111111 11i lI~IIIIlIIIijJIIiII smpf 1 GTAATATTTAGTCTTTTACACGTATCCATC 2110 2120 2130 2140 2150 2160 1960 1970 1980 1990 2000 2010 dbpf 1. TAAGGATCCAGAAGTCATATATTACTATCTT 2170 2180 2190 2200 2210 2220 2020 2030 2040 2050 2060 2070 dbpf 1. ACCAGGTGCTAACCCATCTAATAAAGCTAAGGGTGGTTGGTTGCAACCTTCGCTAT 2230 2240 2250 2260 2270 2280 2080 2090 2100 2110 2120 2130 dbpf 1. GGCTAAGTTGGATTCAAAGATGCAAATGATGGTArrTCATTGACTACTCAAGTTI'CC 1 11 1 11 11 1 MIMIIIIII 1111 IIIIIIIIIItiii smpf 1 GAAGAAACTTGACTTTGCTCACGCAAATGATGGTATCTCATTGACAACTCAAGTTTCACC 2290 2300 2310 2320 2330 2340 2140 2150 2160 2170 2180 2190 dbpf 1. TCGTGCACTrGGTAAAACTCGTGATGAACAAGTGGATAACTTGGTTCAAJATTCTTGATGG s~mpf~ 1 AAGCTCTTGGTAAGACATTCGATGAACAAGTTGCTAACTTAGTAACATTCTTGATGG 2350 2360 2370 2380 2390 2400 2200 2210 2220 2230 2240 2249 dbpf 1. ATACTTCACACCAGGTGCT TTGATTAATGGTACTGAATI'TGCAGGTCAACACGTTA srnpf 1 TTACTIGAAGGCGGCGGTCAACACGTTAACTTGAAC
-GTTATGGATCTTAAAGATGTTT
2410 2420 2430 2440 2450 2460 2250 2260 2270 2280 2290 2300 2309 dbpf 1. ACTTGAACGTAATGGACCTAAGATGTTTACGATAAATCATGCGTGGTGAAGATGrrA Smvfl 1 .LTG.C-AAGATCATGAATGGTGAAGATGTTATCGTTCGTATC---TCAGGTTACTGTGTTA 2470 2480 2490 2500 2510 2310 2320 2330 2340 2350 2360 2369 dbpf 1. TCGTTCGTATCTCTGGTTACTGTGTCAATACTAAATACCTCACACCAGAACAAJACAAJG smpf 1 ACACTAAIATACCTTACTAAGAACAAAAGACTGAAT- TGACACAACGTGrrTTTCCATG 2520 2530 2540 2550 2560 2570 2370 2380 2390 2400 2410 2420 dbpf 1. AA- TTAACTGAACGTGTCTTCC-A- -TGAAGTTC'FrCAAACGATGATGAAGAAGTAA-
-T
smpf 1 AAGTTCTCTCAATGGATGATGCAGCTACAGACTTGGTTAACA GTAAG GAAC 2580 2590 2600 2610 2620 2630 WO 98/07867 PCT/DK97/00336 73 2430 2440 2450 2460 2470 dbpf 1. GCATA -CTTCAAACATCTAATrCTTAAAA TTrAATGAATATTCGG -TCTGT smptl G-irAG=rAAAAGACCTCACTCATAAAAGTGAGGTCTTTACTTTGCTTTCGGGTACGAT 2640 2650 2660 2670 2680 2690 2480 2490 2500 2510 2520 2530 dbpf 1. CAGTTTTACTGACAGACTTTTTTTTACGAAAAAATTAATCATAAT -AGTTAAMAACTA'Fr I I I I I I i l I I I H l 1 1 1 1 smpf 1 CA -AAGCAGTGAGAGCTTTTATATTCTAAAAACTCA- -CAAATTCAGAAA1AAACAGT 2700 2710 2720 2730 2740 2750 2540 2550 2560 2570 2580 2590 dbpfl 1 ATTAGAGTATTTTCAAAAAGAGAATGATG 11 111 H lJil I Hill l Jill11 IiIii Hill1 Smpf 1 C ITGTGATTT -AGCTTTTA-GCTAC2 ATAATATTATGAAAAT..--TAATTAT 2760 2770 2780 2790 2800 2600 2610 2620 2630 2640 2650 dbpf 1. ATTGACAGGCGGAATTGCGATGGGATCAACGGTGGTTGATTGATCTGAGGGTr.I smpf 1 CCAAATTCTCCTTTGAGATAATAGCCTGCT 2810 2820 2830 2840 2850 2860 dbpf corresponding to nucleotides 1-2653 of SEQ ID smpfl: SEQ ID NO: 17 dbpfl seq /rev empro :hi32 812 ID H132812 standard; DNA; PRO; 10817 BP.
AC U32812; L42023; NI g1222092 DT 09-AUG-1995 (Rel. 44, Created) DE Haemophilus influenzae focA, pflA, pflB, rspB, yaaJ, yajF', yeiG SCORES Initi: 652 Initn: 1077 Opt: 1299 55.4t identity in 1961 bp overlap 1979 1969 1959 1949 1939 1929 1920 dbpf 1. CATCTTCATTGAGGATACTCCTTATGTACTGGAGATTACCAGTGTTAGAGTAAG l 11 11111 Jl l III III hi 3281 GTCCGAATGGTGCACCAGCACACGACCATCGGGGTG'I
ACCCGTCTACCATA
2730 2740 2750 2760 2770 2780 1919 1909 1899 1889 1879 1869 1860 dbpf 1. CACTAAGATGCAATAACGACTACTTTTAGTG I111111 Hll11H1 1 Jil11 11111 Jill I lit h~t 11 III hi 3281 CTCTAAGATGTAAAATGGAGATCTGGTATTA 2790 2800 2810 2820 2830 2840 1859 1849 1839 1829 1819 1809 dbpf 1. GTGAAGCTAATTT -TTCATGGTACACATGACAAGTTTAGCATATCTCAGCACGA 11 111111 1 ill 11 I I lit Hl H IM1 Jill hi3 281 GTTTTTGAATTTTCTrCATAAAAC
-GTTCAACTAAGTCACAAGCGATGTCATACACGG
2850 2860 2870 2880 2890 2900 1799 1789 1779 1769 1759 dbpf 1. TCATCATCTTCACCATAACGAGGGAATCACCTTCTACTTCGTAATCG-----------
TA
111 JI 11111I 111 HhI IIll t Il I11I hi 3281 TTATCATTGTTACCATATTGTGGATATTCACCTTCGATAAGTCGATTGCTACGTTA 2910 2920 2930 2940 2950 2960 WO 98/07867 WO 9807867PCT/DK97A0b33ti 74 1750 1749 1739 1729 1719 dbpf 1. ATATAGCCA7=TCAT------------- CACGCAATGTTAACTTrAGCA 2970 2980 2990 3000 3010 3020 1709 1699 1689 1679 1669 1659 dbpf 1. TAT AATTGCTGAAAGTGAATCAACTGTATTTGCGAATCCACAGATACCATCCCATG hi3 281 TArGATTGCTGAAAGTGAGTCAGCCGCAACAGAAGACCTGCGATACAAGCCATA 3030 3040 3050 3060 3070 3080 1649 1639 1629 1619 1609 1599 dbpf 1. TTGAGA=GAGAGAGCTTGATCTAATAATAC II~1 Hii t IM i I H t i 1 lii 11 3090 3100 3110 3120 3130 3140 1589 1579 1569 1559 1549 1539 dbpf 1. GTCATGTAATGAATGATATTCATTGCATCCATAGTATAGTCACGTCGAGAGAT 111 illh III 1 1 ill li,1 1 1 HIMI! IttII hi3 281 TGAAATGTAGTAGCGCrAATT
GCACACAAA
3150 3160 3170 3180 3190 3200 1529 1519 1509 1499 1489 1479 dbpf 1. TTGTCAAAG TCCATAACTGTATCATAGTCAAGAT=CGTCAGCATTC
-GAT
hi 3281 CTTrAAGGCTATTTGATTAATCTATATGGAT 3210 3220 3230 3240 3250 3260 1469 1459 1449 1439 1429 1419 dbpf 1. GTGAA=TACTAGAGCTCTACCGTAACGCAA ill I I 11 11 1Hill IllHillI I lilt hi 3281 TTGACATGAACAAT
CTCAACCGTATCTTAA
3270 3280 3290 3300 3310 1409 1399 1389 1379 1369 1359 dbpf 1. TGTTAGCTTrAGGACAGTTGAGTTAGCTCTAr H11111 I till till Hill ill iii! I 1i 1 I tilt11 hi3 281 TGTT rCGCTAAGTTTrCACGTGCACCGAAGATTGCATTGI.AC CCACAATCAT 3320 3330 3340 3350 3360 3370 1349 1339 1329 1319 1309 1299 dbpf 1. TTTGTAGGTAAAACAAAAAGCTTGCTTCT HIM1111I Ihitt ll I Ill I Ilit
TGGTGATAACAACATGCGATTGCGTAGTCATCGTGGAGTCGG
3380 3390 3400 3410 3420 1289 1279 1269 1259 1249 1239 dbpf 1. AGCCATTG TCAACACCTTCATATTGAATAGAGA-ATGCTGGC- CATAGACATr hi 3281 ACCTAACT~~-LiGATAATAGGTTATGTC=
CC
3430 3440 3450 3460 3470 3480 1229 1219 1209 1199 1189 1179 dbpf 1. GATAG'rATATAGATTGACCAGATTAG
GTC
ii ilitiili I Ilt 1 II11 1 tittl Mih II Ill hi 3281 GA--ACTTAG
AGTATTCGCAAAAGTAGTGCC
3490 3500 3510 3520 3530 3540 WO 98/07867 WO 9807867PCT/DK97fln336 1169 1159 1149 1139 1129 1119 dbpf 1. GGAG CATTTCCGATrGTATCAAGTGTGTTCAAGAAACGGTAGTCCA TrAGTGACACGG, hi3 281 GGAGAAGTACCCATGTTGTAAAGGGTGTGTAAATACGGAATGTATTTrGG.JTACTAAT 3550 3560 3570 3580 3590 3600 1109 1099 1089 1079 1069 1059 dbpf 1. TGACGTCCGTCATTACCCATACCAGCCATAGATGGTGATGATGTGGGTCACAGAA hi 3281 GTACGACCATCTAAACCCATACCTGCGATGGCAGTGCCCAATGGTCACr-GAG 3610 3620 3630 3640 3650 3660 1049 1039 1029 1019 1009 999 dbpf 1. TAATc~-TACGTCCACATTATTCAGTCAAGA I1 1111111 111 11 111 11 1 IIII1 II 11111ll hi 3281 AAATGTGATAGGAGAGAAGACTCAGTCT~TA 3670 3680 3690 3700 3710 3720 989 979 969 959 949 939 dbpf 1. TCATCAACAATTCTTGAATTTCTGTTCAGTAATGTCCACGAGAGGTCACGTTCT hi 3281 TGGTCAACTAATTCTTGCGCTTCAGTTTCAGTAATTTCCTGCAATCACGTTCG 3730 3740 3750 3760 3770 3780 929 919 909 899 889 879 dbpf 1. GCAAAGATATCAAGAACGATTGGAACACGTCCAAGTGAAGTTGCAGCTCCA
-TTAATGAC
hi 3281 ATGTAACGTAATAAGGTTGCGGTACGACCGAATGACATTGCA
CCATTGTGAT
3790 3800 3810 3820 3830 3840 869 859 849 839 829 819 dbpf 1. ACGACATACTGCCATATAAGCGATGTTAACCCATTGGATTGCTTCTrTACGTTCATAGC I I11 IIIIHIM 11 HiIMII HiIIIIIII 1I I1II1 hi 3281 TTTA- TTGCAGCAAGATAAGCAAAGTACATCCATGATGGCTTCTTGAGAAGTTGC 3850 3860 3870 3880 3890 3900 809 799 789 779 769 759 dbpf 1. TGGACGTGAAACATCAAGACCATATAAGGCACCAAGTTTACAACTTTGAGCTTG III 111111 11111 HIM 1 I 1 I 1I 1ii 1 111 hi 3281 TGGGTTAGAAATATCATAACCATAGCTTGCTGCCATTGTTTTTGACCTATGCACG 3910 3920 3930 3940 3950 3960 749 739 729 719 709 699 dbpf 1. GTTGAATA CTTTAAGAGL11.CTGTATCGGTG II Ill iIIltII I HIM11 I!11 Il 1 1 1 hi 3281 GTG TGTCTGCGA=rCTTC-ACGTAAACGAAmGTGCTTCAAGATT------ACGCC 3970 3980 3990 4000 4010 689 679 669 659 649 639 dbpf 1. ATCCCATTCTTTTGCTTTTCCTTCATAAG
-GTAATCAGCACCGTAAAGGGCAAGACGTG
111 1111 HIMIII IllI I II I I I hi 3281 ATC TTCTAAATCrrTT GTAAAGAAGAGAATTrGTGCGTATTTATCTTTCATTAAGA 4020 4030 4040 4050 4060 4070 629 619 609 599 589 579 dbpf 1. CATAGACACCATGATACGTCCACGAGAGTAAGCATCTGGCAAJACCTGTTACAGTATGAG 1I HIiIII I 11Hi11 IllIi II II iiI l hi 3281 AATCTACACCATAAAGTGCTACACGACGGTAGTCACCGATGATACGACCACGACCATAAG 4080 4090 4100 4110 4120 4130 WO 98/07867 PCT/DK97/00336 76 569 559 549 539 529 520 dbpfl2. GACGTGCTTTACGAATTGCTGAAGTATAAGCACGAAAGAT
-TCCATCATTTACA
ill 11 1 11 i I1H11 IllIIt 111111 1 1 hi 3281 CACGAGCATATCCAATTCGACTAAACGCTTA 4140 4150 4160 4170 4180 4190 519 509 499 489 479 dbpf 1. -GAAGTCA- -TTGTTTGTG------- ACAAACATCATG GCCTGGGT CTGAG i l l 1 1 i l l 1 1 1 1 1t 1 i l l i l l i I i l l hi 3281 CATCGAATACACCTTGGTT-ATGTGTTTTACGGTA- TCAGTGAAGATTTTTTCAC~rrT 4200 4210 4220 4230 4240 4250 469 459 449 439 429 419 dbpf 1. AGACCGTGTT CTGTCAAAATCTTTTCAGCAAC-ACGAAGTCCACCTCTTGGCATGAAT hi3 281 GGATCAAGTTCACGACCATAAJACTTTACAAGAAC
CTTCCACCATTTTG-ATACCAC
4260 4270 4280 4290 4300 409 399 389 379 369 359 dbpf 1. TCAAGCGGAAAAGTTCGCTATT--TGCATCCCATAGATGAGrrcAAGTTC=rATC hi 3281 CGAATGGCATAATGGCACGTTAGGTTCATCAG=GA. -AGACCAACGATTrTTrC 4310 4320 4330 4340 4350 4360 349 339 329 319 309 dbpf 1. ATTAGCATCGATATATCCAGCAGGGArTATCIJTAGAGGT -TACGCGGTCAGT- -ATC I 1I1 11 11111l I I illil 1 1111 11 ill hi 3281 TAATCTTTGTTAATGTAACCAGGTGCGTGAGAGATATGGTAGATGGTGTATGTTCATC 4370 4380 4390 4400 4410 4420 299 289 279 269 259 249 dbpf 1. GAAGGGAAATCCTACTTr- -CTTC -GTAGTGATTTTTTGTATCTTCAAT -AATTTTCrr 11 il I I 1 11 I 1 tllill 1 i1i hi 3281 AAATCTAATGGCGCGTGAGTACGGTTTCATTTIATACCTTCCATCACAGATTCCC-A 4430 4440 4450 4460 4470 4480 239 229 219 209 199 189 dbpf 1. TACTTAAGTGTACGTTCTGTTGGCCCAGC
-AAGAAAGCTTTCATCACCATCATATGGTT
1 11 ill ill i llI 11 11lillill 1111IHI illii 1 hi 3281 AAGC'FGGTTGTTGCTTCGGTTGGACCTGCTAAGAAG -AGTa-.TCGCC*LuTTCATAAGGGG 4490 4500 4510 4520 4530 4540 179 169 159 149 139 129 dbpf 1. TGTAGI CTTGTACAAAGCGAGTACGCTTGCTTATCGCGCCAGTTGGCCTT hi 3281 TAATTTGAAATAGAATAATTTGCACCACGA 4550 4560 4570 4580 4590 4600 119 109 99 89 79 69 dbpf 1. AACTCAGTGTAAAATTCGACTGTTCTTGATT Hill 11I1I11 11 1111l11I1111 liI ii1 hi 3281 AACCAGCCCACGC--CA- -ATTTTTGCATTTCATTAAGTTCTGACATAGTCJT-rC 4610 4620 4630 4640 4650 59 49 39 29 19 9 dbpf 1. CTTTCTCG CATAACTATAACTCGTGGAACA 1111111 It hi 3281 CTTTATATATATTTAGTT
GTAAACTGATCC
4660 4670 4680 4690 4700 4710 dbpf 1: complementary strand corresponding to nucleotides 1979-9 of SEQ ID NO:15; hi3281: SEQ ID NO:18 WO 98/07867 PCT/DK97/00336 77 Table 3.4. Protein homology (FASTA. GCG Wisconsin Package Version 8, Genetics Computer Group) using the complete Protein seauence derived from the L. lactis DBl34l fofl se rence shown in Table 3.2 Only alignment of the L. lactis Pf1 protein (dbpfl.pep) with the best four scores is shown.
The Pfl protein of Streptococcus mutans was not recorded in the searched protein databases.
(Peptide) FASTA of: dbpfl.pep from: 1 to: 788 July 19, 1996 09:11 The best scores are: ifiti initn opt..
sw:pflbecoli P09373 escherichia coli. formate ac. 560 1498 1502 sw:pfl3_ecoli P42632 escherichia coli. probable f. 558 1358 1487 sw:pflb-haein P43753 haemophilus influenzae. form. 545 1228 1521 sw-pfl_chire P37836 chlamydomonas reinhardtii. f. 163 259 306 sw:fasd ecoli P46000 escherichia coli. Outer memb. 53 113 sw:gtf2_strdo P27470 streptococcus downei (strept. 46 110 sw:frap rat P42346 rattus norvegicus (rat). fkb. 42 101 53 sw:frap human P42345 homo sapiens (human). fkbp-r. 42 101 53 dbpfl. pep sw:pflb-ecoli ID PFLBECOLI STANDARD; PRT; 759 AA.
AC P09373; DE FORMATE ACETYLTRANSFERASE 1 (EC 2.3.1.54) (PYRUVATE FORMATE- LYASE 1).
SCORES Initl: 560 Initn: 1498 Opt: 1502 42.2% identity in 732 aa overlap 20 30 40 50 59 dbpfl MKTEVTENIFEQAWDGFKGTNWRDKSVTRFVQENYKPYDGDESFLAGPTERTLKV
KKI
pflb-e SELNELATAWEGFTKGDWQNEVNVRDFIQKNYTPYEGDESFLAGATEATTTLWDKV 20 30 40 70 80 90 100 110 dbpfl IEDTK NHYEEVGFPFDTDRVTSIDKPAGYIDANDKELELIYGMQNSELFRLNFMPRGG 1::I 111 :1:111 1: II pflb e MEGVKLENRTHAPVDFDTAVASTITSHDAGYI
NKQLEKIVGLQTEAPLKRALIPFGG
70 80 90 100 110 120 130 140 150 160 170 dbpfl LRVAEKILTEHGLSVDPGLHTVSQTMTSVNDGI FAYTSAIPJJAJiAHTVTGLPDAYSR I ::II 1:1:1 :11::l ::11 11111: 1 pflb e IKMIEGSCKAYNRELDPMIKKIFTEYRKTHNQGVFDVYTPDILRCRKSGVTGLPDAYGR 120 130 140 150 160 170 180 190 200 210 220 230 WO 98/07867 WO 9807867PCT/DK97/00f336 78 dbpf 1. GRI IGVYARLALYGAflYLMEKAKEWDAI
-TEIN--EENIRLKEEINMQYQ.ALQEVV
pf lb-e GRIIGYRAYIY KAFSQDLNVLQILEIEHAGM 180 190 200 210 220 230 240 250 260 270 280 290 dbpf 1. NFGALYGLD VS RPAMKEAIQWVNTAYMAVCRVINGAATSLGRVPIVIJDI
PAERDAG
pf lb-e EMAAKYGYDISGPATNAQEAIQWTYFGYLAAVKSQNGAAMSFGRTSTFLDVYIERLKAG 240 250 260 2.70 280 290 300 310 320 330 340 350 dbpf 1. TFEEQFDFLLTKAAADEYGPFTSAMNGHVK~ pflb-e KITEQEAQEMVDHLVMKLRMVRFLRTPEYDELFSGDPIWATES IGGMGLDGRTLVTrg4SF 300 310 320 330 340 350 360 370 380 390 400 410 dbpf 1. RFLNTLDTIGNAPEPNLTLWDSKLPYSFKRYSMMSHKHJSS
IQYEGVETMAKDGYGEMS
pf lb-e RFLNTLYTMGPSPEPNMTILWSEKLPLNFKCFAAKVS
IDTSSLQYENDDLMPDFNNDDY
360 370 380 390 400 410 420 430 440 450 460 470 dbpf 1. CICVPDEEGHLYGRNLAMGNGDViDKFIPRE 1:11111: :1:1111:1: 1:11 ::111 1: 1 pf lb-e AIACCVSPMIVGKQ--MQFFGARNLATVYAINGGVDEKLMQVGPKSEPIKGDV 420 430 440 450 460 480 490 500 510 520 530 dbpf 1. LDDVEFKLWTTVANIYMDYYAQALTVAMFIG pf lb-e LNDVEMHMWAQIANIYMDYYALAHRVRMCIG 470 480 490 500 510 520 540 550 560 570 580 590 dbpf 1. ANVDSLSAIKYAKVKrLRfENGYIYDYEVEGDFPR GEDRDDDPADIA1KJYHE pf lb-e SVAAfSLSAIKYAKPIRDEDGLAIDFEIEGEYPQFGNNDPRVDDLAVDLVJERFMKKIQ 530 540 550 560 570 580 600 610 620 630 640 650 dbpf 1. SHKLYKNAEATVSLLTITSNVAYSKQTGNSPVHKGVFlNqEDGNSKLEFFSPGNPSN pf lb-e KIHTYRfAIPTQSVTITSNVVYGKKTGNTP---------DG
-RRAGAPFGPGANPMHG
590 600 610 620 630 660 670 680 690 700 710 dbpf 1. KAGWQLSALFDNGSTQVPAGTDQDLQLGFPA pflbe RDQKGAVASLTSV.AKLPFAYAKDGISYTFS IVPNALGKDEVpRKNLAGbGYFHHEAS~ 640 650 660 670 680 690 720 730 740 750 760 770 dbpf 1. INGTEFAGQVNLNDLKVYDKIRGEDVIVRISGYVTKLTPEQKQELTERVFHE pflb-e IEGGQHLNVNVMNREMLLDAMENPEKYPQLTIRVSGYAVRFNSLTKEQQQDVITRTFTQS 700 710 720 730 740 750 WO 98/07867 WO 9807867PCT/DK97/fi0336 79 dbpfl: corresponds to amino acid residues 1-772 of SEQ ID NO:16; pf lb-e: corresponds to amino acid residues of SEQ ID NO:14 dbpfl1.pep sw:pfl3_ecoli ID PFL3_-ECOLI STANDARD; PRT; 746 AA.
AC P42632; DE PROBABLE FORMATE ACETYLTRANSFERASE 3 (EC 2.3.1.54) (PYRUVATE FOR.- MATE- SCORES Initi: 558 Initn: 1358 Opt: 1487 39.8W identity in 741 aa overlap 20 30. 40 dbpf 1. MKTEVTENI FEQAWDGFKGTNWRDKASVTRFVQENYKPYDGDESFLAGPTERTLKV-K pf 13_e MVDIDTSDKLYADAWLGFKGTDWKNEINVRDFIQHNYTPYEGDESFAATPATTELWE 10 20 30 40 50 70 80 90 100 110 dbpf 1. KIIEDTK-NHYEEVGFPFDTDRVSIDKIPAGYIDADELELIYGMQNSELFPLNFMR 1: I: :l 1 11 1 11 1 I pf 13_e KVDEGIRIENATHAPVDFDTNIATTITAHDAGYI---
NQPLEKIVGLQTDAPLKRALHPF
70 80 90 100 110 120 130 140 150 160 170 dbpf 1. GGLRVAEKILTEHGLSVDPGLHDVLSQTM'rSVNDGI
FRAYTSAIRKARHAHTVTGLPDAY
pf 13_e GGINMIKSSFHAYGREDSEFEYLFTDLRTHNQGVFDVYSPDMLRCPJCSGVTGLPDGY 120 130 140 150 160 170 180 190 200 210 220 230 dbpf 1. SRGRI IGVYARILYGADYLMKEKACEWDAI-------
TEINEENIRLKEEIMQYQALQE
pf 13_e GRGRI IGDYRRVALYGI SYLVRERELQFADLQSRLEKGEDLEATIRLREELAEHP4JALJQ 180 190 200 210 220 230 240 250 260 270 280 290 dbpf 1. VVNFGALYGLDVSRPVKAIQWIAYMAVCRVINGTSLGRVPIIFARDL pfl3_e IQEMAAKYGFDISRPAQNAQEAVQWLYFAYLAVKSQNGGMSRTASFLDIYIERDFK 240 250 260 270 280 290 300 310 320 330 340 350 dbpf 1. RGTFTEQEIQEFVDDFKRTFAAYDELYSGDPTFISMAGMG
I-RTKM
300 310 320 330 340 350 360 370 380 390 400 410 dbpf 1. DYFNLTGAENTLDKPSFRSSSMSQEVTAGG pf 13_e SFYHLTGAENTLSEPAFKAQSVSLYNDMTFS 360 370 380- 390 400 410 420 430 440 450 460 470 dbpf 1. MSCISCCVSPrDPENEEGRH NLQYFGAVVLK TGLNGGYDHDYKVFDIEPVRD pf 13_e DYAIACCVSPMVIGKQ--MQFFGARANLAKTLLYAINGGVDEKLKIQVGPKTJPM 420 430 440 450 460 470 WO 98/07867 WO 9807867PCT/DK97/00336 480 490 500 510 520 530 dbpf 1. EIDDVEFKLWTTVANIHMDYYAQALTVAMFI pf 13_e DVLDYDKVM~DSLDHFDWLAVQYISALNI
IHYNHDKYSYEASLMALHDPDVYRTACGIA
480 490 500 510 520 530 540 550 560 570 580 590 dbpf 1. GFANTVDSLSAIKYA1CKTLRDENGYIYDYEVEGDFPRYGERDDAKLDIKYTEK pf 13_e GLSVATDSLSAIKYARVCPIRDEN~GLAVDFEIDGEYPQYGNNDERVDS
IACDLVERFMYK
540 550 560 570 580 590 600 610 620 630 640 650 dbpf 1. LASHKLYKNAEATVSLLTITSNVAYSKQTGNS
PVHKGVFLNEDGTNKSKLEFFSPGNP
pf 13_e IKALPTYRNAVPTQSILTITSNVVYGQKTGNTP
DG--RRAGTPFAPGANPM
600 610 620 630 640 660 670 680 690 700 710 dbpf 1. SNAGWQLSALFDNGSTTVPAGTDQDLQLGFP pf 13_e HGRDRKGAVASLTSVAKIJPFTYAKDGISYTFS
IVPAALGKEDPVRKTNLVGLLDGYFHHE
650 660 670 680 690 700 720 730 740 750 760 770 dbpf 1. ALIGTEFAGQHVNLNVMDIKDVYDKIMRGEDVIVRI
SGYCVNTKYLTPEQKQELTERVF
I 1 1 pf 13_e ADVEGGQHLN~VNVMNRE~LLDAIEHPEKYPNLTIRVSGYACASTH 710 720 730 740 dbplf: coreresponds to amino acid residues 1-770 of SEQ ID INO:16; pfll3_e: SEQ ID lNO:19 dbpfl1.pep sw:pf lb-haein ID PFLB_-HAEIN STANDARD; PRT; 769 AA.
AC P4375 3; DE FORMATE ACETYLTRANSFERASE (EC 2.3.1.54) (PYRtJVATE FORM1ATE- LYASE) SCORES Initi: 545 Initn: 1228 Opt: 1521 42.1t identity in 781 aa. overlap -20 30 40 50 59 dbpf 1. MKTEVTEIIFEQAWDGFKGTNWRDKASVTRFVQENYKPYDGDESFLAGPTERTLIKV-KKI pf lb-h SELNEMQKLAWAGFAGGDWQENNVRFIQKNYTPYEGDDSFLAGPTEATTLWSV 10 20 30 40 70 80 90 100 110 dbpf 1. 1EDTK-NHYEEVGFPFDTDRVTSIDKIPAGYIDMKLELIYGMQNSELFRNFMPRCC pf lb-h MEGIKIENRTHAPLDFDEHTPSTIISHAPGYI
NKDLEKIVGLQTDEPLKRAIMPFGG
60 70 80 90 100 110 120 130 140 150 160 170 dbpf 1. LRVAEKILTEHGLSVDPGLHDVJSQTTSVNDGIFAYTSAIRRHHTVTLPDAYSR pf lb-h I1KdVEGSCKVYGRELDPKVKKcIFTEYRKTHNQGVFDVYTPDILRCRKSGVLTGLPDAYGR s0 120 130 140 150 160 170 WO 98/07867 WO 9807867PCT/DK97/00336 81 180 190 200 210 220 230 dbpf 1. GRI IGVYARLAIJYGADYLCEKACEWDAI--TE IN -EENIRIKEEINMQYQLQEWV pf lb-h GRIIGYRAYVFMDYAFS
DEGNETILEIEHAGL
180 190 200 210 220 230 240 250 260 270 280 290 dbpf 1. NFAYLVRANKAQVIYAVRIGASGVILIAPLR :::1I11I:I:11 1::j1iI: :II:I: 111 111 pflb-h QMAASYGYDI SNPATAQAIQWMYFAYLAIKSQNGMFGRTATFIDYIEDLKG 240 250 260 270 280 290 300 310 320 330 340 350 dbpf 1. TFEEQFDFLLTKAAADEYGPFTSAMNGHVK~ pf lbh KITETEAQELVDHLVKLRRFLRTPEYDQLFSGDPWATETIAGMGLDGRTLTEF 300 310 320 330 340 350 360 370 380 390 400 410 dbpf 1. RFLNTLDTIGNAPEPNLTLWDSKPYSFYSSMSHKHSIQYEGVETOMGYGEMS pf lb-h RIHLNGSPPLIWELPNKF VIDTSSVQYENDDLMjRPDFNDDY 360 370 380 390 400 410 420 430 440 450 460 470 dbpf 1. CICVPDEEGHLYGRNLKMGNGDVKYFIPRE pf lb-h AIACCVSPMIVGKQ--MQFFGARANAAXTLLYAINGGIDEKLGMQVGPKTAPITDEV 420 430 440 450 460 480 490 500 510 520 530 dbpf 1. LDYDTVMENFDKSLDWLTDTYVDA~TI
IHYMTDKYNYEAVQMAFLPTKVRANMGFGICGF
pf lb_h LDDVTMSMWAQVANIYMDYYALAHRVRMCIG 470 480 490 500 510 520 540 550 560 570 580 dbpf 1. ANTVDSLSAIKYAKVKTLR
DENGYI-------YDYEVEGDFPRYGEDDDRADDI~A,
pf lb-h SVASSIYKKVGIDDNVANADREEPYNDRDIC 530 540 550 560 570 580 590 600 610 620 630 640 dbpf 1. VMtYELSKYNETSLISVYKTNPHGFNDTNSL pf lb-h LVRMKQLTRAPQSLISVYKTNP---------DGRAGAP- 590 600 610 620 630 650 660 670 680 690 700 dbpf 1. FFSPGANPSN -KAGW LSALEKADILTVPRLKREVN pf lb-h -FGGNMGDKAALTVKPAADIYFSVNLKAARNA 640 650 660 670 680 690 710 720 730 740 750 760 dbpf 1. ILGPPAIGEAQVLVMKV-KMGDIRSYVTYTE pf lb-h LMDGYFHHEATVEGGQHLNVNV-
LNREMIJLDAMNPDKYPQLTIRVSGYAVRFNSLTKEQ
700 710 720 730 740 750 WO 98/07867 PCTIDK97/00336 82 770 780 dbpf 1. KQELTERVFHEVLSNqDDEEVMHTSNIX (SEQ ID NO:16) :1 1~ I pflbh QQDVITRTFTESM (SEQ ID 760 dbpfl1.pep sw:pf 1_chire ID PFLCHLRE STANDARD; PRT; 195 AA.
AC P37836; DE FORMATE ACETYLTRAINSFERASE (EC 2.3.1.54) (PYRUVATE FORMATE- LYASE) SCORES Initi: 163 Initn: 259 Opt: 306 38.0%t identity in 213 aa overlap 540 550 560 570 580 590 dbpf 1. NTVDSLSAIKYAKVKTLRDENGYIYDYEVEGDFPRYGEDDDADDIKLVMJYHEQJAS pflch GS FPKYGNDDDRVDE IAEWVVSTFSSKCIAK 20 600 610 620 630 640 650 dbpf 1. HKLYKNAEATVSLLTITSNVAYSKQTGNSPVHKGVFLNEDGTVNSKIJEFFSPGANP
-SN
pflch QHTYRNSVPTLSVLTITSNVVYGKKTGSTP---------- DG RKKGEPFAPGANPLHG 50 60 660 670 680 690 700 710 dbpf 1. KAKGGWLQNLRSLAKLEFKDANDGISLTTQVSPJALGK
-TRDEQVDNLVQILDGYFTPGA
pflch RDAHGALASLNSVAKLPYTMCLDGISNTFSLI PQVLGRc4GEHERATNLAS ILDGYFANGG 90 100 110 120 130 720 730 740 750 760 770 dbpf 1. LIGEAQVLVDKVDIREDIRSYVTYTEKETRF ::::I:IIII1: 11:11: I:::i:II pf 1_ch HHINVNhVLNRSMLI4ILAVEHPEKY-------
PNLTIRVSGYAVHFARLTREQQLEVIARTFH
140 150 160 170 180 190 780 dbpf 1. EVLSNflDEEVMaHTSNIX (corresponding to amino acid residues 535-788 of SEQ ID NO:16) pflch DTM (SEQ ID NO:21) The highest homology value obtained when analysing the sequence from clone pf 11 corresponds to the S. rnutans pifi gene (Table i.e. about 8011 at the DNA level, in the region covered by the probe used for library screening and 68.5t for the 1.1 kb pifi fragment analyzed.
WO 98/07867 PCT/DK97/na736 83 Sequence comparisons indicated that the fragment included in clone pfll encompasses 367 amino acids of the C-terminal region of the L. lactis pfl gene. Therefore, about 1.3 kb of the end of the pfl gene was lacking.
A 0.6 kb PstI-EcoRI fragment of clone pfll, spanning from the polylinker (PstI site) and including a fragment spanning from positions 1342-2003 in the sequence shown in Table 3.2, was randomly labelled and used for screening a XZAP genomic library of strain DB1341 (Sambrook et al., 1989) to get the upstream region of the pfl gene. High stringency hybridization (washing steps at 65 0 C, 2 x 30 min in 2 x SSC, then 1 x 30 min in 0.1 x SSC; 0.1 SDS) resulted in the isolation of twelve positive clones.
Sequence analysis of clones pfl9, pfll0, pfll9 and pfl20 showed that they included the same pfl fragment as did clone pfll.
Restriction analysis of the above clones showed that they all contained a 460 bp Sau3AI fragment identical to pfll (positions 1342-1798 in Table Only clone pfll4 showed a different Sau3AI restriction pattern. This clone lacked the above Sau3AI fragment and had a 600 bp fragment that hybridized to the Pstl- EcoRI pfl probe, suggesting that rearrangement of the insert occurred during in vivo excision of the plasmid. Sequence analysis of pfll4 confirmed that it included a pfl fragment that lacked the Sau3AI site at position 1 in clone pfll, but showed sequence identity from position 30 onwards in clone pfll (position 1372 in Table It is therefore likely that the presence of an intact L. lactis pfl gene is toxic in E. coli and leads to plasmid rearrangement.
4. Inverse PCR to obtain the complete pfl sequence of L. lactis DB1341 To facilitate the characterization of the 5' region of the L.
lactis pfl gene from strain DB1341 inverse PCR was used. EcoRI- WO 98/07867 PCT/DK97/0336 84 digested genomic DNA of strain DB1341 was religated at low concentration (Sambrook et al., 1989) and PCR was carried out using primers pfll-250 and pfl-390 (see Fig. A 1.6 kb fragment that contained the lacking 421 codons and the upstream region of the L. lactis pfl sequence (positions 1 to 1342 in Table 3.2) was amplified. This PCR fragment was re-amplified from EcoRI-digested and religated DB 1341 DNA using modified primers pfll-250 (including an XhoI site at the 5'-end) and pfll-390 (including a BamHI site at the 5'-end) and the amplified product was digested with XhoI and BamHI and ligated into vector pGEM digested with the same enzymes and transformation of E. coli DH5a resulted in strain pflup-l. The L.
lactis DB1341 pfl gene encodes a 787 amino acid protein (Tables 3.2, 3.4 and 3.6) with a deduced molecular weight of 89.1 kDa.
A sample of E. coli DH5a strain pflup-1 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg lb, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession No. DSM 11087.
Cloning of the pfl upstream sequence from L. lactis DB1341 Inverse PCR was carried out on HhaI-digested and religated chromosomal DNA of strain DB1341, using primers derived from the above sequence (Table The HhaI fragment spans about 1.7 kb from position 1 to 1707 in the below sequence which overlaps the sequence shown in Table 3.2 from position 1563 to 1750.
Table 3.5. pfl upstream sequence from L. lactis DB1341 HhaI 1 GCGCCTAGATAAGAAACAGCAACAGCTAAAAGATAGGTATCAAAAGCACT 51 TGATTTAAAATAATGACTTTATCCGATTTTTTGATTCCCAACTCAGATA 100 101 AGAGACTTGCCTTATCAACAATTGCTTGATGAGTCTTTGGTAAGTCGTT 150 151 TCAAGAGCTAGTTCGGGGAAAGCTCCAACAGCCTCATCAAAGATAATTGG 200 201 GCTATCAGGAAACTGTTCAGCTGATTTTAAAGTTTAGATACAAATTTA 250 WO 98/07867 Pr"lr1nWQ'71AA22,C 251 GGGGTrCGTGTTGAATTTCAAAAAAATCTCCTCAAGTTAATAAG'I.I.A 300 301 rTATATCACAAAGTATTCTTTAGACCAATAGTTAATGTAATGTTTIw..CTr 350 351 AAGTCGTAGAGAATAAAATTCTCGGAAA AAGTCTAAAATCTGCTACAA 400 401 TTAAAGGGAC-ACTAAGAGGATTCCAATCCTC "TATCAGAAAAGAAGG 450 451 GATAGATAGGAAAATGATTAAAAATTATGA.ACTATC CAACGAAAAAJAT 500 orfA M I K N Y E L S N E K K L 501 TAATTTCAACCTCTGAATGAAGAATTTCACCTATGTTCT TCAAC 550 1 ST S E M K N F T Y V L N P T 551 CGGAAATGATTTTAAATTATCCTTAT 600 P. E E I G N I S E Y Y D F P F D Y 601 TTTATCAGGAATTTTGGATGACTATGAA TGCCCG TTTGAAACAGATG 650 L S G I LD D YE NA P. F E TD D 651 ATAGTAATTATTTACAACTCCCCATA 700 N D N N L I LL Q Y P P L S N Y 701 GGGATGGCTTrAATTTGTGATAATAT 750 G E VA T F P Y S L V W T K N E S 751 GGTTATTTTAGCACTTAATCATGAGATTGATAATGGCTTAATTTCGAGC 800 V I LA L N H E I DN G LI FE R 801 GTGAATATGATTATAAACGCTACAACATCAAGTTATmTCAAGTGATG 850 E Y D Y K R Y K H Q V I F Q V M 851 TATCAAATGACACACACTTTCCATGATTATGAGAGA CCCAG 900 Y Q M T H T F H D Y L R D F R T R 901 GCGTCGCAGACTTGAACAGGGAATCAA ATTCAA AGAACGACCAAA 950 R P.R L E Q G I1K N S T K N D Q I- 951 TTGTTGATTGATGCCATrAGCAGTT AAFGAAGATGCC 1000 V D L I A I Q AS L I Y F E D A 1001 TTGCACAATAZATATGCAAGTACTTCAGGATTTTATTGATITACTGAGAGA 1050 L H N N MQ V L Q D F I D Y L P. E 1051 AGTAGAAGTTGTAAAATAGTTTTTGA 1100 D D E-D G F A E K I Y D I F V E T 1101 CAACACTTCGACAGATACCATATGAA 1150 D Q AY T E T K I Q L K L LE N 1151 CTCAATGTTAAATTCCATATGAATTA 1200 L R D L F S N N V S N N L N I V M 1201 GAA ATCATGACATCAGCTACTTTCGTTCTAGGGATTCCTGCAGTAATTG 1250 K I M T S AT F V L G I P A VI V 1251 TTGGTrTACGGAATGAATGTTCCAATTCCTGGTCAAJTTTTAATTGG 1300 G F Y G MN V PI P G Q N F N W 1301 ATGGTTTGGCTTATTTTAGTTCTAGGAATTATTATGTGTTTGGGTCAC 1350 M VW L I L V L G I LL C V WV T 1351 TTGGTGGTTACATAAAAAAGATATGTTATAATGGAGAAAATCTCCAT 1400 W W L H K K D M L Stop (SEQ ID 1401 TTTTTTGCTCTTTGTGAAA ATTAATTAGTGATTGCAGATTATGAAGTT 1450 WO 98/07867 PCT/DK97/00336 86 1451 AGCAATGTTTGTTAAAACTATTITGTGAATTATTTATGAAAACGTTTAA 1500 1501 AAAAGTATAACAGATATTAAAATAATTGGAACTGTATTAGTAAAGAATCT 1550 EcoRI 1551 GTAATTTCTCTTGAATTCTGTTTGCTATTCTCAAACTGTATGATATAATG 1600 1601 AAGTTGTAATTTGAAACAGAAAGAACAAAGGAGATTTCAAAATGAAAACC 1650 pfl M K T 1651 GAAGTTACGGAAAATATCTTTGAACAAGCTTGGGATGGTTTTAAAGGAAC 1700 E V T E N I F E Q A W D G F K G T HhaI 1701 CAACTGGCGCGATAAAGCAAGCGTTACTCGCTTTGTACAAGAAAACTACA 1750 N W R D K A S V T R F V Q E N Y K Nucleotides 1-1750: SEQ ID NO:34 The sequence included an open reading frame, designated orfA encoding a putative 37 kDa protein with no relevant homology to any sequence in available databases.
EXAMPLE 4 Characterization of L. lactis orfA encoding a putative transporter protein In gram-negative bacteria, the pfl gene is located downstream of an open reading frame transcribed with focA that codes for a putative membrane-bound formate transporter (Suppmann and Sawers 1994). This genetic organization is conserved in E.coli and H. influenzae but has shown great variation in streptococci (Arnau et al. 1997). In L. lactis, the orfA gene is located immediately upstream of pfl. An open reading frame is also found upstream of the pfl gene in Streptococcus mutans that showed no homology to the L. lactis orfA.
In E. coli, growth under anaerobiosis results in the synthesis of large amounts of PFL protein, about 3% of the total protein content (Suppmann and Sawers 1994). Consequently, high amounts of formate are formed intracellularly. At physiological intracellular pH in E. coli formate (low pKa, 3.75) is not WO 98/07867 PCT/DK97/00336 87 dissociated and therefore is not membrane-permeable. Thus, there is a requirement for a specific transporter to remove the excess formate in the cells.
In the following the novel orfA gene of L. lactis and its gene product is characterized.
1. The orfA gene structure, protein homology and structure Sequence analysis of orfA (see Table 3.5. above) showed a "weak" RBS (AGG) and a consensus -10 promoter region upstream of the ATG start codon. No -35 consensus region was identified, suggesting a low expression level for this gene. The deduced protein encoded by orfA, consisting of 306 amino acids and a size of 37 kDa, showed homology (38% identity at the C-terminus) to a 37 kDa putative lactococcal protein (Donkersloot and Thompson 1995) and to a less extent to numerous membrane-bound transporter proteins. A prediction of the structure of OrfA suggested the presence of a large intracellularly located Nterminal region followed by two transmembrane domains, Leu 242 to Phe 265 and Asn 276 to Val 294 (Fig. These features are consistent with a possible role of the protein in transport across the cell membrane, although neither sequence homology nor structural similarities with the E. coli FocA protein could be identified. A molecular prediction of the FocA protein showed the presence of six transmembrane domains, but among the related proteins a certain variation in the number of these domains is found. In fact, one of these proteins, the E.coli NirC has four and not six of these domains in its primary sequence (Suppmann and Sawers 1994).
2. Expression of orfA RNA was isolated from aerobic and anaerobic cultures of L.
lactis MG1363 grown in fermenters at 30 0 C. Using an orfA specific probe (Fig. 7A), Northern blot hybridization was carried WO 98/07867 PCT/DK97/00336 88 out. As shown (Fig. 7B), a low level of expression was observed under the conditions used, which is in agreement with the sequence analysis (lack of -35 region, short RBS) of the upstream region of orfA and with the level of expression expected for a gene coding a membrane associated protein.
No anaerobic induction was observed in GM17 or GalM17 during exponential growth. In GM17 a lower expression of orfA was detected as compared to GalM17 and virtually no expression of the gene was observed during stationary phase.
3. Construction and analysis of orfA mutant strains in L.lactis MG1363.
In order to determine whether orfA is the focA analogue in L.
lactis, two mutant strains of MG1363 were constructed. A null mutation was carried out by gene disruption using an internal fragment of the orfA gene (including codons 30-168, Fig 7A), cloned into the integrative vector pSMA500 and transformed into MG1363. One transformant (MG1363AorfA) that formed light blue colonies on X-gal was selected. An orfA multicopy strain was constructed by cloning of the entire coding sequence and promoter region of this gene in pAK80 and transforming into MG1363.
As above, a transformant giving blue colonies in X-gal was selected (MG1363 In E.coli, a focA null mutant strain was capable of growing at higher sodium hypophosphite concentrations than was the wild type strain. This compound is a formate analogue that is toxic.
Thus, transport of hypophosphite into the cytosol via the FocA channel protein is deleterious for the cells (Suppmann and Sawers 1994). If the OrfA protein has a similar function in L.
lactis as does FocA in E. coli, then a null mutant should show an increased resistance to hypophosphite and a strain containing multiple copies of the gene should be more sensitive to this compound than the wild type.'As shown in Fig. 8, strain WOn QSR/Ti77 .89 rYCTIU7/U 33b MG1363 showed reduced growth when the medium was supplemented with 500 mM of hypophosphite and it did not grow at 600 mM.
MG1363AorfA grew at 600 mM and was unable to grow at higher concentrations. The orfA multicopy strain, MG1363 was completely unable to grow at 500 mM hypophosphite. Thus, these results confirmed that OrfA may represent a formate transporter protein in L. lactis.
The mutant strains constructed included a translational fusion of the orfA gene to the lacLM reporter gene (Madsen et al.
1996). The effect of the addition of formate to the medium on the expression of orfA was studied. To exclude a possible toxic effect of the addition of formate to the medium, a dosis curve was studied. Growth inhibition of the wild type strain was observed at formate concentrations exceeding 10 mM. Exponentially growing cultures (OD 600 about 1) were used to measure 1-galactosidase after the addition of 10 mM of formate to the growth medium. As shown in the below Table 3.6 similar levels of f-galactosidase were observed in MG1363AorfA independently of the addition of formate or the growth conditions.
Table 4.1. Analysis of orfA expression in mutant strains of L.
lactis strains.
a Aerobic Anaerobic STRAIN +Formate -Formate +Formate -Formate MGI363AorfA 9.1 +0.3 8.2 +0.7 7.5 0.7 6.2 +0.1 MGI363pAK80::orfA 14.6+0.2 16.7 +0.7 13.2 +0.1 13.2 1.3 WO 98/07867 PCT/DK97/00336 a) -galactosidase activity in exponentially growing cultures. At OD 600 about 1, formate was added formate) and the cultures were incubated further for 15 min before cells were separated by centrifugation and frozen.
Higher levels were observed in all cases with the multicopy strain MG1363 pAK80::orfA. These levels, about 2-fold higher, did not correlate with the number of copies (5-10 per cell) expected in this strain. A degree of regulation of expression may exist for orfA in L. lactis to ensure an appropriate level of the OrfA protein.
EXAMPLE Isolating and characterizing the pfl gene from L. lactis subspecies lactis MG1363 1. Cloning of a fragment of the pfl gene A pfl fragment was amplified with the above modified primers and pfll-1066 from chromosomal DNA of strain MG1363 (see Fig. This fragment was digested and cloned into the vector pGEM digested with XhoI and BamHI, respectively and transformed into E. coli strain DH5a (Stratagene), resulting in strain MGpfl-l. The fragment was sequenced using the relevant primers derived from the sequence of the DB1341 pfl fragment (see Fig. 4).
The sequence of the MG1363 pfl fragment showed 48 differences (42 base changes and a 6 bp deletion) in the 1 kb region characterized when compared to the corresponding sequence of the DB1341 pfl (below Table The deduced Pfl protein fragment encoded by the characterized pfl sequences of strains MG1363 and DB1341 showed high homology. Only four sequence differences are found in a 336 amino acid stretch (below Table two amino acid substitutions (Pro 447 to Thr 473 and Asn 486 to Asp 486 WO 98/07867 PCT/DK97/00336 91 in Table 5.1) and two adjacent deletions (Asp 454 -Asp4 55 encoded by the DB1341 pfl gene. The latter two residues are also present in the protein encoded by the S. mutans pfl gene.
A sample of E. coli DH5a strain MGpfl-1 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg 1b, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11088.
Table 5.1. Homologv between the DNA sequences of a fracrment of the Dfl gene fracment isolated from L. lactis strains DB1341 (dbl341pfl) and a fragment of the pfl gene of MG1363 (mcr1363- Dfl) The comparison starts at the position of the Sau3AI site in the L. lactis DB 1341 pfl gene (position 1342 in Table 3.2).
mgl363pfl db1341pfl consensus mgl363pfl dbl34lpfl consensus mgl363pfl dbl34lpfl consensus mgl363pf1 dbl34lpfl consensus mgl363pf1 dbl341pfl consensus mgl363pfl dbl341pfl consensus 1 GATCCAGAAA ATGAAGAAGG ACGTCATAAC CTCCAATACT TTGGTGcGCG 51 100 TGTTACCTGG TTTGAACGGT GGTTAC....
TGTAAACGTC TTGAAAGCAA TGTTGACTGG TTGAACGGT GGTTATGATG .TGTT..CTGG TTTGAACGGT GGTTA.....
101 150 GTTCATAA AGATTATAAA GTATTCGATA TTGAACCTGT TCGTGATGAA ACGTTCATAA AGATTATAAA GTATTCGACA TCGAACCTGT TCGTGACGAA GTTCATAA AGATTATAAA GTATTCGA.A T.GAACCTGT TCGTGA.QAA 151 200 ATTCTTGACT ATGATACAGT TATGGAAAAC TTCGACAAAT CACTCAACTG ATTCTTGACT ATGATACAGT TATGGAAAAC TTGACAAAT CTCTCGAcTG ATTCTTGACT ATGATACAGT TATGGAAAAC TT.GACAAAT C.CTC.ACTG 201 250 GTTGACAGAT ACTTATGTrG ATGCAATGAA TATCATTCAC TACATGACTG GTTGACTGAT AcTTATGTTG ATGCAATGAA TATCATTCAT TACATGACTG GTTGAC.GAT ACTTATGTTG ATGCAATGAA TATCATTCA. TACATGACTG 251 300 ACAAATATAA CTATGAAGCA GTTCAAATGG CCTTCTTGCC TACTAAAGTT ATAAATATAA CTATGAAGCA GTTCAAATGG CCTTCTTGCC TACTAAAGTT A.AAATATAA CTATGAAGCA GTTCAAATGG CCTTCTTGCC TACTAAAGTT WO 98/07867 mgl3 63pf1 dbl 34 lpf 1 consensus ingi36 3pf 1 dbl341pfl consensus mg1363pf1 dbl34lpfl consensus mgl3 63pf 1 dbl34lpfl consensus zngl3 G3pf 1 dbl34lpfl consensus mgl363pfl dbl34lpfl consensus Ingl363pfl db134 lpf 1 consensus mgl363pfl dbl34lpfl Consensus zngl363pfl dbl34lpfl consensus mgl363pf1 dbl34lpfl consensus mgl3E3pfl dbl34lpfl Consensus mgl3 63pf 1 dbl34lpfl consensus mg1363pf1 dbl34lpfl consensus 301
CGTGCTAACA
CGTGCTAACA
CGTGCTAACA
351
ACTCAGCG
ACTTrCAGCA
ACTTTCAGC.
401
GCTACATCTA
GCTATATCTA
GCTA.ATCTA
451
GATGATGACC
GATGATGATC
GATGATGA. C 501
TGAAAAATTA
TGAA~AAATTA
TGAkPJLATTA 551
CACTTTTGAC
CACTTTrGAC
CACTTTTGAC
601
TCTCCAGTTC
TCTCCAGTAC
TCTCCAGT. C 651
ATCTAAACTT
ATCTAAAC'TT
ATCTAAACTT
701
AAGGTGGATG
AGGGTGGITG
A. GGTGG. TG 751
GATGCAAATG
GATGCAAATG
GATGCAAATG
801
TGGTAAAACT
TGGTAAAACT
TGGTAAAACT
851
GATACTTCAC
GATACTTCAC
GATACTTCAC
901
CACGTTAACT
CACGTTAACT
CACGTTAACT
TGGGATTTGG
TGGGATTrrGG
TGGGATTTGG
ATTAAATATG
ATTAAATATG
ATTAAATATG
CGATTATGAA
CGATTACGAA
CGATTA .GAA
GTGCTGATGA
GTGCTGATGA
GTGCTGATGA
GCTTCACACA
GCTTCACACA
GCTTCACACA
AATCACATCT
AATACATCT
AAT .ACATCT
ATAAAGGAGT
ATAAAGGAGT
ATAILAGGAGT
GAATTCTTCT
GAATTTCT
GAATTCTTCT
G'ITGCAAAAT
GTTGCAAAAC
GTTGCAAAA.
ACGGTATTC
ATGGTAT=C
A. GGTATTTC
CGTGATGAAC
CGTGATGAAC
CGTGATGAAC
ACCAGGAGCT
ACCAGGTGCT
ACCAGG.GCT
TGAACGTTAT
TGAACGTAAT
TGAACGT .AT
TATCTGTGGT
TATCTGTGGA
TATCTGTGG.
CTAAAGTTAA
CTAAAGTTAA
CTAAAGTTAA
GTAGAAGGTG
GTAGAAGGTG
GTAG.AAGGTG
TATCGCTAAA
TATTGCTAAA
TAT. GCTAAA
AACTTTACAA
AACTTTACAA
AACTTTACAA
AACGTTGCT
AACGTTGCTT
AACGTTGCTT
ATTCCTCAAT
AMTCCTCAPLT
ATTCCTCAAT
CACCAGGTGC
CACCAGGTGC
CACCAGG2TGC
CTTCGTTCAT
CTTCGCTCAT
CTTCG .TCAT
ATTAACTACT
ATTGACTACT
ATT .ACTACT
AAGTAGATAA
AAGTGGATAA
AAGT GATAA
TTGATTAATG
TTGATTAATG
TTGATTAIATG
GGACCTTAAA
GGACCTTAAA
GGACCTTAAA
TTCGCAAATA
TTCGCAAATA
TTCGCAkATA
AACTTGCGT
AACATTGCGT
AAC .TTGCGT
ACTTCCCACG
ATTTCCCTCG
A.TTCCC.CG
CTTGTCATGA
CTTGTCATGA
CTTGTCATGA
AAATGCTGAA
AAATGCTGAA
AAATGCTGAA
ACTCTAAACA
ACTCTAAACA
ACTCTACA
GAAGATGGTA
GAAGATGGTA
GAAGATGGTA
TAACCCATCT.
TAACCCATCT.
TAACCCATCT
TAGCTAAATT
TGGCTAAGTT
T.GCTAA.TT
CAAGTTTCTC
CAAGT=CAC
CAAGTTTC.C
CTTGcITTCAA
CTTGGTTCAA
CTTGGTTCAA
GTACTGAATT
GTACTGAATT
GTACTGAATT
GATGTTTACG2
GATGTTACG
GATGTTTACG
PCT/DK97/00336 350 CAGTTc4ATrC
CAGTTGATTC
CAGTTGATTC
400
GATGAAAATG
GATGAAAATG
GATGAA.AATG
450
ITATGGTGAA
'IWATGGTGAA
TTATGGTGAA
500
AAATGTACCA
AAATGTACCA
AAATGTACCA
550 GCTACTGTI-r
GCTACTGTTT
GCTACTGTTT
600
AACTGGTAAC
AACTGGTAAT
AACTGGTAA.
650
CAGTCAACAA
CAGTAAATAA
CAGT .AA .AA 700
A~ACAAAGCTA
AATAAAGCTA
AA AAGCTA 750
GGAATTCAAA
GGAATTCAAA
GGAATTCAAA
800
CTCGTGCACT
CTCGTGCACT
CTCGTGCACT
850 46TTCTTGATG ELT1TCTTGATG kTTCTTGATG 900
E'GCAGGTCAA
rGCAGGTCAA rGCAGGTCAA 950 kTAAAATCAT kTAAAATCAT kTAAAATCAT WO 98/07867 mgl363pfl db1341pfl consensus mgl363pf1 dbl34lpfl consensus mgl363pfl db1341pfl consensus dbl34lpfl consensus db134lpf1 consensus dbl34lpfl consensus dbl34lpfl consensus 951
GCGTGGTGAA
GCGTGGTGAA
GCGTGGTGAA
1001
AATACCTCAC
AATACcTCAC
AATACCTCAC
1051
GAAGTACTTT
GAAGTTCTTr GAAGT. CTTT 1101
ATTCTTAAAA
1151
TTTITTTACGA
1201
TAAGAAAGTT
1251 ATrGACAGGC GATGTTATCG TTCGTATCTC TGGATACTGT GATGTTATCG TTCGTATCTC TGGTTACTGT GATGTTATCG TTCGTATCTC TGG.TACTGT ACCTGAACAA AAACAAGAAT TGACTGAACG ACCAGAACAA AAACAAGAAT TAACTGAACG ACC.GAACAA AAACAAGAAT T.ACTGAACG CAAACGATGA TGAAGAAGTA AT (SEQ ID CAAACGATGA TGAAGAAGTA ATGCATACTT CAAACGATGA TGAAGAAGTA TTTAATGAAT ATTCGGTCTG TCAGTTTTAC AAAAATTAAT CATAATAGTT AAAAACTATT AAATTITATG CTAAAATAGA TGAATGAAAA GGAATTGCGA KTGGGAAATC AACGGTGGTT 1100 PCT1DK97/00336 1000
GTTAACACTA
GTCAATACTA
GT.AA.ACTA
1050
TGTCTTCAT
TGTCTTCCAT
TGTCTTCCAT
NO: 22)
CAAACATCTA
1150
TGACAGACTT
1200
GTTTTAGTT
1250
TGGTAATTGG
1300 GATTTrrGA dbl34lpfl: corresponding to nucleotides 1342-2641 of SEQ ID NO:1S Table 5.2. Multialignment of the putative Pfl protein from L lactis strains MG1363 (Dartial secuence: 1) and DB1341 with the deduced amino acid seauences of known cloned bacterial pfl genes The L. lactis Pfl proteins were aligned with the following known Pfl proteins: deduced proteins of S. mutans pfl E.
coli pfl3 and pfIb genes (Accession Nos. P42632 and P09373; 4 and H. influenzae Pfl C. pasteurianum Pfl Consensus (con) shows conserved positions (bold) among all of the protein sequences. The four amino acid differences between the MG1363 and DB1341 Pfl are shown in underlined, bold at the top (1) WO 98/07867 WO 9807867PCT/DK97/00336 2 3 4 5 6 7 con 2 3 4 6 7 con 2 3 4 6 7 con 2 3 4 6 7 con 2 3 4 6 7 con 2 3 4 5 6 7 con 2 3 4 6 7 con MKrEVTENI
MATVKTNTDV
MKVDIDTSDKL
SELNEK
SELNEM
FEQAWDGFKG
FEKAWEGFKG
YADAWLGFKG
LATAWEGFTK
QKLAWAGFAG
LFKQWEGFQD
W GF TNWRDKASVT RFVQENYKPY DGDESFLAGP TDWKDRASIS RFVQDNYTPY DGGESFLAGP TDWKNEINVR DFIQHNYTPY EGDESFLAEA GDWQNEVNVR DFIQKNYTPY EGDESFLAG3A GDWQENVNVR DFIQKNTYTPY EGDDSFLAGP GEWTNDVNVR DFIQKNYKEY TGDKSFLKGP W F QNY Y G SFL T
IEDTKNMHE
VEETKAHYEE
VMEGIRIEKA
VMEGVKLENqR
VMEGIKIENR
AVS-LILEEL
GLRVAEKILT
GIRMAETALK
GINMIKSSFH
GIKMIEGSCK
GIKMVEGSCK
GIRMAEQSLK
G
RGRI IGVYAR RGRI IGVYAR RGRI IGDYRR
RGRIIGDYRR
RGRI IGDYRR RGRI IGDYRR RGRIIG Y R VGFPFDTD- TRFPNDT-
THAPVDFDTN
THAPVDFDTA
THAPLDFDEH
KKGILDVDTE
D
EHGLSVDPGL
EHGYEPDPAV
AYGREMDSEF
AYNREIJDPMI
VYGRELDPKV
EYGFKISDEM
LALYGADYLM
LALYGADYLM
VAL.YGISYLV
VALYGIDYLM
VALYGVDFLM
VAIIYGIDFLI
ALYG L RVTS IDKIPA
RITSIADIPA
IATTITAHDA
VASTITSHDA
TPSTIISHAP
TISGINSFKP
I
HDVLSQTMTS
HEIFTKYATT
EYLFTDLRKT
KKIFTEYRKT
KKIFTEYRKT
HNIFTNYRKT
KEKAKEWDAI
QEKVNDWNSI
RERELQFADL
KDKLAQFTSL
KDKYAQFSSL
QEKKKDLSNL
GYIDAIDKEL
GYID- -KEN GYIN QFIJ GYIN- KQIJ GYIN- KDL GYLD- KDN
GY
VNDGIFRAYT
VNDGIFRAYT
HNQGVFDVYS
HINQGVFDVYT
HNQGVFDVYT
HNQGVFDAYS
G F Y
TEIN
AEID
QSRLEKGEDL
QADLENGVNIJ
QKDLEDGVNL
KGDML
ELIYGMQNSE
ELIFGIQNDE
EKIVGLQTDA
EKIVGLQTEA
EKIVGLQTDE
EVIVGFQTDA
ElI G Q SAl RKARHAH
SNIRRARHAH
PDMVIRCRKSG
PDILRCRKSG
PDILRCRKSG
EETRIARSAG
R
EENIRLKEEI
EESIRLREEI
EAT IRLREEL
EQTIRLREEI
EKATIRLREEI
DELIRLREEV
IRLREE
TERTLKVKKI
TERSLHIKKV
TPATTELWEK
TEATTTLWDK
TE-ATTKLWEs
TEKTKKVWDK
120
LFRLNFMPRG
LFKLNFMPKG
PLKRALHPFG
PLKRALIPFG
PLKRAIMPFG
PLKRITNPFG
P G 180
TVTGLPDAYS
TVTGLPDAYS
VLTGLPDGYG
VIJTGLPDAYG
VLTGLPDAYG
VLTGLPDAYG
TGLPD Y 240
INMQYQALQEV
NLQYQALGEV
AEHRHALLQI
AEQHRALGQM
.AEQHRAJJGQL
SEQIRAIDEI
AL
VNFGALYGLD
VRIAGDLYGIAD
QEMAAKYGFD
KEMAAKYGYD
KQMAASYGYD
KKMALSYGVD
YG D VSRPAMI4VKE
VRKPAMNVKE
ISRPAQNAQE
ISGPATNAQE
I SNPATKAQE
ISRPAVNAKE
PA N E AIQWVNIAYM AVCRZVINGAA AIQWINIAFM AVCRVINGAA AVQWLYFAYL AAVKSQNGGA AIQWTYFGYL AAVKSQNGAA AIQWMYFAYL AAIKSQNGAA AAQFLjYFGYL AGVKENNGAA AQ0 A XG A
TSLGRVPIVL
TSLGRVPIVL
MSLGRTASFL
MSFGRTSTFL
1MSFGRTATFI
MSLGRTSTFL
S GR 300
DIFAERDLAR
DIFAERDLAR
DIYIERDFKA~
DVYIERDLKA
DVYIERDLKA
DIYIERDLEQ
D ERD
GTFTEQEIQE
GTFTESEIQE
GVLNEQQAQE
GKITEQEAQE
GKITETEAQE
GLITEDEAQE
G QE
YRFLNTLDTI
YRFLNTLDNI
FRYLHTLHTM
FRFLNTLYTM
FRILHTLYNM
FRYLHTLINL
L TL
FVDDFVLKLR
FVDDFVMKLR
LIDHFIMKIR
MVDHLVMKLR
LVDHLVMICLR
VIDQFIIKLR
D K R
GNAPEPNLTV
GNAPEPNLTV
GPAPEPNLTI
GPSPEP'MTI
GTSPEPNLTI
GSAPEPNMTV
G PEPN T
TDMCFARAAAY
TVKFARTKAY
MVRFLRTPEF
MVRFLRTPEY
MVRFLRTPEY
LVRHLRTPEY
R
LWDSKLPYS F LWSSKLPYS F
LWSEELPIAF
LWSEKLPLNF
LWSEQLPENF
LWSENLPESF
LW LP F
DELYSGDPTF
DELYSGDPTF
DSLFSGDPIW
DELFSGDPIW
DQLFSGDPMW
NELFAGDPTW
L GDP
KRYSMSMSHK
RHYCMSMSHK
KKYAAQVS IV KKFAAKVS ID KRFCAKVS ID KKFCAEMS IL s
ITTSMAGMGN
ITI'SMAGMGA
ATE VIGGMGL ATES IGGMGL
ATETIAGMGL
VTESIAGVGI
T G G
HSSIQYEGVE
HSS IQYEGVr
TSSLQYENDD
TSSLQYENDD
TSSVQYENDD
TDSIQYENDD
S QYE 360 DGRHRVTK4D DGRHRVTKbMJ
DGRTLVTKNS
DGRTLVTKNS
DGRTLVTKNT
DGRSLVTKNS
DGR VTK 420
TMAKDGYGEM
TMAKEGYGEM
LMRTDFNSDD
LMRPDFNNDD
LMRPDFNNDD
IMRPI -YGDD m WO 98/07867 WO 9807867PCT/DK97/00336 1 2 3 4 6 7 con 2 3 4 5 6 7 con 1 2 3 4 6 7 con 1 2 3 4 6 7 con 1 2 3 4 6 7 con 1 2 3 4 6 7 con
SCISCCVSPL
SCISCCVSPL
YAIACCVSPM
YAIACCVSPM
YAIACCVSPM
YAIACCVSAM
I CCVS
ILDYDTVHEN
ILDYDTVMENq
VLDFETVICAN
VLDYDKV'DUS
VLNYDEVMER
VLDFDTVMTR
VLDYEKVKEN
L V
DPENEEGRHN
DPENEDRRHN
VIG--KQ
IVG--KQ
IVG--KQ
RVG--KD
FDKSLNWLTD
FDKSLDWJTD
FEKAILDWLTD
LDHFMDWLAV
MDHFDWLAC
MDSFIMDWLAC
YFKVLEYMAG
LQYFGARVISV
LQYFGARVNV
MQFFGARAN5L
MQFFGARANL
MQFFGARANL
MQFFGARCNL
Q PGAR N TYVDAMNI IH
TYVD
1 MNI IH TYVDAMNI IH QYISALNI IH
QYITALNIIH
QYVTALNVIH
LYVNTMNI IH Y N IH
LPGLNG
LKANLTGLNG
LKALLTGLNG
AKTLLYAING
AKTMrLYAING
AKTLLYAING
AKCLLLAING
K L NG
YMTDKYNYEA
YIMKYNYEA
YMTDKYNYEA
YIVIHKYSYE-A
YbMDKYSYEA GY- -VHKDYK
GYDDVHKDYI(
GYDDVHKDYK
GVDEKLKIQV
GVDEKLIO4QV
GIDEKLGMQV
GVDEKKGIKV
G
VQMAFLPTKV
VQMAFLPTKV
VQMAFLPTRV
SLMALHDRflV
SLMALHDRDV
480
VFDIEPVRDE
VFDIEPVRDE
VFDVEPIRDE
GPKTAPLMDD
GPKSEPIKGD
GPKTAPITDE
VPDIEPITDE
p 540
RANMGFGICG
R.ANMGFGICG
KANMGFGICG
YRTMACGIAG
IRTMACGIAG
YRTMACGIAG
GRLMAFGIAG
M GI G YMHGDKYSYEA ALMALHDMRDV FMHDKYAYEA SQMALHDTKV YM DKY YEA MA V
FANTVDSLSA
FANTVDSLSA
FSNTVDSLSA
LSVATDSLSA
LSVAADSLSA
LSVAADSLSA
FSVAADSLSA
DSLSA
LVDG~MYEKL
LVMKMYHEKJ
WLLEAFHT.L
DLVERFMKKI
DLVERFMKKI
DLV ERFMGCKI
EIVEKFSDEL
EFFSPGANPS
EFFSPGANPS
EFFSPGANPS
TPFAPGANPM
APFGPGANPM
APFGPGANPM
EPLAPGANPM
PGANP
QILDGYFTPG
QILDGYFTPG
TILDGYF
GLLDGYFHHE
GIMGYFHHE
GLMYDGYFHHE
SIMGGYF-
GYP
IKYAKVKTLR
IKYAKVKTLR
IKYATVKPIR
IKYA1RVKPIR
IKYAKVKPIR
IKYAKVKPVR
IRYAKVKPIR
IKYA VK R
ASHKLYKNAE
ASHKLYKNAE
AIRHKLYKDSE
KALIPTYRNAV
QKLHTYRflAI
QKLKTYRNAV
KKHPTYRNAK
Y
NKA- KGGWLQ NKA- KGGWLQ NKA- SGGWLQ
HGRDRKGAVA
HGRDQKGAVA
HGRDQKGAVA
HGRDMEGALA
G
ALINGTEFAG
AIINGTEPAG
EGGG
ADV- EGG ASI EGG ATV- EGG
GQGA
DEN DEN DEN GDIKDKDGN V -EN
ATVSLLTITS
ATVSLLTITS
ATVSLLTITS
PTQSILTITS
PTQSVLTITS
PTQSVLTITS
HTLSVLTITS
T S LTITS
NLRSLAKLEF
NLRSLAKLEF
NLNSLKKADF
SLTSVAKLPF
SLTSVAKLPF
SLTSVAKLPF
SLNSVAKVPY
L S K
QHVNLNVM~DL
QHVNLNVMDL
QHVNLNVIDL
QHLNVNVMNR
QHLNVNVNR
QHLNVZNVLNR
HHLNVNVLNR
H N N~V
GYIYDYE
-GYIYDYE
-GYIYDYE
-GLAVDFE
-GLAIDFE
VATNVAIDFE
GITVDFV
D
NVAYSKQTGN
NVAYSKQTGN
NVAYSKQTGN
NVVYGQKTGN
NVVYGKKTGN
NVVYGKKTGN
NVMYGKKTGT
KNVY TGN
KDANDGISLT
KDANDGISLT
AHANDGISLT
TYAKDGISYT
AYAKDGISYT
AYAKDGISYT
VCCEDGVSIqT DG S T
KDVYDKIMRG
KDVYDKIMRG
KDVYDKIMNG
EMLLDAIEHP
EfLJLDAMENP EMvLLDAMENP
ETLIDAMUNP
D
VEGDFPRYGE
VEGDFPRYGE
TVGNFPRYGE
IDGEYPQYGN
IEGEYPQFGN
IEGEYPQYGN
KEGDFPKYGN
G P G
SPVHKGVFIJN
SPVHKGVFLN
SPVHKGVYLN
TPD TPD TPD TPD p
TQVSPRALGK
TQVSPRALGK
TQVSPKALGK
FS IVPAALGK FS IVPNALGK
FSIVPNALGK
FSIVPDALGN
P ALG EDV- IVRI EDV- IVRI EDV- IVRI
EKYPNLTIRV
EKYPQLTIRV
DKYPQLTIRV
DKYPTLTiRv
R
600
DDDRADDIAK
DDDRADDIAK
DDDRVDS IAE NDERVDS IAC
NDPRVDDLAV
NDNRVDDIAC
DDDRVDSIAV
D RD A 660
EDGTVNKSKL
EDGTVNKSKL
EDGSVNLSKV
GRRAG
RRAG
GRRAG
GRKVG
720
TRDEQVDNLV
TRDEQVDNLV
TFDEQVANLV
EDPVRXTNLV
DDEVRJKTNLA
DAEAQRRNLA
DHDVRINNiLV
NL
780
SGYCVNTKYL
SGYCVNTKYL
SGYCVNTKYL
SGYACASTH
SGYAVRFNSL
SGYAVRFNSL
SGYAVNFNRL
SGY
WO 98/07867 WO 9807867PCT/DK97/00336 96 1 TPEQKQELTE RVFHEVLSND DEEV (SEQ ID NO:23) 2 TPRQKQELTE RVFHEVLSND DEEVMHTSNI Z (SEQ ID NO:16) 3 TKEQKTELTQ RVFHEVLSD DAATDLVNNK Z (SEQ ID NO:24) 4 (SEQ ID NO:19) 5 TKEQQQDVIT RTFTQSM (SEQ ID NO:14) 6 TKEQQQDVIT RTFTESM (SEQ ID 7 SKDHQKEVIS RTFHEKL (SEQ ID con 2. Cloning and sequencing of the entire pifi gene of L. lactis strain M4G1363 The entire pfl gene sequence was obtained from L. lactis subsp.
cremoris strain MG1363 using PCR. Like the pfi coding sequence of L. lactis strain DB1341 the coding sequence of MG1363 comprises 2363 bp and encodes a 787 amino acid PFL protein having a predicted molecular weight of 89.1 kDa.
Table 5.3. The cgmplete sequence of the Pfl locus of L. lactis gtrain MG1363 1 TTGGGCTATAAGGAAATTGTTCTGcTGATTrTT=AAAGTTTAGATATAGG 51 TTTAGGGGTrCATG=rGAATTTCAAAAAAAGTCTCCTCAAGTTAATAAG 100 101 =TATTATATCACAAGATATTMTAGACCAACTTCCTTCA JJACT 150o 151 TCGTTAAGGCTTG TA TAGGAAATGGAAAATCTGCT 200 201 ACAATTAGAAGGAGAAGAAGAGGATTAAATCCTTTT rATTAGGAAAAG 251 AAGGGATAGATAGGCTGATATGATAA AATTATGAACTATCCAATGA 300 orfA M I K N Y E L S N E K Sau3AI 301 AAAAATTGATCTCAACTTCTGAGATGAAGAATTCACTTATGTCCTAT 350 K L IS TS E M K N F T Y V L N 351 CCAACACGTGAAGAAATTGGGAATATCTCAGAACACTATGAT.I-TCCT 400 P T R E ElI G N I S E H Y D F P F 401 TGACTATCTATcTGG ATTTAGATGAcTATGAA TGCCCGTTTTGAAJA 450 D Y L S G I LD D YE N AR F E T 451 CAGATGATAATGACAATAATCTGATTCITTTGCAATATcCcGCCTTGTCC 500 D D N D N N L I L L Q Y P AL S 501 AACTATGGAGAAGTGGCCAcTiTT-,CCATATTCTTTGGTTTGGACTAAGA 550 N Y G E V AT F P Y S L V W T K N 551 TGAATcGGTTA ErTTGGCCCTTAACCATGAAATTGATAATGGTCTCATTT 600 E S V I L A L N H ElI D N G L I F WO 98/07867 PCT/DK97/oo336 97 601 TTACAATTATTACGTTACC ATATTCAA 650 E R E Y D Y K R Y K H Q L I F Q 651 GTGATGTACCAATGACTCATACTTTTCATGATTA GAGAGACT-T.AG 700 V M Y Q M T H T F H D Y L R D F R 701 AACAAGGCGCCGCCGGCTTGAAGTTGGTATCAJJAJA12TJ ACA ZATG 750 T R R R R L E V G I1K N S T K N D 751 ACCAAATTGTTGAO ATGCATAGGAGTGTTATTA 800 Q I V DL I A I Q AS L I Y F E 801 GAGGTCCAATTGAGTT GA- ATTGATTACTT 850 D A L H N N M Q VL Q N F I D Y L 851 ACGAGAAGATGATGAGATGGTTTTGCCGAAAAATCTATGATATTI=G 900 R ED D E D G F A E K I Y D I F V 901 TCGA ACAGACCAAGCTTATACAGAA CCAAGATTCAGCTCAIAG'TACTA 950 E T D Q A Y T E T K I Q L K L L 951 GAAAATCTCCGAGATTTGT-rCTCAACATTGTCTCTAATAATTGAATAT 1000 E N L R D L F S N I V S N N L N I 1001 CGTCATGAA ATATGACCTCAGCAACATTTGTTCTAGGTA'rTCCGGCGG 1050 V M K I M T S AT F V L G I P A V 1051 TTATrGTCGGCTT TATGGAATGAATGTTCCGATTCCTGGTCAAATTT 1100 I V G F Y G M NV P IP G Q N F 1101 AATTGGATGGTCTGGCTCATTTTGGTGTTTGGAA =ATTATGTG'I=G 1150 N W MV WL I L V F G IL L C V W 1151 GGTTACTGGTGGCTACACAAAAAGATATGTTATGAATGGAGAAATIT 1200 V T W W L H K K D M L Stop (SEQ ID NO:37) 1201 CTCCGTTTT TATCTTTGTGAAAAAT TAGTGATAATAAATCATG 1250 1251 AAGTTAGCAATGTTGTCA AGCTATTTAGTGAATTAATATGAAAACGT 1300 1301 TTAAAGTAGTAAAAATATG CTGTATTAGTAAAG 1350 EcoRI 1351 AATCTGTAATTTCTCTTGAATTCTGTTTGCTATTATCA1CTGTATGATA 1400 1401 TAATGAAG~rGTAA TTGAA -GAAAGAAAGGAGATTCAAAATGA 1450 pf 1 M K 1451 AAACCGAAGTTACGGAAATATCTTTGACAGCTGGGATGGTTTAAA 1500 T E V T E N I F E Q A W D GF K 1501 GGAACTAACTGGCGCGATAA GCAAGCGTTACTCGCTTTGTACAGAAAA 1550 G T N W R D K A S V T R F V Q E N 1551 CTACAAACCATATGATGGTGATGAAAGCTTTCTTGCTGGGCCAACAGAAC 1600 Y K P Y D G D E S F L AG P T E R 1601 GTACACTTAA GTAAAGAAAATTATTGAAGATAC AATCACTACGAA 1650 T L KV K K I I E DT K N HY E 1651 GAGTAGGATTTCCCTTGATACTGACCGCGTACCTTATCGATAAT 1700 E V G F P F D T D R V T S I D K I 1701 TCCTGCTGGATATATTGATGCTAATGAT AGAACTTGAACTCATCTATG 1750 WO 98/07867 PCT/DK97/00336 98 P AG Y I DA N D K E L E LI Y G 1751 GGATGCAAATAGCGAACTTTCCGCTTACTT-CATGCCAAGAJGTGGT 1800 M Q N S EL F R L N F M P R G G 1801 CTTCGTGTTGCTGAAAAGATTTTGACAGAACACGGTCTTCGTTGACCC 1850 L R V A E K IL T E H G L S V D P 1851 AGGTTTGCATGATGT rGTCACAAACAATGACTTCTGTAAATGATGGAA 1900 G L H D V L S Q T MNT S V N D G I 1901 TCTCTCTTCTACATCTAGAGCCCCCC 1950 F RA Y T S AlI R K A R H A H T 1951 GTAACAGGITTGCCTGATGCATACTCTCGTGGACGTATCTCGGGGTATA 2000 V T G L P D A Y S R G R I I G V Y 2001 TGCACGTCTTGCTC=TATGGAGCTGACTACCTTATGAAGGAAAGCAA 2050 A R L AL Y G AD Y L M KE K A K 2051 AAGAATGGGATGCAATCACTGAAATTAATGATGATA CTTCGTCTTAAA 2100 E W D A I T E I N D D N~ I R L K 2101 GAAGAAATAACATGCAATACCAAGC rGCAAGAGrTGTAAACITTGG 2150 E EI N M Q Y Q A L Q E VV N F G 2151 TGCTTTGTATGGTCTTGACGTTTCTCGTCCAGCGATGAACGTAAAGAAG 2200 A L Y G L D V S R P AM NV K E A 2201 CAATCCAATGGGTTAATATGCATACATGGCAGGTCGTGTATCAAT 2250 I Q W V N I A Y M A V C R V I N 2251 GGGTCATCCTGCTTGCAC7CTAACTG 2300 G AA T S L G R V P I V L D I F A 2301 AGAACGTGACCTTGCTCGTGGAACATTACTGAGCAAGAAATCCAAGAAT 2350 E R D L A R G T F T E Q ElI Q E F 2351 TTGTTGATGATTTCATTTTAAAACTTCGTACAATGAAATTTGCTCGTGCT 2400 V D D F IL K L R T M K F A R A 2401 GCTGCTTATGATGAACTTATTCTGGTGACCCCACGTTCATCAACATC 2450 A AY D EL Y S G D P T F I TT S 2451 TATGGCTGGTATGGGTAATGACGGACGCCACCGTGTACTAATGGACT 2500 M AG M G N D G R H R V T K M D Y 2501 ATCGTTTCTTGAACACACTTGATACAATCGGA TGCTCCAGAACCAAAC 2550 R F L N T L D T I G N A PE P N 2551 TTGACAGTTC=TGGGACTCTAA CTCCCATATTCAT ACGTMTC 2600 L T VL W D S K L P Y S F K R Y S 2601 AATGTCTATGAGTCACAAACACTCATCTATCCTATGAGGTGTGA 2650 M S M S H K H S S I Q Y E G V E T 2651 CATGGCTAAGATGGATATGGCGAATGTCTGTATCTCTTGTTGTGTC 2700 M A KDG Y G E M S C I S C C-V 2701 TCCATGCCGAAGAAAGCTAATTCAAT 2750 S P L D PE N E E G R H N LQ Y F 2751 TGGTGCGCGTGTAAACGTCTTGAAAGCAATGTTGACTGGTTTGAACGGTG 2800 G A R V N V L K A M L T G L N G G WO 98/07867 PCT/DK97/00336 99 2801 GTT-ACGATGACGTTCATAAAGATTATAAAGTATTCGATATTGAACCTGTr 2850 Y D D V H K D Y K V F D IE P V 2851 CGTGATGAAATTCTTGACTATGATACAGTTATGGAAAACTTCGACAAATC 2900 R D ElI L D Y D TV M E N F D K S 2901 ACTCAACTGGTTGACAGATACTTATGTTGATGCAATGAATATCATTCACT 2950 L N W L TD T Y V D AM N I I H Y 2951 ACATGACTGACAAATATAACTATGAAGCAGTTCAZAATGGCCTTCTTGCCT 3000 M TD K Y N Y E A V Q M A F L P 3001 ACTAAAGTTCGTGCTAACATGGGATTGGTATCTGTGGTTCGCAAATAC 3050 T K VR A N M G F G I C G F A NT 3051 AGTTGATTCACTTTCAGCGATTAAATATGCTAA GTTAAMACTTTGCGTG 3100 V D S L S A I K' Y A K V K T L R D 3101 ATGAAAATGGCTACATCTACGATTATGAAGTAGAAGGTGACI.TCCCACGT 3150 E N G Y I Y D YE V E G D F P R 3151 TATGGTGAAGATGATGACCGTGCTGATGATATCGCTAACTTGTATGAA 3200 Y G E D D D R A D D I A K L VM K 3201 AATGTACCATGAAAAATTAGCTTCACAACTTACAAAAATGCTGAAJG 3250 M Y H E K L A SH K L Y K N A E A 3251 CTACTGTTTCACTTTTGACAATCACATCTAACGTGCTTACTCTAAAA 3300 T V S L L T I T S N VA Y S K Q 3301 ACTGGTAACTCTCCAGTTCATAAAGGAGTArCCTCAATGAAJGATGGTAC 3350 T G N S P V H K G V F L N E D G T EcoRI 3351 AGTCAACAAATCTAAACTTGAATTCTTCTCACCAGGTGCTAACCCATCTA 3400 V N K S K L E F F S P G AN P S N 3401 ACAAAGCTAAAGGTGGATGGTTGCAAATCTTCG7TCATTAGCTAAATTG 3450 K AK G G W L Q N L R S L A K L EcoRI 3451 GAATTCAAAGATGCAAATGACGGTA ITTCATTAACTACTCAAGTTTCTCC 3500 E F K D A ND G I S L T T Q V S P 3501 TCGTGCACTTGGTAAAACTCGTGATGAACAAGTAGATAACTTGGTCAAA 3550 R A L G K T R D E Q V D N L VQ I 3551 TTTTGATGGATACTTCACACCAGGAGCrI'GATTAATGGTACTGAATTT 3600 L D G Y F TP G A LI NOGT E F 3601 GCAGGTCAACACGTTAACTTGAACGTTATGGACCTTAAAGATGTTTACGA 3650 A G Q H V N L N V M D L K D V Y D 3651 TAAAATc-ATGCGTGGTGAAGATGTTATCGTTCGTATCTCTGGATACTGTG 3700 K I M R G E D VI V R ISOG Y C V 3701 TTAACACTAAATACCTCACACCTGAACAAAACAAGAATTGACTGAACGT 3750 N T K Y L T P E Q K Q E L T E R 3751 GTCTTCCATGAAGTACTTCAAATGATGATGAAGAGTAATGACCTTC 3800 V F H E V L S N DD E E V M H T S WO 98/07867 PCT/DK97/00336 100 3801 AAATATCTAATTCTTAGTATTAAAAAATATAAGGTCTGTCAGTTCTACTG 3850 N I Stop (SEQ ID NO:39) 3851 ACAGACT I"r iCTATAAATTAATTATAATAGTTAAAAACTATTATTTT 3900 3901 TAGTTTAAGAAAAATAAAATTTGTGCTAAAATAGATAATGATAAAGGTA 3950 3951 ATTGGATTAACAGGCGGAATTGCGAGTGGGAAATCAACGGTGGTTGATTT 4000 4001 TTTGATTTCTGAAGGTTATCAAGTAATTGATGCTGACAAAGTTGTTCGTC 4050 4051 AGTTGCAAGAACCTGATGGGAAACTTTTrTAATGCAATAATGGAAACTTTC 4100 4101 GGTTCAGATTTTACTGACGAAAATGGGAAATTAAACCGATGCAAAATTGA 4150 4151 GTGCTTAAGTTTTGCTGACCCAAATCAACGTCAAAAATTAT 4191 Nucleotides 1-4191: SEQ ID NOS:36/38) Homology searches using the above deduced PFL protein revealed a 79% overall protein sequence identity with the S. mutans PFL and higher than 40% with the E. coli, C. pasteurianum and H.
influenzae PFL.
In the promoter region of the MG1363 pfl gene canonical lactococcal ribosome binding site (AAAGGAG, position +21 to and -10 promoter regions (TTGCTA and TATAAT, respectively were found. A putative rho-dependent transcription terminator was located 24 bp downstream of the pfl stop codon (position 2432 to 2445). Additionally, two sequences (FNR-1 and FNR-2 with significant homology to E. coli FNR-boxes having consensus sequence TTGAT-N 4 -ATCAA (SEQ ID NO:40) and being involved in regulation of the expression of pfl in E. coli were identified.
The MG1363 FNR-1 (GGAGT-N 4 ATCAA) (SEQ ID NO:41) was also present in strain DB1341. FNR-2 (TTTGC-N 4 -ATCAA) (SEQ ID NO:42); position -36 to -23 overlaps with the -35 hexamer of the promoter region of the pfl gene.
The coding sequence of the MG1363 pfl gene showed 102 basepair changes when compared to the corresponding sequence of strain DBl341, but these changes resulted only in four amino acid changes in the PFL primary structure. The lactococcal PFL includes the conserved Gly residue at position 749, flanked by Ser and Tyr residues, which is involved in activation and deactivation of the enzyme in E. coli via free radical forma- WO 98/07867 PCT/DK97/00336 101 tion. This region is present in all PFL proteins characterized to date. The L. lactis sequence ISCCVSP is highly conserved and includes two adjacent Cys residues.
EXAMPLE 6 Construction of pfl mutant strains of L. lactis strains DB1341 and MG1363 by gene inactivation and physiological characterization of pfl- strains A 460 bp Sau3AI internal fragment (positions 1343 to 1799 in Table 3.2) of the L. lactis DB1341 pfl gene was cloned into BamHI-digested pSMA500 (Madsen et al., 1996), resulting in plasmid pSMAKAS7, and transformed into E. coli MC1000 by electroporation (Sambrook et al., 1989). A transformant (SMAKAS7) containing the recombinant plasmid was isolated. The orientation of the pfl fragment in pSMAKAS7 was confirmed by sequencing. Homologous recombination of pSMAKAS7 into the L. lactis pfl gene allows translational fusion of the reporter lacLM gene (Madsen et al., 1996).
Plasmid pSMAKAS7 was used to transform L. lactis strains DB1341 and MG1363 by electroporation (Holo and Nes 1989). Two single transformants were isolated (DBKAS7 and MGKAS7, respectively).
DBKAS7 became blue on X-gal plates, as expected if homologous integration at the chromosomal pfl locus had occurred, and was further characterized. Integration of pSMAKAS7 by homologous recombination into the DB1341 chromosome would result in a truncated pfl gene, where the N-terminal region of the protein (residues Met 1 -Asp 574 would be separated from the C-terminal domain (residues Asp 422 -Ile 778 PCR analysis was used to confirm that DBKAS7 carries a disrupted pfl gene. The activation site of the E. coli Pfl, a glycine residue at position Gly 734 flanked by serine and tyrosine is conserved in all bacterial Pfl proteins characterized (Weidner and Sawers, 1996; Yamamoto et al., 1996), including the L. lactis Pfl (position 2321-2329 WO 98/07867 PCT/DK97/00336 102 of the nucleotide sequence in Table 3.2; Table The truncated Pfl protein in strain DBKAS7 would lack an activation site.
A sample of Lactococcus lactis subspecies lactis biovar diacetylactis strain DBKAS7 and of Lactococcus lactis subspecies lactis strains MGKAS7, respectively were deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 18 July 1996 under the accession Nos DSM 11086 and DSM 11083, respectively.
A 495 bp PCR fragment was amplified from MG1363 using primers pfll-P1MG1363 (5'-GGCCGCTCGA GTTGTGTCTC ACCACTTGAC CC-3'(SEQ ID NO:43); XhoI site underlined) and pflP2MG1363 CCATCATCTT CACCATAACG TGG-3'(SEQ ID NO:44); BamHI site underlined) and cloned into XhoI+BamHI digested pSMA500 and transformed into strain MG1363, resulting in strain MGKAS13.
MGKAS13 was deposited under the Budapest Treaty with the German Collection of Microorganisms and Cell Cultures, Mascheroder Weg Ib, D-38 124 Braunschweig, Germany on 10 July 1997 under the accession No. DSM 11653.
DBKAS7 and MGKAS13 formed blue colonies on X-gal-containing plates. Plasmid integration through homologous recombination was confirmed via PCR in both strains.
Physiological analysis of the L.lactis pfl- strain A colorimetric assay (Voges-Proskauer, VP; Westerfeld 1945) was used to study acetoin and diacetyl production in strain DBKAS7.
The presence of acetoin and diacetyl in the samples results in the formation of red colour which is monitored by measuring
OD
5 2 0 Overnight cultures of strain DBKAS7 (pfl-) and wild type strain DB1341, grown at 30 0 C without aeration in GM17 were WO 98/07867 DPT/DK9nT;nki 103 7 lu/0II33o used. The VP assay was performed by mixing 200 il bacterial culture, 100 Al 0.3 creatine, 100 gl 5 M NaOH, and yl 5 a-naphthol (dissolved in 2.5 M NaOH immediately before use). The mixture was incubated for 10 min at room temperature, with constant stirring to provide aeration. The reaction was stopped by adding 1 ml 4 mM DTT. After centrifugation to remove cellular debris, OD 520 was measured. As shown in Table 6.1.
DBKAS7 had approximately a 2-fold increase in the production of acetoin/diacetyl as compared to strain DB1341.
Table 6.1. Voges-Proskauer assay for aroma compounds produced by DB1341 and DBKAS7, respectively Strain OD 6 oo ODs 20 DB1341 2.40 0.082 DBKAS7 2.22 0.155 Overnight cultures were grown at 30 0 C, without shaking in GM17.
The OD 600 values represent a measure for growth. The OD 520 values are the results of the production of acetoin and diacetyl (Westerfeld 1945).
Thus, gene inactivation of the pfl gene in the L. lactis strain DB1341 results in an enhanced production of aroma compounds, without affecting the ability to grow.
Similar levels of formate were obtained in strain DB1341 as in MG1363, and no formate was detected in DBKAS7 under anaerobic conditions, confirming the pfl mutant phenotype in this strain.
L. lactis biovar diacetylactis strains are used as starter cultures due to their ability to produce diacetyl during milk WO 98/07867 PrT/mTK '/nni03 104 II JJU fermentation. A mutation in the pfl gene of DB1341 should result in increased pyruvate levels under anaerobic growth.
Thus, if excess pyruvate is directed towards the production of diacetyl and acetoin, a higher level of these metabolites would be expected in strain DBKAS7 grown under anaerobiosis. As shown in Table 6.2. a 7-fold increase in the production of aroma compounds was observed in strain DBKAS7 grown in GM17 and a more than 4-fold increase was detected in GalM17 as compared'to the wild type strain, DB1341. This demonstrated the effect of a pfl mutation in the production of diacetyl and acetoin in a L.
lactis biovar diacetylactis strain.
Table 6.2. Production of aroma compounds in the L. lactis biovar diacetylactis pfl-strain, DBKAS7 as compared to the wild type strain Voges-Proskauer assay (diacetyl+acetoin in mM) Glucose Galactose Strain DB1341 0.2 <0.05 DBKAS7 1.5 0.2 aCell extracts from stationary culture (OD 600 about 3) were assayed according to Casabadan et al. 1980. Values shown are the mean of two independent experiments.
Inactivation of the pfl gene leads to a transcriptional fusion of the lacLM reporter gene (Madsen et al. 1996) 0-galactosidase levels were measured in overnight cultures of strain MGKAS13 grown in M17 with either glucose (GM17) or galactose (GalM17) (Table Using GM17, anaerobic growth was observed, about a increase of 3-galactosidase units, which is consistent with the induction observed at RNA level. High levels of 3galactosidase were observed under anaerobic growth when growing in the presence of galactose, and a 4-fold induction was observed under anaerobiosis in this medium which is in agreement with the RNA studies.
WO 98/07867 PrT/nrir/lnnll< 105 Table 6.3. Characterization of the L. lactis Mgl363 pflstrain. MGKAS13 Aerobic Anaerobic Strain Growth Formate p-gal Formate 1-gal medium (mM) (units) (mM) (units) MG1363 GMI7 0 53 GalM17 0 42 MGKAS13 GMI7 0 9.5 0 150.0 GalM17 0 94.6 0 600.0
IJ
MGKAS13 should not produce formate under anaerbic conditions as a result of the inactivation of the pfl gene in this strain. In strain MG1363, no formate was detected under aerobic growth in GM17, as it would be expected if the lactococcal PFL is inactivated in the presence of oxygen. Relatively low levels of formate were detected under anaerobic conditions. In GalM17 a 8-fold higher amount of formate was detected in anaerobiosis.
No formate was detected in strain MGKAS13 in either of the test media, confirming that this strain carries a pfl null mutation.
EXAMPLE 7 Identification of pfl and adhE homolocues in non-Lactococcus lactic acid bacteria using Lactococcus lactis pfl and adhE gene fragments as probes 1. Southern hybridization of genomic DNA from non-Lactococcus lactic acid bacteria using a L. lactis pfl gene fragment as a probe A PCR fragment including most of the L. lactis pfl coding sequence was obtained by amplification of MG1363 genomic DNA with primers pf189 and pf11066 (see Fig. A 2 kb DNA frag- WO 98/07867 PCT/DK97/nn0336 106 ment (Fig. 9) was obtained and used as a probe in Southern hybridization experiments using EcoRI-digested total DNA from Streptococcus thermophilus ATCC 19258, Leuconostoc mesenteroides subsp. mesenteroides ATCC 10878 and Lactobacillus acidophilus ATCC 4796 (Fig. Hybridization was carried out overnight at 65 0 C. Filters were washed twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65 0 C for 30 minutes.
As shown in Fig. 10C the expected EcoRI genomic fragment deduced from L. lactis pfl sequence was detected after overnight exposure. After short exposure of the filters (Fig. only hybridization was detected in S. thermophilus and only weak signals were detected in L. mesenteroides and L. acidophilus after longer exposure (Fig. 10C), indicating lower pfl sequence homology in these bacteria, as would be expected due to their taxonomic distance to L. lactis.
2. Southern hybridisation of genomic DNA from non-Lactococcus lactic acid bacteria using L. lactis adhE gene fragment as a probe Two Sau3AI fragments including most of the L. lactis subsp.
lactis biovar diacetylactis DB 1341 adhE coding sequence (Fig.
11) were used as a probe in Southern hybridization experiments using EcoRI-digested total DNA from Streptococcus thermophilus ATCC 19258, Leuconostoc mesenteroides subsp. mesenteroides ATCC 10878 and Lactobacillus acidophilus ATCC 4796.
Hybridization was carried out overnight at 650C. Filters were washed twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes.
As shown in Fig. 12, the expected EcoRI genomic fragment (about 5 kb) deduced from the L. lactis MG1363 adhE sequence was detected. Strongly hybridizing bands were also detected in S.
WO 98/07867 PCT/DK97/00336 107 thermophilus (5 kb) and L. mesenteroides (5 and 0.4 kb). Weaker hybridizing bands were also detected in L. acidophilus (4.2 and 2 kb, and two minor bands, 2.3 and 5 kb).
3. Conclusions Using the above L. lactis DNA probes, preliminary restriction maps of the pfl and adhE genes, respectively in the three non- Lactococcus lactic acid bacterial species could be carried out using different restriction digests of the genomic DNA. Two strategies for the cloning of these non-Lactococcus genes can be followed: cloning of DNA fragments isolated from agarose gels corresponding in size to the hybridizing bands detected in Southern analysis; (ii) PCR of conserved regions using primers derived from the corresponding L. lactis sequence.
WO 98/07867 PCT/DK97/00336 108
REFERENCES
1. Arnau, F. Jorgensen, S. Madsen, A. Vrang and H. Israelsen. 1997. Cloning, expression and characterization of the Lactococcus lactis pfl gene, encoding pyruvate formate-lyase.
Submitted for publication.
2. Chen, and C.C. Lin. 1991. Regulation of the adhE gene, which encodes ethanol dehydrogenase in Escherichia coli. J.
Bacteriol. 173:8009-8013.
3. Chippaux, F. Casse and Pascal. 1972. Isolation and phenotypes of mutants from Salmonella typhimurium defective in formate hydrogenlyase activity. J. Bacteriol. 110:766-768.
4. Christiansen, L. and S. Pedersen. 1981. Cloning, restriction endonuclease mapping and post-transcriptional regulation of rspA, the structural gene for ribosomal protein Sl. Mol. Gen.
Genet. 181:548-551.
Crow, V.L. and G.G. Pritchard. 1977. Fructose 1,6-diphosphate-activated L-lactate dehydrogenase from Streptococcus lactis: kinetic properties and factors affecting activation. J.
Bacteriol. 131:82-91.
6. Donkersloot, J.A. and J. Thompson. 1995. Cloning, expression, sequence analysis, and site-directed mutagenesis of the Tn5306-encoded N5-(Carboxyethyl)ornithine synthase from Lactococcus lactis K1. J. Biol Chem 270:12226-12234.
7. Fleischmann, M.D. Adams, O. White, R.A. Clayton, E.F.
Kirkness, A.R. Kerlavage, C.J. Bult, J.F. Tomp, B.A. Dougherty and J.M. Merrick et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496-512.
WO 98/07867 PCT/DK97/00336 109 8. Frey, Rothe, Volker Wagner, A.F. and Knappe, J.
1994. Adenosylmethionine-dependent synthesis of the glycyl radical in pyruvate formate-lyase by abstraction of the glycine C-2 pro-S hydrogen atom. J. Biol. Chem. 269: 12432-12437.
9. Goodlove, P.R. Cunningham, J.Parker, and D.P. Clark.
1989. Cloning and sequence analysis of the fermentative alcohol-dehydrogenase-encoding gene of Escherichia coli. Gene 85:209-214.
Holo, H. and I.F. Ness. 1989. High-frequency transformation by electroporation of Lactococcus lactis subsp. cremoris grown with glycine in osmotically stabilized media. Appl. Environ.
Microbiol. 55:3119-3123.
11. Kessler, I. Leibrecht and J. Knappe. 1991. Pyruvateformate-lyase-deactivase and acetyl CoA reductase activities of Escherichia coli reside on a polymeric protein particle encoded by adhE. FEBS Lett. 281:59-63.
12. Kessler, W. Herth and J. Knappe. 1992. Ultrastructure and pyruvate formate-lyase radical quenching property of the multienzymic AdhE protein of Escherichia coli. J. Biol. Chem.
267:18073-18079.
13. Madsen, B. Albrechtsen, E.B. Hansen and H. Israelsen.
1996. Cloning and transcriptional analysis of two threonine biosynthetic genes from Lactococcus lactis MG1614. J. Bacteriol 178:3689-3694.
14. Mat-Jan, K.Y. Alam and D.P. Clark. 1989. Mutants of Escherichia coli deficient in the fermentative lactate dehydrogenase. J. Bacteriol. 171:342-348.
Nair, G.N. Bennett and E.T. Papoutsakis. 1994. Molecular characterization of an aldehyde/alcohol dehydrogenase WO 98/07867 PCT/DK97/00336 110 from Clostridium acetobutylicum ATCC 824. J. Bacteriol.
176:871-881.
16. Pecher, H.P. Blaschkowski, K. Knappe and A. B6ck. 1982.
Expression of pyruvate formate-lyase of Escherichia coli from the cloned structural gene. Arch. Microbiol. 132:365-371.
17. Platteeuw, J. Hugenholtz, M. Starrenburg, I. van Alen- Boerrigter and W. M. de Vos. 1995. Metabolic engineering of Lactococcus lactis: influence of the overproduction of a-acetolactate synthase in strains deficient in lactate dehydrogenase as a function of culture conditions. Appl. Environ. Microbiol.
61:3967-3971.
18. Sambrook, E.F. Fritsch and T. Maniatis. 1989. Molecular cloning: a laboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y.
19. Sauter, M. and Sawers, R.G. 1990. Transcriptional analysis of the gene encoding pyruvate formate-lyase-activating enzyme of Escherichia coli. Mol. Microbiol. 4: 355-363.
Sawers, A.F. Wagner and A. B6ck. 1989. Transcription initiation at multiple promoters of the pfl gene by dent transcription in vitro and heterologous expression in Pseudomonas putida in vivo. J. Bacteriol. 171: 4930-4937.
21. Sawers, G. and A. B6ck. 1988. Anaerobic regulation of pyruvate formate-lyase from Escherichia coli K-12. J. Bacteriol. 170:5330-5336.
22. Sawers, G. and A. B6ck. 1989. Novel transcriptional control of the pyruvate formate-lyase gene: upstream regulatory sequences and multiple promoters regulate anaerobic expression.
J. Bacteriol. 171:2485-2498.
WO 98/07867 PCT/DK97/00336 111 23. Snoep, M.J.T. de Mattos, M.J.C. Starrenburg and J.
Hugenholtz. 1992. Isolation, characterization, and physiological role of the pyruvate dehydrogenase complex and a-acetolactate synthase of Lactococcus lactis subsp. lactis by. diacetylactis. J. Bacteriol. 174: 4838-4841.
24. Suppmann, B. and G. Sawers. 1994. Isolation and characterization of hypophosphite-resistant mutants of E. coli: identification of the FocA protein, encoded by the pfl operon, as a putative formate transporter. Mol. Microbiol 11:965-982.
25. Takahashi, K. Abbe and T. Yamada. 1982. Purification of pyruvate formate-lyase from Streptococcus mutans and its regulatory properties. J. Bacteriol. 149:1034-1040.
26. Varenne, F. Casse, M. Chippaux and M.C. Pascal. 1975. A mutant of Escherichia coli deficient in pyruvate formate-lyase.
Mol. Gen. Genet. 141:181-184.
27. deVos, W.M. and G. Simons. 1994. Gene cloning and expression systems in lactococci. In: Gasson, Vos W. de (eds) Genetics and biotechnology of lactic acid bacteria, Chapman and Hall, London, pp. 52-105.
28. Weidner, G. and G. Sawers. 1996. Molecular characterization of the genes encoding formate-lyase and its activating enzyme of Clostridium pasteurianum. J. Bacteriol. 178:2440-2444.
29. Westerfeld, W.W. 1945. A colorimetric determination of blood acetoin. J. Biol. Chem. 161: 495-502.
30. Wong, K.L. Suen and H.S. Kwan. 1989. Transcription of pfl is regulated by anaerobiosis, catabolite repression, pyruvate, and oxrA: pfl::Mu dA operon fusions of Salmonella typhimurium. J. Bacteriol. 171:4900-4905.
WO 98/07867 PCT1DK97/00336 112 31. Yamamoto, Y. Sato, S. Takahashi-Abbe, K. Abbe, T.
Yamada and H. Kizaki. 1996. Cloning and sequence analysis of the pif gene encoding pyruvate forinate-lyase from Streptococcus mutans. Infect. Immun. 64:385-391.
WO 98/07967 113 WO 980786 113PCT/DK97/00336 INDICATIONS RELATING TO A DEPOSITED MICROORGANISM (PCT Rule 13bis) A. The indications made below relate to the microorganism referred to in the description on page 30 .line 9l12 B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet[] Name of depositary institution DSM-Deutsche Sammiung von Mikroorganisnen und Zelikulturen GmbH Address of depositary institution (including postal code and country) Mascheroder Weg 1B D-38124 Braunschweig Germany C. ADDITIONAL INDICATIONS (leave blank if no applicable) This information is continued on an additional sheet As regards the respective Patent Offices of the respective designated states, the applicants request that a sample of the deposited microorganisms only be made available to an expert rxominated by the requester until the date on which the patent is granted or the date on which the application has been refused or withdrawn or is deemed to be withdrawn.
D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the idications are notfor all dsi gnned State) E. SEPARATE FURNISEUNG OF 1 INDICATIONS (leave blank if not applicable) The indications listed below will be submitted to the International Burea u la ter (speccifythe general nature of the indicationr cg., 'Accctsion Number ofDepaxit) For receiving Office use only E] Tbs sheet was received with the international application kleadl Clark For International Bureau use only This sheet was received by the International Bureau on: Authorized officer WO 98/07867 PCT/DK97/00336 114 INDICATIONS RELATING TO DEPOSITED MICROORGANISMS (PCT Rule 12bis) Additional sheet In addition to the microorganism indicated on page 113 of the description, the following microorganisms have been deposited with DSM-Deutsche Sammlung von Mikroorganismen und Cellkulturen GmbH Mascheroder Weg 1B, D-38124 Braunschweig, Germany on the dates and under the accession numbers as stated below: Accession number Date of deposit Description Page No.
Description Line Nos.
DSM 11101 DSM 11102 DSM 11091 DSM 11089 DSM 11090 DSM 11081 DSM 11082 DSM 11084 DSM 11085 DSM 11654 DSM 11103 DSM 11087 DSM 11088 DSM 11086 DSM 11083 DSM 11653 July 1996 July 1996 July.1996 July 1996 July 1996 July 1996 July 1996 July 1996 July 1996 July 1997 July 1996 July 1996 July 1996 July 1996 July 1996 July 1997 36 36 45 47 47 60 60 60 60 60 63 84 91 102 102 102 15-19 15-19 13-16 21-25 21-25 4-11 4-11 4-11 4-11 27-30 1-4 15-18 4-7 4-10 4-10 17-20 For all of the above-identified deposited microorganisms, the following additional indications apply: As regards the respective Patent Offices of the respective designated states, the applicants request that a sample of the deposited microorganisms stated above only be made available to an expert nominated by the requester until the date on which the patent is granted or the date on which the application has been refused or withdrawn or is deemed to be withdrawn.
WO 98/07867 PCT/DIK97/00336 115 SEQUENCE LISTING GENERAL INFORMATION APPLICANT: Bioteknologisk Institut (ii) TITLE OF THE INVENTION: Metabolically engineered lactic acid bacteria and means for providing same (iii) NUMBER OF SEQUENCES: 44 COMPUTER READABLE FORM: MEDIUM TYPE: Diskette COMPUTER: IBM Compatible OPERATING SYSTEM: DOS SOFTWARE: FastSEQ for Windows Version (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: FILING DATE:
CLASSIFICATION:
(vii) PRIOR APPLICATION DATA: APPLICATION NUMBER: US 08/701,458 FILING DATE: 22-AUG-1996 (viii) ATTORNEY/AGENT INFORMATION: NAME: PLOUGMANN, VINGTOFT PARTNERS A/S REGISTRATION NUMBER: REFERENCE/DOCKET NUMBER: 18383 PC 1 (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: +45 33 63 93 00 TELEFAX: +45 33 63 96 00
TELEX:
INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 2088 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 792...2069 WO 98/07867 116 (xi) SEQUENCE DESCRIPTION: SEQ ID 110:1: PCTDK97OO336
GATCTGTCCT
CCAGAGCACG
AAGCCACCTC
TGTCAATAAT
TGTGGGAATT
GTATCGTTCC
AAACACGTAA
CAGCAAAAAT
GGATTGAAGT
CAATCCTTGC
CACTCGGTGT
GTGCCGTrGA
AAAATTCAGC
TAGTACGAGA
GCTGGATAGC
AAGATGAGAT
TTGCAAAAAA
TAACAGAAAG
AACGACTAAT
TGCTA'rrGrr
TGITTACGAT
ACCAAGCCTT
AACTGGTGGC
TGGAGCTGGT
AGACCTTrTG
TGTTATTGAT
GGACCGGGAT
TATGTAGGGA
TACCCATTCG
TCTTC ICA
TGATCTGTTG
CCAACATCAA.
TTCGCTTTCC
GCTGCAATTG
GACATGACTA
CCAGGAATGG
AATGGTGCTG
CTTTCAAAAC
GCTTCAGTTT
GGACTTACCG
AGGGATAAGC
AGAATTAAGA
GCAAAACGGG
AAATCGCAA.G
CAGCAATCTr
ACCCTCAAGC
AAGCTGGTGC
CCGCCTTGAT
TAAACGCCGC
TTTATGTTGA
GTTTTGATAA
ATGATGAATT
CTGGTGTACC
GCTGAAAGCA
GCCCAGAGAG
ATTTGAGTT
CCCTC!TCGGT
TAAATCTTrA
TCAAAAATGT
ACCGGAAGAC
TCAAAACCGT
ACTCAAATCT
TGCAACTGCA
TGGGATGATT
TATTGCTAAA
AG ITGTTCCG
TCTAAGTGCO
ATGATCAAGA
TTGCTCGATT
GTACTTGCTG
TTGACTGCAA~
TCAAGCCATG
TTTATrCAAT GGAC'FrGCAA
GGTAACCCTT
AATATTGAAC
TGTGCCACTG
ATGC.AAGAAC
120 180 240 300 360 420 480 540 600 660 720 780 830 AAGGCOCTTA T ATG GTT CCT AAA AAA Met Val Pro Lys Lys GAC TAC AAA GCT ATT GAA AGT TTC Asp Tyr Lys Ala Ile Giu Ser Phe GTT TTT Val Phe OTT GAA COT OCT GOT GAA GOT TTT GGA GTA ACT GGT CCT GTT Val Glu Arg Ala Gly Giu Gly Phe Gly Val Thr Gly Pro Val
GCC
Ala GOT CGT TCT Gly Arg Ser GOT CAA Gly Gin 35 TGG ATT GCT GAA CAA GCT GGT GTC AAA Trp Ile Ala Glu Gin Ala Gly Val Lys GTr Val 926 974 CCT AAA OAT AAA OAT GTC CTT CTT TTT OAA CTT GAT AAG AAA AAT ATT Pro Lys Asp Lys Asp Val Leu Leu Phe Glu Leu Asp Lys Lys Asn Ile GGT GAA Oly Olu OCA CTT TCT Ala Leu Ser TOT GAA AAA Ser Oiu Lys
CTT
Leu 70 TCT CCT TTG Ser Pro Leu CTT TCA ATC TAO Leu Ser Ile Tyr COT AOC ITA OTT Arg Ser Leu Leu 1022 AAA OCT OAA ACA COT OAA OAA Lys Ala Oiu Thr Arg Glu Giu OCT TAT CAA GOT OCT OGA CAT Ala Tyr Gin Oly Ala Oly His 100
OGA
Gly 85 ATT GAO ATT OTA Ile Glu Ile Val 1070 1118 AAT OCT OCA Asn Ala Ala ATT CAA ATC GOT GCA ATO le Gin le Oly Ala Met 105
OAT
Asp 110 GAT CCA TTC OTT Asp Pro Phe Val
AAA
Lys 115 GAA TAT 000 OAA AAA& GTT GPAA OCT TOT COT Oiu Tyr Oly Oiu Lys Val Glu Ala Ser Arg 1166 ATC CTC Ile Leu ACT OAT Thr Asp OTT AAC CAA Val Asn Gin 130 OCA OAT TCT ATT Pro Asp Ser Ile
GOT
Oly 135 GG GTC GOA OAT ATC TAT Oly Val Gly Asp Ile Tyr 140 1214 00k ATO COT CCA TCA CTT Ala Met Arg Pro Ser Leu
ACA
Thr 150 OTT OGA ACT Leu Oly Thr GOT TCA TOG 000 Gly Ser Trp Gly 155 CTA TTO AAT OTT Leu Leu Asn Val 170 1262 1310 AAA AAT TCA CTT TCA CAC AAT TTG AGT ACA TAC OAT Lys Asn Ser Leu Ser His Asn Leu Ser Thr Tyr Asp 160 165 WO 98/07867 PCT/DK97/00336 117 AAA ACA Lys Thr 175 GTG GCT AAA CGT Val Ala Lys Arg CGT AAT CGC CCA CAA TGG Arg Asn Arg Pro Gin Trp 180 185 GTT CGT TTG CCA Val Arg Leu Pro 1358
AAA
Lys 190 GAA ATT TAC TAC Glu Ile Tyr Tyr GAA AAA Glu Lys 195 AAT GCA ATT Asn Ala Ile TAC TTA CAA GAA Tyr Leu Gin Glu
TTG
Leu 205 1406 CCA CAC GTC CAC AAA GCT TTC ATC GTT Pro His Val His Lys Ala Phe Ile Val 210
GCT
Ala 215 GAC CCT GGT ATG Asp Pro Gly Met GTT AAA Val Lys 220 1454 TTT GGT TTC Phe Gly Phe CAA GTT GAA Gin Val Glu 240
GTT
Val 225 GAT AAA GTT TTG Asp Lys Val Leu
GAA
Glu 230 CAA CTT GCT ATC CGC CCA ACT Gin Leu Ala Ile Arg Pro Thr 235 1502 ACA AGC ATT TAT Thr Ser Ile Tyr
GGC
Gly 245 TCT GTT CAA CCT Ser Val Gin Pro
GAC
Asp 250 CCA ACT TTG Pro Thr Leu 1550 AGC GAA Ser Glu 255 GCA ATT GCA ATC Ala Ile Ala Ile ATC TGT CTr GGT Ile Cys Leu Gly 275
GCT
Ala 260 CGT CAA ATG AAA CAA TTT GAA CCT GAC Arg Gin Met Lys Gin Phe Giu Pro Asp 1598 1646
ACT
Thr 270
GTC
Val GGT GGT TCT GCT CTC GAT GCC GGT AAG Gly Gly Ser Ala Leu Asp Ala Gly Lys 280
ATT
Ile 285 GGT CGT TTG ATT Gly Arg Leu Ile
TAT
Tyr 290 GAA TAT GAT GCT Glu Tyr Asp Ala
CGT
Arg 295 GGT GAA GCT GAC Gly Glu Ala Asp CTT TCT Leu Ser 300 1694 GAT GAT GCA Asp Asp Ala GTC GAT ATT Val Asp Ile 320
AGT
Ser 305 TTG AAA GAA CTT Leu Lys Giu Leu
TTC
Phe 310 CAA GAA TTA GCT Gin Giu Leu Ala CAA AAA TTT Gin Lys Phe 315 CAT AAA GCA His Lys Ala 1742 1790 CGT AAA CGT ATT Arg Lys Arg Ile AAA TTC TAC CAT Lys Phe Tyr His
CCA
Pro 330 CAA ATG Gin Met 335 GTT GCA ATT CCT Val Ala Ile Pro
ACT
Thr 340 ACT TCT GGT ACT Thr Ser Gly Thr
GGT
Gly 345 TCT GAA GTG ACT Ser Glu Val Thr 1838 1886
CCA
Pro 350 TTT GCA GTT ATC Phe Ala Val Ile
ACT
Thr 355 GAT GAT GAA ACT Asp Asp Giu Thr GTT AAG TAC CCA Val Lys Tyr Pro
CTT
Leu 365 GCT GAC TAC CAA TTA ACA OCA CAA OTT Ala Asp Tyr Gin Leu Thr Pro Gin Val 370
GCC
Ala 375 ATT GTT GAC CCT Ile Val Asp Pro GAG TTT Glu Phe 380 1934 GTT ATG ACT Val Met Thr
GTA
Val 385 CCA AAA COT ACT Pro Lys Arg Thr
OTT
Val 390 TCT TGG TCT GGT Ser Trp Ser Gly ATT OAT GCG Ile Asp Ala 395 1982 WO 98/07867 PCT/DK97/00336 118 ATG TCA CAC GCG CTT GAA TCT TAC GTT TCT GTT ATG TCT TCT GAC TAT Met Ser His Ala Leu Giu Ser Tyr Val Ser Val Met Ser Ser Asp Tyr 400 405 410 ACA AAA CCA ATT TCA CTT CAA GCG ATC CCG GGT CTA GAT TAGGGTAACT TT Thr Lys Pro Ile Ser Leu Gin Ala Ile Pro Gly Leu Asp 415 420 425
GAAAGGA
2030 2081 2088 INFORMATION FOR SEQ ID NO:2: SEQUENCE CHARACTERISTICS: LENGTH: 426 amino acids TYPE: amino acid STR.ANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Val Pro Lys Lys Asp Tyr Lys Ala Ile 1 Giu Arg Ser Gly Lys Asp s0 Leu Ser Thr Arg Gly Ala Phe Val Asn Gin 130 Met Arg 145 Leu Ser Ala Lys Tyr Tyr His Lys 210 Val Asp 225 Thr Ser Ala Gin Val Ser Giu Gly Lys 115 Pro Pro His Arg Glu 195 Ala Lys Ile Gly Trp Leu Giu Glu His 100 Giu Asp Ser Asn Arg 180 Lys Phe Val Tyr 5 Giu Ile Leu Lys Gly Asn Tyr Ser Leu Leu 165 Asn Asn Ile Leu Gly 245 Gly Ala Phe Leu 70 Ile Ala Gly Ile Thr 150 Ser Arg Ala Val Giu 230 Ser Phe Giu Giu 55 Ser Giu Ala Glu Gly 135 Leu Thr Pro Ile Ala 215 Gin Val Gly Gin 40 Leu Pro Ile Ile Lys 120 Gly Gly Tyr Gin Ser 200 Asp Leu Gin Val 25 Ala Asp Leu Val Gin 105 Val Val Thr Asp Trp 185 Tyr Pro Ala Pro 10 Thr Gly Lys Leu Arg 90 Ile Giu Gly Gly Leu 170 Val Leu Gly Ile Asp 250 Glu Gly Val Lys Ser 75 Ser Gly Ala Asp Ser 155 Leu Arg Gin Met Arg 235 Pro Ser Pro Lys .Asn le Leu Ala Ser le 140 Trp Aen Leu Giu Val 220 Pro Thr Phe Val Val le Tyr Leu Met Arg 125 Tyr Gly Val Pro Lou 205 Lys Thr Leu Val Ala Pro Gly Lys Ala Asp 110 Ile Thr Lys Lys Lys 190 Pro Phe Gin Ser Phe Gly Lye Glu Ala Tyr Asp Leu Asp Asn Thr 175 Giu His Gly Val Giu 255 Val Arg Asp Ala Glu Gin Pro Val Ala Ser 160 Val Ile Val Phe Giu 240 Ala Ile Ala Ile Ala Arg Gin Met Lye Gin Phe Glu Pro Asp Thr Val Ile 260 265 WO 98/07867 Cys Leu Gly 275 Ile Tvr Glu PCT/DK97/00336 119 Gly Gly Ser Ala Leu 280 Asp Ala Gly Lys Ile 285 Ser Gly Arg Leu Asp Asp Ala Tyr Asp Ala 290 Leu Arg 295 Gin Gly Glu Ala Asp Leu 300 Lys Ser 305 Arg Lys Glu Leu Phe Glu Leu Ala Lys Arg Ile Ile 325 Thr 310 Lys Gin 315 His Phe Val Asp Phe Tyr His Pro 330 Ser Lys Ala Gin Met Val 335 Ala Ile Pro Val Ile Thr 355 Gin Leu Thr Thr 340 Asp Ser Gly Thr Gly 345 Val Glu Val Thr Asp Glu Thr His Lys Tyr Pro Leu 365 Phe Pro Phe Ala 350 Ala Asp Tyr Val Met Thr Pro Gin Val 370 Val Pro Ala 375 Ser Val Asp Pro Glu 380 Asp Lys Arg Thr 385 Ala Val 390 Val Trp Ser Gly Ile 395 Ser Ala Met Ser His 400 Pro Leu Glu Ser Tyr 405 Ala Ser Val Met Ser 410 Asp Asp Tyr Thr Lys 415 Ile Ser Leu Gin 420 Ile Pro Gly Leu 425 INFORMATION FOR SEQ ID NO:3: SEQUENCE CHARACTERISTICS: LENGTH: 3185 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 145...2853 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: AAGCTTGTTA CAAAACCGTT TTCTAAACTT TTGATGAGTG TTTTTGTAAA ATATTGCTTG ACATCTATAA AAAACTTTGT TAAACTATTC ACGTAAAAGA AGTCACAAAG GAGAACCTAC AAAT ATG GCA ACT AAA AAA GCC GCT
AACTATCACA
AAGTGAATGA
CCA GCT 120 171 Met Ala Thr Lys Lys Ala Ala Pro Ala GCA AAG Ala Lys AAA GTT TTA AGC Lys Val Leu Ser 15 GCT GAA GAA Ala Glu Glu AAA GCC Lys Ala 20 GCA AAA TTC CAA GAA Ala Lys Phe Gin Glu GCT GTT GCT TAT Ala Val Ala Tyr ACT GAC Thr Asp AAA TTA GTC AAA AAA GCA CAA GCT GCT GTT Lys Leu Val Lys Lys Ala Gin Ala Ala Val CTT AAA TTT Leu Lys Phe GCA ATG GCT Ala Met Ala
GAA
Glu GGA TAT ACA CAA Gly Tyr Thr Gin
ACT
Thr 50 CAA GTC GAT Gin Val Asp ACT ATT GTC GCT Thr Ile Val Ala CTC GCT CAT GAA Leu Ala His Glu CTT GCA GCA AGC Leu Ala Ala Ser
AAA
Lys 65 CAT TCT CTA GAA His Ser Leu Glu WO 98/07867 PCT/DK97/00336 120 GCC GTT Ala Val AAC GAA ACT GGT CGT GGT GTT GTC GAA Asn Giu Thr Gly Arg Gly Val Val Glu AAA GAT ACC AAA Lys Asp Thr Lys 411 459
AAC
Asn CAC TTT GCT TOT GAA TCT GTT TAT AAC His Phe Ala Ser Giu Ser Val Tyr Asn 95
GCA
Ala 100 ATT AAA AAT GAC Ile Lys Asn Asp
AAA
Lys 105 ACT GTT GGT GTO Thr Val Gly Val
ATT
Ile 110 TCT GAA AAC AAG Ser Giu Asn Lys
GTT
Val 115 GCT GGA TCT GTT Ala Gly Ser Val GAA ATC Glu Ile 120 GCA AGC CCT CTC GGT GTA CTT GCT Ala Ser Pro Leu Gly Val Leu Ala
GGT
Gly 130 ATC GTT CCA ACG Ile Val Pro Thr ACT AAT COA Thr Asn Pro 135 ACA CGT AAT Thr Arg Asn ACA TOA ACA Thr Ser Thr 140 GCA ATO TTT AAA TCT TTA TTG ACT GCA Ala Ile Phe Lys Ser Leu Leu Thr Ala
AAA
Lys 150 GCT ATT Ala Ile 155 GTT TTC GOT TTC Val Phe Ala Phe
CAC
His 160 CCT CAA GOT CAA Pro Gin Ala Gin
AAA
Lys 165 TGT TCA AGO CAT Cys Ser Ser His
GCA
Ala 170 GCA AAA ATT GTT Ala Lys Ile Val
TAO
Tyr 175 GAT GOT GOA ATT Asp Ala Ala Ile
GAA
Glu 180 GCT GGT GOA CCG Ala Gly Ala Pro
GAA
Glu 185 GAO TT ATT CAA Asp Phe Ile Gin
TGG
Trp 190 ATT GAA GTA OCA Ile Giu Vai Pro
AGO
Ser 195 OTT GAO ATG ACT Leu Asp Met Thr ACC GCC Thr Aia 200 TTG ATT CAA AAC CGT GGA OTT GCA Leu Ile Gin Asn Arg Giy Leu Ala
ACA
Thr 210 ATC CTT GOA ACT Ile Leu Ala Thr GGT GGC OCA Gly Gly Pro 215 CTC GOT GTT Leu Gly Val GGA ATG GTA Gly Met Val 220 AAC GOC GCA CTC AAA TOT GGT AAC CCT Asn Ala Ala Leu Lys Ser Oly Asn Pro
TCA
Ser 230 GGA GOT Gly Ala 235 GGT AAT OGT GCT Gly Asn Gly Ala
GTT
Val 240 TAT GTT GAT GCA Tyr Val Asp Ala
ACT
Thr 245 GCA AAT ATT GAA Ala Asn Ile Glu
OGT
Arg 250 GCC GTT GAA GAC CTT TTG OTT TCA AAA Ala Val Giu Asp Leu Leu Leu Ser Lys 255 TTT GAT AAT GGG Phe Asp Asn Gly ATT TGT GOC ACT Ile Cys Ala Thr
GAA
Glu 270 AAT TCA GOT GTT Asn Ser Ala Val
ATT
Ile 275 GAT GOT TCA GTT Asp Ala Ser Val TAT GAT Tyr Asp 280 GAA TTT ATT GOT AAA ATG CAA GAA Glu Phe Ile Ala Lys Met Gin Glu 285
CAA
Gin 290 GGC GOT TAT ATG Gly Ala Tyr Met GTT COT AAA Val Pro Lys 295 1035 WO 98107867 AAA GAC TAC Lys Asp Tyr 300 PCT/DK97/00336 121 AAA GCT ATT GAA AGT TTC GTT TTT GTT Phe Val Phe Val Lys Ala Ile Glu Ser 305
GAA
Glu 310 CGT GCT GGT Arg Ala Gly 1083 GAA GGT Glu Gly 315 TTT GGA GTA ACT Phe Gly Val Thr
GGT
Gly 320 CCT GTT GCC GGT Pro Val Ala Gly
CGT
Arg 325 TCT GGT CAA TGG Ser Gly Gin Trp
ATT
Ile 330 GCT GAA CAA GCT Ala Giu Gin Ala
GGT
Gly 335 GTC AAA GTT CCT Val Lys Vai Pro GAT AAA GAT GTC Asp Lys Asp Val
CT
Leu 345 1131 1179 1227 CTT TTT GAA CTT Leu Phe Glu Leu
GAT
Asp 350 AAG AAA AAT ATT Lys Lys Asn Ile
GGT
Gly 355 GAA GCA CTT TCT Glu Ala Leu Ser TCT GAA Ser Glu 360 AAA CTT TCT Lys Leu Ser GGA ATT GAG Gly Ile Glu 380
CCT
Pro 365 TTG CTT TCA ATC Leu Leu Ser Ile
TAC
Tyr 370 AAA GCT GAA ACA Lys Aia Giu Thr CGT GAA GAA Arg Glu Glu 375 GCT GGA CAT Ala Gly His 1275 ATT GTA CGT AGC TTA CTT GCT TAT CAA Ile Val Arg Ser Leu Leu Ala Tyr Gin
GGT
Gly 390 AAT GCT Asn Ala 395 OCA ATT CAA ATC Ala Ile Gin Ile
GGT
Gly 400 GCA ATG GAT GAT Ala Met Asp Asp
CCA
Pro 405 TTC GTT AAA GAA Phe Val Lys Glu 1323 1371 1419
TAT
Tyr 410 GGC GAA AAA GTT Gly Giu Lys Val
GAA
Glu 415 GCT TCT CGT ATC Ala Ser Arg Ile
CTC
Leu 420 GTT AAC CAA CCA Vai Asn Gin Pro
GAT
Asp 425 TCT ATT GGT GGG Ser Ile Giy Gly
GTC
Val 430 GGA GAT ATC TAT ACT GAT GCA ATG CGT Gly Asp Ile Tyr Thr Asp Ala Met Arg 435 CCA TCA Pro Ser 440 1467 CTT ACA CTT Leu Thr Leu TTG AGT ACA Leu Ser Thr 460
GGA
Gly 445 ACT GGT TCA TGG Thr Gly Ser Trp
GGG
Gly 450 AAA AAT TCA CTT Lys Asn Ser Leu TCA CAC AAT Ser His Asn 455 AAA CGT CGT Lys Arg Arg 1515 1563 TAC GAT CTA TTG Tyr Asp Leu Leu
AAT
Asn 465 GTT AAA ACA GTG Val Lys Thr Vai
GCT
Ala 470 AAT CGC Asn Arg 475 CCA CAA TGG GTT Pro Gin Trp Val
CGT
Arg 480 TTG CCA AAA Leu Pro Lys GAA ATT Glu Ile 485 TAC TAC GAA AAA Tyr Tyr Giu Lys
AAT
Asn 490 GCA ATT TCT TAC Ala Ile Ser Tyr
TTAA
Leu 495 CAA GAA TTG CCA Gin Giu Leu Pro GTC CAC AAA GCT Val His Lys Ala
TTC
Phe 505 1611 1659 1707 ATC GTT GCT GAC Ile Val Ala Asp
CCT
Pro 510 GGT ATG GTT AAA Gly Met Val Lys
TTT
Phe 515 GGT TTC GTT OAT Oly Phe Val Asp AAA GTT Lys Val 520 WO 98/07867 TTG GAA CAA Leu Glu Gin GGC TCT GTT Gly Ser Val 540 PCT/DK97/00336 122
CTT
Leu 525 GCT ATC CGC Ala Ile Arg CCA ACT CAA GTT GAA ACA AGC ATT TAT Pro Thr Gin Val Glu Thr Ser Ile Tyr 530 535 1755 1803 CAA CCT GAC CCA Gin Pro Asp Pro
ACT
Thr 545 ITG AGC GAA GCA Leu Ser Giu Ala
ATT
Ile 550 GCA ATC GCT Ala Ile Ala CGT CAA Arg Gin 555 ATG AAA CAA TTT Met Lye Gin Phe
GAA
Glu 560 CCT GAG ACT GTC Pro Asp Thr Val
ATC
Ile 565 TGT CTT GOT GGT Cys Leu Gly Gly
GGT
Gly 570 TCT GCT CTC GAT Ser Ala Leu Asp
GCC
Ala 575 GGT AAG ATT GGT Gly-Lys Ile Gly
CGT
Arg 580 TTG ATT TAT GAA Leu Ile Tyr Glu
TAT
Tyr 585 1851 1899 1947 OAT OCT CGT GGT Asp Ala Arg Gly
GAA
Glu 590 GCT GAC CTT TCT Ala Asp Leu Ser
GAT
Asp 595 OAT GCA AGT TG Asp Ala Ser Leu AAA GAA Lye Glu 600 CTT TTC CAA Leu Phe Gin ATT AAA TTC Ile Lye Phe 620
GAA
Glu 605 TTA GCT CAA AAA Leu Ala Gin Lys
TTT
Phe 610 GTC GAT ATT CGT Val Asp Ile Arg AAA CGT ATT Lye Arg Ile 615 ATT CGT ACT Ile Pro Thr 1995 2043 TAC CAT CGA CAT Tyr His Pro His
AAA
Lys 625 OCA CAA ATG GTT Ala Gin Met Val
GCA
Ala 630 ACT TCT Thr Ser 635 GGT ACT GOT TCT Gly Thr Gly Ser
GAA
Glu 640 GTG ACT CCA TTT Vai Thr Pro Phe
GCA
Ala 645 GTT ATG ACT GAT Val Ile Thr Asp
GAT
Asp 650 GAA ACT CAT GTT Glu Thr His Val
AAG
Lys 655 TAC CCA GTT OCT Tyr Pro Leu Ala TAG CAA TTA ACA Tyr Gin Leu Thr
CCA
Pro 665 2091 2139 2187 CAA GTT 0CC ATT Gin Val Ala Ile GAG CCT GAG TTT Asp Pro Glu Phe
OTT
Val 675 ATG ACT OTA CCA Met Thr Val Pro AAA CGT Lys Arg 680 ACT GTT TCT Thr Val Ser TAG GTT TCT Tyr Val Ser 700
TGG
Trp 685 TCT GGT ATT OAT Ser Gly Ile Asp
GCG
Ala 690 ATG TCA CAC GCG Met Ser His Ala CTT GAA TCT Leu Glu Ser 695 TCA CTT CAA Ser Leu Gin 2235 2283 OTT ATG TCT TCT Val Met Ser Ser TAT ACA AAA CCA Tyr Thr Lys Pro
ATT
Ile 710 GCG ATC Ala Ile 715 AAA GTT ATC TTT Lye Leu Ile Phe
GAA
Glu 720 AAC TTG ACT GAG Asn Leu Thr Glu
TCT
Ser 725 TAT CAT TAT GAG Tyr His Tyr Asp 2331 2379
CCA
Pro 730 GCG CAT GCA ACT Ala His Pro Thr
AAA
Lye 735 GAA GGA CAA Glu Oly Gin AAA GCC Lye Ala 740 CGC GAA AAG ATO Arg Glu Asn Met WO 98/07867 PCT/DX97/00336 123 AAT GCT GCA ACA Asn Ala Ala Thr
CTC
Leu 750 GCT GGT ATG GCC Ala Gly Met Ala
TTC
Phe 755 GCT AAT GCT TTC Ala Asn Ala Phe CTT GGA Leu Gly 760 2427 ATT AAC CAC Ile Asn His CAT GGT CIT His Gly Leu 780
TCA
Ser 765 CTT GCT CAT AAA Leu Ala His Lys
ATT
Ile 770 GGT GGT GAA TTT Gly Gly Giu Phe GGA CTT CCT Gly Leu Pro 775 AAA TTT AAC Lye Phe Asn 2475 2523 GCC ATT GCC ATC Ala Ile Ala Ile
GCT
Ala 785 ATG CCA CAT GTC Met Pro His Val
ATT
Ile 790 GCT GTA Ala Val 795 ACA GGA AAC GTT Thr Gly Asn Val
AAA
Lye 800 CGT ACC CCT TAC Arg Thr Pro Tyr
CCA
Pro 805 CGT TAT GAA ACA Arg Tyr Glu Thr
TAT
Tyr 810 CGT GCT CAA GAG Arg Ala Gin Glu
GAC
Asp 815 TAC GCT GAA ATT Tyr Ala Glu Ile CGC TTC ATG GGA Arg Phe Met Gly
ITT
Phe 825 2571 2619 2667 GCT GGT AAA GAT Ala Gly Lye Asp
GAT
Asp 830 TCA GAT GAA AAA Ser Asp Glu Lye
GCT
Ala 835 GTG CAA GCT CTG Val Gin Ala Leu GTT GCT Val Ala 840 GAA CT]? AAG Glu Leu Lye GGA AAT GGT Gly Asn Gly 860 CTG ACT GAT AGC Leu Thr Asp Ser
ATT
Ile 850 GAT ATT AAT ATC Asp Ile Asn Ile ACC CTT TCA Thr Leu Ser 855 GAT AAA TTG Asp Lye Leu 2715 2763 ATC GAT AAA GCT Ile Asp Lye Ala
CAC
His 865 CTT GAA CGT GAA Leu Olu Arg Glu
CTT
Leu 870 GCT GAC Ala Asp 875 CTT GTT TAT GAT Leu Val Tyr Asp
GAT
Asp 880 CAA TOT ACT CCT Gin Cys Thr Pro
GCT
Ala 885 AAT CCT CGT CAA Asn Pro Arg Gin 2811
CCA
Pro 890 AGA ATT GAT GAG Arg Ile Asp Glu
ATT
lie 895 AAA CAG TTG TTG Lye Gin Leu Leu
TTA
Leu 900 OAT CAA TAC Asp Gin Tyr TAATAATCT 2862
GTTGATAAAA
TCAAAAGGTA
TATAAAAATC
TGTTTTAAAT
CATGAAAATT
AAACTTCGGA
TTATTAAAAC
TAAATCAATT
AATAATrAAT
TCTGGTTTTT
CCTCCTTATT
TATCTTATCA
GCTCTGATCA GAGCATTTTT TCGATATAGG CTCTI TTCAC TAGCGATAGA AGTCGAGTTC CTTTATGTTC TTTGCGAACA ATGGTACTAT TITGAGCCCA
AAG
TATTATAGCT
TCCATTGATT
ATGCATGCTA
TCTrTCACAG
AATAGTTATA
TATACAACTA
TATGCATTTC
ATAATGAAAT
TTTCTTTGTT
TAAGAATCCT
2922 2982 3042 3102 3162 3185 INFORMATION FOR SEQ ID NO:4: SEQUENCE CHARACTERISTICS: LENGTH: 903 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal WO 98/07867 124 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: PCT/DK97/00336 Met Ala Thr Lys Lys Ala Ala Pro Ala Ala 1 Glu Glu Leu Val Gin Thr so Lys His Gly Val Val Tyr Asn Lys Ala Gly 130 Ser Leu 145 Pro Gin Ala Ala Val Pro Ala Thr 210 Lys Ser 225 Tyr Val Leu Ser Ala Val Giu Gin 290 Ser Phe 305 Pro Val Lys Val Asn Ile Ile Tyr 370 Leu Leu 385 Ala Met Ser Arg Lys Lys Gin Ser Val Asn Val 115 Ile Leu Ala Ile Ser 195 Ile Gly Asp Lys Ile 275 Gly Val Ala Pro Gly 355 Lys Ala Asp Ile Ala Lys Val Leu Giu Ala 100 Ala Val Thr Gin Giu 180 Leu Leu Asn Ala Arg 260 Asp Ala Phe Gly Lys 340 Giu Ala Tyr Asp Leu 420 5 Ala Lys Ala Gin Asp Thr Giu Leu 70 Asp Lys Ile Lys Gly Ser Pro Thr Ala Lys 150 Lys Cys 165 Ala Gly Asp Met Ala Thr Pro Ser 230 Thr Ala 245 Phe Asp Ala Ser Tyr Met Val Giu 310 Arg Ser 325 Asp Lys Ala Leu Giu Thr Gin Gly 390 Pro Phe 405 Val Asn Phe Ala Ile 55 Ala Asp Asn Val Thr 135 Thr Ser Ala Thr Gly 215 Leu Asn Asn Val Val 295 Arg Gly Asp Ser Arg 375 Ala Val Gin Gin Glu Ala 25 Ala Val Leu 40 Val Ala Ala His Giu Ala Thr Lys Asn Asp Lys Thr 105 Giu Ile Ala 120 Asn Pro Thr Arg Asn Ala Ser His Ala 170 Pro Giu Asp 185 Thr Ala Leu 200 Gly Pro Gly Gly Val Gly Ile Glu Arg 250 Gly Met Ile 265 Tyr Asp Giu 280 Pro Lys. Lys Ala Gly Giu Gin Trp Ile 330 Val Leu Leu 345 Ser Giu Lys 360 Glu Giu Gly Gly His Asn Lys Giu Tyr 410 Pro Asp Ser 425 Lys Val Lys Met Val 75 His Vai Ser Ser Ile 155 Ala Phe Ile Met Ala 235 Ala Cys Phe Asp Gly 315 Ala Phe Leu Ile Ala 395 Gly Ile Lys Ala Phe Ala Asn Phe Gly Pro Thr 140 Val Lys le Gin Val 220 Gly Val Ala Ile Tyr 300 Phe Giu Giu Ser Glu 380 Al a Glu Gly Val Tyr Giu Leu Glu Ala Val Leu 125 Ala Phe Ile Gin Asn 205 Asn Asn Giu Thr Ala 285 Lys Gly Gin Leu Pro 365 Ile Ile Lys Gly Leu Thr Gly Ala Thr Ser Ile 110 Gly Ile.
Ala Val Trp 190 Arg Ala Gly Asp Glu 270 Lys Ala Val Ala Asp 350 Leu Val Gin Val Val 430 Ser Asp Tyr Ala Gly Giu Ser Val Phe Phe Tyr 175 Ile Gly Ala Ala Leu 255 Asn Met Ile Thr Gly 335 Lys Leu Arg Ile Giu.
415 Gly.
Ala Lys Thr Ser Arg Ser Giu Leu Lys His 160 Asp Giu Leu Leu Val 240 Leu Ser Gin Giu Gly 320 Val Lys Ser Ser Gly 400 Ala Asp Ile Tyr Thr 435 Asp Ala Met Arg Pro 440 Ser Leu Thr Leu Gly 445 Thr Gly Ser WO 98/07867 PCT/DK97/00336 125 Trp Gly Lys Asn Ser Leu Ser His Asn Leu Ser Thr Tyr Asp Leu Leu 450 455 460 Asn Val Lys Thr Val Ala Lys Arg Arg Asn Arg Pro Gin Trp Val Arg 465 470 475 480 Leu Pro Lys Glu Ile Tyr Tyr Glu Lys Asn Ala Ile Ser Tyr Leu Gln 485 490 495 Glu Leu Pro His Val His Lys Ala Phe Ile Val Ala Asp Pro Gly Met 500 505 510 Val Lys Phe Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala Ile Arg 515 520 525 Pro Thr Gin Val Glu Thr Ser Ile Tyr Gly Ser Val Gin Pro Asp Pro 530 535 540 Thr Leu Ser Glu Ala Ile Ala Ile Ala Arg Gin Met Lys Gin Phe Glu 545 550 555 560 Pro Asp Thr Val Ile Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly 565 570 575 Lys Ile Gly Arg Leu Ile Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp 580 585 590 Leu Ser Asp Asp Ala Ser Leu Lys Glu Leu Phe Gin Glu Leu Ala Gin 595 600 605 Lys Phe Val Asp Ile Arg Lys Arg Ile Ile Lys Phe Tyr His Pro His 610 615 620 Lys Ala Gin Met Val Ala Ile Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640 Val Thr Pro Phe Ala Val Ile Thr Asp Asp Glu Thr His Val Lys Tyr 645 650 655 Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala Ile Val Asp Pro 660 665 670 Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly Ile 675 680 685 Asp Ala Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser 690 695 700 Asp Tyr Thr Lys Pro Ile Ser Leu Gin Ala Ile Lys Leu Ile Phe Glu 705 710 715 720 Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu 725 730 735 Gly Gin Lys Ala Arg Glu Asn Met His Asn Ala Ala Thr Leu Ala Gly 740 745 750 Met Ala Phe Ala Asn Ala Phe Leu Gly Ile Asn His Ser Leu Ala His 755 760 765 Lys Ile Gly Gly Glu Phe Gly Leu Pro His Gly Leu Ala Ile Ala Ile 770 775 780 Ala Met Pro His Val-Ile Lys Phe Asn Ala Val Thr Gly Asn Val Lys 785 790 795 800 Arg Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Asp Tyr 805 810 815 Ala Glu Ile Ser Arg Phe Met Gly Phe Ala Gly Lys Asp Asp Ser Asp 820 825 830 Glu Lys Ala Val Gin Ala Leu Val Ala Glu Leu Lys Lys Leu Thr Asp 835 840 845 Ser Ile Asp Ile Asn Ile Thr Leu Ser Gly Asn Gly Ile Asp Lys Ala 850 855 860 His Leu Glu Arg Glu Leu Asp Lys Leu Ala Asp Leu Val Tyr Asp Asp 865 870 875 880 Gin Cys Thr Pro Ala Asn Pro Arg Gin Pro Arg Ile Asp Glu Ile Lys 885 890 895 Gin Leu Leu Leu Asp Gin Tyr 900 WO 98/07867 PCT/DK97/00336 126 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 835 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Ala Thr Lys Lys Ala Ala Pro Ala Ala Lys Lys Val Leu Ser Ala 1 5 10 Glu Glu Lys Ala Ala Lys Phe Gin Glu Ala Val Ala Tyr Thr Asp Lys 25 Leu Val Lys Lys Ala Gin Ala Ala Val Leu Lys Phe Glu Gly Tyr Thr 40 Gin Thr Gin Val Asp Thr Ile Val Ala Ala Met Ala Leu Ala Ala Ser 55 Lys His Ser Leu Glu Leu Ala His Glu Ala Val Asn Glu Thr Gly Arg 70 75 Gly Val Val Glu Asp Lys Asp Thr Lys Asn His Phe Ala Ser Glu Ser 90 Val Tyr Asn Ala Ile Lys Asn Asp Lys Thr Val Gly Val Ile Ser Glu 100 105 110 Asn Lys Val Ala Gly Ser Val Glu Ile Ala Ser Pro Leu Gly Val Leu 115 120 125 Ala Gly Ile Val Pro Thr Thr Asn Pro Thr Ser Thr Ala Ile Phe Lys 130 135 140 Ser Leu Leu Thr Ala Lys Thr Arg Asn Ala Ile Val Phe Ala Phe His 145 150 155 160 Pro Gin Ala Gin Lys Cys Ser Ser His Ala Ala Lys Ile Val Tyr Asp 165 170 175 Ala Ala Ile Glu Ala Gly Ala Pro Glu Asp Phe Ile Gin Trp Ile Glu 180 185 190 Val Pro Ser Leu Asp Met Thr Thr Ala Leu Ile Gin Asn Arg Gly Leu 195 200 205 Ala Thr Ile Leu Ala Thr Gly Gly Pro Gly Met Val Asn Ala Ala Leu 210 215 220 Lys Ser Gly Asn Pro Ser Leu Gly Val Gly Ala Gly Asn Gly Ala Val 225 230 235 240 Tyr Val Asp Ala Thr Ala Asn Ile Glu Arg Ala Val Glu Asp Leu Leu 245 250 255 Leu Ser Lys Arg Phe Asp Asn Gly Met Ile Cys Ala Thr Glu Asn Ser 260 265 270 Ala Val Ile Asp Ala Ser Val Tyr Asp Glu Phe Ile Ala Lys Met Gin 275 280 285 Glu Gin Gly Ala Tyr Met Val Pro Lys Lys Asp Tyr Lys Ala Ile Glu 290 295 300 Ser Phe Val Phe Val Glu Arg Ala Gly Glu Gly Phe Gly Val Thr Gly 305 310 315 320 Pro Val Ala Gly Arg Ser Gly Gin Trp Ile Ala Glu Gin Ala Gly Val 325 330 335 Lys Val Pro Lys Asp Lys Asp Val Leu Leu Phe Glu Leu Asp Lys Lys 340 345 350 Asn Ile Gly Glu Ala Leu Ser Ser Glu Lys Leu Ser Pro Leu Leu Ser 355 360 365 WO 98/07867 PCT/K97/00336 127 Ile Tyr Lye Ala Glu Thr Arg Glu Glu Gly Ile Glu Ile Val Arg Ser 370 375 380 Leu Leu Ala Tyr Gin Gly Ala Gly His Asn Ala Ala Ile Gin Ile Gly 385 390 395 400 Ala Met Asp Asp Pro Phe Val Lys Glu Tyr Gly Glu Lys Val Glu Ala 405 410 415 Ser Arg Ile Leu Val Asn Gin Pro Asp Ser Ile Gly Gly Val Gly Asp 420 425 430 Ile Tyr Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser 435 440 445 Trp Gly Lys Asn Ser Leu Ser His Asn Leu Ser Thr Tyr Asp Leu Leu 450 455 460 Asn Val Lys Thr Val Ala Lys Arg Arg Asn Arg Pro Gin Trp Val Arg 465 470 475 480 Leu Pro Lys Glu Ile Tyr Tyr Glu Lys Asn Ala Ile Ser Tyr Leu Gin 485 490 495 Glu Leu Pro His Val His Lys Ala Phe Ile Val Ala Asp Pro Gly Met 500 505 510 Val Lys Phe Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala Ile Arg 515 520 525 Pro Thr Gin Val Glu Thr Ser Ile Tyr Gly Ser Val Gin Pro Asp Pro 530 535 540 Thr Leu Ser Glu Ala Ile Ala Ile Ala Arg Gin Met Lys Gin Phe Glu 545 550 555 560 Pro Asp Thr Val Ile Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly 565 570 575 Lys Ile Gly Arg Leu Ile Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp 580 585 590 Leu Ser Asp Asp Ala Ser Leu Lys Glu Leu Phe Gin Glu Leu Ala Gin 595 600 605 Lys Phe Val Asp Ile Arg Lys Arg Ile lie Lys Phe Tyr His Pro His 610 615 620 Lys Ala Gin Met Val Ala Ile Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640 Val Thr Pro Phe Ala Val Ile Thr Asp Asp Glu Thr His Val Lys Tyr 645 650 655 Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala Ile Val Asp Pro 660 665 670 Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly Ile 675 680 685 Asp Ala Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser 690 695 700 Asp Tyr Thr Lys Pro Ile Ser Leu Gin Ala Ile Lys Leu Ile Phe Glu 705 710 715 720 Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu 725 730 735 Gly Gin Lys Ala Arg Glu Asn Met His Asn Ala Ala Thr Leu Ala Gly 740 745 750 Met Ala Phe Ala Asn Ala Phe Leu Gly Ile Asn His Ser Leu Ala His 755 760 765 Lys Ile Gly Gly Glu Phe Gly Leu Pro His Gly Leu Ala Ile Ala Ile 770 775 780 Ala Met Pro His Val Ile Lys Phe Asn Ala Val Thr Gly Asn Val Lys 785 790 795 800 Arg Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Asp Tyr 805 810 815 Ala Glu Ile Ser Arg Phe Met Gly Phe Ala Gly Lys Asp Asp Ser Asp 820 825 830 WO 98/07867 PCT/DK97/00336 128 Glu Lys Ala 835 INFORMATION FOR SEQ ID NO:6: SEQUENCE CHARACTERISTICS: LENGTH: 797 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Ala Val Thr Asn Val Ala Glu Leu Asn Ala Leu Val Glu Arg Val Lys 1 5 10 Lys Ala Gin Arg Glu Tyr Ala Ser Phe Thr Gin Glu Gin Val Asp Lys 25 Ile Phe Arg Ala Ala Ala Leu Ala Ala Ala Asp Ala Arg Ile Pro Leu 40 Ala Lys Met Ala Val Ala Glu Ser Gly Met Gly Ile Val Glu Asp Lys 55 Val Ile Lys Asn His Phe Ala Ser Glu Tyr Ile Tyr Asn Ala Tyr Lys 70 75 Asp Glu Lys Thr Cys Gly Val Leu Ser Glu Asp Asp Thr Phe Gly Thr 90 Ile Thr Ile Ala Glu Pro Ile Gly Ile Ile Cys Gly Ile Val Pro Thr 100 105 110 Thr Asn Pro Thr Ser Thr Ala Ile Phe Lys Ser Leu Ile Ser Leu Lys 115 120 125 Thr Arg Asn Ala Ile Ile Phe Ser Pro His Pro Arg Ala Lys Asp Ala 130 135 140 Thr Asn Lys Ala Ala Asp Ile Val Leu Gin Ala Ala Ile Ala Ala Gly 145 150 155 160 Ala Pro Lys Asp Leu Ile Gly Trp Ile Asp Gin Pro Ser Val Glu Leu 165 170 175 Ser Asn Ala Leu Met His His Pro Asp Ile Asn Leu Ile Leu Ala Thr 180 185 190 Gly Gly Pro Gly Met Val Lys Ala Ala Tyr Ser Ser Gly Lys Pro Ala 195 200 205 Ile Gly Val Gly Ala Gly Asn Thr Pro Val Val Ile Asp Glu Thr Ala 210 215 220 Asp Ile Lys Arg Ala Val Ala Ser Val Leu Met Ser Lys Thr Phe Asp 225 230 235 240 Asn Gly Val Ile Cys Ala Ser Glu Gin Ser Val Val Val Val Asp Ser 245 250 255 Val Tyr Asp Ala Val Arg Glu Arg Phe Ala Thr His Gly Gly Tyr Leu 260 265 270 Leu Gin Gly Lys Glu Leu Lys Ala Val Gin Asp Val Ile Leu Lys Asn 275 280 285 Gly Ala Leu Asn Ala Ala Ile Val Gly Gin Pro Ala Tyr Lys Ile Ala 290 295 300 Glu Leu Ala Gly Phe Ser Val Pro Glu Asn Thr Lys Ile.Leu Ile Gly 305 310 315 320 Glu Val Thr Val Val Asp Glu Ser Glu Pro Phe Ala His Glu Lys Leu 325 330 335 WO 98/07867 PCT/DK97100336 129 Ser Pro Thr Leu Ala Met Tyr Arg Ala Lys Asp Phe Giu Asp Ala Val 340 345 350 Giu Lys Ala Giu Lys Leu Val Ala Met Gly Gly Ile Gly His Thr Ser 355 360 365 Cys Leu Tyr Thr Asp Gin Asp Asn Gin Pro Ala Arg Val Ser Tyr Phe 370 375 380 Gly Gin Lys Met Lys Thr Ala Arg Ile Leu Ile Asn Thr Pro Ala Ser 385 390 395 400 Gin Gly Gly Ile Gly Asp Leu Tyr Asn Phe Lys Leu Ala Pro Ser Leu 405 410 415 Thr Leu Gly Cys Gly Ser Trp Gly Gly Asn Ser Ile Ser Glu Asn Vai 420 425 430 Gly Pro Lys His Leu Ile Asn Lys Lys Thr Val Ala Lys Arg Ala Giu 435 -40445 Asn Met Leu Trp His Lys Leu Pro Lys Ser Ile Tyr Phe Arg Arg Gly 450 455 460 Ser Leu Pro Ile Aia Leu Asp Giu Val Ile Thr Asp Giy His Lys Arg 465 470 475 480 Ala Leu Ile Val Thr Asp Arg Phe Leu Phe Asn Asn Giy Tyr Ala Asp 485 490 495 Gin Ile Thr Ser Val Leu Lys Aia Ala Gly Val Giu Thr Glu Vai Phe 500 505 510 Phe Glu Vai Glu Ala Asp Pro Thr Leu Ser Ile Val Arg Lys Gly Ala 515 520 525 Giu Leu Ala Asn Ser Phe Lys Pro Asp Val Ile Ile Aia Leu Gly Gly 530 535 540 Giy Ser Pro Met Asp Ala Ala Lys Ile Met Trp Val Met Tyr Giu His 545 550 555 560 Pro Glu Thr His Phe Glu Giu Leu Ala Leu Arg Phe Met Asp Ile Arg 565 570 575 Lys Arg Ile Tyr Lys Phe Pro Lys Met Gly Val Lys Ala Lys Met Ile 580 585 590 Aia Val Thr Thr Thr Ser Giy Thr Gly Ser Glu Val Thr Pro Phe Ala 595 600 605 Val Val Thr Asp Asp Aia Thr Giy Gin Lys Tyr Pro Leu Ala Asp Tyr 610 615 620 Ala Leu Thr Pro Asp Met Ala Ile Val Asp Ala Asn Leu Val Met Asp 625 630 635 640 Met Pro Lys Ser Leu Cys Ala Phe Gly Gly Leu Asp Ala Val Thr His 645 650 655 Ala Met Giu Ala Tyr Val Ser Val Leu Ala Ser Giu Phe Ser Asp Gly 660 665 670 Gin Ala Leu Gin Ala Leu Lys Leu Leu Lys Giu Tyr Leu Pro Aia Ser 675 680 685 Tyr His Giu Gly Ser Lys Asn Pro Val Ala Arg Glu Arg Val His Ser 690 695 700 Ala Ala Thr Ile Ala Gly Ile Ala Phe Ala Asn Ala Phe Leu Gly Val 705 710 715 720 Cys His Ser Met Ala His Lys Leu Gly Ser Gin Phe His Ile Pro His 725 730 735 Gly Leu Ala Asn Ala Leu Leu Ile Cys Asn Vai Ile Arg Tyr Asn Ala 740 745 750 Asn Asp Asn Pro Thr Lys Gin Thr Ala Phe Ser Gin Tyr Asp Arg Pro 755 760 765 Gin Ala Arg Arg Arg Tyr Aia Giu Ile Ala Asp His Leu Gly Leu Ser 770 775 780 Ala Pro Gly Asp Arg T 'hr Ala Ala Lys Ile Giu Lys Leu 785 790 795 WO 98/07867 PCT/DK97/00336 130 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 19 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: CTTCTTTGGT TGGATGAGC 19 INFORMATION FOR SEQ ID NO:8: SEQUENCE CHARACTERISTICS: LENGTH: 490 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Tyr Gin Gly Ala Gly His Asn Ala Ala Ile Gin Ile Gly Ala Met Asp 1 5 10 Asp Pro Phe Val Lys Glu Tyr Gly Ile Lys Val Glu Ala Ser Arg Ile 25 Leu Val Asn Gin Pro Asp Ser Ile Gly Gly Val Gly Asp Ile Tyr Thr 40 Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser Trp Gly Lys 55 Asn Ser Leu Ser His Asn Leu Ser Thr Tyr Asp Leu Leu Asn Val Lys 70 75 Thr Val Ala Lys Arg Arg Asn Arg Pro Gin Trp Val Arg Leu Pro Lys 90 Glu Ile Tyr Tyr Glu Lys Asn Ala Ile Ser Tyr Leu Gin Glu Leu Pro 100 105 110 His Val His Lys Ala Phe Ile Val Ala Asp Pro Gly Met Val Lys Phe 115 120 125 Gly Phe Val Asp Lys Val Leu Glu Gin Leu Ala Ile Arg Pro Thr Gin 130 135 140 Val Glu Thr Ser Ile Tyr Gly Ser Val Gin Pro-Asp Pro Thr Leu Ser 145 150 155 160 Glu Ala Ile Ala Ile Ala Arg Gin Met Asn His Phe Glu Pro Asp Thr 165 170 175 Val Ile Cys Leu Gly Gly Gly Ser Ala Leu Asp Ala Gly Lys Ile Gly 180 185 190 Arg Leu Ile Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp Leu Ser Asp 195 200 205 Asp Ala Ser Leu Lys Glu Ile Phe Gin Glu Leu Ala Gin Lys Phe Val 210 215 220 Asp Ile Arg Lys Arg Ile Ile Lys Phe Tyr His Pro His Lys Ala Gin 225 230 235 240 Met Val Ala Ile Pro Thr Thr Ser Gly Thr Gly Ser Glu Val Thr Pro 245 250 255 WO 98/07867 PCT/DK97/00336 131 Phe Ala Val Ile Thr Asp Asp Glu Thr His Val Lys Tyr Pro Leu Ala 260 265 270 Asp Tyr Gin Leu Thr Pro Gin Val Ala Ile Val Asp Pro Glu Phe Val 275 280 285 Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly Ile Asp Ala Met 290 295 300 Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser Asp Tyr Thr 305 310 315 320 Lys Pro Ile Ser Leu Gin Ala Ile Lys Leu Ile Phe Glu Asn Leu Thr 325 330 335 Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu Gly Gin Lys 340 345 350 Ala Arg Glu Asn Met His Asn Ala Ala Thr Leu Ala Gly Met Ala Phe 355 360 365 Ala Asn Ala Phe Leu Gly Ile Asn His Ser Leu Ala His Lys Ile Ala 370 375 380 Gly Glu Phe Gly Leu Pro His Gly Leu Ala Ile Ala Ile Ala Met Pro 385 390 395 400 His Val Ile Lys Phe Asn Ala Val Thr Gly Asn Val Lys Phe Thr Pro 405 410 415 Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Asp Tyr Ala Glu Ile 420 425 430 Ser Arg Phe Met Gly Phe Ala Gly Lys Glu Asp Ser Asp Glu Lys Ala 435 440 445 Val Lys Ala Phe Val Ala Glu Leu Lys Lys Leu Thr Asp Ser Ile Asp 450 455 460 Ile Asn Ile Thr Leu Ser Gly Asn Gly Val Asp Lys Ala His Leu Glu 465 470 475 480 Arg Glu Leu Asp Lys Leu Ala Asp Leu Val 485 490 INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 903 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: Met Ala Thr Lys Lys Ala Ala Pro Ala Ala Lys Lys Val Leu Ser Ala 1 5 10 Glu Glu Lys Ala Ala Lys Phe Gin Glu Ala Val Ala Tyr Thr Asp Lys 25 Leu Val Lys Lys Ala Gin Ala Ala Val Leu Lys Phe Glu Gly Tyr Thr 40 Gin Thr Gin Val Asp Thr Ile Val Ala Ala Met Ala Leu Ala Ala Ser 55 Lys His Ser Leu Glu Leu Ala His Glu Ala Val Asn Glu Thr Gly Arg 70 75 Gly Val Val Glu Asp Lys Asp Thr Lys Asn His Phe Ala Ser Glu Ser 90 Val Tyr Asn Ala Ile Lys Asn Asp Lys Thr Val Gly Val Ile Ser Glu 100 105 110 WO 98/07867 WO 9807867PCT/DK97/00336 132 Asn Ala Ser 145 Pro Al a Val Ala Lys 225 Tyr Leu Ala Giu Ser 305 Pro Lys Asn Ile Leu 385 Al a Ser Ile Trp Asn 465 Leu Giu Val Pro Thr 545 Pro Lys Oly 130 Leu Gin Ala Pro Thr 210 Ser Val Ser Val Gin 290 Phe Val Val Ile Tyr 370 Leu Met Arg Tyr Gly 450 Val Pro Leu L~ys Thr 530 Leu ksp Val 115 Ile Leu Ala Ile Ser 195 Ile Gly Asp Lys Ile 275 Gly Val Ala Pro cay 355 Lys Ala Asp Ile Thr 435 Lys_ Lys Lys Pro Phe 515 Gin Ser Thr Ala Gly Ser Val Val Pro Thr Ala Gin Lys 165 Glu Ala 180 Leu Asp Leu Ala Asn Pro Ala Thr 245 Arg Phe 260 Asp Ala Ala Tyr Phe Val Gly Arg 325 Lys Asp 340 Giu Ala Ala Giu Tyr Gin Asp Pro 405 Leu Val 420 Asp Ala Asn Ser Thr Val Glu Ile 485 His Val 500 Gly Phe Vai Giu Giu Ala Val Ile 565 Thr Lys 150 Cys Gly Met Thr Ser 230 Al a Asp Ser Met Glu 310 Ser Lys Leu Thr Gly 390 Phe Asn Met Leu Ala 470 Tyr His Val Thr Ile 550 Cys Thr 135 Thr Ser Ala Thr Gly 215 Leu Asn Asn Val Val 295 Arg Gly Asp Ser Arg 375 Ala Val Gin Arg Ser 455 Lys Tyr Lys Asp Ser 535 Ala Leu Giu Ile Ala 120 Asn Pro Thr Arg Asn Ala Ser His Ala 170 Pro Giu Asp 185 Thr Ala Leu 200 Gly Pro Gly Gly Val Gly Ile Giu Arg 250 Gly Met Ile 265 Tyr Asp Giu 280 Pro Lys Lys Ala Gly Glu Gin Trp Ile 330 Val Leu Leu 345 Ser Giu Lys 360 Giu Giu cay Gly His Asn Lys Giu Tyr 410 Pro Asp Ser 425 Pro Ser Leu 440 His Asn Leu Arg Arg Aen Giu Lys Asn 490 Ala Phe Ile 505 Lys Val Leu 520 Ile Tyr Gly Ile Ala Arg Gly Gly Gly 570 Ser Ser Ile 155 Ala Phe Ile Met Ala 235 Ala Cys Phe Asp Gly 315 Ala Phe Leu Ile Ala 395 Gly le Thr Ser Arg 475 Ala Val Glu Ser Gin 555 Ser Pro Thr 140 Val Lys Ile Gin Val 220 Gly Val Ala Ile Tyr 300 Phe Glu Glu Ser Giu 380 Ala Giu Gly Leu Thr 460 Pro Ile Ala Gin Val 540 Met Ala Leu 125 Ala Phe Ile Gin Asn 205 Asn Asn Giu Thr Ala 285 Lys Gly Gin Leu Pro 365 Ile Ile Lys Giy Gly 445 Tyr Gin Ser Asp Leu 525 Gin Lys Leu Gly Ile Al a Val Trp 190 Arg Ala Gly Asp Glu 270 Lys Ala Val Ala Asp 350 Leu Val Gin Val Val 430 Thr Asp Tip Tyr Pro 510 Ala Pro Gin Asp Val Phe Phe Tyr 175 Ile Gly Ala Ala Leu 255 Asn Met Ile Thr Gly 335 Lys Leu Arg Ile Giu 415 Gly Gly Leu Val.
Leu 495 Gly Ile Asp Phe Ala 575 Leu Lys His 160 Asp Giu Leu Leu Val 240 Leu Ser Gin Giu Gly 320 Val Lys Ser Ser Gly 400 Ala Asp Ser Leu Arg 480 Gin Met Arg Pro Giu 560 Gly WO 98/07867 PCT/DK97/00336 133 Lys Ile Gly Arg Leu Ile Tyr Glu Tyr Asp Ala Arg Gly Glu Ala Asp 580 585 590 Leu Ser Asp Asp Ala Ser Leu Lys Glu Leu Phe Gin Glu Leu Ala Gin 595 600 605 Lys Phe Val Asp Ile Arg Lys Arg Ile Ile Lys Phe Tyr His Pro His 610 615 620 Lys Ala Gin Met Val Ala Ile Pro Thr Thr Ser Gly Thr Gly Ser Glu 625 630 635 640 Val Thr Pro Phe Ala Val Ile Thr Asp Asp Glu Thr His Val Lys Tyr 645 650 655 Pro Leu Ala Asp Tyr Gin Leu Thr Pro Gin Val Ala Ile Val Asp Pro 660 665 670 Glu Phe Val Met Thr Val Pro Lys Arg Thr Val Ser Trp Ser Gly Ile 675 680 685 Asp Ala Met Ser His Ala Leu Glu Ser Tyr Val Ser Val Met Ser Ser 690 695 700 Asp Tyr Thr Lys Pro Ile Ser Leu Gin Ala Ile Lys Leu Ile Phe Glu 705 710 715 720 Asn Leu Thr Glu Ser Tyr His Tyr Asp Pro Ala His Pro Thr Lys Glu 725 730 735 Gly Gin Lys Ala Arg Glu Asn Met His Asn Ala Ala Thr Leu Ala Gly 740 745 750 Met Ala Phe Ala Asn Ala Phe Leu Gly Ile Asn His Ser Leu Ala His 755 760 765 Lys Ile Gly Gly Glu Phe Gly Leu Pro His Gly Leu Ala Ile Ala Ile 770 775 780 Ala Met Pro His Val Ile Lys Phe Asn Ala Val Thr Gly Asn Val Lys 785 790 795 800 Arg Thr Pro Tyr Pro Arg Tyr Glu Thr Tyr Arg Ala Gin Glu Asp Tyr 805 810 815 Ala Glu Ile Ser Arg Phe Met Gly Phe Ala Gly Lys Asp Asp Ser Asp 820 825 830 Glu Lys Ala Val Gin Ala Leu Val Ala Glu Leu Lys Lys Leu Thr Asp 835 840 845 Ser Ile Asp Ile Asn Ile Thr Leu Ser Gly Asn Gly Ile Asp Lys Ala 850 855 860 His Leu Glu Arg Glu Leu Asp Lys Leu Ala Asp Leu Val Tyr Asp Asp 865 870 875 880 Gin Cys Thr Pro Ala Asn Pro Arg Gin Pro Arg Ile Asp Glu Ile Lys 885 890 895 Gin Leu Leu Leu Asp Gin Tyr 900 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 891 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Ala Val Thr Asn Val Ala Glu Leu Asn Ala Leu Val Glu Arg Val 1 5 10 WO 98/07867 WO 9807867PCT/DK97/00336 134 Lys Lys Ala Gin Arg Glu Tyr Ala Ser 25 Lys Ile Leu Ala Lys Val Lys Asp Thr Ile Thr Thr Lys Thr 130 Ala Thr 145 Gly Ala Leu Ser Thr Gly Ala Ile 210 Ala Asp 225 Asp Asn Ser Val Leu Leu Asn Gly 290 Ala Giu 305 Gly Giu Leu Ser Val Giu Ser Cys 370 Phe Gly 385 Ser Gin Leu Thr Val Gly Giu Asn 450 Gly Ser 465 Phe Lys Ile Glu Thr Asn 115 Arg Asn Pro Asn Gly 195 Gly Ile Gly Tyr Gin 275 Ala Leu Val Pro Lys 355 Leu Gin Gly Leu Pro 435 Met Leu Arg Met Lys Lys Ile 100 Pro Asn Lys Lys Ala 180 Pro Val Lys Val Asp 260 Gly Leu Ala Thr Thr 340 Ala Tyr Lys Gly Gly 420 Lys Leu Pro Ala Ala Asn Thr Ala Thr Al a Ala Asp 165 Leu Gly Gly Arg Ile 245 Ala Lys Asn Gly Val 325 Leu Giu Thr Met Ile 405 Cys His Trp Ile *Ala Val His 70 Cys Giu Ser Ile Ala 150 Leu Met Met Ala Ala 230 Cys Val Giu Ala Ph-e 310 Val Ala Lye Asp Lys 390 Gly Gly Leu His Al a 470 Al a Ala 55 Phe Gly Pro Thr Ile Asp Ile His Val Gly 215 Val Ala Arg Leu Ala 295 Ser Asp Met Leu Gin 375 Thr Asp Ser Ile Lys 455 Leu Leu 40 Glu Ala Val Ile Ala 120 Phe Ile Gly His Lys 200 Aen Ala Ser Giu Lye 280 Ile Val Giu Tyr Val 360 Asp Ala Leu Trp Aen 440 Leu Asp Ala Ser Ser Leu Gly 105 Ile Ser Val Trp Pro 185 Ala Thr Ser Giu Arg 265 Ala Val Pro Ser Arg 345 Ala Asn Arg Tyr Gly 425 Lys Pro Glu Phe Ala Gly Giu Ser 90 Ile Phe Pro Leu Ile 170 Asp Ala Pro Val Gin 250 Phe Val Gly Giu Giu 330 Ala Met Gin Ile Asn 410 Gly Lys Lys Val Thr Gin Giu Ala Asp Ala Met Gly Ile Tyr Ile Tyr 75 Glu Asp Asp Ile Cys Gly Lye Ser Leu 125 His Pro Arg 140 Gin Ala Ala 155 Asp Gin Pro Ile Asn Leu Tyr Ser Ser 205 Val Val Ile 220 Leu Met Ser 235 Ser Val Val Ala Thr His Gln Asp Val 285 Gin Pro Ala 300 Asn Thr Lye 315 Pro Phe Ala Lye Asp Phe Gly GJly Ile 365 Pro Ala Arg 380 Leu Ile Asn 395 Phe Lye Leu Asn Ser Ile Thr Val Ala 445 Ser Ile Tyr 460 Ile Thr Asp 475- Gln Arg Val Asn Thr Ile 110 IlJe Ala Ile Ser Ile 190 Gly Asp Lye Val Gly 270 le Tyr Ile His Giu 350 Gly Val Thr Ala Ser 430 Lye Phe Gly Val Asp Ile Pro Giu Asp Ala Tyr Phe Gly Val Pro Ser Leu Lys Asp Ala Ala 160 Val Giu 175 Leu Ala Lye Pro Giu Thr Thr Phe 240 Val Asp.
255 Gly Tyr Leu Lye Lye Ile Leu Ile 320 Glu Lye 335 Asp Ala His Thr Ser Tyr Pro Ala 400 Pro Ser 415 Giu Asn Arg Ala Arg Arg His Lys 480 WO 98/07867 PCT/DK97/00336 135 Arg Ala Leu Ile Val Thr Asp Arg Phe Leu Phe Asn Asn Gly Tyr Ala 485 490 495 Asp Gin Ile Thr Ser Val Leu Lys Ala Ala Gly Val Glu Thr Glu Val 500 505 510 Phe Phe Glu Val Glu Ala Asp Pro Thr Leu Ser Ile Val Arg Lys Gly 515 520 525 Ala Glu Leu Ala Asn Ser Phe Lys Pro Asp Val Ile Ile Ala Leu Gly 530 535 540 Gly Gly Ser Pro Met Asp Ala Ala Lys Ile Met Trp Val Met Tyr Glu 545 550 555 560 His Pro Glu Thr His Phe Glu Glu Leu Ala Leu Arg Phe Met Asp Ile 565 570 575 Arg Lys Arg Ile Tyr Lys Phe Pro Lys Met Gly Val Lys Ala Lys Met 580 585 590 Ile Ala Val Thr Thr Thr Ser Gly Thr Gly Ser Glu Val Thr Pro Phe 595 600 605 Ala Val Val Thr Asp Asp Ala Thr Gly Gin Lys Tyr Pro Leu Ala Asp 610 615 620 Tyr Ala Leu Thr Pro Asp Met Ala Ile Val Asp Ala Asn Leu Val Met 625 630 635 640 Asp Met Pro Lys Ser Leu Cys Ala Phe Gly Gly Leu Asp Ala Val Thr 645 650 655 His Ala Met Glu Ala Tyr Val Ser Val Leu Ala Ser Glu Phe Ser Asp 660 665 670 Gly Gin Ala Leu Gin Ala Leu Lys Leu Leu Lys Glu Tyr Leu Pro Ala 675 680 685 Ser Tyr His Glu Gly Ser Lys Asn Pro Val Ala Arg Glu Arg Val His 690 695 700 Ser Ala Ala Thr Ile Ala Gly Ile Ala Phe Ala Asn Ala Phe Leu Gly 705 710 715 720 Val Cys His Ser Met Ala His Lys Leu Gly Ser Gin Phe His Ile Pro 725 730 735 His Gly Leu Ala Asn Ala Leu Leu Ile Cys Asn Val Ile Arg Tyr Asn 740 745 750 Ala Asn Asp Asn Pro Thr Lys Gin Thr Ala Phe Ser Gin Tyr Asp Arg 755 760 765 Pro Gin Ala Arg Arg Arg Tyr Ala Glu Ile Ala Asp His Leu Gly Leu 770 775 780 Ser Ala Pro Gly Asp Arg Thr Ala Ala Lys Ile Glu Lys Leu Leu Ala 785 790 795 800 Trp Leu Glu Thr Leu Lys Ala Glu Leu Gly Ile Pro Lys Ser Ile Arg 805 810 815 Glu Ala Gly Val Gin Glu Ala Asp Phe Leu Ala Asn Val Asp Lys Leu 820 825 830 Ser Glu Asp Ala Phe Asp Asp Gin Cys Thr Gly Ala Asn Pro Arg Tyr 835 840 845 Pro Leu Ile Ser Glu Leu Lys Gin Ile Leu Leu Asp Thr Tyr Tyr Gly 850 855 860 Arg Asp Tyr Val Glu Gly Glu Thr Ala Ala Lys Lys Glu Ala Ala Pro 865 870 875 880 Ala Lys Ala Glu Lys Lys Ala Lys Lys Ser Ala 885 890 WO 98/07867 136 INFORMATION FOR SEQ ID NO:1l: Wi SEQUENCE CHARACTERISTICS: LENGTH: 862 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: None (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: PCT/DK97/00336 Met Lye Val Thr Thr Val Lye Glu Leu Asp 1 Lys Glu Leu Lys Lys Ile Val Lys Ser 145 Gly Leu Gly Gly Ile 225 Gly Tyr Lys Ser Met 305 Val Pro Lys.
Glu Ile Al a so Val Asp Thr Thr Thr 130 Thr Ala Thr Pro Val 210 Lys Val Asn Lys Val 290 Ala Thr Val Ala Ala Phe Lys Ile Giu Lys Asn 115 Arg Ile Pro Gln Ser 195 Gly Met Ile Lye Asn 275 Asn Gly Ser Leu Val
S
Gin Lys Arg Asn Ala Ala Lye Asn Lys Thr Ile Ala 100 Pro Thr Aen Gly Leu Ala Glu Asn 165 Tyr Leu 180 Leu Val Pro Gly Ala Val Cys Ala 245 Val Lye 260 Glu Leu Pro Lye Ile Lye Leu Gly 325 Ala Met 340 Lye Phe Ala Ala Val Leu 55 His Phe 70 Cys Gly Giu Pro Ser Thr Ile Phe 135 Ala Lye 150 Ile Ile Met Gin Lye Ser Asn Thr 215 Ser Ser 230 Ser Glu Asp Giu Asp Lys Ile Val 295 Val Pro 310 Giu Giu Tyr Glu Ser met 40 Glu Ala Ile Ile Thr 120 Phe Thr Gly Lye Ala 200 Pro Ile Gin Phe Val 280 Gly Lys Giu Ala Cys Tyr 25 Ala Ala Thr Gly Gly Glu Ile Glu 90 Gly Val 105 Ile Phe Ser Pro Ile Leu Trp Ile 170 Ala Asp 185 Tyr Ser Vai Ile Ile Leu Ser Val 250 Gin Glu 265 Arg Glu Gln Ser Thr Thr Pro Phe 330 Asp Aen 345 Glu Ser le Met Tyr 75 Arg Val Lys His Asp 155 Asp Ile Ser le Ser 235 le Arg Val Ala Arg 315 Ala Phe Lys Gin Asp Gly Ile Aen Ala Ser Pro 140 Aia Giu Thr Gly Asp 220 Lye Val Gly Ile Tyr 300 Ile His Asp Leu Glu Ala Leu Tyr Giu Ala Leu 125 Arg Ala Pro Leu Lye 205 Glu Thr Leu Ala Phe 285 Thr Leu Giu Asp His 365 Lys Met Arg Val Asn Pro Ile 110 Ile Ala Val Ser Ala 190 Pro Ser Tyr Lys Tyr 270 Lys Ile Ile Lys Ala 350 Val Ile Val Asp Ile Giu GiU Asp Lye Tyr Tyr Gly Ile Pro Ser Leu Lye Lye Lye Ser 160 Ile Glu 175 Thr Giy Ala Ile Ala His Asp Asn 240 Ser Ile 255 Ile Ile Asp Gly Ala Ala Gly Glu 320 Leu Ser 335 Leu Lys Thr Leu Ile Asn Leu Gly Gly Leu Gly 360 Thr Ser Gly WO 98/07867 WO 9807867PCT/DK97/00336 137 Ile Tyr Ala Asp Glu Ile Lys Ala Arg Asp Lys Ile Asp Arg Phe Ser Ser 385 Gly Leu Pro Met Leu 465 Phe Ile Lys Giu Pro 545 Glu Arg Ile Val Met 625 Pro Ile Ala Lys Ser 705 His Ile Asp.
Thr Asn 785 Leu Ala Ala Gly Lys Leu 450 Gin Ile Ile Val Met 530 Glu Val Ile Thr Thr 610 Thr Lys Giu Leu Asn 690 Thr Ser Ala Asn Ile 770 Thr Lys met Ser Cys His 435 Trp Phe Val Lys Gly 515 Ser met Lys Tyr Thr 595 Asp Pro Gly Ala Giu 675 Gly Met Met Asn Pro 755 Phe Asp Lys.
Lys Gly Gly 420 Leu Phe Al a Thr Ile 500 Arg Ser Ser Phe Thr 580 Ser Asn Asn Leu Tyr 660 Ala Arg Ala Al a Ala 740 Val Arg Glu Ala Thr Asp 405 Phe Leu Arg Leu Asp 485 Leu Glu Phe Ser Glu 565 Phe Ala Asn Met Thr 645 Thr Ile Thr.
Gly Ile 725 Leu Lys Tyr Giu Leu 805 Val 390 Leu Trp Asn Val Lys 470 Ser Glu Ala met Al a 550 Asp Pro Gly Thr Al a 630 Ala Ser Arg Asn Met 710 Lys Leu Gin Ala Lys 790 Asn Arg Tyr Gly Ile Pro 455 Asp Asp His Asp Pro 535 Lys Leu Lys Ser Gly 615 Ile Tyr Val Leu Glu 695 Ala Leu Ile Ala Arg 775 Val Ile Thr Asn Gly Lys 440 His Leu Pro Leu Leu 520 Asp Leu Ala Leu Gly 600 Asn Val Ser Tyr Ile 680 Lys Ser Ser Giu Pro 760 Ile Asp Pro Phe Phe Asn 425 Thr Lys Lys Tyr Asp 505 Lys Thr Met Ile Gly 585 Ser Lys Asp Gly Ala 665 Phe Ala Ala Ser Glu 745 Cys Ala Leu Thr Val Arg 410 Ser Val Val Asp Asn 490 Ile Thr Ile Trp Lys 570 Lys Glu Tyr Ala Ile 650 Ser Lys Arg Asn Glu 730 Val Pro Asp Leu Ser 810 Asn 395 Ile Val Ala Tyr Leu 475 Leu Asp Ile Ile Val 555 Phe Lys Val Met Giu 635 Asp Glu Glu Ala '715 His le Gin Tyr le 795 le 380 Ile Pro Ser Glu Phe 460 Lys Asn Phe Lys Ala 540 Leu Met Ala Thr Leu 620 Leu Ala Tyr Leu Lys 700 Phe Asn Lys Tyr Ile 780 An Lys Pro Pro Giu Arg 445 Lys Lys Tyr Lys Lys 525 Leu Tyr Asp Met Pro 605 Ala Met Leu Thr Pro 685 met Leu Ile Phe Lys 765 Lys Ly*1s Asp Thr Ser Asn 430 Arg Phe Lys Val Val 510 Ala Gly Glu Ile Leu 590 Phe Asp Met Val Asn 670 Glu Ala Gly Pro Asn.
750 Tyr Leu Ile Ala Ser Phe 415 Val Giu Giy Arg Asp 495 Phe Thr Gly His Arg 575 Val Ala Tyr Lys Asn 655 Gly Ala His Leu Ser 735 Ala Pro Gly H-is Gin 400 Thr Gly Asn Cys Ala 480 Ser Asn Glu Thr Pro 560 Lys Ala Leu Glu Met 640 Ser Leu Tyr Ala Cys 720 Gly Val Asn Gly Glu 800 Val 815 Leu Glu Glu Asn Phe Tyr Ser Ser Leu Asp Arg Ile Ser Glu Leu Ala 820 825 WO 98/07867 138 Leu Asp Asp Gin Cys Thr Gly Ala Asn Pro Arg Phe 835 840 Glu Ile Lys Glu Met Tyr Ile Asn Cys Phe Lys Lys 850 855 860 INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 1470 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID N0:12: PCTIDJ 97/00336 Pro Leu Thr Ser 845 Gln Pro
TACCAAGGAG
AAAGAATACG
GGTGGGGTCG
TCATGGGGGA
ACAGTGGCTA
GAAAAAAATG
GCCGACCCTG
CGCCCAACTC
GAAGCAATcj GGTGGTGG'rr
CGTGGTGAGG
CAAAAATITG
ATGGTTGCTA
ACTGATGATG
GCCATTGTrG
ATTGATGCTA
AAACCAATT
TATGACCCAG
GCAACACTCG
CATAAAATTG
CATGTCATTA
GAAACTTATC
AAAGAAGATT
GATAGTATTG
CGTGAGCTTG,
CTGGTCACAA
GAATrAAAGT GAGATATrTA
AAAAITCACT
AACGTCGTAA
CAATTrCTrA
GTATGGTTAA
AAG'ITGAAAC
CAATCGCTCG
CTGCTCTCGA
CTGACCTTrC
TTGATATTCG
TCCCTACTAC
AANZTCACGT
ACCCTGAGTT
TGTCACACGC
CAC'FrCAAGC
CTCATCCAAC
CTGGTATGGC
CTGGTGAATr AAmTAACGC
GTGCGCAAGA
CAGATGAAAA
ATATrAATAT
ATAAATTGGC
CGCTGCAATT
CGAAGCTrCT
TACTGATGCA
TI'CACACAAT
TCGCCCTCAA
CTTACAAGAA
ATTCGG'FrrC AAGCATTrAT
TCAAATGAAC
TGCTGGTAAG
CGATGACGCA
TAAACGTATT
TrCTGGTACT
TAAATATCCA
TGTTATGACT
GC'rrGAATCT
CATCAAACTC
CAAAGAAGGT
CTTCGCCAAT
TGGGC'ITCCT
TGTAACAGGA
AGACTACGCT
AGCGGTCAAA
CACCCTTTCA
TGACCTTGTT
CAAATCGGTG
CGTATCCTCG
ATGCGTCCAT
TTGAGTACAT
TGGGTTCGTr
TTGCCACACG
GTTGATAAAG
GGCTCAGTCC
CATTTTGAAC
ATTGGTC!GTT
AGTTrGAAAG ATCAAATrCT
GGTTCTGAAG
CTTGCTGACT
GTACCAAAAC
TATGTTTCTG
ATCTTTGAAA
CAAAAAGCTC
GCTTTCCTTG
CATGGTC!TTG
AACGTTAAAT
GAAATTTCAC
GCTTITGTTG
GGAAATGGTG
CAATGGACGA
TTAACCAACC
CATTGACGCT
ACGATCTATr
TGCCAAAAGA
TCCACAAAGC
TTTTGGAAC.A
AACCTGACCC
CTGACACTGT
TGATTTATGA
AGATCTTCCA
ACCACOCACA
TGACTCCATr
ATCAATTGAC
GTACTGTTrC TCATGTCTrC
ACTTGACTGA
GCGAAAACAT
GAATTAACCA
CCATTGCTAT
TTACCCCTTA
GC'ITCATGGG
CTGAACTTAA
TAGATAAAGC
CCCATTTGTC
TGACTCTATC
CGGAACTGGT
GAATGTTAAA
AATTTACTAC
T'rrCATrGTr
ACTTGCTATC
AACTTrGAGT C.ATCTGTCTr
ATATGATGCT
AGAGTTAGCT
CAAAGCACAA
TGCGGTTATC
ACCTCAAGTr
TTGGTCTGGG
TGACTATACA.
GTCTTATCAT
GCACAATGCT
CTCAC'FrGCT
CGCTATGCCA
CCCACGTTAT
A'TTrGCTGGC
AAAATTGACT
TCACCTTGAA
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1470 INFORMATION FOR SEQ ID NO:13: Wi SEQUENCE CHARACTERISTICS: LENGTH: 3193 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: AAGCTTGTTA CAAAACCGTT TTCTAAACTT TTGATGAGTG TTTTTGTAAA AACTATCACA ATATTGCTTG ACATCTATAA AAAACTTTGT TAAACTATTC ACGTAAAAGA AAGTGAATGA AGTCACAAAG GAGAACCTAC AAATATGGCA ACTAAAAAAG CCGCTCCAGC TGCAAGA WO 98/07867 WO 9807867PCT/DK97/00336 139
GTTTTAAGCG
TTAGTCAAAA
GATACTATI'G
GAAGCCGTITA
GCTTCTGAAT
AACAAGGTTG
CCAACGACTA
AATGCTATTG
ATTGTTTACG
GTACCAAGCC
GCAACTGGTG
GTTGGAGCTG
GAAGACCTTT
GCTGTTATTG
TATATGGTTC
GGTGAAGGTT
CAAGCTGGTG
AATATTGGTG
GAAACACGTG
CATAATGCTG
AAAGTTGAAG
ATCTATACTG
TCACTTTCAC
CGTAATCGCC
TCTTACTTAC
GTTAAATTrG
GAAACAAGCA
GCTCGTCAAA~
CTCGATGCCG
CTTTCTGATG
ATTCGTAAAC
ACTACTTCTG
CATG ITAAGT
GAGTTTGTTA
CACGCGCTTG
CAAGCGATCA
CCAACTAAAG
ATGGCCTTCG
GAATTTGGAC
AACGCTGTAA
CAAGAGGACT
GAAAAAGCTG
AATATCACCC
TTGGCTGACC
GATGAGATTA
ACGCTCTGAT
TAAATCAATT
AATAATTAAT
TCTGGTrTT
CCTCCTTATT
TATCTTATCA
CTGAAGAAAA AGCCGCAAAA
AAGCACAAGC
TCGCTGCAAT
ACGAAACTGG
CTGTTTATAA
CTGGATCTGT
ATCCAACATC
TTTTCGCTT
ATGCTGCAAT
TTGACATGAC
GCCCAGGAAT
GTAATGGTGC
TG CTiI CAAA
ATGCTTCAGT
CTAAAAAAGA
TTGGAGTAAC
TCAAAGTTCC
AAGCACTTTC
AAGAAGGAAT
CAATTCAAAT
CTTCTCGTAT
ATGCAATGCG
ACAATTTGAG
CACAATGGGT
AAGAATTGCC
GTTTCGTTGA
TTTATGGCTC
TGAAACAATT
GTAAGATTGG
ATGCAAGTTT
GTATTATTAA
GTACTGGTTC
ACCCACTrGC
TGACTGTACC
AATCTTACGT
AACTTATCTT
AAGGACAAAA
CTAATGC'rrTI
TTCCTCATGG
CAGGAAACGT
ACGCTGAAAT
TGCAAGCTCT
TTCAGGAAA
TTGTTATGA
.AACAGTTGTT
GAATTCGTCA
TCGATATAGG
TAGCGATAGA
CTTTATG'rrC
ATGGTACTAT
AAG
TGCTGTI'CTT
GGCTCTTGCA
TCGTGGTGTT
CGCAATTAAA
TGAAATCGCA
AACAGCAATC
CCACCCTCAA
TGAAGCTGGT
TACCGCCTTG
GGTAAACGCC
TGITTATGTT
ACGTTTTGAT
TTATGATGAA
CTACAAAGCT
TGGTCCTGTT
TAAAGATAAA
TTCTGAAAAA
TGAGATTGTA
CGGTGCAATG
CCTCGTTAAC
TCCATCACTT
TACATACGAT
TCG=TGCCA
ACACGTCCAC
TAAAGTTTTG
TGTTCAACCT
TGAACCTGAC
TCGTTTGATT
GAAAGAACTT
ATTCTACCAT
TGAAGTGACT
TGACTACCAA
AAAACGTACT
'rrCTGTrATG
TGAAAACTTG
AGCCCGCGAA
CCTTGGAATT
TCTTGCCATT
TAAACGTACC
'rrCACGCTTC
GGTTGCTGAA
TGGTATCGAT
TGATCAATGT.
GTTAGATCAA
GAGCATrTT CTC'rrTCAC AGTCGAG ITC,
TTTGCGAACA
TTTGAGCCCA
TrCCAAGAAG
AAATTTGAAG
GCAAGCAAAC
GTCGA1AGACA
AATGACAAAA
AGCCCTCTCG
TAAATCTT
GCTCAAAAAT
GCACCGGAAG
ATTCAAAACC
GCACTCAAAT
GATGCAACTG
AATGGGATGA
TTTATTGCTA
ATTGAAAGTT
GCCGGTCGTT
GATGTCCTTC
CTTTCTCCTT
CGTAGCTTAC
GATGATCCAT
CAACCAGArr
ACACTTGGAA
CTATTGAATG
AAAGAAATT
AAAGCT'FrCA
GAACAACTTG
GACCCAACTT
ACTGTCATCT
TATGAATATG
ITCCAAGAAT
CCACATAAAG
CCATTTGCAG
TTAACALCCAC
GTTTCTTGGT
TCTTCTGACT
ACTGAGTCTT
AACATGCACA
AACCACTCAC
GCCATCGCTA
CCTTACCCAC
ATGGGATTTG
CTTAAGAAAC
AAAGCTCACC
ACTCCTGCTA
TACTAAT.AAT
TATTATAGCT
TCCATTGATT
ATGCATGCTA
TCTTTCACAG
AATAG'rrATA CTGTTGCTTA TACTGACAAA GATATACACA AACTCAAGTC
ATCTCTAGA
AAGATACCAA~
CTGTTGGTGT
GTGTACTTGC
TATTGACTGC
GTTCAAGCcA
ACTTTATTCA
GTGGACTT4C
CTGGTAACCC
CAAATATTGA
TTTGTGCCAC
AAATGCAAGA
TCGTTrTTGT
CTGGTCAATG
FTTGAACT
TGCTTTCAAT
ACTCGCTCAT
AAACCACmT
CATTTCTGAA
TGGTATCGTT
AAAAACACGT
TGCAGCAAAA
ATGGATTGAA
AACAATCCTr
TTCACTCGGT
ACGTGCCGTrr
TGAAAATTCA
ACAAGGCGCT
TGAACGTGCT
GATTGCTGAA
TGATAAGAAA
CTACAAAGCT
TTGCTTATCA AGGTGCTGGA TCGTTAAAGA ATATGGCGAA
CTATTGGTGG
CTGGTrCATG
TTAAAACAGT
ACTACGAAAA
TCGTTGCTGA
CTATCCGCCC
TGAGCGAAGC
GTCTTGGTGG
ATGCTCGTGG
TAGCTCAAAA
CACAAATGGT
TTATCACTGA
AAGTTrGCCAT
CTGGTATTGA
ATACAAAACC
ATCATTATGA
ATGCTGCAAC
TTGCTCATAA
TGCCACATGT
GTTATGAAAC
CTGGTAAAGA
TGACTGATAG
TTrGAACGTGA
ATCCTCGTCA
CTGTrGATAA
TATACAACTA
TATGCATTTC
ATAATGAAAT
TTTTGTT
TAAGAATCCT
GGTCGGAGAT
GGGGAAAAAT
GGCTAAACGT
AAATGCAATr
CCCTGGTATG
AACTCALAGTT
AATTGCAATC
TGGTTCTGCT
TGAAGCTGAC
ATrGTCGAT
TGCAATTCCT
TGATGAAACT
TG'rrGACCCT
TGCGATGTCA
AATrTCACTT
CCCAGCGCAT
ACTCGCTGGT
AATTGGTGGT
CATTAAATT
ATATCGTGCT
TGATrCAGAT
CATTGATATT
ACTTGATAAA
ACCAAGAATrr
AATTATTAAA
TCAAAAGGTA
TATAAAAATC
TGTTTTAAAT
CATGAAAATT
AAACTTCGGA
240 300 360 .420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 2760 2820 2880 2940 3000 3060 3120 3180 3193 WO 98/07867 140 INFORMATION FOR SEQ ID NO:14: Wi SEQUENCE CHARACTERISTICS: LENGTH: 758 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:i4: PCTIDK97/00336 Ser Giu Leu Asn Giu 1 Gly Tyr Ala Asn Ile Val Gly Leu Asn 145 Lys Ile Met Asn Giu 225 Gly Trp Ala Arg Val 305 Pro Ser Asp Thr Thr Arg Thr Gly Gly Asp 130 Gin Ser Ile Lys Gly 210 Gin Tyr Thr Met Asp 290 Asp Glu Ile Trp Gin Pro Tyr Thr Thr Thr His Ser His Leu Gin 100 Ile Lys 115 Pro Met Giy Val Giy Val Gly Asp 180 Asp Lys 195 Val Asn His Arg Asp Ile Tyr Phe 260 Ser Phe 275 Leu Lys His Leu Tyr Asp Gly Gly 340 5 Asn Glu Leu Ala Asp Thr Met Ile Phe Leu 165 Tyr Leu Leu Ala Ser 245 Gly Gly.
Ala Val Giu 325 Met Lys Giu Gly Trp Pro 70 Ala Giu Ile Lys Asp 150 Thr Arg Ala Glu Leu 230 Gly Tyr Arg Gly Met 310 Leu Gly Leu Val Asp 'Asp 55 Val Gly Ala Glu Lys 135 Val Gly Arg Gin Gin 215 Gly Pro Leu Thr Lys 295 Lys Phe Leu Aia Asn Giu 40 Lys Asp Tyr Pro Giy 120 Ile Tyr Leu Val Phe 200 Thr Gln Ala Ala Ser 280 Ile Leu Ser Asp Leu 360 Thr Val 25 Ser Val Phe Ile Leu 105 Ser Phe Thr Pro Ala 185 Thr Ile Met Thr Ala 265 Thr Thr Arg Gly Gly 345 Ala 10 Arg Phe Met Asp Asn 90 Lys Cys Thr Pro Asp 170 Leu Ser Arg Lys Asn 250 Val Phe Glu Met Asp 330 Arg Trp Giu Gly Phe Thr Asp Leu Glu Thr 75 Lys Arg Lys Glu Asp 155 Ala Tyr Leu Leu Giu 235 Ala Lys Leu Gin Val 315 Pro rhr Phe Ala Gly Ala Gin Ala Ala Tyr 140 Ile Tyr Gly Gin Arg 220 Met Gin Ser Asp Glu 300 Arg Ile Leu Ile Gin Gly Ala Val Lys Val Aia Leu Glu Leu Ile 110 Tyr Asn 125 Arg Lys Leu Arg Gly Arg Ile Asp 190 Ala Asp 205 Giu Giu Ala Ala Giu Ala Gin Asn 270 Val Tyr 285 Ala Gln Phe Leu Trp Ala Val Thr 350 Lys Thr Leu Ser Lys Pro Arg Thr Cys Gly 175 Tyr Leu Ile Lys Ile 255 Giy Ile Glu Arg Thr 335 Lys Lys Asn Glu Glu Thr s0 Ile Phe Giu His Arg 160 Arg Leu Giu Ala Tyr 240 Gin Ala Giu Met Thr 320 Giu Asn Ser Phe Arg 355 Phe Leu Asn Thr Tyr Thr Met Gly Pro 365 Ser Pro Glu WO 98/07867 PCT/DK9/00336 141 Pro Asn Met Thr Ile Leu Trp Ser Glu Lys Leu Pro Leu Asn Phe Lys 370 375 380 Lys Phe Ala Ala Lys Val Ser Ile Asp Thr Ser Ser Leu Gin Tyr Glu 385 390 395 400 Asn Asp Asp Leu Met Arg Pro Asp Phe Asn Asn Asp Asp Tyr Ala Ile 405 410 415 Ala Cys Cys Val Ser Pro Met Ile Val Gly Lys Gin Met Gin Phe Phe 420 425 430 Gly Ala Arg Ala Asn Leu Ala Lys Thr Met Leu Tyr Ala Ile Asn Gly 435 440 445 Gly Val Asp Glu Lys Leu Lys Met Gin Val Gly Pro Lys Ser Glu Pro 450 455 460 Ile Lys Gly Asp Val Leu Asn Tyr Asp Glu Val Met Glu Arg Met Asp 465 470 475 480 His Phe Met Asp Trp Leu Ala Lys Gin Tyr Ile Thr Ala Leu Asn Ile 485 490 495 Ile His Tyr Met His Asp Lys Tyr Ser Tyr Glu Ala Ser Leu Met Ala 500 505 510 Leu His Asp Arg Asp Val Ile Arg Thr Met Ala Cys Gly Ile Ala Gly 515 520 525 Leu Ser Val Ala Ala Asp Ser Leu Ser Ala Ile Lys Tyr Ala Lys Val 530 535 540 Lys Pro Ile Arg Asp Glu Asp Gly Leu Ala Ile Asp Phe Glu Ile Glu 545 550 555 560 Gly Glu Tyr Pro Gin Phe Gly Asn Asn Asp Pro Arg Val Asp Asp Leu 565 570 575 Ala Val Asp Leu Val Glu Arg Phe Met Lys Lys Ile Gin Lys Leu His 580 585 590 Thr Tyr Arg Asp Ala Ile Pro Thr Gin Ser Val Leu Thr Ile Thr Ser 595 600 605 Asn Val Val Tyr Gly Lys Lys Thr Gly Asn Thr Pro Asp Gly Arg Arg 610 615 620 Ala Gly Ala Pro Phe Gly Pro Gly Ala Asn Pro Met His Gly Arg Asp 625 630 635 640 Gin Lys Gly Ala Val Ala Ser Leu Thr Ser Val Ala Lys Leu Pro Phe 645 650 655 Ala Tyr Ala Lys Asp Gly Ile Ser Tyr Thr Phe Ser Ile Val Pro Asn 660 665 670 Ala Leu Gly Lys Asp Asp Glu Val Arg Lys Thr Asn Leu Ala Gly Leu 675 680 685 Met Asp Gly Tyr Phe His His Glu Ala Ser Ile Glu Gly Gly Gin His 690 695 700 Leu Asn Val Asn Val Met Asn Arg Glu Met Leu Leu Asp Ala Met Glu 705 710 715 720 Asn Pro Glu Lys Tyr Pro Gin Leu Thr Ile Arg Val Ser Gly Tyr Ala 725 730 735 Val Arg Phe Asn Ser Leu Thr Lys Glu Gin Gin Gin Asp Val Ile Thr 740 745 750 Arg Thr Phe Thr Gin Ser 755 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 3412 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear WO 98/07867 PCT/DK97/00336 142 (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 80. 2440 (xi) SEQUENCE DESCRIPTION: SEQ ID GAATTCTGTT TGCTATrCTC AAACTGTATG ATATAATGAA GTTGTAATI'T GAAACAGAAA GAACAAAGGA GATT~CAAA ATG AAA ACC GAA GTT ACG GAA AAT ATC TTT GAA Met Lys Thr Olu Val Thr Glu Asn Ile Phe Glu 112 CAA OCT TG Gin Ala Trp GTT ACT CGC Val. Thr Arg
GAT
Asp GOT TTT AAA GGA ACC AAC TGG Trp Oly Phe Lys Gly Thr Asn 20 CGC GAT AAA GCA AGC Arg Asp Lys Ala Ser TTT GTA CAA GAA Phe Val Gin Glu
AAC
Asn 35 TAC AAA CCA TAT Tyr Lys Pro Tyr
GAT
Asp GOT GAT GAA Oly Asp Glu AGC ITr Ser Phe CTT GCT GGG CCA Leu Ala Gly Pro
ACA
Thr so GAA CGT ACA CTT Glu Arg Thr Leu
AAA
Lys GTA AAG AAA ATT Val Lys Lys Ile 256 304
ATT
Ile GAA GAT ACA AAA Glu Asp Thr Lys CAC TAC GAA GAA His Tyr Glu Glu GGA TI'T CCC TTC Gly Phe Pro Phe
GAT
Asp ACT GAC CGC GTA ACC TCT AT' GAT AAA ATC CCT GCT GGA TAT Thr Asp Arg Val Thr Ser Ile Asp Lye Ile Pro Ala Gly Tyr ATC GAT Ile Asp GCT AAT GAT Ala Asn Asp CTT TTC CGC Leu Phe Arg 110
AAA
Lys GAA CTT GAA CTC Glu Leu Glu Leu
ATC
Ile 100 TAT GGG ATG CAA Tyr Gly Met Gin AAT AGC GAA Asn Ser Giu 105 OTT GCT GAA Val Ala Glu 400 448 TTG AAT TTC ATG Leu Asn Phe Met
CCA
Pro 115 AGA GOT GGA CTr Arg Gly Gly Leu
CGT
Arg 120 AAG ATT Lye Ile 125 TTG ACA GAA CAC Leu Thr Glu His
GGT
Gly 130 CTC TCA GTT GAC Leu Ser Val Asp
CCA
Pro 135 GGC TTG CAT GAT Gly Leu His Asp 496
GTT
Val 140 ETG TCA CAA ACA ATG ACT TCT GTA AAT GAT GGA ATC TTT CGT Leu Ser Gin Thr Met Thr Ser Val Asn Asp Gly Ile Phe Arg
OCT
Ala 155 TAT ACT TCA OCA ATT CGT AAA OCA Tyr Thr Ser Ala Ile Arg Lys Ala 160 CGT CAT Arg His 165 CGT ATC Arg Ile 180 OCT CAT ACT Ala His Thr ATT GGT GTC Ile Gly Val OTA-ACA GGT Val Thr Oly 170 TAT OCA COT Tyr Ala Arg 185 TTG CCA OAT Leu Pro Asp
GCT
Ala 175 TAC TCT COT OGA Tyr Ser Arg Gly WO 98/07867 PCT/DK97/00336 143 CTT GCC CIT TAC Leu Ala Leu Tyr 190 GGT GCT GAT Gly Ala Asp
TAC
Tyr 195 CIT ATG AAG GAA Leu Met Lys Glu
AAA
Lys 200 GCA AAA GAA Ala Lys Glu TGG GAT Trp Asp 205 GCA ATC ACT GAA Ala Ile Thr Glu
ATT
Ile 210 AAC GAA GAA AAC Asn Glu Glu Asn CGT CTT AAA GAA Arg Leu Lys Giu
GAA
Glu 220 ATT AAT ATG CAA Ile Asn Met Gin CAA GCT TTG CAA Gin Ala Leu Gin
GAA
Glu 230 GTT GTA AAC TT Val Val Asn Phe
GGT
Gly 235 GCC TTA TAT GGT Ala Leu Tyr Gly
CTT
Leu 240 GAT GTT TCA CGT Asp Val Ser Arg
CCA
Pro 245 GCT ATG AAC GTA Ala Met Asn Val AAA GAA Lys Glu 250 GCA ATC CAA TGG GTT AAC ATC GCT Ala Ile Gin Trp Val Asn Ile Ala 255
TAT
Tyr 260 ATG GCA GTA TGT Met Ala Val Cys CGT GTC AT Arg Val Ile 265 CTT GAT ATC Leu Asp Ile AAT GGA OCT Asn Gly Ala 270 TTT GCA GAA Phe Ala Glu 285 GCA ACT TCA CTT Ala Thr Ser Leu
GGA
Gly 275 CGT GTT CCA ATC Arg Val Pro Ile
GTT
Val 280 928 976 CGT GAC CTT Arg Asp Leu
GCT
Ala 290 CGT GGA ACA TTT Arg Gly Thr Phe
ACT
Thr 295 GAA CAA GAA ATT Glu Gin Glu Ile
CAA
Gin 300 OAA TTT GTT OAT Glu Phe Val Asp
OAT
Asp 305 TTC OTT TTG AAG Phe Val Leu Lys
CTT
Leu 310 CGT ACA ATG AAA Arg Thr Met Lys
TTT
Phe 315 1024 1072 GCT CGT GCA OCT Ala Arg Ala Ala
OCT
Ala 320 TAT OAT OAA CTT Tyr Asp GLu Leu
TAT
Tyr 325 TCT GOT GAC CCA Ser Oly Asp Pro ACA TTC Thr Phe 330 ATC ACA ACA Ile Thr Thr ACT AAA ATO Thr Lys Met 350 ATG GCT GGT ATG Met Ala Gly Met
GGT
Gly 340 AAT GAC GGA CGT Asn Asp Gly Arg CAC CGT GTC His Arg Val 345 ATC OGA AAT Ile Gly Asn 1120 1168 GAC TAC COT TTC Asp Tyr Arg Phe
TTG
Leu 355 AAC ACA CTT GAT Asn Thr Leu Asp
ACA
Thr 360 OCT CCA Ala Pro 365 GAA CCA AAC TTG ACA GTC CTT TOG OAT Glu Pro Asn Leu Thr Val Leu Trp Asp 370 AAA CTT CCT TAC Lys Leu Pro Tyr 1216
TCA
Ser 380 TTC AAA CGT TAT Phe Lys Arg Tyr
TCA
Ser 385 ATO TCT ATG AGC Met Ser Met Ser
CAC
His 390 AAG CAT TCT TCT Lys His Ser Ser
ATT
Ile 395 1264 CAA TAT GAA GGT Gin Tyr Glu Gly
GTT
Val 400 GAA ACA ATG GCT AAA GAT GGA TAT GGC Olu Thr Met Ala Lys Asp Gly Tyr Oly 405 OAA ATG Glu Met 410 1312 WO 98/07867 PCT/DK97/00336 144 TCA TGT ATC TCT Ser Cys Ile Ser 415 GGA CGT CAT AAC Gly Arg His Asn 430 TGT TGT GTC TCA Cys Cys Val Ser
CCA
Pro 420 CTT GAT CCA GAA Leu Asp Pro Glu AAT GAA GAA Asn Glu Glu 425 GTC TTG AAA Val Leu Lys 1360 1408 CTC CAA TAC Leu Gin Tyr
TTT
Phe 435 GGT GCG CGT GTA Gly Ala Arg Val
AAC
Asn 440 GCA ATG Ala Met 445 TTG ACT GGT TTG Leu Thr Gly Leu
AAC
Asn 450 GGT GGT TAT GAT Gly Gly Tyr Asp
GAC
Asp 455 GTT CAT AAA GAT Val His Lys Asp 1456 1504
TAT
Tyr 460 AAA GTA TTC GAC Lys Val Phe Asp
ATC
lie 465 GAA CCT GTT CGT Glu Pro Val Arg GAA ATT CTT GAC Glu Ile Leu Asp
TAT
Tyr 475 GAT ACA GTT ATG GAA AAC TTT GAC AAA Asp Thr Val Met Glu Asn Phe Asp Lys 480
TCT
Ser 485 CTC GAC TGG TTG Leu Asp Trp Leu ACT GAT Thr Asp 490 1552 ACT TAT GTT Thr Tyr Val AAC TAT GAA Asn Tyr Glu 510
GAT
Asp 495 GCA ATG AAT ATC Ala Met Asn Ile
ATT
Ile 500 CAT TAC ATG ACT His Tyr Met Thr GAT AAA TAT Asp Lys Tyr 505 GTT CGT GCT Val Arg Ala 1600 1648 GCA GTT CAA ATG Ala Val Gin Met
GCC
Ala 515 TTC TTG CCT ACT Phe Leu Pro Thr
AAA
Lys 520 AAC ATG Asn Met 525 GGA TTT GGT ATC Gly Phe Gly Ile
TGT
Cys 530 GGA TTC GCA AAT Gly Phe Ala Asn
ACA
Thr 535 GTT GAT TCA CTT Val Asp Ser Leu
TCA
Ser 540 GCA ATT AAA TAT Ala Ile Lys Tyr
GCT
Ala 545 AAA GTT AAA ACA Lys Val Lys Thr CGT GAT GAA AAT Arg Asp Glu Asn
GGC
Gly 555 1696 1744 1792 TAT ATC TAC GAT Tyr Ile Tyr Asp
TAC
Tyr 560 GAA GTA GAA Glu Val Glu GGT GAT Gly Asp 565 TTC CCT CGT TAT Phe Pro Arg Tyr GGT GAA Gly Glu 570 GAT GAT GAT Asp Asp Asp CAT GAA AAA His Glu Lys 590
CGT
Arg 575 GCT GAT GAT ATT Ala Asp Asp Ile
GCT
Ala 580 AAA CTT GTC ATG Lys Leu Val Met AAA ATG TAC Lys Met Tyr 585 GAA GCT ACT Glu Ala Thr 1840 1888 TTA GCT TCA CAC Leu Ala Ser His
AAA
Lys 595 CTT TAC AAA AAT Leu Tyr Lys Asn
GCT
Ala 600 GTT TCA Val Ser 605 CTT TTG ACA ATT Leu Leu Thr Ile
ACA
Thr 610 TCT AAC GTT GCT Ser Asn Val Ala
TAC
Tyr 615 TCT AAA CAA ACT Ser Lys Gin Thr 1936
GGT
Gly 620 AAT TCT CCA GTA CAT AAA GGA GTA TTC Asn Ser Pro Val His Lys Gly Val Phe 625 AAT GAA GAT GGT Asn Glu Asp Gly 1984 WO 98/07867 GTA AAT AAA Val Asn Lys AAT AAA GCT Asn Lys Ala 'rrG GAA [TC Leu Giu Phe 670 PCT/DK97/00336 145 TCT AAA CTT Ser Lys Leu 640 GAA TTC TTC Glu Phe Phe TCA CCA GGT GCT AAC CCA 2032 Ser Pro Gly Ala Asn Pro 650
AAG
Lys 655 GGT GGT TGG TTG Gly Gly Trp Leu
CAA
Gin 660 AAC CTT CGC TCA Asn Leu Arg Ser TTG GCT AAG Leu Ala Lys 665 ACT CAA GTT Thr Gin Val 2080 2128 AAA GAT GCA AAT GAT GGT ATT TCA TTG ACT Lys Asp Ala Asn Asp Gly Ile Ser Leu Thr 675 680 TCA CCT Ser Pro 685 CGT GCA CTT GGT Arg Ala Leu Gly
AAA
Lys 690 ACT CGT GAT GAA Thr Arg Asp Giu
CAA
Gin 695 GTG GAT AAC TTG Val Asp Asn Leu 2176
GTT
Val 700 CAA ATT CTT GAT Gin Ile Leu Asp
GGA
Gly 705 TAC TTC ACA CCA GGT GCT 'ITG ATT AAT Tyr Phe Thr Pro Gly Ala Leu Ile Asn 710
GGT
Gly 715 2224 ACT GAA TTT GCA Thr Giu Phe Ala
GGT
Gly 720 CAA CAC GTT AAC Gin His Val Asn
TTG
Leu 725 AAC GTA ATG GAC Asn Val Met Asp CTT AAA Leu Lys 730 2272 GAT GTT TAC Asp Val Tyr TCT GGT TAC Ser Gly Tyr 750 GAA ITA ACT Giu Leu Thr 765
GAT
Asp 735 AAA ATC ATG Lys Ile met CGT GGT Arg Giy 740 GAA GAT GTr ATC Giu Asp Val le GTT CGT ATC Val Arg Ile 745 CAA AAA CAA Gin Lys Gin TGT GTC AAT ACT Cys Val Asn Thr GAA CGT GTC TTC Giu Arg Val Phe 770
AAA
Lys 755 TAC CTC ACA CCA Tyr Leu Thr Pro
GAA
Giu 760 2320 2368 2416 CAT GAA GTT CTT TCA His Glu Val Leu Ser 775 AAC GAT GAT GAA Asn Asp Asp Glu
GAA
Glu 780 GTA ATG CAT ACT TCA AAC ATC Val Met His Thr Ser Asn Ile TAATTCTTAA AATTTAATGA ATATTCGGTC 2470
TGTCAGTACT
TTTTTAGTTT
TI'GACAGGCG
TCAAGTGATr
ACAAGGCAAT
CGTGAAAAAT
TI'ACAAGATG
ATGACTGACA
GTAAATAAGC
GATGAAATAT
AATAAGTTTA
AAACAAAAAG
CAAATCCAGC
CAGTTAATTG
ATTTTCACTC
ATTTGAACTT
GACAGACTTT
AAGAAAGTrA
GAATTGCGAT
GATGCTGACA
ATTAGAAACT
TAGGAGCTTT
AAATTATrCG
AGTCTGTCAG
CAATATTTAT
GGTTGGTCAG
CGGAAGAAGA
TCGCTGATGT
GAGAACTAGC
GAAAGGAATT
GTCATGCCTT
TACTCAGGGT
TTTITACGAA
AATr'ITATGC
GGGAAATCAA
AAGTTGTCCG
TACGGTITAG
AGTTTTTCT
TACAGAATTA
TAAAAA~TT
GGATATTCCG
AAAATTAATC
TAAAATAGAT
CGGTdIGTTGA
TCAATTTACA
ATTTTATTGC
GATTCAAAAG
TATGATAGAC
GATTCAAAGA
TTATTAATTG
ATAA.TAGTrA
GAATGAAAAT
TTTTTGA'rr
AGAACCTGGC
TGACAATTGG
AGCGCGAGAA
GTGATGACTT
GTCAAGGAAA
AATACAATTA
TAGAAAGACT
CACAAATGCC
AGATTGAAGC
AGAGGTGAAT
TGCTTTTTTG
AAAACTATTG
GGTAATrGGA
CTGAGGGTTA
GGAAAACTT
ACAGTTAAAT
ATrATCAAAC
ATTAAAAAAA
AAATCTGTCA
TACCGGATTr
GATGGCAAGA
ATTGTCAGAA
ACTAAAAAAA~
CGCACGAAAA
TAGGTTCATC
2530 2590 2650 2710 2770 2830 2890 2950 3010 3070 31i30 3190 3250 3310 3370 3412 CTTACCTGAA AAAATACAAT AGCTAAAAAA CGAATTTCTT CATTCTGGAT AATTCTGGAA TAGGATAGAA GAACAGAAAT TATTTATAAC ATGGATTGGC TCTCCCCTTG TATATTCAAG TACTTTTTCT TTGCCAGCCT GACTGGGTGA AGCGGTGGGA
TA
WO 98/07867 146 INFORMATION FOR SEQ ID NO:16: Wi SEQUENCE CHARACTERISTICS: LENGTH: 787 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID N0:16: PCT/DK97/00336 Met Lys Thr Giu Val Thr Giu Asn Ile Phe 1 Phe Lys Gin Giu Pro Thr Asn His Ser Ile Leu Glu Phe Met His Gly 130 Met Thr 145 Arg Lys Ser Arg Ala Asp Glu Ile 210 Tyr Gin 225 Asp Val Asn Ile Ser Leu Leu Ala 290 Asp Phe 305 Tyr Asp Gly Asn Glu Tyr Asp Leu Pro 115 Leu Ser Al a Gly Tyr 195 Asn Al a Ser Ala Gly 275 Axg Val Glu Thr Tyr Arg Giu Lys Ile 100 Arg Ser Val Arg Arg 180 Leu Giu Leu Arg Tyr 260 Arg Gly Leu Leu 5 Asn Lys Thr Glu Ile Tyr Gly Val Asn Hisr 165 Ile met Glu Gin Pro 245 Met Val Thr Lys Tyr 325 Trp Arg Pro Tyr Leu Lys 55 Val Gly 70 Pro Ala Gly met Gly Leu Asp Pro 135 Asp Gly 150 Ala His Ile. Gly Lys Glu Asn Ile 215 Giu Val 230 Ala Met Ala Val Pro Ile Phe Thr 295 Leu Arg 310 Ser Gly Asp Asp 40 Val Phe Gly Gin Arg 120 Gly Ile Thr Val Lys 200 Arg Val Asn Cys Val 280 Giu Thr Asp Lys 25 Gly Lys Pro Tyr Asn 105 Val Leu Phe Val Tyr 185 Ala Leu Asn Val Arg 265 Leu Gin Met Pro 10 Ala Asp Lys Phe Ile 90 Ser Ala His Arg Thr 170 Al a Lys Lys Phe Lys 250 Val Asp Glu Lys Thr 330 Glu Gin Ser Val Glu Ser Ile Ile Asp Thr 75 Asp Ala Glu Leu Glu Lye Asp Val 140 Ala Tyr 155 Gly Leu Arg Leu Glu Trp Glu Giu 220 Gly Ala 235 Glu Ala Ile Asn Ile Phe Ile Gin 300 Phe Ala 315 Phe Ile Ala Thr Phe Giu Asp Aen Phe le 125 Leu Thr Pro Ala Asp 205 le Leu Ile Gly Ala 285 Giu Arg Thr Trp Asp Arg Phe Leu Ala Asp Thr Arg Val Asp Lye Arg Leu 110 Leu Thr Ser Gin Ser Ala Asp Ala 175 Leu Tyr 190 Ala Ile Aen Met Tyr Gly Gin Trp 255 Ala Ala 270 Glu Arg Phe Val Ala Ala Thr Ser 335 Gly Val Gly Lye Thr Glu Asn Giu Thr le 160 Tyr Gly Thr Gln Leu 240 Val Thr Asp Asp Ala 320 met Ala Gly Met Gly Aen Asp Gly Arg His Arg Val Thr Lys Met Asp Tyr 345 350 WO 98/07867 PCT/DK97/00336 147 Arg Phe Leu Asn Thr Leu Asp Thr Ile Gly Asn Ala Pro Giu Pro Asn 355 360 365 Leu Thr Val Leu Trp Asp Ser Lys Leu Pro Tyr Ser Phe Lys Arg Tyr 370 375 380 Ser Met Ser Met Ser His Lys His Ser Ser Ile Gin Tyr Giu Giy Vai 385 390 395 400 Giu Thr Met Ala Lys Asp Giy Tyr Gly Giu Met Ser Cys Ile Ser Cys 405 410 415 Cys Val Ser Pro Leu Asp Pro Giu Asn Giu Giu Gly Arg His Asn Leu 420 425 430 Gin Tyr Phe Gly Ala Arg Val Asn Val Leu Lys Ala Met Leu Thr Gly 435 440 445 Leu Asn Gly Gly Tyr Asp Asp Val His Lys Asp Tyr Lys Val Phe Asp 450 455 460 Ile Giu Pro Val Arg Asp Giu Ile Leu Asp Tyr Asp Thr Val Met Giu 465 470 475 480 Asn Phe Asp Lys Ser Leu Asp Trp Leu Thr Asp Thr Tyr Val Asp Ala 485 490 495 Met Asn Ile Ile His Tyr Met Thr Asp Lys Tyr Asn Tyr Giu Ala Val 500 505 510 Gin Met Ala Phe Leu Pro Thr Lys Val Arg Ala Asn Met Gly Phe Gly 515 520 525 Ile Cys Gly Phe Ala Asn Thr Val Asp Ser Leu Ser Ala Ile Lys Tyr 530 535 540 Ala Lys Val Lys Thr Leu Arg Asp Giu Asn Gly Tyr Ile Tyr Asp Tyr 545 550 555 560 Giu Val Giu Gly Asp Phe Pro Arg Tyr Gly Giu Asp Asp Asp Arg Ala 565 570 575 Asp Asp Ile Ala Lys Leu Val Met Lys Met Tyr His Giu Lys Leu Ala 580 585 590 Ser His Lys Leu Tyr Lys Asn Ala Giu Ala Thr Val Ser Leu Leu Thr 595 600 605 Ile Thr Ser Asn Val Ala Tyr Ser Lys Gin Thr Gly Asn Ser Pro Val 610 615 620 His Lys Gly Val Phe Leu Asn Giu Asp Gly Thr Val Asn Lys Ser Lys 625 630 635 640 Leu Giu Phe Phe Ser Pro Gly Ala Asn Pro Ser .Asn Lys Ala Lys Gly 645 650 655 Gly Trp Leu Gin Asn Leu Arg Ser Leu Ala Lys Leu Giu Phe Lys Asp 660 665 670 Ala Asn Asp Gly Ile Ser Leu Thr Thr Gin Val Ser Pro Arg Ala Leu 675 680 685 Gly Lys Thr Arg Asp Giu Gin Val Asp Asn Leu Val Gin Ile Leu Asp 690 695 700 Gly Tyr Phe Thr Pro Gly Ala Leu Ile Asn Gly Thr Giu Phe Ala Gly 705 710 715 720 Gin His Val Asn Leu Asn Val Met Asp Leu Lys Asp Val Tyr Asp Lys 725 730 735 Ile Met Arg Gly Giu Asp Val Ile Val Arg Ile Ser Gly Tyr Cys Vai 740 745 750 Asn Thr Lys Tyr Leu Thr Pro Giu Gin Lys Gin Giu Leu Thr Giu Arg 755 760 765 Val Phe His Glu Val Leu Ser Asn Asp Asp Giu Glu Val Met His Thr 770 775 780 Ser Asn Ile 785 WO 98/07867 148 INFORMATION FOR SEQ ID NO:17: Wi SEQUENCE CHARACTERISTICS LENGTH: 2665 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: PCT/DK97/00336
AAGCAAGTTC
AACAGATTAA
ACTGACGTTT
AGCATTTCTC
GCCGGCCCTA
TACGAAGAAA
GGTTATATrG CTGAACTrCA
TATGAACCAG
GGTATCTTTC
GGTCTCCCAG
TATGGTGCTG
GATGAAGAAT
GTAGTGCGGT
GAAGCTATCC
GCAACTrrCTC
CGTGGCACTT
CGTACGGTTA
T'TTATTACGA
GACTACCGTT
GTTCTTTGGT
AAGC.ATTCTT
ATGTCATGTA
AATCTACAAT
GGCGGTTACG
GAAGTCCTTG
GATACTTACG
GCCGTCA.~A
GGATTCTCTA
CGTGATGAAG
GAAGATGATG
CTTGC.ACGTC
TCTAATGTTG
AATGAAGATG
TCAAATAAAG
GCTCACGCAA
ACATTCGATG
GGTCAACACG
GGTGAAGATG
GAACAAAAGA
GCTACAGACT
ATAAAAGTGA
TTCTAAAAAC
CAATAATATT
AGTAAAAATT
TTTCGCTTGT
CTGTrACTAG
TTGAAAAAGC
GCTTTGTrCA
CTGAACGITC
CACGTITIlTCC
ACAAGGAAAA
TGCCAAAAGG
ACCCTGCCGT
GTGCITACAC
ATGCATACTC
ACTACTrGAT
CAATTCGTCT
TGGGTGATCT
AATGGATTAA
TTGGACGTGT
TCACTGAATC
AATTI'GCACG
CTTCTATGGC
TCTTAAATAC
CAAGTAAATT
CAATTCAATA
TCTCATGCTG
ACTI'TGGTGC
ACGATGTTCA
ATTTTGAAAC
TGGACGCAAT
TGGCCTTCTT
GTAACCGGTT ACTGTATGAT
AATAGAGGGG
CTGGGAAGGC
AGACAACTAC
ACTTCACATC
AATGGATACA
TGAATTGA'TT
CGGTATTCGC
TCATGAAATC
'rrCAAACAT
TCGCGGACGT
GCAAGAAAAA
TCGTGAAGAA
GTATGGTCTr
TATCGCCTT
CCCAATCGTr
AGAAATCCAA
TACTAAGGCT
TGGTATGGGA
GC'ITGATAAT
GCCTTACTCT
TGAAGGTGTC
TGTATCTCCG
TCGTGTTAAC
.CAAAGACTAC
GG ITAAAGCT
GAATATCATT
ACCAACACGT
AACTCAAITA
TTrAAAGGAA
ACTCCATATG
AAAAAAGTCG
CGTATTACAT
TTGGTATCC
ATGGCTGAAA
T TTACCAAAT
CGCCGTGCAC
ATTAITGGAG
GTGAACGACT
ATCAATCTTC
GATGTTCGCA
ATGGCTGTCT
CTrGATATCT
GAATTCGTTG
TATGACGAAC
GCTGATGGAC
ATTGGCA.ATG
TTCCGTCATT
ACAACTATGG
CTTGATCCTG-
GTTCTTAAAG
AAAGTATTTG
AATTTGAAA
CACTATATGA
GTTAAAGCCA
GCTA LTAAAT
GAAACTGTTG
GAATGGTTGC
GAAGCTACTG
AATTCTCCAG
GTAGAATTCT
AACTTGAACT
ACTCAAGTTT
ACAATTCTTG
C LTAAAGATG
TACTGTGTTA
TTCCA.TGAAG
TTAAACAGIT
GTACGATCAA
AGCTCTTGTG
CGACACACTG
AGAATATAAT
TGGCAACTGT
CTGACTGGAA
ACGGAGGCGA
TAGAAGAAAC
CTATTGCTGA
AAAACGATGA
CAGCTrGAA
ATGC.AACAAC
GTCATGCCCA
TITATGCCCG
GGAACTCAAT
AATATCAGGC
AACCTGCTAT
GCCGCGTTAT
TTGC.AGAACG
ATGACTrCGT
ITTATTCAGG
GTC.ACCGTGT
CTCCAGAACC
ATTGTATGTC
CTAAAGAAGG
AAAACGAAGA
CACTTCTTAC
ATGTCGAACC
AAGCACT'rGA
CTGATAAATA
ATATGGGATT
ATGCTACTGT
GTAAC'ITCCC
TI'GAAGCTTT
TATCATTGCT
TTC.A.CAAGGG
TCTCACCAGG
CATTGAAGAA
CACCAAAAGC
ATGGTTACTT
TTTATGACAA
ACACTAAATA
TTCTCTCAAT
TAGTTTAAAA
AGCAGTGAGA
ATTTGAAAAG
TCATCCACCT
CGTAAATTGT
CAAAACTAAC
AGACAGAGCA
A.AGTTTrCrr
TAAAGCGCAT
TATCCCAGCA
ACTTTTTAAG
AGAACATGGT
CGTTAATGAT
CACTGTAACT
TCTTGCTCTC
TGCTGAA.ATT
ACTTGGCGAA
GAATGTTAAG
CAATGGTGCT
TGACCTTGCT
TATGAAACTr
TGACCCAACA
TACTAAGATG
TACTAACC
TATGAGCCAC
TTATGGTGAA
TCGTCGCCAC
AGGTCTTAAT
TATCCGTGAT
TTGGTrGACT
TAACTATGAA
TGGTAT'ITGC
AAAACCTAT'
TCGTTACGGA
CCATACTCGT
TACAATCACT
TGTTTACCTC
TGCTAACCCA
ACTTGACTTr
TCTTGGTA.AG
TGAAGGCGGC
GATCATGAAT
CCTTACTAAA
GGATGATGCA
GACCTCACTC
GCTTTTTATA
CTTTTAGCTA
ATC'ITGATGC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2665 ATACAGTTGA TTCATTATCA ATGGTTACAT TTACGACTAT ACCGTGTAGA CTCAATCGCT ATAAACTGTA CAAAGATTCC CTTATTCTAA ACAAACTGGT GTTCTGTGAA. CTTGTCTAAA CTTCCGGCGG CTGGTTGCA.A ATGATGGTAT CTCATTGACA A.ACA.AGTTGC TAACTTAGTA TTAACTTGAA CGTTATGGAT 'ITATCGTTCG TATCTCAGGT CTGAATTGAC ACAACGTGTT TGG'rrAACAA. CAAGTAAGAG GGTCTTTACT TTGCTTTCGG TCACAAATTC AGAAAAAAAC ATGAAAATTA A ITATACTCG AGACACC'FrG TCTTC WO 98/07867 149 INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 1993 base pairs TYPE: nucleic acid STRAN'DEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: PCT/DK97/00336
GTCCGAATGG
CTACGTrAGA
GITTTGAAT
TATCATTGTT
TTGCAACAAC
ATTTGA'ITGC
TACGGTATAC
GCATATAGTG,
TATCCATACG
TCGGACCTAC
TTTTCGCTAA
ATACAC.AACA
TTCGTATTG
CAGGTAATrG
AAAGGGTGTG
TACCTGCG.AT
GTGTACGTAA
CTTCAG'ITTC
TTGCGGTACG
CAAAGTACAT
CATAGCTrGC
CACGTAAACG
.AGAATTGTGC
AGTCACCGAT
TACGGCAACG
GGTATTCAGT
AACCTTCCAC
TTTGAAGACC
TGGTAGATGG
TACCTTCCAT
AGTCATCGCC
CTTGCCAATC
ACATAGTCAT
GTTGGAATAC
TGCACC.AGCA
AGTAATGGTT
TTTCTTCATA
ACCATATTGT
ATTGCCATCT
TGAAAGTGAG
ATCACGATCA
GATTACGTT
AGTCATTACT
TTGC.ATACCT
G'FITGCACGT
TGCGATTGCG
AACTGATGAG
TI'CAGACCAA
TAAAATACGG
GGTTTCAGTT
GAAACGAACC
AGTAAT'rrrT
ACCGAATGAC
CCATTGAATG
TGCCATTTGT
AATTGTTGCT
GTATTTATCT
GATACGACCA
TAAAATATCT
GAAGATTTTT
CATTGATA
AACGATTTTT
TGTATG'FrCA
CACAGATTCC
TTICATAAGGG
CGACGACCAT CAGGGGTGTT
AATACAGATT
AAACGTTCAA
GGATATTCAC
TTATCTTTGA
TC.AGCCGCAA
TGTAATGCCA
AAGGCAGTCA
GTATCGAAAT
AATITTTC.AT
GCACCGAAGA
TAGTCATCGT
GTATCAATCG
AGAATGGTTA
AATGTAT'TrT
GCCCACATTG
ATACGAAG'Fr
CCTGCTTTTA
ATTGCAGCAC
GCTTCTTGAG
ITTAATTGAC
TCAAGATTTA
TTCATTAAGA
CGACCATAAG
GGCGTGTAAA
TTCACTTTTG
CCACCGAATG
TCT.AAATCTT
TCAAAATCTA
CAAAGCTTGG
GTATAG'TT
GTGTAGGCAC
CTAAGTCACA
CTTCGATTTC
TGTCGCCACG
CAGAAAGACC
TTAATGCGGC
CATATTGTT
CTAATACTTC
CGATACCGCC
ATTGCATTTG
TGTTGAAGTC
ATACTTTTGC
AGT'ITGGCTC
TGGTTACTAA
GGTCACCAGA
TCATAACTAA
AATCACGTTC
CATITGTGA
CATTAGTTGC
CTAATGCACG
CGCCATC'rrC
AATCTACACC
CATCTGGAAG
CATCGAATAC
GATCAAGTTC
GCATAATGGC
TGTTAATGTA
ATGGCGCGTG;
TTGTTGCI-rC
TTTGGATAAA.
ACGCCAArrT
TAAATCTTTA
ACCCG'rrTTTC TTACCATAAA TGCATTGCGG TAAGTTrrrAA AGCGATGTCA TCAACACGGT AAAGTCGATT GCTACGTTAG AACTGGTTTA AC TTrCGCAT TGCGATACCA CAAGCCATAG TTCGTATGAA TA TATCGT TGCCAACCAA TCCATAAAGC ATCAGTAATT GGTGCAGTT GTTGATTGCG TATAACAATG =rTACCCACA ATCATTGGTG TGGACGCATr AAATCATCGT ACAGAAACGT TrGAAGTrTT TGGAGAAGTA CCCATGTTGT TGTACGACCA TCTAAACCCA GAATAATTGA TCGTATTCAG GTGGTCAACT AATTCTTGCG GATGTACACG TCAATAAAGG ITITATrGCA GCAAGATAAG TGGGTTAGAA ATATCATAAC GTGTTGTTCT GCGA TCTr TAAATCT'TTT TGTAAAGAAG ATAAAGTGCT ACACGACGGT ACCAGTTAAT ACCCCAGATT ACCTTGGTTA TGTGTT'rrAC ACGACCATAA ACTTTACAAG 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1740 1800 1860 1920 1980 1993
ACGTTTTAAA
ACCAGGTGCG
AGTACGGTTT
GGTTGGACCT
GTCACGTACA
TTGCATTTCA
ATGTGTTTTG
GGTTCATCAG
TGAGAGATAA
TCAATTTrAA
GCTAAGAAAG
TTGACATr'rr
TTAAGTTCTG
GTTAAATAAC
GCCACCAGCA AAACCAGCCC TTCCT'rrGTT AATTAATAAA
ACC
INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 746 amino acids TYPE: amino acid STR.ANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein WO 98/07867 150 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: Met Lys Val Asp Ile Asp Thr Ser Asp Lys Leu Tyr PCT/DJ 97/00336 1 Leu Phe Ala Gly Asn Pro Ala Ala Leu 145 met Tyr Gly Gin Arg 225 Met Gin Ser Asp Gin 305 Arg Ile Leu Gly Pro 385 Ser Asp Gin Gly Ile Giu Ile Ile Leu Leu Tyr 130 Arg Leu Gly Ile Ser 210 Giu Ala Giu Gin Ile 290 Ala Phe Trp Val Pro 370 Ile Leu Asp Met Phe Gin Ala Arg Ala Giu His 115 Gly Lys Arg Arg Ser 195 Arg Giu Al a Ala Asn 275 Tyr Gin Leu Ala Thr 355 Ala Ala G1n Tyr Gln Lys His Thr Ile Thr Lys 100 Pro Arg Thr Cys Gly 180 Tyr Leu Leu Lys Val 260 Gly Ile Giu Arg Thr 340 Lys Pro Phe Tyr Ala 420 Phe 5 Gly Asn Pro Giu Thr Ile Phe Giu His Arg 165 Arg Leu Giu Ala Tyr 245 Gin Gly Glu Leu Thr 325 Giu Asn Glu Lys Glu 405 Ile Phe Thr Tyr Ala Asn 70 Ile Val Gly Met Asn 150 Lys Ile Val Lys Glu 230 Gly Trp Ala Arg Ile 310 Pro Vai Ser Pro Lys 390 Asn Ala Gly Asp Thr Thr 55 Ala Thr Gly Gly Asp 135 Gin Ser Ile Arg Gly 215 His Phe Leu Met Asp 295 Asp Giu Ile Phe Asn 375 Tyr Asp Cys Al a Trp Pro 40 Thr Thr Ala Leu Ile 120 Ser Gly Gly Gly Giu 200 Giu Arg Asp Tyr Ser 280 Phe His Phe Gly Arg 360 Leu Ala Asp Cys Arg Lys 25 Tyr Glu His His Gin 105 Asn Glu Val Val Asp 185 Arg Asp His Ile Phe 265 Leu Lys Phe Asp Gly 345 Tyr Thr Ala Leu Val 425 10 Asn Glu Giu Gly Leu Trp Ala Pro 75 Asp Ala 90 Thr Asp Met Ile Phe Glu Phe Asp 155 Leu Thr 170 Tyr Arg Giu Leu Leu Giu Ala Leu 235 Ser Arg 250 Ala Tyr Gly Arg Ala Gly Ile Met 315 Ser Leu 330 Met Gly Leu His Ile Leu Gin Val 395 Met Arg 410 Ser Pro Ile Asp Giu Val Gly Ala Lys Tyr 140 Val Gly Arg Gin Ala 220 Leu Pro Leu Thr Val 300 Lys Phe Leu Thr Trp 380 Ser Thr Met Ala *Asn Giu Lys Asp *Tyr Pro Ser 125 Leu Tyr Leu Val Phe 205 Thr Gin Ala Ala Ala 285 Leu Ile Ser Asp Leu 365 Ser Ile Asp Val Lys 445 Asp Val Ser Val Phe Ile Leu 110 Ser Phe Ser Pro Ala 190 Aia Ile Ile Gin Ala 270 Ser Asn Arg Gly Gly 350 His Giu Val Phe Ile 430 Ala Arg Phe Met Asp Asn Lys Phe Thr Pro Asp 175 Leu Asp Arg Gin Asn 255 Val Phe Giu Met Asp 335 Arg Thr Giu Thr Asn 415 Gly Tip Asp Leu Giu Thr Gin Arg His Asp Asp 160 Gly Tyr Leu Leu Giu 240 Ala Lys Leu Gin Val 320 Pro Thr Met Leu Ser 400 Ser Lys Ala Asn Leu Ala Thr Leu Leu 440 WO 98/07867 WO 9807867PCT/D1C97/00336 151 Tyr Ala Ile Asn Gly Gly Val Asp Glu Lys Leu Pro 465 Met Ser Ala Cys Lys 545 Asp Arg Ile Leu Pro 625 Met Ala Ser Asn Glu 705 Leu Val 450 Lys Asp Ala Ser Gly 530 Tyr Phe Val Lys Thr 610 Asp His Lys Ile Leu 690 Gly Asp Ser Thr Ser Leu Leu 515 Ile Ala Glu Asp Ala 595 Ile Gly Gly Leu Val 675 Val Gly Ala Gly Ala Leu Asn 500 Met Ala Arg Ile Ser 580 Leu Thr Arg Arg Pro 660 Pro Giy Gin Ile Tyr 740 Pro Asp 485 Ile Ala Gly Val Asp 565 Ile Pro Ser Arg Asp 645 Phe Ala Leu His Glu 725 Ala Leu 470 His Ile Leu Leu Lys 550 Gly Ala Thr Asn Ala 630 Arg Thr Ala Leu Leu 710 His Cys 455 Met Phe His His Ser 535 Pro Giu Cys Tyr Val 615 Gly Lys Tyr Leu Asp 695 Asn Pro Ala Asp Met Tyr Asp 520 Val Ile Tyr Asp Arg 600 Val Thr Gly Ala Gly 680 Gly Val Glu Ser Val Trp 490 His Asp Thr Asp Gin 570 Val Ala Gly Phe Val 650 Asp Glu Phe Val Tyr 730 His Leu 475 Leu Asp Val Asp Giu 555 Tyr Giu Val Gin Ala 635 Al a Gly Asp His Met 715 Pro Lys 460 Asp Ala Lys Tyr Ser 540 Asn Gly Arg Pro Lys 620 Pro Ser Ile Pro His 700 Asn Asn Ile Tyr Val Tyr Arg 525 Leu Gly Asn Phe Thr 605 Thr Gly Leu Ser Val 685 Giu Arg Leu Gin Asp Gin Ser 510 Thr Ser Leu Asn Met 590 Gin Gly Ala Thr Tyr 670 Arg Ala Giu Thr Val Lys Tyr 495 Tyr Met Al a Ala Asp 575 Lys Ser Asn Asn Ser 655 Thr Lys Asp Met Ile 735 Gly Val 480 Ile Glu Ala Ile Val 560 Giu Lys Ile Thr Pro 640 Val Phe Thr Val Leu 720 Arg 745 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 769 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0:20: Ser Glu Leu Asn Glu Met Gin Lye Leu Ala Trp Ala Gly Phe Ala Gly 1 5 10 Gly Asp Trp Gin Giu Asn Val Asn Val Arg Asp Phe Ile Gin Lys Asn 25 Tyr Thr Pro Tyr Giu Gly Asp Asp Ser Phe Leu Ala Gly Pro Thr Giu 40 WO 98/07867 PCT/DK97/00336 152 Ala Thr Thr Lys Leu Trp Giu Ser Val Met Giu Gly Ile Lys Ile Giu 55 Asn Arg Thr His Ala Pro Leu Asp Phe Asp Giu His Thr Pro Ser Thr 70 75 Ile Ile Ser His Ala Pro Gly Tyr Ile Asn Lys Asp Leu Giu Lys Ile 90 Val Gly Leu Gin Thr Asp Giu Pro Leu Lys Arg Ala Ile Met Pro Phe 100 105 110 Gly Gly Ile Lys Met Val Giu Gly Ser Cys Lys Val Tyr Gly Arg Giu 115 120 125 Leu Asp Pro Lys Vai Lys Lys Ile Phe Thr Giu Tyr Arg Lys Thr His 130 135 140 Asn Gin Giy Val Phe Asp Vai Tyr Thr Pro Asp Ile Leu Arg Cys Arg 145 150 155 160 Lys Ser Gly Vai Leu Thr Gly Leu Pro Asp Ala Tyr Gly Arg Gly Arg 165 170 175 Ile Ile Gly Asp Tyr Arg Arg Vai Aia Leu Tyr Giy Vai Asp Phe Leu 185 190 Met Lys Asp Lys Tyr Ala Gin Phe Ser Ser Leu Gin Lys Asp Leu Giu 195 200 205 Asp Gly Val Asn Leu Giu Ala Thr Ile Arg Leu Arg Giu Giu Ile Ala 210 215 220 Giu Gin His Arg Ala Leu Gly Gin Leu Lys Gin Met Ala Ala Ser Tyr 225 230 235 240 Giy Tyr Asp Ile Ser Asn Pro Ala Thr Asn Ala Gin Giu Ala Ile Gin 245 250 255 Trp Met Tyr Phe Ala Tyr Leu Ala Aia Ile Lys Ser Gin Asn Giy Ala 260 265 270 Ala Met Ser Phe Gly Arg Thr Aia Thr Phe Ile Asp Val Tyr Ile Glu 275 280 285 A-rg Asp Leu Lys Ala Gly Lys Ile Thr Glu Thr Giu Ala Gin Giu Leu 290 295 300 Val Asp His Leu Val Met Lye Leu Arg Met Val Arg Phe Leu Arg Thr 305 310 315 320 Pro Glu Tyr Asp Gin Leu Phe Ser Gly Asp Pro Met Trp Ala Thr Giu 325 330 335 Thr Ile Ala Gly Met Giy Leu Asp Gly Arg Thr Leu Vai Thr Lys Asn 340 345 350 Thr Phe Arg Ile Leu His Thr Leu Tyr Asn Met Gly Thr Ser Pro Giu 355 360 365 Pro Asn Leu Thr Ile Leu Trp Ser Giu Gin Leu Pro Glu Asn Phe Lys 370 375 380 Arg Phe Cys Ala Lys Val Ser Ile Asp Thr Ser Ser Val Gin Tyr Giu 385 390 395 400 Asn Asp Asp Leu Met Az-g Pro Asp Phe Asn Aen Asp Asp Tyr Ala Ile 405 410 415 Ala Cys Cys Val Ser Pro-Met Ile Val Giy Lys Gin Met Gin Phe Phe 420 425 430 Gly Ala Arg Ala Asn Leu Ala Lys Thr Leu Leu Tyr Ala Ile Asn Gly 435 440 445 Gly Ile Asp Giu Lys Leu Gly Met Gin Val Giy Pro Lye Thr Ala Pro 450 455 460 Ile Thr Asp Glu Val Leu Asp Phe Asp Thr Val Met Thr Arg Met Asp 465 470 475 480 Ser Phe Met Asp Trp Leu Ala Lys Gin Tyr Vai Thr Ala Leu Asn Vai 485 490 495 Ile His Tyr Met His Asp Lye Tyr Ser Tyr Glu Ala Aia Leu Met Ala 500 505 510 WO 98/07867 WO 9807867PCT/DK97/00336 Leu Leu Lys 545 Thr cay Arg Pro Lys 625 Pro Ser Ile Giu His 705 Asn Gin Thr Met His Asp 515 Ser Val 530 Pro Vai Asn Vai Asn Asn Phe Met 595 Thr Gin 610 Thr Giy Gly Ala Leu Thr Ser Tyr 675 Aia Gin 690 Giu Ala Arg Giu Leu Thr Lys Giu 755 Arg Asp Vai Tyr Arg Thr Met Ala Cys Gly Ile Ala Gly Ala Aia Arg Giy Ala Ile 565 Asp Asn 580 Lye Lye Ser Val Aen Thr Asn Pro 645 Ser Vai 660 Thr Phe Arg Arg Thr Val Met Leu 725 Ile Arg 740 Gin Gin Asp Asp 550 Asp Arg Ile Leu Pro 630 Met Ala Ser Asn Glu 710 Leu Vai Gin Ser 535 Ile Phe Val Gin Thr 615 Asp His Lys Ile Leu 695 Giy Asp Ser Asp 520 Leu Lys Giu Asp Lys 600 Ile Giy Gly Leu Val 680 Ala Giy Ala Gly Val 760 Ser Aia Asp Lye Ile Giu 570 Asp Ile 585 Leu Lye Thr Ser Arg Arg Arg Asp 650 Pro Phe 665 Pro Asn Giy Leu Gin His Met Giu 730 Tyr Aia 745 Ile Thr Ile Asp 555 Gly Ala Thr Asn Ala 635 Gin Aia Ala Met Leu 715 Aen Vai Arg Lys 540 Gly Giu Cys -Tyr Val 620 Gly Lye Tyr Leu Asp 700 Asn Pro Arg Thr Tyr Asn Tyr Asp Arg 605 Val Ala Giy Al a Giy 685 Giy Val Asp Phe Phe 765 Val Aia Thr Asp His Lys Al a Val Pro Leu 590 Asn Tyr Pro Aia Lye 670 Lye Tyr Aen Lye Asn 750 Thr Asp Lys Ile Giy Giy Leu *Lye Vai Gin 575 Vai Ala Gly Phe Val 655 Asp Asp Phe Val Tyr 735 Ser Giu Giu Gin Thr Arg Arg Pro Vai Ala 560 Tyr Giu Vai Lys Gly 640 Aia Gly Ala His Leu 720 Pro Leu Ser Ile His Ser Lye Asp Tyr INFORMATION FOR SEQ ID NO:21: Wi SEQUENCE CHARACTERISTICS: LENGTH: 195 amino acids TYPE: amino acid STRAN~DEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0:21: Ser Phe Pro Lye Tyr Giy Aen Asp Asp Asp Arg 10 Giu Trp Vai Val Ser Thr Phe Ser Ser Lye Leu 25 Tyr Arg Asn Ser Val Pro Thr Leu Ser Val Leu 40 Val Val Tyr Gly Lye Lye Thr Gly Ser Thr Pro 55 Gly Giu Pro Phe Ala Pro Giy Aia Aen Pro Leu 70 75 His Gly Ala Leu Ala Ser Leu Asn Ser Val Aia 90 Gly Ala Thr Asn Lys Ala WO 98/07867 PCT/DK97/00336 154 Thr Met Cys Leu Asp Gly Ile Ser Asn Thr Phe Ser Leu Ile Pro Gin 100 Arg Val Leu Gly Gly Giy Giu Ile Leu 1:30 Vai Leu 115 Asp Asn His 120 Giu Arg Ala Giy Tyr Phe Ala Asn Gly 13 5 Arg Ser Met Leu Met Asp 145 Tyr Pro Asn Leu Thr 165 Giu 150 Ile Arg Val Ser Gin Gin Leu Giu Giy His Ala Val 155 Gly Tyr 170 Vai Ile Thr Asn 125 His Ile 140 Giu His Ala Val 110 Leu Ala Ser Asn Val Asn Pro Giu Lys 160 His Phe Ala 175 Phe His Arg Leu Thr Asp Thr Met 195 Arg 180 Ala Arg Thr 190 INFORMATION FOR SEQ ID NO:22: Wi SEQUENCE CHARACTERISTICS: LENGTH: 1006 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:
TGTTACCTGG
CTGTTCGTGA
ACTGGTrGAC
ATAACTATGA
TTGGTATCTG
TTAAAACTTT
CACGTTATGG
ACCATGAAAA
TGACAATCAC
GAGTATTCCT
GTGCTAACCC
AATTGGAATT
CACTTGGTA4
TCACACCAGG
TTATGGACCT
TCTCTGGATA
AACGTGTCTT
TTrGAACGGT GGTTACGTT-C TGAAAMTCTT GACTATGATA AGATAC ETAT GTTGATGCAA AGCAGTTCAA ATGGCCTTCT TGGTTTCGCA AATACAGTTG GCGTGATGAA AATGGCTACA TGAAGATGAT GACCGTGCTG ATTAGCTI'CA CACAAACTT ATCTAACGTT GCITACTCTA CAATGAAGAT GGTACAGTCA ATCTAACAAA GCTAAAGGTG CAAAGATGCA AATGACGGTA AACTCGTGAT GAACAAGTAG AGCTTTGATT AATGGTACTG TAAAGATGTT TACGATAAAA CTGTGTrrAAC ACTAAATACC CCATGAAGTA CTTTCAAACG
ATAAAGATTA
CAGTTATGGA
TGAATATCAT
TGCCTACTAA
ATTCACT'TTC
TCTACGATTA
ATGATATCGC
ACAAAAATGC
AAC.AAACTGG
ACAAATCTAA
GATGGTTGCA
TTTCATTrAAC
ATAACTTGGT
AATTTGCAGG
TCATGCGTGG
TCACACCTGA
ATGATGAAGA
TAAAGTATTC
AAACTTCGAC
TCACTACATG
AGTTCGTGCT
AGCGATTAAA
TGAAGTAGAA
TAAACTTGTC
TGAAGCTACT
TAACTCTCCA
AC!TTGAATTC
AAATCTTCGT
TACTC.AAGTT
TCAALATTCTT
TCAAC.ACGTT
TGAAGATGTT
ACAAAAACAA
AGTAAT
GATATTGAAC
AAATCACTCA
.ACTGACAAAT
AACATGGGAT
TATGCTAAAG
GGTGACTTCC
ATGAAAATGT
GTrTCACTrT
GTTCATAAAG
TTCTCACCAG
TCATTAGCTA
TCTCCTCGTG
GATGGATACT
AACTTGAACG
ATCGTTCGTA
GAA'ITGACTG
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1006 INFORMATION FOR SEQ ID NO:23: Wi SEQUENCE CHARACTERISTICS: LENGTH: 334 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein WO 98/07867 155 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: PCT/DK97/00336 Leu Pro Gly Leu Asn Gly Gly Tyr Val His 1 Asp Giu Ala Val Gly Tyr Tyr Ala Ala 145 Thr Val Lys Gly Asp 225 Leu Asp Gly Lys Val 305 Arg Ile Asn Met Gin Ile Ala Giu Asp 130 Ser Ile His Leu Gly 210 Ala Gly Gly Gin Ile 290 Asn Val Glu Phe Asn Met Cys Lys Val 115 Asp His Thr Lys Glu 195 Trp Asn Lys Tyr His 275 met Thr Phe Pro Asp Ile Ala Gly Val 100 Giu Ile Lys Ser Gly 180 Phe Leu Asp Thr Phe 260 Val Arg Lys His Arg Ser His Leu 70 Ala Thr Asp Lys 150 Val Phe Ser Asn Ile 230 Asp Pro Leu Glu Leu 310 Val Asp Leu Tyr 55 Pro Asn Leu Phe Leu 135 Lys Ala Leu Pro Leu 215 Ser Giu Gly Asn Asp 295 Thr Leu Glu Asn 40 Met Thr Thr Arg Pro 120 Val Asn Tyr Asn Gly 200 Arg Leu Gin Ala Val 280 Val Pro Ser Ile 25 Trp Thr Lys Val Asp 105 Arg Met Ala Ser Glu 185 Ala Ser Thr Val Leu 265 Met Ile Giu; Asn 10 Leu Leu Asp Val Asp Glu Tyr Lys Giu Lys 170 Asp Asn Leu Thr Asp 250 Ile Asp Val Gin Asp 330 Lys Asp Thr Lys Arg 75 Ser Asrn Gly Met Al a 155 Gin Gly Pro Ala Gin 235 Asn.
Asn Leu Arg Lys 315 Asp Asp Tyr Asp Tyr Ala Leu Gly Glu Tyr 140 Thr Thr Thr Ser Lys 220 Val Leu Gly Lys Ile 300 Gin Glu Tyr Asp Thr Asn Asn Ser Tyr Asp 125 His Val Gly Val Asn 205 Leu Ser Val Thr Asp 285 Ser Glu Giu Lys Thr Tyr Tyr Met Ala Ile 110 Asp Glu Ser Asn Asn 190 Lys Glu Pro Gin Glu 270 Val Giy Leu Val Val Val Val Giu Gly Ile Tyr Asp Lys Leu Ser 175 Lys Ala Phe Arg Ile 255 Phe Tyr Tyr Thr Phe Met Asp Ala Phe Lys Asp Arg Leu Leu 160 Pro Ser Lys Lys Ala 240 Leu Ala Asp Cys Glu 320 INFORMATION FOR SEQ ID NO:24: SEQUENCE CHARACTERISTICS: LENGTH: 776 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein WO 98/07867 PCT/DK97/00336 156 (xi) SEQUENCE DESCRIPTION: SEQ ID N0:24: Met Ala Thr Val Lys Thr Asn Thr Asp Val Phe Glu Lys Ala Trp Glu 1 5 10 Gly Phe Lys Gly Thr Asp Trp Lys Asp Arg Ala Ser Ile Ser Arg Phe 25 Val Gin Asp Asn Tyr Thr Pro Tyr Asp Gly Gly Glu Ser Phe Leu Ala 40 Gly Pro Thr Glu Arg Ser Leu His Ile Lys Lys Val Val Giu Glu Thr s0 55 Lys Ala His Tyr Giu Giu Thr Arg Phe Pro Met Asp Thr Arg Ile Thr 70 75 Ser Ile Ala Asp Ile Pro Ala Gly Tyr Ile Asp Lys Giu Asn Glu Leu 90 Ile Phe Gly Ile Gin Asn Asp Glu Leu Phe Lys Leu Asn Phe Met Pro 100 105 110 Lys Giy Gly Ile Arg Met Ala Giu Thr Ala Leu Lys Giu His Gly Tyr 115 120 125 Giu Pro Asp Pro Ala Val His Giu le Phe Thr Lys Tyr Ala Thr Thr 130 135 140 Val Asn Asp Gly Ile Phe Arg Ala Tyr Thr Ser Asi Ile Arg Arg Ala 145 150 155 160 Arg His Ala His Thr Val Thr Gly Leu Pro Asp Ala Tyr Ser Arg Gly 165 170 175 Arg Ile Ile Gly Val Tyr Ala Arg Leu Ala Leu Tyr Gly Ala Asp Tyr 180 185 190 Leu Met Gin Glu Lys Val Asn Asp Trp Asn Ser Ile Ala Giu Ile Asp 195 200 205 Glu Glu Ser Ile Arg Leu Arg Giu Glu Ile Asn Leu Gin Tyr Gin Ala 210 215 220 Leu Gly Glu Val Val Arg Leu Gly Asp Leu Tyr Gly Leu Asp Val Arg 225 230 235 240 Lys Pro Ala Met Asn Val Lys Glu Ala Ile Gin Ti-p Ile Aen Ile Ala 245 250 255 Phe Met Ala Val Cys Arg Val Ile Asn Gly Ala Ala Thr Ser Leu Gly 260 265 270 Arg Val Pro Ile Val Leu Asp Ile Phe Ala Glu Arg Asp Leu Ala Arg 275 280 285 Gly Thr Phe Thr Glu Ser Glu Ile Gin Giu Phe Val Asp Asp Phe Val 290 295 300 Met Lys Leu Arg Thr Val Lys Phe Ala Arg Thr Lys Ala Tyr Asp Giu 305 310 315 320 Leu Tyr Ser Gly Asp Pro Thr Phe Ile Thr Thr Set Met Ala Gly Met 32S 330 335 Gly Ala Asp Gly Arg His Arg Val Thr Lys Met Asp Tyr Arg Phe Leu 340 345 350 Asn Thr Leu Asp Asn le Gly Asn Ala Pro Glu Pro Asn Leu Thr Val 355 360 365 Leu Tip Ser Ser Lys Leu Pro Tyr Ser Phe Arg His Tyr Cys Met Ser 370 375 380 Met Ser His Lys His Ser Se Ile Gin Tyr Glu Gly Val Thr Thr Met 385 390 395 400 Ala Lys Giu Gly Tyr Gly Giu Met Ser Cys Ile Ser Cys Cys Val Ser 405 410 415 Pro Leu Asp Pro Giu Asn Giu Asp Arg Arg His Asn Leu Gin Tyr Phe 420 425 430 Gly Ala Arg Val Asn Val Leu Lys Ala Leu Leu Thr Gly Leu Asn Gly 435 440 445 WO 98/07867 PCT/DK97/00336 157 Gly Tyr Asp Asp Val His Lys Asp Tyr Lys Val Phe Asp Val Giu Pro 450 455 460 Ile Arg Asp Glu Val Leu Asp Phe Giu Thr Val Lys Ala Asn Phe Giu 465 470 475 480 Lys Ala Leu Asp Trp Leu Thr Asp Thr Tyr Val Asp Ala Met Asn Ile 485 490 495 Ile His Tyr Met Thr Asp Lys Tyr Asn Tyr Giu Ala Vai Gin Met Ala 500 505 510 Phe Leu Pro Thr Arg Val Lys Ala Asn Met Gly Phe Gly Ile Cys Gly 515 520 525 Phe Ser Asn Thr Val Asp Ser Leu Ser Ala Ile Lys Tyr Ala Thr Val 530 535 540 Lys Pro Ile Arg Asp Giu Asp Gly Tyr Ile Tyr Asp Tyr Giu Thr Val 545 550 555 560 Gly Asn Phe Pro Arg Tyr Gly Giu Asp Asp Asp Arg Val Asp Ser Ile 565 570 575 Ala Giu Trp Leu Leu Giu Ala Phe His Thr Arg Leu Ala Arg His Lys 580 585 590 Leu Tyr Lys Asp Ser Giu Ala Thr Val Ser Leu Leu Thr Ile Thr Ser 595 600 605 Asn Val Ala Tyr Ser Lys Gin Thr Gly Asn Ser Pro Val His Lys Gly 610 615 620 Val Tyr Leu Asn Giu Asp Gly Ser Val Asn Leu Ser Lys Val Glu Phe 625 630 635 640 Phe Ser Pro Gly Ala Asn Pro Ser Asn Lys Ala Ser Gly Gly Trp Leu 645 650 655 Gin Asn Leu Asn Ser Leu Lys Lys Leu Asp Phe Ala His Ala Asn Asp 660 665 670 Gly Ile Ser Leu Thr Thr Gin Vai Ser Pro Lys Ala Leu Gly Lys Thr 675 680 685 Phe Asp Giu Gin Val Ala Asn Leu Val Thr Ile Leu Asp Gly Tyr Phe 690 695 700 Giu Gly Gly Gly Gin His Val Asn Leu Asn Val Met Asp Leu Lys Asp 705 710 715 720 Val Tyr Asp Lys Ile Met Asn Giy Giu Asp Val Ile Val Arg Ile Ser 725 730 735 Gly Tyr Cys Val Asn Thr Lys Tyr Leu Thr Lys Giu Gin Lys Thr Giu 740 745 750 Leu Thr Gin Arg Val Phe His Giu Val Leu Ser Met Asp Asp Ala Ala 755 760 765 Thr Asp Leu Val Asn Asn Lys Gix 770 775 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 740 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein WO 98/07867 PCT/DK97/00336 158 (xi) SEQUENCE DESCRIPTION: SEQ ID Leu Phe Lys Gin Trp Giu Gly Phe Gin Asp Gly Giu Trp Thr Asn Asp 1 5 10 Vai Asn Vai Arg Asp Phe Ile Gin Lys Asn Tyr Lys Giu Tyr Thr Gly 25 Asp Lys Ser Phe Leu Lys Giy Pro Thr Giu Lys Thr Lys Lys Val Trp 40 Asp Lys Ala Val Ser Leu Ile Leu Giu Giu Leu Lys Lys Gly Ile Leu 55 Asp Val Asp Thr Giu Thr Ile Ser Gly Ile Asn Ser Phe Lys Pro Gly 70 75 Tyr Leu Asp Lys Asp Asn Giu Vai Ile Val Gly Phe Gin Thr Asp Ala 90 Pro Leu Lys Arg Ile Thr Asn Pro Phe Gly Gly Ile Arg Met Ala Giu 100 105 110 Gin Ser Leu Lys Giu Tyr Giy Phe Lys Ile Ser Asp Giu Met His Asn 115 120 125 Ile Phe Thr Asn Tyr Arg Lys Thr His Asn Gin Gly Vai Phe Asp Ala 130 135 140 Tyr Ser Giu Giu Thr Arg Ile Ala Arg Ser Aia Gly Val Leu Thr Giy 145 i50 155 160 Leu Pro Asp Ala Tyr Gly Arg Gly Arg Ile Ile Gly Asp Tyr Arg Arg 165 170 175 Val Ala Leu Tyr Gly Ile Asp Phe Leu Ile Gin Giu Lys Lys Lys Asp 180 185 190 Leu Ser Asn Leu Lys Gly Asp Met Leu Asp Giu Leu Ile Arg Leu Arg 195 200 205 Giu Giu Val Ser Giu Gin Ile Arg Ala Leu Asp Giu Ile Lys Lys Met 210 215 220 Ala Leu Ser Tyr Gly Vai Asp Ile Ser Arg Pro Ala Val Asn Ala Lys 225 230 235 240 Giu Ala Ala Gin Phe Leu Tyr Phe Giy Tyr Leu Ala Gly Val Lys Giu 245 250 255 Asn Asn Giy Ala Ala Met Ser Leu Gly Arg Thr Ser Thr Phe Leu Asp 260 265 270 Ile Tyr Ile Glu Arg Asp Leu Giu Gin Gly Leu Ile Thr Giu Asp Giu 275 280 285 Aia Gin Giu Vai Ile Asp Gin Phe le Ile Lys Leu Arg Leu Vai Arg 290 295 300 His Leu Arg Thr Pro Giu Tyr Asn Giu Leu Phe Aia Giy Asp Pro Thr 305 310 315 320 Trp Vai Thr Giu Ser Ile Mla Giy Vai Gly Ile Asp Gly Arg Ser Leu 325 330 335 Val Thr Lys Asn Ser Phe .Arg Tyr Leu His Thr Leu Ile Asn Leu Giy 340 345 350 Ser Ala Pro Glu Pro Asn Met Thr Vai Leu Trp, Ser Giu Asn Leu Pro 355 360 365 Giu Ser Phe Lys Lys Phe Cys Ala Giu Met Ser Ile Leu Thr Asp Ser 370 375 380 Ile Gin Tyr Glu Asn Asp Asp Ile Met Arg Pro Ile Tyr Gly Asp Asp 385 390 395 400 Tyr Ala Ile Ala Cys Cys Val Ser Ala Met Arg Val Gly Lys Asp Met 405 410 415 Gin Phe Phe Gly Aia Arg Cys Asn Leu Ala Lys Cys Leu Leu Leu Ala 420 425 430 Ile Asn Gly Gly Val Asp Giu Lys Lys Gly Ilie Lys Val Val Pro Asp 435 440 445 WO 98/07867 WO 9807867PCTIDK97/00336 159 Ile Asn 465 Met Gin Ile Ala Lys 545 Ser His Thr Arg Arg 625 Pro Pro Ser Asn Lys 705 Asn His Glu 450 Tyr Asn Met Ala Lys 530 Glu Ile Pro Ser Lys 610 Asp Tyr Asp Ile Val 690 Tyr Arg Glu Pro Phe Ile Ala Gly 515 Val Gly Ala Thr Asn 595 Vai Met Val Ala Met 675 Leu Pro Leu Lys Ile Lys Ile Leu 500 Phe Lys Asp Val Tyr 580 Val Gly Giu Cy s Leu 660 Gly Asn Thr Ser Leu Thr Val His 485 His Ser Pro Phe Glu 565 Arg Met Glu Gly Cys 645 Gly Gly Arg Leu Lys 725 Asp Leu 470 Phe Asp Val Ile Pro 550 Ile Asn Tyr Pro Ala 630 Giu Asn Tyr Glu Thr 710 Asp Glu 455 Glu Met Thr Ala Arg 535 Lys Val Ala Gly Leu 615 Leu Asp Asp Phe Thr 695 Ile His Val Tyr His Lys Ala 520 Giu Tyr Giu Lys Lys 600 Ala Al a Gly His Gly 680 Leu Arg Gin Leu Met Asp Val 505 Asp Asn Gly Lys His 585 Lys Pro Ser Val Asp 665 Gin Ile Val Lys Asp Ala Lys 490 Gly Ser Gly Asn Phe 570 Thr Thr Gly Leu Ser 650 Val Gly Asp Ser Giu 730 Tyr Gly 475 Tyr Arg Leu Ile Asp 555 Ser Leu Gly Ala Asn 635 Asn Arg Ala Ala Gly 715 Val Giu 460 Leu Ala Leu Ser Thr 540 Asp Asp Ser Thr Asn 620 Ser Thr le His met 700 Tyr Ile Lys Tyr Tyr Met Ala 525 Val Asp Giu Val Thr 605 Pro Val Phe Asn His 685 Asn Ala Ser Val Val Giu Ala 510 Ile Asp Arg Leu Leu 590 Pro Met Al a Ser Asn 670 Leu Asn Val Arg Lys Asn Ala 495 Phe Arg Phe Val Lys 575 Thr Asp His Lys Ile 655 Leu Asn Pro Asn Thr 735 Glu Thr 480 Ser Gly Tyr Val Asp 560 Lys Ile Gly Gly Val 640 Val Val Val Asp Phe 720 Phe 740 INFORMATION FOR SEQ ID NO:26: Wi SEQUENCE CHARACTERISTICS: LENGTH: 1848 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 591... .1613 (xi)
CTGCAGCTTG
TTACATAGTT
TACCATTACG
GACCGAAGTG
SEQUENCE DESCRIPTION: SEQ ID, NO:26: TTTTTTAGTA CCAACAAAAA GGACTACTGC ACCTTCTTGT GAAGCGTTTT GTAAGCATCG TCAACAAGTT TTACAGTTTT TTGAAGGTCG ATAACGTGGA TTCTGTGAAG ATGTATGGTT TCAT ITTTGG GTTCCAACGA CGAG TTT GGT AACACCAGCT TCAAGAAGTT GTTTCATTGA AATAACTGAC ATGTTAATGT 120 180 240 WO 98/07867 CTCCTrAA
ACTACGACTT
TGTTTGACCT
CTAACCACTC
GAGTAAGTTT
TTTGTGATAA
PCT/DK97/00336 160
AATAG'TTT
TGTCGAGACG
ATTTTTGCAA
CTATTATCTG
TTAAATTTT
AATAATTATA
CCTO'TTCA
AAATGCGAGA
GTATCTATTC
ACAAATTTA
TTCTAAGAAA
GTAAATAAAT
CTGTCATCCG
TGGTTGCATA
ATGCTTCTAT
TTGTTAATTT
AAAATTAATA
TAGTTTGTGA
CAGCCGCAAT
GCAACTCTAT
TGTTCAGTAA
AGGCTCTATA
TrrGCTGA
GGAGAGAAAT
ACTTGCGTAC
CATTATACAT
ATCTATTITTT
ATCACTAAAA
AACCGCTTTT
ATG AAA Met Lys 1 GAA AAA ATC CTT TTA GGC GGC TAT ACA AAA CGT GTA Glu Lys Ile Leu Leu Gly Gly Tyr Thr Lys Arg Val TCT AAA GGC GTA Ser Lys Gly Val TCA TCA TTA AAT Ser Ser Leu Asn TAT AGT Tyr Ser GTT CTT TTG GAC ACT AAA GCT GCT GAA Val Leu Leu Asp Thr Lys Ala Ala Giu Leu 644 692 740 788
GAA
Glu GTC GCT GCG GTT CAA AAC CCT ACT TAT Val Ala Ala Val Gin Asn Pro Thr Tyr
ATC
Ile 45 ACT CTC GAT GAA Thr Leu Asp Giu
AAG
Lys GGA CAC CTC TAT ACT TGT GCA GCA GAT Gly His Leu Tyr Thr Cys Ala Ala Asp
AGT
Ser 60 AAT GGT GGA GGA Asn Gly Gly Gly ATC GCC Ile Ala GCC Tr GAT Ala Phe Asp ACC ACG GGA Thr Thr Gly
TTT
Phe GAT GGC GAA ACT Asp Gly Glu Thr
GCT
Al a 75 ACT CAT CTC GGA Thr His Leu Gly AAT GTC ACA Asn Val Thr s0 GCG CGA CAA Ala Arg Gin 836 884 GCT CCA CTC TGC Ala Pro Leu Cys
TAT
Tyr 90 GTT GCC GTG GAC Val Ala Val Asp
GAA
Giu TTA q~TT TAC GGA GCG AAC TAT Leu Val Tyr Gly Ala Asn Tyr 100 .105 CAT CTT GGA GAA His Leu Gly Glu CGT GTT TAT AAG Arg Val Tyr Lys
ATT
Ile 115 CAA GCT AAT GGC Gin Ala Asn Gly
TCA
Ser 120 CTC CGA 'ITA ACG Leu Arg Leu Thr
GAT
Asp 125 ACA GTA AAA CAT Thr Val Lys His
ACC
Thr 130 932 980 1028 GGT TCT GGA CCA Gly Ser Gly Pro CCT GAA CAA GCT Pro Giu Gin Ala
AGC
Ser 140 TCA CAC GTT CAT Ser His Vai His TAT TCT Tyr Ser 145 GAT TTG ACT Asp Leu Thr GAA GTC ACT Glu Val Thr 165
CCT
Pro 150 GAC GGA CGA CTT Asp Gly Arg Leu
GTC
Val 155 ACC TGT GAT TTG Thr Cys Asp Leu GGA ACA GAT Gly Thr Asp 160 AAT ATT GCT Asn Ile Ala 1076 1124 GTT TAT GAT GTC Val Tyr Asp Val
ATT
Ile 170 GGT GAA GGT AAA Gly Glu Gly Lys
CTC
Leu 175 ACA ATT Thr Ile 180 TAT CGG GC-A GAA Tyr Arg Ala Giu
AAA
Lys 185 GGA ATG GGT GCT Gly Met Gly Ala CAT ATT ACT ITC His Ile Thr Phe 1172 CAT CCA AAT GGT AAA ATC GCT TAT TTG GTT GGA GAG TTA AAT TCA ACA 12 1220 WO 98/07867 WO 9807867PCT/DK97/00336 161 His 195 Pro Asn Gly Lys Ile 200 Ala Tyr Leu Val Gly Glu Leu Asn Ser 205 Thr 210 ATT GAA OTT TTA Ile Glu Val Leu
AGT
Ser 215 TAC AAT GAA GAA Tyr Asn Glu Glu
AAA
Lys 220 GGA CGC TTT GCT Gly Arg Phe Ala COT CTT Arg Leu 225 1268 CAA ACA ATT AGC ACC CTA CCT GAA Gin Thr Ile Ser Thr Leu Pro Giu 230 TAT CAT GGA GCA Tyr His Gly Ala AAT GGT GTT Aen Gly Val 240 ACT TCT AAT Thr Ser Asn i3 16 GCT GCC ATC Ala Ala Ile 245 CGT ATT TCA TCT GAC GGT AAA TTC CTC Arg Ile Ser Ser Asp Gly Lys Phe Leu
TAT
Tyr 255 1364 1412 CGT GGA Arg Oly 260 CAT GAT TCT TTG His Asp Ser Leu
ACA
Thr 265 ACT TAC AAA GTA Thr Tyr Lys Val CCT CTT GGT ACA Pro Leu Gly Thr
AAA
Lys 275 CTT GAA ACT ATT GGC TOG ACA AAT ACT Leu Oiu Thr Ile Gly Tx-p Thr Asn Thr 280
GAA
Giu 285 GGT CAT ATC CCT Gly His Ile Pro
CGC
Arg 290 1460 OAT TTT AAT TTC Asp Phe Asn Phe
AAC
Asn 295 AAA ACT OAA OAT Lys Thr Giu Asp
TAT
Tyr 300 ATC ATT GTC GCT Ile Ile Val Ala CAT CAA His Gin 305 1508 GAA TCT OAT AAT TTA TCT CTT TTC Glu Ser Asp Aen Leu Ser Leu Phe 310 CGA OAT AAA AAA Arg Asp Lye Lye ACC GGT ACT Thr Gly Thr 320 ACT TOT GTT Thr Cys Vai 1556 TTA ACT TTG Leu Thr Leu 325 TTA CCA CTA Leu Pro Leu 340 GAA CAA AAA GAT Glu Gin Lye Asp
TTT
Phe 330 TAC GCT CCT GAA Tyr Ala Pro Glu
ATC
Ile 335 1604 TAAAAA TTTA T TTTrTTCACA AAG rTTGACT GATAAACTAA AAAAGATTG 1662 CTAATrTCTC TCAAAGAATT AGCAATCITT TTTTCTTCAG rT TTCTAAAC TF'ITGATGAG TGTITTGTA AAAACTATCA AAAAAACTrT GTTAAACTAT TCACGTAAAA GAAAGTGAAT
ACAAAT
INFORMATION FOR SEQ ID NO:27: SEQUENCE CHARACTERISTICS: LENGTH: 341 amino acids TYPE: amino acid STRA1NDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal
TAAAGCTT
CAATATTGCT
GAAGTCACAA
TACAAAACCG
TGACATCTAT
AGGAGAACCT
1722 1782 1842 1848 WO 98/07867 162 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: PCTIDK97/00336 Met Lys Glu Lys Ile Leu Leu Gly Gly Tyr 1 Gly Leu Giu Ile Val Arg Tyr His Tyr 145 Thr Ile Thr Ser Arg 225 Gly Ser Gly Pro His 305 Gly Cys Val Asn Lys Al a Thr Gin Lys Thr 130 Ser Asp Al a Phe Thr 210 Leu Val Asn Thr Arg 290 Gin Thr Val Tyr Glu Gly Ala Thr Leu Ile Gly Asp Glu Thr His 195 Ile Gin Ala Arg Lys 275 Asp Glu Leu Leu Ser Val His Phe Thr Val 100 Gin Ser Leu Val Ile 180 Pro Giu Thr Ala Gly 260 Leu Phe Ser Thr Pro 340 5 Val Ala Leu Asp Gly Tyr Ala Gly Thr Thr 165 Tyr Asn Val Ile Ile 245 His Glu Asn Asp Leu 325 Leu Leu Ala Tyr Phe 70 Ala Gly Asn Pro Pro 150 Val Arg Gly Leu Ser 230 Arg Asp Thr Phe Aen 310 Glu Leu Val Thr 55 Asp Pro Ala Gly Arg 135 Asp Tyr Al a Lys Ser 215 Thr Ile Ser Ile Asn 295 Leu Gin Asp Gin 40 Cys Gly Leu Asn Ser 120 Pro Gly Asp Glu Ile 200 Tyr Leu Ser Leu Gly 280 Lys Ser Lys Thr 25 Asn Ala Glu Cys Tyr 105 Leu Glu Arg Val Lys 185 Ala Asn Pro Ser Thr 265 Trp Thr Leu Asp 10 Lye Pro Ala Thr Tyr 90 His Arg Gin Leu Ile 170 Gly Tyr Glu Glu Asp 250 Thr Thr Glu Phe Phe 330 Thr Aia Thr Asp Ala 75 Val Leu Leu Ala Val 155 Gly Met Leu Glu Asp 235 Gly Tyr Asn Asp Leu 315 Tyr Lys Ala Tyr Ser Thr Ala Gly Thr Ser 140 Thr Giu Gly Val Lys 220 Tyr Lys Lys Thr Tyr 300 Arg Ala Arg Glu Ile Asn His Val Glu Asp 125 Ser Cys Gly Ala Gly 205 Gly His Phe Val Glu 285 Ile Asp Pro Val Leu Thr Gly Leu Asp Val 110 Thr His Asp Lys Arg 190 Glu Arg Gly Leu Ser 270 Gly Ile Lys Glu Ser Ser Leu Gly Gly Glu Arg Val Val Leu Leu 175 His Leu Phe.
Ala Tyr 255 Pro His Val Lys Ile 335 Lys Ser Asp Gly Asn Ala Val Lys His Gly 160 Asn Ile Aen Ala Aen 240 Thr Leu Ile Ala Thr 320 Thr INFORMATION FOR SEQ ID NO:28: SEQUENCE CHARACTERISTICS: LENGTH: 474i base pairs TYPE: nucleic acid STRAN~DEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: WO 98/07867 163 NAmE/KEY: Coding Sequence LOCATION: 453.. .1475 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: PCT/DK97/00336
TTTGOTGACC
TAATGTCTCC
GCGTACACTA
ATACATTGTT
TAAATGAAAG
CTGACAAATT
TTT.71CTAAGA
TAGTAAATAA
GAAGTGAACA
TTTTAAAATA
CGACTTTGTC
TAAGCTACTT
CTATCCTACC
TAACGAATAT
ATAAAATTAA
AATAGTTTGT
CCAGCTTCAA
GTTTTTCCTC
GAGACGAAAT
TTGCAAGC.AT
CCCCTTrCTT
TTTTGCCTAT
TATTTTTGCT
GAGGAGAGAA
GAAGTTGTTT
TI'TCATCTGT
GCGAGATGGT
CTATTCATTT
TITATTCTGT
ATAAT1CCCCA
GAAAACGCTT
AT ATG AAA Met Lys 1
CATTGAAATA
CATCCGCAGC
TGCATAGCAA
.ATTTCTTTTA
TTITATATC!
TAAGGGAGAT
TTTGTGAT
ACTGACATGT
CGCAATACTT
CTCTCTCATr
TCAATATGAG
TCAATGTTGT
TTTTACATIT
AAAATAATTA
120 180 240 300 360 420 473 GAA AAA ATC CTT TTA Giu. Lys Ile Leu Leu GGC GOT TAT Gly Gly Tyr ACT AAA CGT GTA Thr Lys Arg Val
TOT
Ser 15 AAA GGC GTT TAC Lys Gly Val Tyr AGT GTT CTA TTA Ser Val Leu Leu GTT GCA GCG GTT Val Ala Ala Val GAT AGC Asp Ser AAG AAA GCT GAA TTG TCG OCT TTA ACT Lys Lys Ala Giu Leu Ser Ala Leu Thr
GAA
Giu.
CAA
Gin AAT CC.A ACT TAT ATC ACT C'IT GAT CAA Asn Pro Thr Tyr Ile Thr Leu Asp Gin
AAA
Lys 50 GGG CAC CTC TAC Gly His Leu Tyr
ACT
Thr TOT GCT GCT OAT Cys Ala Aia Asp AAT GO-T GOT GGA ATT GCT GCC TTT GAT Asn Gly Gly Gly Ile Ala Ala Phe Asp TTC OAT Phe 'Asp GGT CAA AAT Gly Gin Asn TTG TGT TAT Leu Cys Tyr
ACA
Thr ACT CAC CTA G Thr His Leu. Oly
AAT
Asn 80 GTA ACG AGT ACT Val Thr Ser Thr GGA GCC CCT Gly Ala Pro TAT GGT GCC Tyr Gly Ala GTG GCT GTT GAT Val Ala Val Asp
GAA
Glu GCA CGT CAA CTC Ala Arg Gin Leu.
AAC TAT Asn Tyr 105 CAC TTG GGT GAA GTT COT GTG TAC AAA His Leu Oly Glu Val Arg Val Tyr Lys 110
ATT
Ile 115 CAA GCT GAT GGT Gin Ala Asp Gly
TCC
Ser 120 CTT AGA TTA ACC OAT ACA OTT AAA CAT Leu Arg Leu Thr Asp Thr Val Lys His
AAT
Asn.
130 GGT TCT GOC CCT Oly Ser Gly Pro 809 857 905 CCT GAO CAA OC.A AOT TCT CAT OTC CAT Pro Giu Gin Ala Ser Ser His Val His 140
TAC
Tyr 145 TCT OAT TTA ACT Ser Asp Leu Thr CCA OAT Pro Asp 150 GOT COT CTT Oly Arg Leu
OTT
Val 155 ACT TOT OAT TTA GOT ACA OAT OAA OTO Thr Cys Asp Leu Oly Thr Asp Giu Val ACT OTT TAC Thr Vai Tyr 165 WO 98/07867 GAT GTT ATT Asp Val Ile 170 164 PCT/DK97/00336 GGT GAA GGT AAA Gly Giu Gly Lys
CTC
Leu 175 AAT ATC GTT ACG Asn Ile Val Thr
ATT
Ile 180 TAT CGT CO Tyr Arg Ala 1001 GAA AAA Glu Lys 185 GGA ATG GGA GCT Gly Met Gly Ala
CGT
Arg 190 CAC ATC AGC TTC His Ile Ser Phe CCT AAT GGA AAA Pro Asn Gly Lys
ATT
Ile 200 GCT TAT CTC GTC Ala Tyr Leu Val
GGA
Gly 205 GAA TTA AAT TCA Giu Leu Asn Ser
ACT
Thr 210 ATT GAA GTT CTA Ile Glu Val Leu
AGC
Ser 215 1049 1097 1145 TAT AAT GAA GAA Tyr Asn Glu Giu
AAA
Lys 220 GGA CGA TTC GCT Gly Arg Phe Ala
CGT
Arg 225 CTT CAA ACA ATO Leu Gin Thr Ile AGT ACT Ser Thr 230 TTA CCT GAA Leu Pro Glu TOT TCT GAT Ser Ser Asp 250
GAC
Asp 235 TAT CAC GGA GC Tyr His Gly Ala
AAT
Asn 240 GGA GTA GOT GOT Gly Val Ala Ala ATT OGA ATT Ile Arg Ile 245 CAC GAO TOT His Asp Ser 1193 1241 GGT AAG TTO OTO Gly Lys Phe Leu
TAT
Tyr 255 GOT TOT AAT OGT Ala Ser Asn Arg
GGG
Gly 260 TTA GOA Leu Ala 265 ATT TAO AAG GTA Ile Tyr Lys Val
AGT
Ser 270 OCT OTO GGA ACA Pro Leu Gly Thr
AAA
Lys 275 TTA GAA TOT ATT Leu Giu Ser Ile
GOT
Gly 280 TGG ACA AAG ACT Trp Thr Lys Thr
GAA
Glu 285 TAT OAT ATT OCA Tyr His Ile Pro
CC
Arg 290 CAT TTT AAT TTT Asp Phe Asn Phe
AAT
Asn 295 1289 1337 i385 AAA ACC GAA GAT Lys Thr Giu Asp
TAT
Tyr 300 ATO ATT GTC GOT Ile Ile Val Ala
OAT
His 305 CAA GAA TOT CAT Gin Glu Ser Asp AAT TTA Asn Leu 310 ACT OTT TTO Thr Leu Phe AAA GAO TTT Lys Asp Phe 330
TTG
Leu 315 AGA GAT AAA AAT Arg Asp Lys Asn
ACA
Thr 320 GGG TOA TTA ACO Gly Ser Leu Thr TTA GAA CAA Leu Giu Gin 325 1433 1484 TAO GOT OCT GAA- Tyr Ala Pro Clu
ATT
Ile 335 ACT TGT OTT TTA Thr Cys Val Leu COT TTG Pro Leu 340
TAAAAACTA
AACTTTAGTA
ATTGCCAAAT
CGATTTCTAA
TTAAAAAAOT
GTAOAOATAT
AAAAAGCCGC
AAGCTGOAGT
CAATGGCTCT
CTGGOOGTGG
ATAATGCAAT
CTGTTGAAAT
CATCAAOAGC
CCTTTOACCC
CGATTGAAGC
AATCTTGCTT
CTTTAAAAGG
AAGTTTGATG
TTGTTAAACT
GGCAACTAALA
AAAATTCCAA
TCTTAAATTT
TGCAGCAAGC
AGTTGTTGAG
CAAAAATGAT
CGCAAGCCCC
CATCTTTAAA
ACAAGCACAA
TGGTGCACCT
TTGTTTTTTO
ATTGGCAATA
AGTTTTTTTG
ATTOACGTAA
AAAGCCGCTC
GGAAGTGTOG
GAAGGATACA
AAAOATTCTC
GACAAAGATA
AAAACAGTTG
CTTGGAGTAC
TCATTATTAA
AAATGCTCAA
GA1AGACTTTA
ACAXAGTTTT
TTTTTTTGTC
TAAATTTCAT
AAGAAAGTGA
CAGOTGCAAA.
OTTATACTGA
CAOAAACTCA
TGGAACTCGC
CAAAAAACCA
GOGTTATCGC
TTGCTGGTAT
CTGCAAAGAC
GCOATGCGGC
TTCAATGGAT
ACTAAATCAG
TGAAAOCCTT
CACAATATCG
ATGGAATCAC
GAAAGTTTTA
TCAATTAGTC
AGTTGATAOT
TCACGAAGCC
TTTTGOTTOT
TGAAAACAAA
TOTCCOAACA
ACGTAATGCT
AAAAATTGTT
TGAAGTACCC
ACAAAAAAAT
GCTTATAAAG
OTTGACTTOT
AAAGGAGAAC
AGCGOTGAAG
AAAAAAGCTC
ATTGTTGCTG
GTTAATGAA.A
GAATCTGTTT
GTTGOTGGTT
ACTAATCCAA
ATTGTOTTTG
TATGATGCTG
AGTCTTGATA
1544 1604 1664 1724 1784 1844 1904 1964 2024 2084 .2144 2204 2264 2324 SUBSTITUTE SHEET (RULE 26) WO 98/07867
TGACGACTGC
GTATGGTCAA
GTGCAG'rrrA
CAAAACGTTT
CAATCTATG1A
AAGATTACAA
TAACTGGTCC
TCCCTAAAGA
TTTCTrCTGA
GAATTGAAAT
AAATCGGTGC
GTATCCTCGT
TGCGTCCATC
TGAGTACATA
GGGTTCGTTT
TGCC.ACACGT
TTGATAAAGT
GCTCAGTCCA
ATTTTGAACC
TTGGTCGTTT
GTTTGAAAGA
TCAAATTCTA
GTTCTGAAGT
TI'GCTGACTA
TACCAAAACG
ATGTTTCTGT
TCTTI'GAAAA
AAAAAGCTCG
CTTTCCTrGG
.ATGGTCTTGC
ACGTTAAATT
AAATTTCACG
CTrTGGTrGC
GAAATGGTGT
ACGATGAC.
TCTTGTrAGA
TCGCTAACCT
AACAACTCT
TGAATTrCGTC
TTCGATATAG
AGCGATAGAA
PCT/DK97/00336 165 TrTGATrCAA
TGCCGCGCTT
TGTTGATGCA
TGATAACGGA
TGAATTTGTC
GGCAATTGAA
TGTTGCTGGT
TAAAGATGTT
AAAACTITCT
TGTACGTAGC
AATGcIACGAC
TAACCAACCT
ATTGACGCTC
CGATCTAETG
GCCAAAAGAA
CCACAAAGCT
TTTGGAACAA
ACCTGACCCA
TGACACTGTC
GATITATGAA
GATCTTCCAA
CCACCCACAC
GACTCCATTT
TCAATTGACA
TACTGTrTCT
C.ATGTCITCT
CTTGACTGAG
CGAAAACATG
AATrAACCAC CA'rrGCTATC
TACCCCTTAC
CTTCATGGGA
TGAACTrAAA
AGATAAAGCT
ATGTACACCT
CCAATATTAA
GCTAAAGCAG
GTTAGATCAA
.AGAGCGTrT
GCTCTTTTCA
GTGATCC
AATAGAGGAA TTGCTACAAT AAGTCTGGTA ATCCTrCACT ACTGCAAATA TCGATCGTGC
ATGATTTGTG
GCTAAAATGC
AGTTITGTTT
CGTTCTGGTC
CTTCTrrrrG
CCTTTGCTT
TTACTTGCTT
CCATTTGTCA
GACTCTATCG
GGAACTGGTT
AATGTTAAAA
A1TTTACTACG
TTCATTGTTG
CTTGCTATCC
ACTTTGAGTG
.ATCTGTCTTG
TATGATGCTC
GAGTTAGCTC
AAAGCACAAA
GCGGITATCA
CCTCAAGTrG
TGGTCTGGGA
GACTATACAA
TCTTATCATT
CACAATGCTG
TCACTTGCTC
GCTATGCCAC
CCACGTTATG
TTTGCTGGCA
AAATTGACTG
CATCTI'GAAC
GCTAATCCAC
TATATTAATT
GAAGTCGCAA
TACTAATAAT
TTATTATAGC
CTCCATTGAT
CGACTGAAAA
CAACGCAAGG
TCGTTGAACG
AATGGATTGC
AACTTGATAA
CAATCTACAA
ACCAAGGAGC
AAGAATACGG
GTGGGGTCGG
TCTrGCAACT
TGGTGTAGGT
TGTTGAAGAT
CTCTGCAGTr
CGCTTATATG
TGCTGGTGAA
TGAACAAGCT
GAAAAATATT
ATCAGAAACA
TGGTCACAAC
AATTAAAGTC
AGATATTTAT
CATGGGGGAA AAA ETCACTT C.AGTGGCTA.A ACGTCGTAAT
AAAAAAATGC
CCGACCCTGG
GCCCAACTCA.
AAGCAATGC
GTGGTGGTTC
GTGGTGAGGC
AAAAATTTGT
TGGTTGCTAT
CTGATGATGA
CCATTGTTGA
TTGATGCTAT
AACCAATTTC
ATc4ACCCAGC
CAACACTCGC
ATAAAATTGC
ATGTCATTAA
AAACTTATCG
AAGAAGA'ITC
ATAGTATTGA
GTGAGCTTGA
GTCAACCAAG
ATAGTA'FrrG
TGGTACGTCA
CTGTTGATAA
TTATACAACT
TTATATTTAT
AATTTCTrAC TATGGTrAAA
AGTTGAAACA
AATCGCTCGT
TGCTCTCGAT
TGACCTTrCC TGATATrCGT
CCCTACTACT
AACTCACGTT
CCCTGAGTTT
GTC.ACACGCG
ACTTCAAGCC
TCATCCAACC
TGGTATGGCC
TGGTGAATT
ATTTAACGCT
TGCGCAAGAA
AGATGAAAAA
TATTAATATC
TAAA LrGGCT
AATTGATGAG
GAACCGAACG
ACCAAGAATT
AAATAATTAA.
ATCAAAAGGT.
ATAAAAATCA
GGTGGTCCAG
GCTGGTAATG
CTTTTGC'rIT
ATTGATGCAT
GTTCCTAAAA
GGTTTTIGGTG
GGTGITAACG
GGGGAAGCTC
CGTGAAGAAG
GCTGCCATTC
GAAGCTTCTC
ACTGATGCAA
TCACACAATT
CGCCCTCAAT
TTACAAGAAT
TTCGGTTTCG
AGCATTTATG
CAAATGAACC
GCTGGTAAGA
GATGACGCAA
AAACGTATrA
TCTGGTACTG
AAATATCCAC
GTTATGACTG
CTTGAATCTr
ATCAAACTCA
AAAGAAGGTC
TrCGCCAATG
GGGCTTCCTC
GTAACAGGAA
GACTACGCTG
GCGGTCAAAG
ACCCTTTCAG
GACCTTGTr
ATTAAACAAC
ATATCCATc4C GATGAGA LTA
AACGCTCTGA
A.TAAATCAAT
ATAATTAAT
2384 2444 2504 2564 2624 2684 2744 2804 2864 2924 2984 3044 3104 3164 3224 3284 3344 3404 3464 3524 3584 3644 3704 3764 3824 3884 3944 4004 4064 4124 4184 4244 4304 4364 4424 4484 4544 4604 4664 4724 4741 INFORMATION FOR SEQ ID NO:29: SEQUENCE CHARACTERISTICS: LENGTH: 341 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal WO 98/07867 166 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: PCTI.DK97/00336 Met Lys Giu Lys Ile Leu Leu Gly Gly Tyr 1 Gly Leu Gin Ile Val Arg Tyr His Tyr 145 Thr Ile Ser Ser Arg 225 Gly Ser Gly Pro Hi s 305 Gly Cys Val Thr Lys Ala Thr Gin Lys Asn 130 Ser Asp Val Phe Thr 210 Leu Val Asn Thr Arg 290 Gin Ser Val Tyr Giu Gly Ala Ser Leu Ile Giy Asp Giu Thr His 195 Ile Gin Al a Arg Lys 275 Asp Giu Leu Leu Ser Val Hisr Phe Thr Val 100 Gin Ser Leu Val Ile 180 Pro Giu Thr Aia Gly 260 Leu Phe Ser Thr Pro 340 5 Val Ala Leu Asp Gly Tyr Ala Gly Thr Thr 165 Tyr Asn Vai Ile Ile 245 His Giu Asn Asp Leu 325 Leu Leu Ala Tyr Phe 70 Ala Giy Asp Pro Pro 150 Val Arg Gly Leu Ser 230 Arg Asp Ser Phe Asn 310 Glu Leu Val Thr 55 Asp Pro Ala Gly Arg 135 Asp Tyr Ala Lys Ser 215 Thr Ile Ser Ile Asn 295 Leu Gin Asp Gin 40 Cys Gly Leu Asn Ser 120 Pro Gly Asp Giu Ile 200 Tyr Leu Ser Leu Gly 280 Lys Thr Lys Ser 25 Asn Ala Gin Cys Tyr 105 Leu Giu Arg Val Lys 185 Al.a Asn Pro Ser Ala 265 Trp Thr Leu Asp 10 Lys Pro Ala Asn Tyr 90 His Arg Gin Leu Ile 170 Gly Tyr Giu Giu Asp 250 Ile Thr Glu Phe Phe 330 Thr Lys Thr Asp Thr 75 Val Leu Leu Al a Val 155 Gly met Leu Glu Asp 235 Gly Tyr Lys Asp Leu 315 Tyr Lys Al a Tyr Gly Thr Al a Gly Thr Ser 140 Thr Giu Gly Val Lys 220 Tyr Lys Lys Thr Tyr 300 Arg Al a Arg Glu Ile Asn His Val Giu Asp 125 Ser Cys Gly Ala Gly 205 Gly His Phe Val Glu 285 Ile Asp Pro Val Ser Leu Ser Thr -Leu Gly Giy Leu Gly Asp Glu Val Arg 110 Thr Val His Val Asp Leu Lys Leu 175 Arg His 190 Giu Leu Arg Phe Giy Ala Leu Tyr 255 Ser Pro 270 Tyr His Ile Val Lys Asn Giu Ile 335 Lys Ala Asp Gly Asn Aia Val Lys His Gly 160 Asn Ile Asn Ala Asn 240 Aia Leu Ile Ala Thr 320 Thr INFORMATION FOR SEQ ID Wi SEQUENCE CHARACTERISTICS: LENGTH: 4741 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA WO 98/07867 167 (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 1733... .4441 (xi) SEQUENCE DESCRIPTION: SEQ ID PCT/DK97/00336
TI'TGGTGACC
TAATGTCTCC
GCGTACACTA
ATACATTGTT
TAAATGAAAG
CTGACAAATT
TI'TTCTAAGA
TAGTAAATAA
ATACTAAACG
TGTCGGCTTT
AAGGGCACCT
TCGATGGTCA.
ATGTGGCTGT
TTCGTGTGTA
ATGGITCTGG
CAGATGGTCG
TI'GGTGAAGG
GTCACATCAG
CTAITGAAGT
GTACTTTACC
ATGGTAAGTT
GTCCTCTCGG
GCGATTTTAA
ATTTAACTCT
TTTACGCTCC
GCTTTG'rrr
AAGGATTGGC
GATGAGTTTT
AACTATTCAC
GAAGTGAACA
TI'TAAAATA
CGACTrTGTC
TAAGCTACT
CTATCCTACC
TAACGAATAT
ATAAAA-rAA AATAGTTrGT
TGTATCTAAAL
AACTGAAGTr
CTACACTTGT
AAATACAACT
TGATGAAGCA
CAAAATITCAA
CCCTCGACCT
TCTTGTTACT
TAAACTCAAT
CTTCCATCCT
TCTAAGCTAT
TGAAGACTAT
CCTCTATGCT
AACAAAMTrA
TTITTAATAAA
TTTCTTGAGA.
TGAAATTACT
TTTCACAAAG
AATATrTTT
T'ITGTAAAT
GTAAAAGAAA
CCAGCTTCAA
GTrIT CCTC
GAGACGAAAT
TTGCAAGCAT
CCCCTrrCTr TITrGCCTAT TATI1'TrGCT
GAGGAGAGAA.
GGCGTTACA
GCAGCGGTTC
GCTGCTGATG
CACCTAGGGA
CGTCAACTCG
GCTGATGGTT
GAGCAAGCAA
TGTGATTTAG
ATCGTTACGA
AATGGAAAAA
AATGAAGAAA
CACGGAGCCA
TCTAATCGTG
GAATCTAITG
ACCGAAGATT
GATAAAAATA
TGTGTrTAC TITrACTAAA
TGTCTGAAAC!
TCATCACAAT
GTGAATGGAA
GAAGTTGTTT
TTTC.ATCTGT
GCGAGATGGT
CTATTCATTT
TTTAITrCTGT
ATAATCCCCA
GAAAACGCT
ATATGAAAGA
GTGTTCTATT
AAAATCCAAC
GAAATGGTGG
ATGTAACGAG
TTTATGGTGC!
CCCITAGATT
GTTCTCATGT
GTACAGATGA
TTTATCGTGC
TTGCTTATCT
.AAGGACGATT
ATGGAGTAGC
GGCACGACTC
GTTGGACAAA
ATATCATrGT
CAGGGTCAT
CTTTGTAAAA
TCAGACAAAA
CCTTGCTTAT
ATCGCTTGAC
TCACAAAGGA
CATTGAAATA
CATCCGCAGC
TGCATAGCAA
ATI-rCTTTrrA
TTTTTATATC
TAALGGGAGAT
TTTTTGTGAT
AAAAATCCTT
AGATAGCAAG
TATATCACT
TGGAATTGCT
TACTGGAGCC
CAACTATCAC
AACCGATACA
CCATTACTCT
AGTGACTGTT
CGAAAAAGGA
CGTCGGAGAA
CGC!TCGTCTT
TGCTATTCGA
TTAGCAA'rr
GACTGAATAT
CGCTCATCAA
AACGTTAGAA
ACTAAACTr
AAATATTGCC
AAAGCGATTT
TrCTTAAAA
GAACGTACAC
ACTGACATGT
CGCAATAC~r
CTCTCTCATT
TCAATATGAG
TCAATGTTGT
TITTACATTr
AAAATAATTA
TI'AGGCGGTT
AAAGCTGAAT
CTTGATCAAA
GCCTTTGATT
CCTTTGTGTr
TTGGGTGAAG
GTTAAACATA
GATTTAACTC
TACGATGTrA.
ATGGGAGCTC
TTAAATTCAA
CAAACAATCA
AT'FrCTTCTG
TACAAGGTAA
CATA'FrCCAC
GAATCTGATA
CAAAAAGACT
AGTAAATCTT
AAATCTTITAA
CTAAAAGTT
AACTTTGTTA.
AT ATG GCA.
Met Ala 120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 1738 ACT AAA AAA GCC GCT CCA GCT GCA AAG AAA GTT TTA AGC GCT GAA GAA Thr Lys Lys Ala Ala Pro Ala Ala Lys Lys Val Leu Ser Ala Glu Glu 1786 AAA GCC Lys Ala AAA AAA Lys Lys GCA AAA TTC CAA GGA AGT GTC GCT TAT ACT GAT CAA TTA GTC Ala Lys Phe Gin Gly Ser Val Ala Tyr Thr Asp Gin Leu Val 1834 GCT CAA GCT GCA GTT Ala Gin Ala Ala Val 40 CTT AAA TTT Leu Lys Phe GAA GGA TAC ACA CAA Glu Gly Tyr Thr Gin 45
ACT
Thr 1882 CAA GTT GAT ACT ATT G'rr GCT GCA ATG GCT CTT GCA GCA AGC Gin Val Asp Thr Ile Val Ala Ala Met Ala Leu Ala Ala Ser AAA CAT Lys His 1930 TCT CTG GAA CTC Ser Leu Glu Leu OCT CAC GAA GCC OTT AAT GAA ACT GGC COT GGA GTT Ala His Glu Ala Val Asn Glu Thr Gly Arg Gly Val 1978 WO 98/07867 GTT GAG GAC Val Giu Asp PCT/DK97/00336 168 AAA GAT ACA AAA AAC CAT Lys Asp Thr Lys Asn His 90 TTT GCT TCT Phe Ala Ser
GAA
Glu TCT GTT TAT Ser Val Tyr 2026 AAT GCA Asn Ala 100 ATC AAA AAT GAT Ile Lys Asn Asp AAA ACA Lys Thr 105 GTT GGC GTT ATC GCT GAA AAC AAA Val Gly Val Ile Ala Glu Asn Lys 110 2074 GTT GCT GGT TCT Val Ala Gly Ser 115 ATT GTC CCA ACA Ile Val Pro Thr GTT GAA Val Glu 120 ATC GCA AGC CCC Ile Ala Ser Pro
CTT
Leu 125 GGA GTA CTT GCT Gly Val Leu Ala
GGT
Gly 130 2122 2170
ACT
Thr 135 AAT CCA ACA TCA Asn Pro Thr Ser
ACA
Thr 140 GCC ATC TTT AAA Ala Ile Phe Lys TCA TTA Ser Leu 145 TTA ACT GCA Leu Thr Ala GCA CAA AAA Ala Gin Lys 165
AAG
Lye 150 ACA CGT AAT GCT Thr Arg Asn Ala
ATT
Ile 155 GTC TTT GCC TTT Val Phe Ala Phe CAC OCA CAA His Pro Gin 160 GAT GCT GCG Asp Ala Ala 2218 2266 TGC TCA AGC CAT Cys Ser Ser His
GCG
Ala 170 GCA AAA ATT GTr Ala Lye Ile Val ATT GAA Ile Glu 180 GCT GGT GCA CCT Ala Gly Ala Pro
GAA
Glu 185 GAC T ATT Asp Phe Ile CAA TGG Gin Trp 190 ATT GA GTA CCC Ile Giu Val Pro
AGT
Ser 195 CTT GAT ATG ACG Leu Asp Met Thr
ACT
Thr 200 GCT TrG ATT CAP Ala Leu Ile Gin
AAT
Asn 205 AGA GGA ATT GCT Arg Gly Ile Ala
ACA
Thr 210 2314 2362 2410 ATI CTT GCA ACT Ile Leu Ala Thr
GGT
Gly 215 GGT CCA GGT ATG Gly Pro Gly Met
GTC
Val 220 AAT GCC GCG CTT Asn Ala Ala Leu AAG TCT Lys Ser 225 GGT AAT CCT Gly Asn Pro GAT GCA ACT Asp Ala Thr 245
TCA
Ser 230 CTT GGT GTA GGT Leu Gly Vai Gly
GCT
Ala 235 GGT AAT GGT GCA Gly Asn Gly Ala GTT TAT GTT Val Tyr Val -240 TTG CTT TCA Leu Leu Ser 2458 2506 GCA AAT ATC GAT Ala Asn Ile Asp
CGT
Arg 250 GCT GTT GAA GAT Ala Val Glu Asp AAA CGT Lys Arg 260 TTT GAT AAC Phe Asp Asn GGA ATG Gly Met 265 ATT TGT GCG ACT Ile Cys Ala Thr
GAA
Glu 270 AAC TCT GCA GTT Asn Ser Ala Val
ATT
Ile 275 GAT GCA TCA ATC Asp Ala Ser Ile
TAT
Tyr 280 GAT GAA TTT GTC Asp Glu Phe Val
GCT
Ala 285 AAA ATG CCA ACG Lys Met Pro Thr
CAA
Gin 290 2554 2602 2650 GGC GCT TAT ATG GTT Gly Ala Tyr Met Val 295 CCT AAA AAA GAT Pro Lye Lys Asp AAG GCA ATT GA Lys Ala Ile Glu AGT TTT Ser Phe 305 WO 98/07867 OTT rrc GTT Val Phe Val GCT GGT CGT Ala Gly Arg 325 PCTIDK97/00336 169
GAA
Glu 310 CGT OCT GGT GAA GOT TTT GGT GTA ACT Arg Ala Gly Glu Oly Phe Gly Val Thr 315 GGT CCT GTT Gly Pro Val 320 GTT AAC GTC Val Asn Val 2698 TCT GOT CAA TGG Ser Gly Gin Trp
ATT
Ile 330 OCT OAA CAA OCT Ala Glu Gin Ala
GGT
Gly 335 2746 CCT AAA Pro Lys 340 OAT AAA OAT OTT Asp Lys Asp Val
CTT
Leu 345 CTT TTT GAA CTT Leu Phe Olu Leu OAT AAG AAA AAT Asp Lys Lys Asn 350 TTG CTT TCA ATC Leu Leu Ser Ile
ATT
Ile
TAC
Tyr 370
GGG
Oly 355 GAA OCT CTT TCT Glu Ala Leu Ser GAA AAA CTT TCT Glu Lys Leu Ser
CCT
Pro 365 2794 2842 2890 AAA TCA GAA ACA Lys Ser Giu Thr
CGT
Arg 375 GAA OAA OGA ATT Glu Giu Oly Ile
GAA
Olu 380 ATT OTA COT AGC Ile Val Arg Ser TTA CTT Leu Leu 385 OCT TAC CAA Ala Tyr Gin GAC GAC CCA Asp Asp Pro 405
OGA
Gly 390 OCT GOT CAC AAC Ala Gly His Asn
OCT
Ala 395 0CC ATT CAA ATC Ala Ile Gin Ile GGT GCA ATO Oly Ala Met 400 2938 TTT OTC AAA OAA Phe Val Lys Glu
TAC
Tyr 410 OGA ATT AAA OTC OAA OCT TCT CGT Oly Ile Lys Val Glu Ala Ser Arg 415 2986 ATC CTC Ile Leu 420 OTT AAC CAA CCT Val Asn Oin Pro
GAC
Asp 425 TCT ATC GOT GGG Ser Ile Gly Oly
OTC
Val 430 OGA OAT ATT TAT Oly Asp Ile Tyr
ACT
Thr 435 OAT OCA ATO CGT Asp Ala Met Arg
CCA
Pro 440 TCA TTO ACO CTC Ser Leu Thr Leu OGA ACT Oly Thr 445 GGT TCA TGG Gly Ser Trp
GGG
Oly 450 3034 3082 3130 AAA AAT TCA CTT Lys Asn Ser Leu
TCA
Ser 455 CAC AAT TTO AGT His Asn Leu Ser TAC OAT CTA TTO Tyr Asp Leu Leu AAT OTT Asn Val 465 AAA ACA GTG Lys Thr Val AAA OAA ATT Lys Giu Ile 485
OCT
Ala 470 AAA COT COT AAT Lys Arg Arg Aso
CGC
Arg 475 CCT CAA TGG Pro Gin Trp TAC TAC OAA AAA Tyr Tyr Olu Lys
AAT
Asn 490 OCA ATT TCT TAC Ala Ile Ser Tyr OTT COT TTG CCA Val Arg Leu Pro 480 TTA CAA GAA TTO Leu Gin Glu Leu 495 GGT ATG GTT AAA Gly Met Val Lys 3178 3226 CCA CAC Pro His 500 OTC CAC AAA OCT TTC ATT OTT 0CC GAC Val His Lys Ala Phe Ile Val Ala Asp 505
CCT
Pro 510 3274
TTC
Phe 515 GGT TTC OTT GAT Oly Phe Val Asp
AAA
Lys 520 OTT TTG GAA CAA Val Leu Glu Gin CTT OCT Leu Ala 525 ATC COC CCA Ile Arg Pro 3322 WO 98/07867 PCT/DK97/00336 170 CAA GTT GAA ACA Gin Val Glu Thr
AGC
Ser 535 ATT TAT GGC TCA Ile Tyr Gly Ser
GTC
Val 540 CAA CCT GAC CCA Gin Pro Asp Pro ACT TTG Thr Leu 545 3370 AGT GAA GCA Ser Glu Ala ACT GTC ATC Thr Val Ile 565
ATT
Ile 550 GCA ATC GCT CGT Ala Ile Ala Arg
CAA
Gin 555 ATG AAC CAT TTT Met Asn His Phe GAA CCT GAC Glu Pro Asp 560 GGT AAG ATT Gly Lys Ile 3418 3466 TGT CTT GGT GGT Cys Leu Gly Gly
GGT
Gly 570 TCT GCT CTC GAT Ser Ala Leu Asp
GCT
Ala 575 GGT CGT Gly Arg 580 TTG ATT TAT GAA Leu Ile Tyr Glu
TAT
Tyr 585 GAT GCT CGT GGT Asp Ala Arg Gly GAG GCT GAC CTT Glu Ala Asp Leu 590 TTA GCT CAA AAA Leu Ala Gin Lys
TCC
Ser
TTT
Phe 610
GAT
Asp 595 GAC GCA AGT TTG Asp Ala Ser Leu
AAA
Lys 600 GAG ATC TTC CAA Glu Ile Phe Gin
GAG
Glu 605 3514 3562 3610 GTT GAT ATT CGT Val Asp Ile Arg
AAA
Lys 615 CGT ATT ATC AAA Arg Ile Ile Lys TAC CAC CCA CAC Tyr His Pro His AAA GCA Lys Ala 625 CAA ATG GTT Gin Met Val CCA TTT GCG Pro Phe Ala 645
GCT
Ala 630 ATC CCT ACT ACT Ile Pro Thr Thr
TCT
Ser 635 GGT ACT GGT TCT Gly Thr Gly Ser GAA GTG ACT Glu Val Thr 640 TAT CCA CTT Tyr Pro Leu 3658 3706 GTT ATC ACT GAT Val Ile Thr Asp GAA ACT CAC GTT Glu Thr His Val
AAA
Lys 655 GCT GAC Ala Asp 660 TAT CAA TTG ACA Tyr Gin Leu Thr
CCT
Pro 665 CAA GTT GCC ATT Gin Val Ala Ile
GTT
Val 670 GAC CCT GAG TTT Asp Pro Glu Phe
GTT
Val 675
ATG
Met ATG ACT GTA CCA Met Thr Val Pro TCA CAC GCG CTT Ser His Ala Leu 695
AAA
Lys 680 CGT ACT GTT TCT Arg Thr Val Ser
TGG
Trp 685 TCT GGG ATT GAT Ser Gly Ile Asp
GCT
Ala 690 3754 3802 3850 GAA TCT TAT GTT Glu Ser Tyr Val GTC ATG TCT TCT Val Met Ser Ser GAC TAT Asp Tyr 705 ACA AAA CCA Thr Lys Pro ACT GAG TCT Thr Glu Ser 725
ATT
Ile 710 TCA CTT CAA GCC Ser Leu Gin Ala
ATC
Ile 715 AAA CTC ATC TTT Lys Leu Ile Phe GAA AAC TTG Glu Asn Leu 720 GAA GGT CAA Glu Gly Gin 3898 3946 TAT CAT TAT GAC Tyr His Tyr Asp
CCA
Pro 730 GCT CAT CCA ACC Ala His Pro Thr
AAA
Lys 735 AAA GCT Lys Ala 740 CGC GAA AAC ATG CAC AAT GCT GCA ACA Arg Glu Asn Met His Asn Ala Ala Thr
CTC
Leu 750 GCT GGT ATG GCC Ala Gly Met Ala 3994 WO 98/07867 PCT/DK97/00336 171
TTC
Phe 755 GCC AAT GCT TTC Ala Asn Ala Phe err Leu 760 GGA ATT AAC CAC Gly Ile Asn His
TCA
Ser 765 CTT GCT CAT AAA Leu Ala His Lys 4042 4090 GCT GGT GAA TTT Ala Gly Glu Phe
GGG
Gly 775 CTT CCT CAT GGT Leu Pro His Gly GCC ATT GCT ATC Ala Ile Ala Ile GCT ATG Ala Met 785 CCA CAT GTC Pro His Val CCT TAC CCA Pro Tyr Pro 805
ATT
Ile 790 AAA TTT AAC GCT Lys Phe Asn Ala
GTA
Val 795 ACA GGA AAC GTT Thr Gly Asn Val AAA TTT ACC Lys Phe Thr 800 4138 CGT TAT GAA ACT Arg Tyr Giu Thr
TAT
Tyr 810 CGT GCG CAA GAA GAC TAC GCT GAA Arg Ala Gin Glu Asp Tyr Ala Glu 815 4186 ATT TCA Ile Ser 820 CGC TTC ATG GGA Arg Phe Met Gly TTr Phe 825 GCT GGC AAA Ala Gly Lys GAA GAT Glu Asp 830 TCA GAT GAA AAA Ser Asp Giu Lys
GCG
Ala 835 GTC AAA GCT TTG Val Lys Ala Leu
GTT
Val 840 GCT GAA CTT AAA Ala Glu Leu Lys
AAA
Lys 845 TTG ACT GAT AGT Leu Thr Asp Ser
ATT
Ile 850 4234 4282 4330 GAT ATT AAT ATC Asp Ile Asn Ile CTT TCA GGA AAT Leu Ser Gly Asn GTA GAT AAA GCT Val Asp Lys Ala CAT CTT His Leu 865 GAA CGT GAG Glu Arg Glu ACA CCT GCT Thr Pro Ala 885
CTT
Leu 870 GAT AAA TTG GCT Asp Lys Leu Ala
GAC
Asp 875 CTT GTT TAC GAT Leu Val Tyr Asp GAC CAA TGT Asp Gin Cys 880 AAA CAA CTC Lys Gin Leu 4378 4426 AAT CCA CGT CAA Asn Pro Arg Gin
CCA
Pro 890 AGA ATT GAT GAG Arg.Ile Asp Glu
ATT
Ile 895 TTG TTA Leu Leu 900 GAC CAA TAT Asp Gin Tyr TAATATATTA ATTATAGTAT TTGGAACCGA ACGATATCCA T 4482
GCTCGCTAAC
TAAACAACTC
GATGAATTCG
ATTTCGATAT
TTAGCGATAG
CTGCTAAAGC AGGAAGTCGC AATGGTACGT CAACCAAGAA TTGTTAGATC AATACTAATA ATCTGTTGAT AAAAATAATT TCAGAGCGTT TTTTATTATA GCTTATACAA CTATCAAAAG AGGCTCTTTT CACTCCATTG ATTTATATTT ATATAAAAAT
AAGTGATCC
TTGATGAGAT
AAAACGCTCT
GTATAAATCA
CAATAATTAA
4542 4602 4662 4722 4741 INFORMATION FOR SEQ ID NO:31: SEQUENCE CHARACTERISTICS: LENGTH: 903 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal WO 98/07867 WO 9807867PCT/DK97/00336 172 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: Met Ala Thr Lys Lys Ala Ala Pro Ala Ala 1 Giu Glu Leu Val Gln Thr so Lys His Gly Val Val Tyr Asn Lys Ala Gly 130 Ser Leu 145 Pro Gin Ala Ala Val Pro Ala Thr 210 Lys Ser 225 Tyr Val Leu Ser Ala Val Thr Gin 290 Ser Phe 305 Pro Val Asn Val Asn Ile Ile Tyr 370 Leu Leu 385 Ala Met Ser Arg Lys Lys Gin Ser Val Asn Vai 115 Ile Leu Ala Ile Ser 195 Ile Gly Asp Lys Ile 275 Gly Val Ala Pro Gly 355 Lys Al a Asp Ile Ala Lys Val Leu Giu Ala 100 Ala Val Thr Gin Giu 180 Leu Leu Asn Ala Arg 260 Asp Ala Phe Gly Lys 340 Giu Ser Tyr Asp Leu Al Ala Asp Giu Asp Ile Gly Pro Ala Lys 165 Ala Asp Ala Pro Thr 245 Phe Ala Tyr Val Arg 325 Asp Ala Giu Gin Pro 405 Val Lys Gin Thr Leu 70 Lys Lys Ser Thr Lys 150 Cys Gly met Thr Ser 230 Ala Asp Ser Met Giu 310 Ser Lys Leu Thr Gly 390 Phe Asn Phe Al a Ile 55 Ala Asp Asn Val Thr 135 Thr Ser Ala Thr Gly 215 Leu Asn Asn Ile Val 295 Arg Gly Asp Ser Arg 375 Ala Val Gin Gin Ala 40 Val His Thr Asp Glu 120 Asn Arg Ser Pro Thr 200 Gly Gly Ile Gly Tyr 280 Pro Ala Gin Val Ser 360 Giu Gly Lys Pro Gly 25 Vai Al a Giu Lys Lys 105 Ile Pro Asn His Glu 185 Ala Pro Val Asp Met 265 Asp Lys Gly Trp Leu 345 Glu Glu His Giu Asp 425 Ser Leu Ala Ala Asn 90 Thr Ala Thr Ala Ala 1.70 Asp Leu Gly Gly Arg 250 le Glu Lys Glu Ile 330 Leu Lys Gly Asn Tyr 410 Ser Lys Val Lys Met Val 75 His Val Ser Ser Ile 155 Ala Phe Ile Met Ala 235 Ala Cys Phe Asp Gly 315 Ala Phe Leu Ile Ala 395 Gly Ile Lys Val Ala Tyr Phe Glu Ala Leu Asn Giu Phe Ala Gly Val Pro Leu 125 Thr Ala 140 Val Phe Lys Ile Ile Gin Gin Asn 205 Val Asn 220 Gly Asn Val Glu Ala Thr Val Ala 285 Tyr Lys 300 Phe Gly Giu Gin Giu Leu Ser Pro 365 Glu Ile 380 Ala Ile Ile Lys Giy Gly Leu Thr Gly Ala Thr Ser Ile 110 Gly Ile Ala Val Trp 190 Arg Ala Gly Asp Giu 270 Lys Ala Val Ala Asp 350 Leu Val Gin Val Val 430 Ser Asp Tyr Ala Gly Glu Ala Val Phe Phe Tyr 175 Ile Gly Ala Ala Leu 255 Asn Met Ile Thr Gly 335 Lys Leu Arg Ile Glu 415 Gly Ala Gin Thr Ser Arg Ser Glu Leu Lys His 160 Asp Giu Ile Leu Val 240 Leu Ser Pro Glu Gly 320 Val Lys Ser Ser Gly 400 Ala Asp 420 Ile Tyr Thr Asp Ala Met Arg Pro Ser Leu Thr Leu Gly Thr Gly Ser 435 440 445 WO 98/07867 WO 9807867PCT/DK97/00336 173 Trp Asn 465 Leu Glu Val Pro Thr 545 Pro Lys Leu Lys Lys 625 Val Pro Giu Asp Asp 705 Asn Gly Met Lys Ala 785 Phe Ala Giu Ser His 865 Gin Gin Gly 450 Val Pro Leu Lys Thr 530 Leu Asp Ile Ser Phe 610 Ala Thr Leu Phe Al a 690 Tyr Leu Gin Ala Ile 770 Met Thr Giu Lys Ile 850 Leu Cys Leu Lys Asn Lys Thr Lys Giu Pro His 500 Phe Giy 515 Gin Vai Ser Giu Thr Val Gly Arg 580 Asp Asp 595 Val Asp Gin Met Pro Phe Ala Asp 660 Val Met 675 Met Ser Thr Lys Thr Giu Lys Ala 740 Phe Aia 755 Ala Giy Pro His Pro Tyr Ile Ser 820 Aia Vai 835 Asp Ile Giu Arg Thr Pro Leu Leu 900 Ser Val Ile 485 Val Phe Giu Aia Ile 565 Leu Aia Ile Val Ala 645 Tyr Thr His Pro Ser 725 Arg Asn Giu Val Pro 805 Arg Lys Asn Giu Ala 885 Asp Leu Aia 470 Tyr His Vai Thr Ile 550 Cys Ile Ser Arg Ala 630 Val Gin Val Ala Ile 710 Tyr Giu Ala Phe Ile 790 Arg Phe Aia Ile Leu 870 Asn Gin Ser 455 Lys Tyr Lys Asp Ser 535 Aia Leu Tyr Leu Lys 615 Ile Ile Leu Pro Leu 695 Ser His Asn Phe Gly 775 Lys Tyr Met Leu Thr 855 Asp Pro Tyr His Arg Giu Ala Lys 520 Ile Ile Gly Giu Lys 600 Arg Pro Thr Thr Lys 680 Giu Leu Tyr Met Leu 760 Leu Phe Giu Giy Vai 840 Leu Lys Arg Asn Arg Lys Phe 505 Vai Tyr Aia Gly Tyr 585 Giu Ile Thr Asp Pro 665 Arg Ser Gin Asp His 745 Gly Pro Asn Thr Phe 825 Ala Ser Leu Gin Leu Asn Asn 490 Ile Leu Gly Arg Gly 570 Asp Ile Ile Thr Asp 650 Gin Thr Tyr Ala Pro 730 Asn Ile His Ala Tyr 810 Ala Glu Gly Ala Pro 890 Ser Arg 475 Ala Val Glu Ser Gin 555 Ser Ala Phe Lys Ser 635 Giu Val Val Val Ile 715 Ala Ala Asn Giy Val 795 Axg Gly Leu Asn Asp 875 Axg Thr 460 Pro Ile Ala Gin Val 540 met Ala A-rg Gin Phe 620 Giy Thr Ala Ser Ser 700 Lys His Ala His Leu 780 Thr Al a Lys Lys Gly 860 Leu Ile Tyr Gin Ser Asp Leu 525 Gin Asn Leu Gly Giu 605 Tyr Thr His Ile Tip 685 Val Leu Pro Thr Ser 765 Ala Gly Gin Giu Lys 845 Vai Val Asp Asp Tip Tyr Pro 510 Ala Pro His Asp Giu 590 Leu His Gly Val Vai 670 Ser Met Ile Thr Leu 750 Leu Ile Asn Giu Asp 830 Leu Asp Tyr Giu Leu Leu Vai Arg 480 Leu Gin 495 Gly Met Ile Arg Asp Pro Phe Giu 560 Ala Gly 575 Ala Asp Ala Gin Pro His Ser Giu 640 Lys Tyr 655 Asp Pro Gly Ile Ser Ser Phe Glu 720 Lys Giu 735 Ala Gly Ala His Ala Ile Val Lys 800 Asp Tyr 815 Ser Asp Thr Asp Lys Ala Asp Asp 880 Ile Lys 895 WO 98/07867 PCT/DK9700336 174 INFORMATION FOR SEQ ID NO:32: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Other LOCATION: 1...31 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: GGCCGCTCGA GGTTGAACGT GCTGGTGAAG G 31 INFORMATION FOR SEQ ID NO:33: SEQUENCE CHARACTERISTICS: LENGTH: 31 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Other LOCATION: 1...31 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: TAGTAGGATC CGGGTCAGGT TGGACTGAGC C 31 INFORMATION FOR SEQ ID NO:34: SEQUENCE CHARACTERISTICS: LENGTH: 1750 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 464...1378 WO 98/07867 175 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: PCT/DK97/00336
GCGCCTAGAT
ATAATGACTT
ATTGCTTGAT
GCCTCATCAA
TACAAATTTA
TTATATCACA
GAATAAAAT
TTCCAATCCT
AAGAAA.AGC
TATCCGATTT
GAGTO ITTTG
AGATAATTGG
GGGGTTOGTG
AAGTATTC TT
CTCGGAAAAA
CTTTTATCAG
AACAGCTAAA
ITrGATTCCC
GTAAGTCGTT
GCTATCAGGA
TTTGAATTTC
TAGACCAATA
AAGTCTAAAA
GAAAAGAAGG
AGATAGGTAT CAAAAGCACT AACTCAGATA AGAGACTTGC TCAAGAGCTA GTTCGGGGAA AACTGTTCAG CTGATTTTTT AAAAAAAATC TC!CTC.AAGTT GTT.AATGTAA ATGTFITCIT TCTGCTACAA TTAAAGGGAC
TGATTTAAAA
CTTATCAACA
AGCTCCAACA
AAAGTITAGA
AATAAGTTTA
AAGTCGTAGA
ACTAAGAGGA
120 180 240 300 360 420 475 GATAGATAGG AAA ATG ATT AAA AAT Met Ile Lys Asn TAT GAA CTA TCC AAC Tyr Giu Leu Ser Asn AAA AAA TTA ATT LyB Lys Leu Ile TOA ACC TOT GAA Set Thr Ser Giu 15 GAA GAA ATT GG Giu Glu Ile Giy
ATG
Met
AAG
Lye AAT TTO ACC TAT Asn Phe Thr Tyr
GTT
Val CTO AAT CO-A ACA Leu Asn Pro Thr AAT ATT Asn Ile TCT GAA TAO Ser Glu Tyr GAO TAT GAA Asp Tyr Giu
TAT
Tyr GAO TTO COT TT Asp Phe Pro Phe
GAO
Asp 45 TAT TTA TCA Tyr Leu Ser GGA ATT TTO OAT Gly Ile Leu Asp s0 AAT GOC OGT TT Asn Ala Arg Phe
GAA
Giu ACA GAT GAT AAT Thr Asp Asp Asn AAT AAT OTG Asn Asn Leu ATT OTO Ile Leu TI'A CAA TAT COT OCA OTO TOT AAT Leu Gin Tyr Pro Pro Leu Ser Asn TAT GGA Tyr Gly GAA TOG Giu Ser 95 GAA GTG 000 ACT Giu Val Ala Thr
TTT
Phe OCA TAT TOT TTG GTT TOG ACT AAA AAT Pro Tyr Ser Leu Val Trp Thr Lys Asn OTT ATT TTA Val Ile Leu
GOA
Ala 100 715 763 811 OTT AAT OAT GAG ATT GAT AAT 000 TTA ATT TTO GAG OGT GAA Leu Asn His Giu Ile ASP Asn Gly Leu Ile Phe Glu Arg Giu TAT OAT Tyr Asp 115 TAT AAA OGO Tyr Lys Arg ACA CAC ACT Thr His Thr 135
TAO
Tyr 120 AAA CAT CAA Lye His Gin OTT ATT Val Ile 125 TTG AGA Leu Arg 140 TTT CAA GTG ATO TAT CAA ATG Phe Gin Vai Met Tyr Gin Met 130 GAT TTO OGA ACA AGG COT 000 Asp Phe Arg Thr Arg Arg Arg 145 859 907 TTO OAT OAT TAT Phe His Asp Tyr AGA OTT Arg Leu GAA CAG GGA ATO Giu Gin Gly Ile
AAA
Lys 155 .AAT TOA ACA Asn Set Thr AAG AAC Lys Asn 160 TAT TTT Tyr Phe 175 GAO CAA ATT OTT Asp Gin Ile Val 955 1003
OAT
Asp 165 ~TTG ATT GOC ATT Leu Ile Ala Ile
CAA
Gin 170 OCA AGT TTA ATT Ala Ser Leu Ile GAA GAT 000 Giu ASP Ala
TTG
Leu 180 WO 98/07867 176 CAC AAT AAT ATG CAA GTA CTr CAG GAT TTT ATT GAT TAO His Asn Asn Met Gin Val Leu Gin Asp Phe Ile Asp Tyr PCT/DK97/00336 GAT GAT GAA Asp Asp Giu
GAC
Asp 200 GGT TTT GCT Gly Phe Ala TAT ACA GAA Tyr Thr Glu GAA AAG Giu Lys 205 ATT TAT GAT ATT le Tyr Asp Ile TTG AGA GAA Leu Arg Glu 195 TTT GTC GAA Phe Vai Giu 210 TTA CTA GAA Leu Leu Giu TTG AAC A'rr Leu Asn Ile 1051 1099 11i47 ACA GAC CAA GCT Thr Asp Gin Ala 215
ACC
Thr 220 AAG ATT CAG CTC Lys Ile Gin Leu
AAG
Lys 225 AAT CTC Asn Leu 230 CGA GAT TTG TTC TCA AAC AAT GTC TCT Arg Asp Leu Phe Ser Asn Asn Val Ser 235 AAT AAC Asn Asn 240 1195
GTC
Val 245 ATG AAA ATC ATG Met Lys Ile Met
ACA
Thr 250 TCA GCT ACT TTC Ser Ala Thr Phe
GTT
Val 255 CTA GGG ATT COT Leu Gly Ile Pro
GCA
Ala 260 1243 GTA ATT GTT GGT Val Ile Val Gly
T
Phe 265 TAO GGA ATG AAT GTT CCA ATT OCT GGT Tyr Gly Met Asn Val Pro Ile Pro Gly 270 CAA AAT Gin Asn 275 1291 TTT AAT TGG Phe Asn Trp GTT TGG GTC Val Trp Val 295
ATG
Met 280 G'IT TGG CTr ATT [TA G LT CTA GGA ATT Val Trp Leu Ile Leu Val Leu Gly Ile 285 TTA TrA TGT Leu Leu Cys 290 TAAAATGGAG AA 1339 ACT TGG TGG TTA Thr Trp Trp Leu
CAT
His 300 AAA AAA GAT ATG TTA Lys Lys Asp Met Leu 305 1390
AAATCTCCAT
AGCAATGTTT
CAGATATTAA
TTTGCTATTC
GAGATTTCAA
TTAAAGGAAC
TITTTTrGCTC
GTTAAAACTA
AATAATTGGA
TCAAACTGTA
AATGAAAAOC
CAACTGGCGC
TTTGTGAAAA
T'IrGTGAAT
ACTGTATTAG
TGATATAATG
GAAG'ITACGG
GATAAAGCAA
AAITAATTAG TGATTGCAGA TAFTTATGAA AACGTTI'TAA TAAAGAATCT GTAATTTCTC AAGTTGTAAT TTGAAACAGA AAAATATCTT TGAACAAGCT GCGTTACTCG CTTTGTACAA
TTATGAAGTT
AAAAGTATAA
TTGAATTCTG
AAGAACAAAG
TGGGATGGTT
GAAAACTACA.
1450 1510 1570 1630 1690 1750 1750 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 305 amino acids TYPE: amino acid STRANDEDNESS: singie TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID Met Ile Lye Asn Tyr Giu Leu Ser Asn Giu 1 5 10 Ser Glu Met Lys Asn Phe Thr Tyr Val Leu 25 Lye Lys Leu Ile Ser Thr Asn Pro Thr Arg Glu Glu WO 98/07867 WO 9807867PCT/DK97/00336 177 Ile Gly Asn Ile Ser Glu Tyr Tyr Asp Phe Pro Phe Asp Tyr Leu Ser 40 Gly Ile Leu Asp Asp Tyr Glu Asn Ala Arg Phe Giu Thr Asp Asp Asn 55 Asp Asn Asn Leu Ile Leu Leu Gin Tyr Pro Pro Leu Ser Asn Tyr Giy 70 75 Giu Vai Ala Thr Phe Pro Tyr Ser Leu Val Trp Thr Lys Asn Giu Ser 90 Val Ile Leu Ala Leu Asn His Giu Ile Asp Asn Gly Leu Ile Phe Giu 100 105 110 Arg Giu Tyr Asp Tyr Lys Arg Tyr Lys His Gin Vai Ile Phe Gin Val 115 120 125 Met Tyr Gin Met Thr His Thr Phe His Asp Tyr Leu Arg Asp Phe Arg 130 135 140 Thr Arg Arg Arg Arg Leu Glu Gin Gly Ile Lys Asn Ser Thr Lys Asn 145 150 155 160 Asp Gin Ile Val Asp Leu Ile Ala Ile Gin Ala Ser Leu Ile Tyr Phe 165 170 175 Giu Asp Ala Leu His Asn Asn Met Gin Val Leu Gin Asp Phe Ile Asp 180 185 190 Tyr Leu Arg Glu Asp Asp Glu Asp Gly Phe Ala Glu Lys Ile Tyr Asp 195 200 205 Ile Phe Val Giu Thr Asp Gin Ala Tyr Thr Giu Thr Lys Ile Gin Leu 210 215 220 Lys Leu Leu Glu Asn Leu Arg Asp Leu Phe Ser Asn Asn Val Ser Asn 225 230 235 240 Asn Leu Asn Ile Val Met Lys le Met Thr Ser Ala Thr Phe Val Leu 245 250 255 Gly Ile Pro Ala Val Ile Val Gly Phe Tyr Gly Met Asn Val Pro Ile 260 265 270 Pro Gly Gin Asn Phe Asn Trp Met Val Trp Leu Ile Leu Val Leu Gly 275 280 285 Ile Leu Leu Cys Val Trp Val Thr Trp Trp Leu His Lys Lys Asp Met 290 295 300 Leu 305 INFORMATION FOR SEQ ID NO:36: SEQUENCE CHARACTERISTICS: LENGTH: 4191 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Coding Sequence LOCATION: 270. 1184 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: TTGGGCTATA AGGAAATTGT TCTGCTGATT T7=AAAGTT TAGATATAGG TTTAGGGGTT CATGTTTGAA TTTCAAAAAA AGTCTCCTCA AGTTAATAAG TTTATTATAT CACAAAGTAT 120 TATTTAGACC AACTTCCTTC AAAAAACTTT TCGTTAAGGC TrTGAAATAA AATAATGAGA 180 AAAAAATAGG AAAATCTGCT ACAATTAGAA GGAGAAGAAG AGGATTTAAA TCCTTTTrrA 240 WO 98/07867 PCT/DK97/00336 178 TTAGGAAAAG AAGGGATAGA TAGGCTGAT ATG ATA AAA AAT TAT GAA CTA TCC Met Ile Lys Asn Tyr Giu Leu Ser AAT GAA Asn Glu AAA AAA TTG ATC TCA ACT TCT GAG ATG Lys Lys Leu Ile Ser Thr Ser Glu Met 15
AAG
Lys AAT TC ACT TAT Asn Phe Thr Tyr
GTC
Val CTC AAT CCA Leu Asn Pro ACA CGT Thr Arg 30 GAA GAA ATT GGG Glu Giu Ile Gly ATC TCA GAA Ile Ser Glu CAC TAT His Tyr GAA AAT Glu Asn 389 437 GAT TTT CCT TTT Asp Phe Pro Phe
GAC
Asp TAT CTA TCT GGA Tyr Leu Ser Gly
ATT
Ile 50 TTA GAT GAC TAT Leu Asp Asp Tyr GCC CGT TTT Ala Arg Phe TAT CCC GCC Tyr Pro Ala
GAA
Glu ACA GAT GAT AAT Thr Asp Asp Asn
GAC
Asp 65 AAT AAT CTG ATT Asn Asn Leu Ile CTT TTG CAA Leu Leu Gin CCA TAT TCT Pro Tyr Ser TTG TCC AAC TAT Leu Ser Asn Tyr GAA GTG GCC ACT Glu Val Ala Thr
TTT
Phe TTG GTT Leu Val TGG ACT AAG AAT Trp Thr Lys Asn
GAA
Glu 95 TCG OTT ATT TTG Ser Val Ile Leu
GCC
Ala 100 CTT AAC CAT GAA Leu Asn His Glu
ATT
Ile 105 GAT AAT GOT CTC ATT TTT GAA CGA GAA Asp Asn Gly Leu Ile Phe Giu Arg Glu 110
TAT
Tyr 115 GAT TAT AAA CGC Asp Tyr Lys Arg
TAT
Tyr 120 AAA CAC CAA TTG Lys His Gin Leu
ATT
Ile 125 TT CAA GTG ATO Phe Gin Val Met
TAC
Tyr 230 CAA ATG ACT CAT Gin Met Thr His ACT TTT Thr Phe 135 CAT OAT TAT His Asp Tyr GGT ATC AAA Gly Ile Lys 155
TTG
Leu 140 AGA GAC ITT AGA Arg Asp Phe Arg
ACA
Thr 145 AGO CGC CGC CGG Arg Arg Arg Arg CTT GAA GTT Leu Glu Vai 150 TTA ATT GCC Leu Ile Ala AAT TCA ACA AAA AAT GAC CAA ATT GTT Asn Ser Thr Lys Asn Asp Gin Ile Vai
GAC
Asp 165 ATT CAA Ile Gin 270 GCG AGT TTG ATT Ala Ser Leu Ile TTT GAA GAT GCG Phe Giu Asp Ala
CTG
Leu 180 CAC AAT AAT ATG His Asn Asn Met
CAA
Gin 185 GTT CTC CAG AAT Val Leu Gin Asn
TTT
Phe 190 ATT GAT TAC TTA Ile Asp Tyr Leu
CGA
Arg 195 GAA GAT OAT Glu Asp Asp GAA GAT Glu Asp 200 CAA GCT Gin Ala 215 GGT ITT GCC GAA Gly Phe Ala Glu AAA ATC Lys Ile 205 TAT OAT ATT Tyr Asp Ile
TTT
Phe 210 GTC GAA ACA GAC Val Glu Thr Asp WO 98/07867 TAT ACA GAA Tyr Thr Giu TTG TTC TCA Leu Phe Ser 235 PCT/DK97/00336 179 AAG ATT CAG CTC Lys Ile Gin Leu AAG 'PTA CTA GAA AAT CTC CGA GAT Lys Leu Leu Giu Asn Leu Arg Asp 225 230 AAC ATT GTC Asn Ile Val TCT AAT AAT TTG AAT ATC GTC ATG AAA ATT Ser Asn Asn Leu Asn Ile Vai Met Lys Ile 240 245 GTT CTA GGT A'PT CCG GCG GTT ATT GTC GGC Vai Leu Giy Ile Pro Ala Val Ile Vai Gly 255 260 1013 ATG ACC Met Thr 250 TCA GCA ACA TT Ser Aia Thr Phe 1061 TTT TAT GGA Phe Tyr Giy 265 ATG AAT GTT CCG ATT CCT GGT Met Asn Vai Pro Ile Pro Gly 270
CAA
Gin 275 AAT TTT AAT TGG Asn Phe Asn Trp 1109 GTC TGG CTC ATT TTG GTG I-I- GGA ATT Val Trp Leu Ile Leu Val. Phe Giy Ile 285
TTA
Leu 290 TTA TOT GTT TGG Leu Cys Val Trp OTT ACT Vai Thr 295 1157 TGG TGG CTA Trp Tip Leu CAC AAA AAA His Lys Lys 300 GAT ATG TTA TGAATGGAGA AAATTTCTCC GTTTPT Asp Met Leu 305 1211i
TATCTTTGTG
GCTATTTAGT
TGAAACTGTA
TGTATGATAT
AACCGAAGTT
GCGCGATAAA
TGAAAGCTT
TACAAAAAAT
CGATAAAATT
GATGCAAAAT
TGAAAAGATT
ACAAACAATG
AGCACGTCAC
CGGGGTATAT
AGAATGGGAT
CATGCAATAC
'ITCTCGTCCA
AG~TTTGTCGT
CATC'PrTGCA
TGTTGATGAT
TGAACTTrrAT
CGGACGCCAC
AAATGCTCCA
ACGTTATTCA
AATGGCTAAA
CCCAOAAAAT
GAAAGCAATG
ATTCGATATT
CGACAAATCA
CATGACTGAC
TGCTAACATG
TAAATATGCT
AGAAGGTGAC
TGTCATGAAA
AAAAAATTAA
GAATTAATTA
TTAGTAAAGA
AATGAAGTrG
ACGGAAAATA
GCAAGCGTTA
CTTGCTGGGC
CACTACGAAG
CCTGCTGGAT
AGCGAACTTT
TTGACAGAAC
ACTTCTGTAA
GCTCACACTG
GCACGTCTTG
GCAATCACTG
CAAGCTTI'GC
GCGATGAACG
GTTATCAATG
GAACGTGACC
TTCATTTTAA
TCTGGTGACC
CGTGTCACTA
GAACCAAACT
ATGTCTATGA
GATGGATATG
OAAGAAGGAC
TTGACTGGTT
GAACCTGTTC
CTCAACTGGT
AAATATAACT
GGATTTGGTA
AAAGTTAAAA
TTCCCACGTT
ATGTACCATG
TI'AGTGATAA
TGAAAACGTT
ATCTGTAATT
TAATTTGAAA
TCTTTGAACA
CTCGCITGT
CAACAGAACG
AAGTAGGATT
ATATTGATGC
TCCGCTTAAA
ACGGTCTTTC
ATGATGGAAT
TAACAGGT TT
CTCTTTATGG
AAATTAATGA
AAGAAGTTGT
TAAAAGAAGC
GTGCTGCAAC
TTGCTCGTGG
AACTTCGTAC
CCACGTTCAT
AAATGGACTA
TGACAGTTCT
GTCACAAACA
GCGAAATGTC
GTCATAATCT
TGAACGGTG
GTGATGAAAT
TGACAGATAC
ATGAAGCAGT
TCTGTGGTTT
CTTTGCGTGA
ATGGTGAAGA
AAAAATTAGC
TAAATCATGA AGTTAGCAAT TrAAAAAAGT ATAACAGATA TCTCTTGAAT TCTG'TrGCT CAGAAAGAAC AAAGGAGATT AGCTTGGGAT GGTTTTAAAG ACAAGAAAAC TACAAACCAT TACACTTAAA GTAAAGAAAA TCCCTTTGAT ACTGACCOCG TAATGATAAA GAACTTGAAC CTTCATGCCA AGAGGTGGTC
AGTTGACCCA
CTTCCGTGCT
GCCTGATGCA
AGCTGACTAC
TGATAACATT
AAACTTTGGT
AATCCAATGG
TTCAC'ITGGA
AACATTTACT
AATGAAATTT
CACAACATCT
TCGTTTCTTG
TTGGGACTCT
CTCATCTATC
ATGTATCTCT
CCAATACTTT
TTACGATGAC
TCTTGACTAT
TTATGTTGAT
TCAAATGGCC
CGCAAATACA
TGAAAATGGC
TGATGACCGT
TTCACACAAA
GGTTTGCATG
TATACTTCAG
TACTCTCGTG
CTTATGAAGG
CGTCTTAAAG
GCTTTGTATG
G ITAATATTG
CGTGTGCCAA
GAGCAAGAAA
GCTCGTGCTG
ATGGCTGGTA
AACACACTTG
AAACTCCCAT
CAATATGAAG
TGTTGTGTCT
GTGCOCTG
GTTCATAAAG
GATACAGTTA
GCAATGAATA
TTCTTGCCTA
GTTGATTCAC
TACATCTACG
GCTGATGATA
CTTTACAAAA
GTTTGTCAAA
TTAAAATAAT
ATTATCAAAC
TCAAAATGAA
GAACTAACTG
ATGATGGTGA
TTATTGAAGA
TAACCTCTAT
TCATCTATGG
TTCGTGTTGC
ATGTTTTGTC
CAATTCGTAA
GACGTATCAT
AAAAAGCAAA
AAGAAATTAA
GTCTTG-ACGT
CATACATGGC
TCGTTCTTGA
TCCAAGAATT
CTGCTTATGA
TGGGTAATGA
ATACAATCGG
ATTCATTCAA
GTGTTGAAAC
CACCACTrGA
TAAACGTCTT
ATTATAAAGT
TGGAAAACTT
TCATTCACTA
CTAAAGTTCG
TTTCAGCGAT
ATTATGAAGT
TCGCTAAACT
ATGCTGAAGC
i2 71 1331 1391i 1451 is11 is57i i1631i 1691i 1751 i811 1871 1931 1991i 2051 2111i 2171 2231 2291 2351 2411 24 7i 2531 2591 2 2711 2771 2831 2891 2951 3011 3071 3131 3191 3251 WO 98/07867 WO 9807867PCT/DK97/00336 180 TACTGTTTCA CTr'FrGACAA TCACATCTAA
TCCAGTTCAT
ATTCTTCTCA
TCG'TCAITA
.AGT'rrCTCCT
TCTTGATGGA
CGTTAACTTG
TG'ITATCGTT
ACAAGAATTG
GCACACTTCA
CAGACTTTT
AAATAAAATI'
GCGAGTGGGA
GCTGACAAAG
GAAACTTTCG
TGCTTAAGTT
AAAGGAGTAT
CCAGGTGCTA
GCTAAATTGG
CGTGCACTrG
TACTTCACAC
AACGTTATGG
CGTATCTCTG
ACTGAACGTG
AATATCTAAT
TTCTATAAAT
TGTGCTAAAA
AATCAACGGT
TTGTI'CGTCA
GTTC.AGATTr
TTGCTGACCC
TCCTCAATGA
ACCCATCTAA
AATTCAAAGA
GTAAAACTCG
CAGGAGCTT
ACCITAAAGA
GATACTGTGT
TCTrCC.ATGA
TCITAGTAT'
TAAITATAAT
TAGATGAATG
GGTTGATrr
GTTGCAAGAA
TACTGACGAA
AAATCAACGT
CGTTGCTTAC
AGATGGTACA
CAAAGCTAAA
TGCAAATGAC
TGATGAACAA
GATrAATGGT
TGTTTACGAT
TAACACTAAA
AGTACTTTCA
AAAAAATATA
AGTTAAAAAC
ATAAAGGTAA
TTGATTTCTG
CCTGATGGGA
AATGGGAAAT
CAAAAATTAT
TCTAAACAAA
GTCAACAAAT
GGTGGATGGT
GGTATTTCAT
GTAGATAACT
ACTGAATTTG
AAAATCATGC
TACCTCACAC
AATGATGATG
AGGTCTGTCA
TATTA'TTT
TTGGATrAAC
AAGG'ITATCA
AACTTrTTAA
TAAACCGATG
CTGGTAACTC
CTAAACTI'GA
TGCAAAATCT
TAACTACTCA
TGGTTCAAATr
CAGGTCAACA
GTGGTGAAGA
CTGAACAAAA
AAGAAGTAAT
GTTCTACTGA
AGTAAGAA.
AGGCGGATT
AGTAATrGAT
TGCAATAATG
CAAAATTGAG
3311 3371 3431 3491 3551 3611 3671 3731 3791 3851 3911 3971 4031 4091 4151 4191 INFORMATION FOR SEQ ID NO:37:- SEQUENCE CHARACTERISTICS: LENGTH: 305 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: Met 1 Ser Ile Ile Lys Asn Tyr Glu Leu Ser Asn Glu Lys Lys Leu Ile Ser Thr 10 Leu Glu Met Lys Gly Asn Ile Asn Phe Thr Tyr Val Asn Pro Thr 25 Asp Ser Glu His Tyr Phe Pro Phe Arg Giu Glu Tyr Leu Ser Asp Asp Asn Gly Ile Leu Asp Asp Tyr Glu 55 Leu Asn Ala Arg Phe Giu Leu Asp Glu Asn Asn Leu Ile Leu Gin Tyr Pro Ala Ser-Asn Tyr Gly Val Ala Thr Pro Tyr Ser Leu Asn His Giu Ile Trp Thr Lys Asn Gly Leu Val Ile Leu Ala 100 Asp 105 Lys Asn Giu Ser le Phe Glu 110 Phe Gin Val Asp Phe Arg Arg Glu Tyr 115 Met Tyr Gin Tyr Lys Arg 120 Phe His Gln Leu Met Thr His Thr His Asp Tyr 130 Thr Arg 135 Glu Leu 140 Asn Arg Arg Arg Leu Val Gly Ile Lys Ser Thr Lys 145 Asp 155 Ala Gin Ile Val Asp 165 His Leu Ile Ala Ile Ser Leu Ile Giu Asp Ala Leu Asn Asn Met Gin 185 Gly Leu Gin Asn Tyr Phe 175 Ile Asp Tyr Asp Tyr Leu Arg 195 Glu Asp Asp Glu Asp 200 Phe Ala Glu Thr Glu Thr 220 Ile Phe 210 Val Giu Thr Asp Gin 215 Ala Tyr Lys Ile Gin Leu WO 98/07867 WO 9807867PCT/DK97/00336 181 Lys Leu Leu Glu Asn Leu Arg Asp Leu 225 230 Asn Leu Asn Ile Val Met Lys Ile Met 245 Gly Ile Pro Ala Val Ile Val Gly Phe 260 265 Pro Gly Gin Asn Phe Asn Trp Met Val 275 280 Ile Leu Leu Cys Val Trp Val Thr Trp 290 295 Leu 305 Phe Ser 235 Thr Ser 250 Tyr Gly Trp Leu Asn Ile Val Ser Asn 240 Ala Thr Phe Val Leu 255 Met Asn Val Pro Ile 270 Ile Leu Val Phe Gly 285 Lys Trp Leu His 300 Lys Asp Met INFORMATION FOR SEQ ID NO:38: SEQUENCE CHARACTERISTICS: LENGTH: 4191 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE.
NAME/KEY: Coding Sequence LOCATION: 1447... .3807 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:38:
TTGGGCTATA
CATGTTTGAA.
TATTTAGACC
AAAAAATAGG
TTAGGAAAAG
AAAAATI'GAT
AAGAAATTGG
TAGATGACTA
TGCAATATCC
GGACTAAGAA
TTGAACGAGA
AAATGACTCA
AAGTTGGTAT
CGAGTTTGAT
TrGATTACTT
TCGAAACAGA
GAGATTTGTr
CAGCAACATT
CGATTCCTGG
TATGTGTITG
CTCCGTTTTT
TGTTTGTCAA
ATTAAAATAA
TATTATCAAA
AGGAAATrGT TCTGCTGATT TTrCAAAAAA AGTCTCCTCA AACTrCCTTC AAAAAACTTT AAAATCTGCT ACAA.TTAGAA AAGGGATAGA TAGGCTGATA
CTCAACTI'CT.GAGATGAAGA.
GAATATCTCA GAACACTATG TGAAAATGCC CGTTTTGAAA CGCCTTGTCC AACTATGGAG TGAATCGGTT AT TTT GGCCC ATATGATTAT AAACGCTATA TACTTrCAT GArrA TrTGA CAAAAATTCA ACAAAAAATG TrATITrTGAA GATGCGCTGC ACGAGAAGAT GATGAAGATG CCAAGCTTAT ACAGAAACCA CTCAAACA IT GTCTCTAATA TGTTCTAGGT ATTCCGGCGG TCAAAATrFT AATTGGATGG GGTTACTTGG TGGCTACACA TTATCT'ITGT GAAAAAATTA AGCTATTTAG TGAA [TAATT TTGAAACTGT ATTAGTAAAG CTGTATGATA TAATGAAGTT
TTAAAGTT
AGTTAATAAG
TCGTTAAGGC
GGAGAAGAAG
TGATAAAAAA
ATTTCACTTA
ATTTTCCTT
CAGATGATAA
AAGTGGCCAC
TrAACCATGA AACACCAA'rr
GAGACTTTAG
ACCAAATrGT
ACAATAATAT
GTTTTGCCGA
AGATTCAGCT
ATTTGAATAT
TTATTGTCGG
TCTGGCTCAT
AAAAAGATAT
ATTAGTGATA
ATGAAAACGT
AATCTGTAAT
GTAATTTGAA
TAGATATAGG
TTTATTATAT
TTTGAAATAA
AGGATTTAAA
TTATGAACTA
TGTCCTCAAT
TGACTATCTA
TGACAATAAT
UTTT CCATAT
AATTGATAAT
GATTTITCAA
AACAAGGCGC
TGACTTAATT
GCAAGTrCTC
AAAAATCTAT
CAAG ITACTA TTrAGGGGTT
CACAAAGTAT
AATAATGAGA
TCCTrI TTA
TCCAATGAAA
CCAACACGTG
TCTGGAATTr CTGArCTTr
TCTTTGGTTT
GGTCTCATT
GTGATGTACC
CGCCGGCTrG GCCAITrCAAG CAGAA TITTA
GATATTG
GAAAATCTCC
120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1488 CGTCATGAAA ATTATGACCT CTTTTATGGA ATGAATGTTC
TTTGGTGTT
G'ITATGAATG
ATAAATCATG
TTTAAAAAAG
TTCTCTTGAA
ACAGAAAGAA
GGAATTTTAT
GAGAAAA7-rT
AAGTTAGCAA
TATAACAGAT
TTCTGTTTGC
CAAAGGAGAT
TTCAAA
ATG AAA ACC GAA GTT ACG GAA AAT ATC TTr GAA CAA GCT TGG Met Lys Thr Glu Val Thr Glu Asn Ile Phe Glu Gin Ala Trp 1 5 WO 98/07867 PCT/DK97/00336 182 GAT GGT Asp Gly TTr AAA GGA Phe Lys Gly ACT AAC Thr Asn 20 TGG CGC GAT AAA GCA AGC GTT Trp Arg Asp Lys Ala Ser Val ACT CGC Thr Arg 1536 1584 TTT GTA CAA GAA Phe Val Gin Glu
AAC
Asn TAC AAA CCA TAT GAT GGT GAT GAA AGC TTT CT Tyr Lys Pro Tyr Asp Gly Asp Giu Ser Phe Leu GCT GGG CCA Ala Gly Pro ACA AAA AAT Thr Lys Asn
ACA
Thr GAA CGT ACA CTT Glu Arg Thr Leu
AAA
Lys 55 GTA AAG AAA ATT Val Lys Lys Ile ATT GAA GAT Ile Giu Asp ACT GAC CGC Thr Asp Arg 1632 1680 CAC TAC GAA GAA His Tyr Glu Glu
GTA
Val 70 GGA TTT CCC TTT Gly Phe Pro Phe
GAT
Asp GTA ACC Vai Thr TCT ATC GAT AAA Ser Ile Asp Lys
ATT
Ile 85 CCT GCT GGA TAT Pro Ala Giy Tyr
ATT
Ile GAT GCT AAT GAT Asp Ala Asn Asp
AAA
Lys GAA CTT GAA CTC Glu Leu Giu Leu
ATC
Ile 100 TAT GGG ATG CAA Tyr Gly Met Gin
AAT
Asn 105 AGC GAA CTT TTC Ser Giu Leu Phe
CGC
Arg 110 1728 1776 1824 TTA AAC TTC ATO Leu Asn Phe Met
CCA
Pro 115 AGA GGT GGT CTT Arg Gly Gly Leu
CGT
Arg 120 GTT GCT GAA AAG Val Ala Giu Lys ATT TTG Ile Leu 125 ACA GAA CAC Thr Glu His CAA ACA ATG Gin Thr Met 145
GGT
Gly 130 CTT TCA GTT GAC Leu Ser Val Asp GGT TTG CAT GAT Gly Leu His Asp GTT TTG TCA Val Leu Ser 140 1872 ACT TCT GTA AAT GAT GGA ATC TTC CGT Thr Ser Val Asn Asp Gly Ile Phe Arg GCT TAT ACT TCA Ala Tyr Thr Ser 155 GGT TTG CCT GAT Oly Leu Pro Asp GCA ATT Ala Ile 160 CGT AAA GCA CGT Arg Lys Ala Arg
CAC
His 165 GCT CAC ACT GTA Ala His Thr Val
ACA
Thr 170 1920 1968 2016 2064
GCA
Ala 175 TAC TCT CGT GGA Tyr Ser Arg Gly
CGT
Arg 180 ATC ATC GGG GTA Ile Ile Gly Val
TAT
Tyr 185 GCA COT CTT GCT Ala Arg Leu Ala CTr Leu 190 TAT GGA GCT GAC Tyr Gly Ala Asp
TAC
Tyr 195 CTT ATG AAG GAA Leu Met Lys Glu
AAA
Lys 200 GCA AAA GAA TGG Ala Lys Giu Trp GAT GCA Asp Ala 205 ATC ACT GAA Ile Thr Glu ATG CAA TAC Met Gin Tyr 225
ATT
Ile 210 AAT OAT OAT AAC Asn Asp Asp Asn
ATT
Ile 215 CGT CTT AAA GAA Arg Leu Lye Glu GAA ATT AAC Glu Ile Asn 220 OCT TTG TAT Ala Leu Tyr 2112 2160 CAA GCT TTG CAA Gin Ala Leu Gin
GAA
Glu 230 OTT GTA AAC TTT Val Val Asn Phe
GGT
Gly 235 WO 98/07867 PCT/DK97/00336 183 GGT OTT Gly Leu 240 GAC GTT TCT CGT Asp Val Ser Arg
CCA
Pro 245 GCG ATG AAC GTA Ala Met Asn Val
AAA
Lys 250 GAA GCA ATC CAA Glu Ala Ile Gin
TGG
Trp 255 GTT AAT ATT Val Asn Ile GCA TAC Ala Tyr 260 ATG GCA GTr TGT Met Ala Vai Cys
CGT
Arg 265 GTT ATC AAT GGT Val Ile Asn Gly
GCT
Ala 270 2208 2256 2304 GCA ACT TCA CTT Ala Thr Ser Leu
GGA
Gly 275 CGT GTG COA ATC Arg Val Pro Ile
GTT
Val 280 CTT GAC ATC TTT Leu Asp Ile Phe GCA GAA Ala Glu 285 CGT GAC CTT Arg Asp Leu GTT GAT GAT Val Asp Asp 305
GCT
Ala 290 CGT GGA ACA TTT Arg Gly Thr Phe
ACT
Thr 295 GAG CAA GAA ATC Glu Gin Glu Ile CA GA TrT Gin Giu Phe 300 GCT CGT GCT Ala Arg Ala 2352 2400 TTC ATT TTA AAA Phe Ile Leu Lys
CTT
Leu 310 CGT ACA ATG AAA Arg Thr Met Lys
TTT
Phe 315 GCT GCT Ala Ala 320 TAT GAT GAA CTT Tyr Asp Glu Leu
TAT
Tyr 325 TOT GGT GAC CCC Ser Gly Asp Pro
ACG
Thr 330 TTC ATC ACA ACA Phe Ile Thr Thr
TCT
Ser 335 ATG GOT GGT ATG Met Ala Gly Met
GGT
Gly 340 AAT GAO GGA CGC Asn Asp Gly Arg CGT GTC ACT AAA Arg Val Thr Lys 2448 2496 2544 GAO TAT OGT TTC Asp Tyr Arg Phe
TTG
Leu 355 AAO ACA OTT GAT Asn Thr Leu Asp
ACA
Thr 360 ATO GGA AT GCT Ile Gly Asn Ala CCA GAP Pro Glu 365 CCA AAO TTG Pro Asn Leu CGT TAT TCA Arg Tyr Ser 385
ACA
Thr 370 GTT OTT TGG GAO Val Leu Trp Asp
TCT
Ser 375 AAA CTC CCA TAT Lys Leu Pro Tyr TCA TTC AAA Ser Phe Lys 380 CA TAT GA Gin Tyr.Glu 2592 2640 ATG TOT ATG AGT Met Ser Met Ser AAA CAC TCA TCT Lys His Ser Ser GGT GTT Gly Vai 400 GAP ACA ATG GCT Glu Thr Met Ala
AAA
Lys 405 GAT GGA TAT GGC Asp Gly Tyr Gly
GAP
Glu 410 ATG TCA TGT ATO Met Ser Cys Ile
TCT
Ser 415 TGT TOT GTO TCA Cys Cys Val Ser
OCA
Pro 420 OTT GAO CCA GA Leu Asp Pro Glu GAP. GAA GGA CGT Glu Giu Oly Arg
CAT
His 430 2688 2736 2784 2832 AAT CTC CA TAO Asn Leu Gin Tyr
TTT
Phe 435 GGT GCG CGT Oly Ala Arg GTA AAO Val Asn 440 GAC GTT Asp Val 455 GTC TTG AAA GCA Val Leu Lys Ala ATG TTG Met Leu 445 ACT GOT TTG Thr Gly Leu
AAC
Asn 450 GGT GOT TAO GAT Gly Giy Tyr Asp OAT AAA GAT TAT AAA GTA His Lys Asp Tyr Lys Val 460 WO 98/07867 TTC GAT ATT Phe Asp Ile 465 ATG GAA AAC Met Glu Asn 480 PCT/DK97/00336 184 GAA CCT GTT CGT Glu Pro Val Arg
GAT
Asp 470 GAA ATT CTT GAC Glu Ile Leu Asp GAT ACA GT Asp Thr Val 2880 2928 TTC GAC AAA Phe Asp Lys
TCA
Ser 485 CTC AAC TGG TTG Leu Asn Trp Leu
ACA
Thr 490 GAT ACT TAT OTT Asp Thr Tyr Val
GAT
Asp 495 GCA ATG AAT ATC Ala Met Asn Ile
ATT
Ile 500 CAC TAC ATG ACT His Tyr Met Thr AAA TAT AAC TAT Lys Tyr Asn Tyr
GAA
Glu 510 2976 GCA GTT CAA ATG 0CC TTC TTG CCT ACT Ala Val Gin Met Ala Phe Leu Pro Thr 515
AAA
Lys 520 OTT COT GCT AAC Val Arg Ala Asn ATG GGA Met Gly 525 3024 TTT GGT ATC Phe Gly Ile AAA TAT OCT Lys Tyr Aia 545
TGT
Cys 530 GGT TTC GCA AAT Oly Phe Ala Asn
ACA
Thr 535 OTT OAT TCA CTT Val Asp Ser Leu TCA GCG ATT Ser Ala Ile 540 TAC ATC TAC Tyr Ile Tyr 3072 3120 AAA OTT AAA ACT Lys Val Lys Thr
TTO
Leu 550 CGT OAT GAA AAT Arg Asp Glu Asn OAT TAT Asp Tyr 560 OAA OTA OAA GGT Olu Val Glu Oly
GAC
Asp 565 TTC CCA COT TAT Phe Pro Arg Tyr
GGT
Oly 570 OAA OAT OAT GAC Glu Asp Asp Asp
CGT
Arg 575
TTA
Leu OCT OAT OAT ATC Ala Asp Asp Ile OCT TCA CAC AAA Ala Ser His Lys 595
OCT
Ala 580 AAA CTT GTC ATG Lys Leu Val Met ATO TAC CAT OAA Met Tyr His Olu
AAA
Lys 590 3168 3216 3264 CTT TAC AAA AAT Leu Tyr Lys Asn
OCT
Ala 600 GAA OCT ACT OTT Glu Ala Thr Val TCA OTT Ser Leu 605 TTO ACA ATO Leu Thr Ile OCA GTT CAT Pro Val His 625
ACA
Thr 610 TCT AAC OTT GCT Ser Asn Val Ala
TAC
Tyr 615 TCT AAA CAA ACT Ser Lys Gln Thr GGT AAC TCT Oly Asn Ser 620 OTC AAC AAA Val Asn Lys 3312 3360 AAA OGA OTA TTO Lys Gly Val Phe
CTC
Leu 630 AAT GAA OAT GGT Asn Olu Asp Oly
ACA
Thr 635 TCT AAA Ser Lys 640 OTT GAA TTC TTC Leu Glu Phe Phe
TCA
Ser 645 OCA GGT OCT AAC Pro Oly Ala Asn
OCA
Pro 650 TCT AAO AAA OCT Ser Asn Lys Ala 3408 3456
AAA
Lys 655 GGT OGA TGG TTO Oly Gly Trp Leu
CAA
Gin 660 AAT CTT COT TCA Asn Leu Arg Ser OCT AAA TTG OAA Ala Lys Leu Olu
TTC
Phe 670 AAA OAT OCA AAT Lys Asp Ala Asn GGT ATT TCA TTA ACT ACT CAA OTT TCT Gly Ile Ser Leu Thr Thr Gin Val Ser 680 CCT CGT Pro Arg 685 3504 WO 98/07867 GCA CTT GGT Ala Leu Gly CTT GAT GGA Leu Asp Gly 705 PCT/DK97/00336 185
AAA
Lys 690 ACT CGT GAT GAA Thr Arg Asp Glu
CAA
Gin 695 GTA GAT AAC TTG Val Asp Asn Leu GTT CAA ATT Val Gin Ile 700 ACT GAA TTT Thr Glu Phe 3552 3600 TAC TTC ACA CCA Tyr Phe Thr Pro
GGA
Gly 710 GCT TTG ATT AAT Ala Leu Ile Asn GCA GGT Ala Gly 720 CAA CAC GTT AAC Gin His Val Asn
TTG
Leu 725 AAC GTT ATG GAC Asn Val Met Asp
CTI
Leu 730 AAA GAT GTT TAC Lys Asp Val Tyr 3648 3696
GAT
Asp 735 AAA ATC ATG CGT Lys Ile Met Arg GAA GAT GTT ATC Glu Asp Val Ile
GTT
Val 745 CGT ATC TCT GGA Arg Ile Ser Gly TGT GTT AAC ACT Cys Val Asn Thr
AAA
Lys 755 TAC CTC ACA CCT GAA CAA AAA CAA GAA Tyr Leu Thr Pro Giu Gin Lys Gin Glu 760 TTG ACT Leu Thr 765 3744 GAA CGT GTC Glu Arg Val CAT GAA GTA CTT His Glu Val Leu
TCA
Ser 775 AAT GAT GAT GAA Asn Asp Asp Glu GAA GTA ATG Glu Val Met 780 3792 CAC ACT TCA AAT ATC His Thr Ser Asn Ile 785 TAATTCTTAG TATrAAAAAA TATAAGGTCT GTCAGTTCTA C 3848
TGACAGACTT
GAAAAATAAA
ATTGCGAGTG
GATGCTGACA
ATGGAAACTT
GAGTGCTTAA
TTTTCTATA
ATTTGTGCTA
GGAAATCAAC
AAGTTGTTCG
TCGGTrCAGA
GTTTGCTGA
AATTAAITAT
AAATAGATGA
GGTGGTTGAT
TCAGTTGCAA
TTTTACTGAC
CCCAAATCAA
AATAGTTAAA
ATGATAAAGG
TTTTTGATT
GAACCTGATG
GAAAATGGGA
CGTCAAAAAT
AACTATTATT
TAATTGGATT
CTGAAGGTTA
GGAAACTTTT
AATTAAACCG
TAT
TTTAGTTAA
AACAGGCGGA
TCAAGTAATI
TAATGCAATA
ATGCAAAATT
3908 3968 4028 4088 4148 4191 INFORMATION FOR SEQ ID NO:39: SEQUENCE CHARACTERISTICS: LENGTH: 787 amino acids TYPE: amino acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: protein FRAGMENT TYPE: internal (xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: Met Lys Thr Glu Val Thr Giu Asn Ile Phe Glu Gin Ala Trp 1 5 10 Phe Lys Gly Thr Asn Trp Arg Asp Lys Ala Ser Vai Thr Arg 25 Gln Glu Asn Tyr Lys Pro Tyr Asp Gly Asp Giu Ser Phe Leu 40 Pro Thr Giu Arg Thr Leu Lys Val Lys Lys Ile Ile Glu Asp 55 Asn His Tyr Giu Glu Val Gly Phe Pro Phe Asp Thr Asp Arg 70 Asp Gly Phe Val Ala Gly Thr Lys Val WO 98/07867 WO 9807867PCT/DK97/00336 186 Ser Ile Leu Glu Phe Met His Gly 130 Met Thr 145 Arg Lys Ser Arg Ala Asp Giu Ile 210 Tyr Gin 225 Asp Vai Asn Ile Ser Leu Leu Aia 290 Asp Phe 305 Tyr Asp Ala Giy Arg Phe Leu Thr 370 Ser Met 385 Glu Thr Cys. Vai Gin Tyr Leu Asn 450 Ile Glu 465 Asn Phe Met Asn Gin Met Ile Cys 530 Asp Leu Pro 11s Leu Ser Ala Gly Tyr 195 Asn Al a Ser Ala Gly 275 Arg Ile Glu Met Leu 355 Val Ser Met Ser Phe 435 Gly Pro Asp Ile Al a 515 Gly Lys Ile 100 Arg Ser Val Arg Arg 180 Leu Asp Leu Arg Tyr 260 Arg Gly Leu Leu Gly 340 Asn LeU Met Ala Pro 420 Giy Gly Val Lys Ile 500 Phe Phe Ile Tyr Gly Val Asn His 165 Ile Met Asp Gin Pro 245 Met Val Thr Lys Tyr 325 Asn Thr Tip Ser Lys 405 Leu Ala Arg Ser 485 His Leu Al a Pro Gly Gly Asp Asp 150 Ala Ile Lys Asn Giu 230 Ala Ala Pro Phe Leu 310 Ser Asp Lou Asp His 390 Asp Asp Arg Asp Asp 470 Lou Tyr Pro Asn Ala met Leu Pro 135 Gly His Gly Glu Ile 215 Val Met Val Ile Thr 295 Arg Gly Gly Asp Ser 375 Lys Gly Pro Val Asp 455 Glu Asn Met Thr Thr 535 Gly Gin Arg 120 Giy Ile Thr Val Lys 200 Arg Val Asn Cys Val 280 Giu Thr Asp Arg Thr 360 Lys His Tyr Giu Asn 440 Val Ile Trp Thr Lys 520 Val Tyr Asn 105 Val Lou Phe Val Tyr 185 Al a Leu Asn Val Arg 265 Lou Gin Met Pro His 345 Ile Ser Gly Asn 425 Val His Lou -Lou Asp 505 Val Asp Ile 90 Ser Ala His Arg Thr 170 Ala Lys Lys Phe Lys 250 Val Asp Glu Lys Thr 330 Arg Gly Pro Ser Glu 410 Glu Lou Lys Asp Thr 490 Lys.
Arg Ser Asp Glu Glu Asp Ala 155 Gly Arg Glu Giu Gly 235 Glu Ile Ile Ile Phe 315 Phe Val Asn Tyr Ile 395 Met Giu Lys Asp Tyr 475 Asp Tyr Ala Leu Ala Lou Lys Val 140 Tyr Lou Lou Tip Giu 220 Ala Ala Asn Phe Gin 300 Ala Ile Thr Ala Ser 380 Gin Ser Giy Ala Tyr 460 Asp Thr Asn Asn Ser 540 Asn Phe Ile 125 Lou Thr Pro Ala Asp 205 Ile Lou Ile Gly Ala 285 Giu Arg Thr Lys Pro 365 Pho Tyr Cys Arg Met 445 Lys Thr Tyr Tyr met 525 Ala Asp Lys Arg Lou 110 Lou Thr Ser Gin Sor Ala Asp Ala 175 Lou Tyr 190 Ala Ile Asn Met Tyr Gly Gin Tip 255 Ala Ala 270 Glu Arg Phe Val Ala Ala Thr Ser 335 Met Asp 350 Giu Pro Lys Arg Giu Gly Ile Ser 415 His Asn 430 Lou Thr Val Phe Val Met Val Asp 495 Glu Ala 510 Gly Phe Ile Lys Giu Asn Giu Thr Ile 160 Tyr Gly Thr Gin Leu 240 Val Thr Asp Asp Ala 320 Met Tyr Asn Tyr Val 400 Cys Leu Gly Asp Glu 480 Ala Val Gly Tyr WO 98/07867 PCT/DK97/00336 187 Ala Lys Val Lys Thr Leu Arg Asp Glu Asn Gly Tyr Ile Tyr Asp Tyr 545 550 555 560 Glu Val Glu Gly Asp Phe Pro Arg Tyr Gly Glu Asp Asp Asp Arg Ala 565 570 575 Asp Asp Ile Ala Lys Leu Val Met Lys Met Tyr His Glu Lys Leu Ala 580 585 590 Ser His Lys Leu Tyr Lys Asn Ala Glu Ala Thr Val Ser Leu Leu Thr 595 600 605 Ile Thr Ser Asn Val Ala Tyr Ser Lys Gin Thr Gly Asn Ser Pro Val 610 615 620 His Lys Gly Val Phe Leu Asn Glu Asp Gly Thr Val Asn Lys Ser Lys 625 630 635 640 Leu Glu Phe Phe Ser Pro Gly Ala Asn Pro Ser Asn Lys Ala Lys Gly 645 650 655 Gly Trp Leu Gin Asn Leu Arg Ser Leu Ala Lys Leu Glu Phe Lys Asp 660 665 670 Ala Asn Asp Gly Ile Ser Leu Thr Thr Gin Val Ser Pro Arg Ala Leu 675 680 685 Gly Lys Thr Arg Asp Glu Gin Val Asp Asn Leu Val Gin Ile Leu Asp 690 695 700 Gly Tyr Phe Thr Pro Gly Ala Leu Ile Asn Gly Thr Glu Phe Ala Gly 705 710 715 720 Gin His Val Asn Leu Asn Val Met Asp Leu Lys Asp Val Tyr Asp Lys 725 730 735 Ile Met Arg Gly Glu Asp Val Ile Val Arg Ile Ser Gly Tyr Cys Val 740 745 750 Asn Thr Lys Tyr Leu Thr Pro Glu Gin Lys Gin Glu Leu Thr Glu Arg 755 760 765 Val Phe His Glu Val Leu Ser Asn Asp Asp Glu Glu Val Met His Thr 770 775 780 Ser Asn Ile 785 INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 14 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: CDNA (ix) FEATURE: NAME/KEY: Other LOCATION: 6...9 OTHER INFORMATION: Unknown NAME/KEY: Other LOCATION: 1...14 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID TTGATNNNNA TCAA W f V WO 98/07867 PC~TmK07/nn116 188 INFORMATION FOR SEQ ID NO:41: SEQUENCE CHARACTERISTICS: LENGTH: 14 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Other LOCATION: 6...9 OTHER INFORMATION: Unknown NAME/KEY: Other LOCATION: 1...14 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: GGAGTNNNNA TCAA 14 INFORMATION FOR SEQ ID NO:42: SEQUENCE CHARACTERISTICS: LENGTH: 14 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Other LOCATION: 6...9 OTHER INFORMATION: Unknown NAME/KEY: Other LOCATION: 1...i4 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: TTTGCNNNNA TCAA 14 INFORMATION FOR SEQ ID NO:43: SEQUENCE CHARACTERISTICS: LENGTH: 32 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA WO 98/07867 PCT/DK97/00336 189 (ix) FEATURE: NAME/KEY: Other LOCATION: 1...32 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: GGCCGCTCGA GTTGTGTCTC ACCACTTGAC CC 32 INFORMATION FOR SEQ ID NO:44: SEQUENCE CHARACTERISTICS: LENGTH: 33 base pairs TYPE: nucleic acid STRANDEDNESS: single TOPOLOGY: linear (ii) MOLECULE TYPE: cDNA (ix) FEATURE: NAME/KEY: Other LOCATION: 1...33 OTHER INFORMATION: Primer (xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: TAGTAGGATC CCATCATCTT CACCATAACG TGG

Claims (29)

1. An isolated DNA sequence comprising a coding sequence derived from a lactic acid bacterium, said coding sequence codes for a polypeptide having at least one enzymatic activity selected from the group consisting of acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity, the coding sequence is selected from the group consisting of the coding sequence of SEQ ID NO:3, the coding sequence of SEQ ID NO:30, a mutant or variant of or and a sequence that hybridizes to the adhE coding sequence from L. lactis strain DB1341 (SEQ ID NO:3) under the following conditions: hybridization overnight at 65°C followed by washing the filters twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65°C for 30 minutes.
2. A DNA sequence according to claim 1 further comprising sequences regulating the expression of the coding sequence and/or the activity of its gene product.
3. A DNA sequence according to claim 1 which is derived from a lactic acid bacterium selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
4. A DNA sequence according to claim 1 which is derived from Lactococcus lactis. AMENDED SHEET CLAIM/18833PC1/ATG/HS/SA/16-09-98 2 A DNA sequence according to claim 1 wherein the coding sequence codes for a polypeptide that is at least 50% identical with the gene product of SEQ ID NO:3.
6. A DNA sequence according to claim 5 wherein the polypeptide encoded by the coding sequence is at least 70% identical with the gene product of SEQ ID NO:3.
7. A recombinant replicon comprising a DNA sequence comprising a coding sequence derived from a lactic acid bacterium, the coding sequence codes for a polypeptide having at least one enzymatic activity selected from the group consisting of acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity, where the replicon is selected from a plasmid capable of replicating in a lactic acid bacterium and a lactic acid bacterial chromosome.
8. A recombinant replicon according to claim 7, comprising the DNA sequence of claim 1.
9. A recombinant lactic acid bacterial cell comprising the replicon of claim 7. A lactic acid bacterial cell according to claim 9 which is selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
11. A lactic acid bacterial cell according to claim 10 which is in the form of a starter culture composition for the production of a food product or an animal feed, or in the form of a culture for the production of an aroma or antimicrobially active compound.
12. A lactic acid bacterial cell according to claim 9 wherein the DNA sequence comprising the sequence coding for the multifunctional polypeptide is modified so as Sto inactivate or reduce the production of or the activity of at least one of the PAMENDED SHEET CLAIM/18383PC1/ATG/HS/SA/16-09-98 enzymatic activities selected from the group consisting of acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity.
13. A lactic acid bacterial cell according to claim 12 wherein said modification of the DNA sequence results in the cell producing increased amounts of a metabolite selected from the group consisting of acetaldehyde, acetate and ethanol.
14. A lactic acid bacterial cell according to claim 9 wherein the DNA sequence comprising the sequence coding for the multifunctional polypeptide is modified so as to enhance the production of or the activity of at least one of the enzymatic activities selected from the group consisting of acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity. A lactic acid bacterial cell according to claim 14 wherein said modification of the DNA sequence results in the cell producing an increased amount of a metabolite selected from the group consisting of acetaldehyde, ethanol, formate, acetate, ca- acetolactate, acetoin, diacetyl and 2,3 butylene glycol.
16. An isolated DNA sequence comprising a coding sequence which is derived from a lactic acid bacterium, said coding sequence codes for a polypeptide having pyruvate formate-lyase activity, the coding sequence is selected from the group consisting of the coding sequence of SEQ ID NO: 15, the coding sequence of SEQ ID NO:30, a mutant or variant of or and a sequence that hybridizes to the pfl encoding sequence isolated from L. lactis strain MG1363, under the following conditions: hybridization overnight at 65 0 C followed by washing the filter twice in 5 x SSC at room temperature for 30 minutes and subsequently once in 3 x SSC; 0.1% SDS at 65 0 C for 30 minutes, subject to the limitation that the coding sequence is not the pfl gene of Streptococcus mutans discloded in FASTA, GCG Wisconsin, Accession No. D50491. AENED Sffrr-r
17. A DNA sequence according to claim 16 comprising at least one regulatory sequence regulating the expression of the pyruvate formate-lyase polypeptide or coding for a gene product regulating the pyruvate formate-lyase activity of the polypeptide.
18. A DNA sequence according to claim 17 wherein the regulating gene product is selected from a pyruvate formate-lyase activase and a pyruvate formate-lyase deactivase.
19. A DNA sequence according to claim 18 wherein the gene product is a polypeptide having at least one enzymatic activity selected from the group consisting of (i) acetaldehyde dehydrogenase (ACDH) activity whereby acetyl CoA is converted into acetaldehyde, (ii) alcohol dehydrogenase (ADH) activity whereby acetaldehyde is converted into ethanol, (iii) capability of converting acetyl CoA into ethanol and (iv) pyruvate formate-lyase deactivase activity as defined in claim 1. A DNA sequence according to claim 16 which is derived from a lactic acid bacterium selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
21. A DNA sequence according to claim 16 which is derived from Lactococcus lactis.
22. A recombinant replicon comprising a DNA sequence comprising a coding sequence derived from a lactic acid bacterium, said coding sequence codes for a polypeptide having pyruvate formate-lyase activity, said replicon is selected from a plasmid capable of replicating in a lactic acid bacterium and a lactic acid bacterial chromosome, subject to the limitation that the coding sequence is not the pfl gene of Streptococcus mutans disclosed in FASTA, GCG Wisconsin, Accession No. D50491.
23. A recombinant replicon according to claim 22 wherein the DNA sequence is the DNA sequence of claim 16. 0 AMENDED SHEET CLAIM/18383PC1/ATGIHS/SA/16-09-98
24. A recombinant lactic acid bacterial cell comprising the replicon of claims 22. A lactic acid bacterial cell according to claim 24 which is selected from the group consisting of a Lactococcus species, a Lactobacillus species, a Streptococcus species, a Pediococcus species, a Bifidobacterium species and a Leuconostoc species.
26. A lactic acid bacterial cell according to claim 24 which is in the form of a starter culture composition for the production of a food product or an animal feed.
27. A lactic acid bacterial cell according to claim 24 wherein the DNA sequence is modified whereby its production of pyruvate formate-lyase is reduced or inhibited or whereby the enzyme is produced in a modified form having a reduced pyruvate forma- te-lyase activity.
28. A lactic acid bacterial cell according to claim 27 wherein said modification of the DNA sequence results in that the cell produces increased amounts of a metabolite selected from the group consisting of c-acetolactate, acetoin, diacetyl and 2,3 butylene glycol.
29. A lactic acid bacterial cell according to claim 24 wherein the DNA sequence is modified whereby its production of pyruvate formate-lyase is enhanced or whereby the enzyme is produced in a modified form having an increased pyruvate formate-lyase activity.
30. A lactic acid bacterial cell according to claim 29 wherein said modification of the DNA sequence results in the cell producing increased amounts of formate.
31. A recombinant lactic acid bacterial cell comprising the DNA sequence of claim 1 and the DNA sequence of claim 16.
32. A recombinant lactic acid bacterial cell according to claim 31 wherein at least one of said DNA sequences is modified so as to modify the expression of pyruvate formate-lyase or the activity hereof. A iiENDED SHEE CLAIM/18383PC1/ATGIHS/SA/16-09-98
33. A method of producing a lactic acid bacterial metabolite, the method comprising cultivating a lactic acid bacterium according to any of claims 12, 15, 29 or 31 under conditions where the metabolite is produced and isolating the metabolite from the culture.
34. A method of producing an animal feed, the method comprising the step of admixing to the feed starting materials a starter culture of a lactic acid bacterium according to claim 9 or 24 and keeping the mixture under conditions allowing the starter culture to be metabolically active. An isolated DNA sequence which is the open reading frame orfA isolated from Lactococcus lactis strain DB1341 where it is located upstream of the pfl gene (SEQ ID NO:34), said sequence coding for a product having a formate transporter activity. AMENDED SHEET CLAIM/18383PC1/ATG/HS/SA/16-09-98
AU37659/97A 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same Ceased AU721803B2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US70145896A 1996-08-22 1996-08-22
US08/701458 1996-08-22
PCT/DK1997/000336 WO1998007867A2 (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same

Publications (2)

Publication Number Publication Date
AU3765997A AU3765997A (en) 1998-03-06
AU721803B2 true AU721803B2 (en) 2000-07-13

Family

ID=24817464

Family Applications (1)

Application Number Title Priority Date Filing Date
AU37659/97A Ceased AU721803B2 (en) 1996-08-22 1997-08-20 Metabolically engineered lactic acid bacteria and means for providing same

Country Status (5)

Country Link
EP (1) EP0938566A2 (en)
AU (1) AU721803B2 (en)
CA (1) CA2262418A1 (en)
NZ (1) NZ334294A (en)
WO (1) WO1998007867A2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9724627D0 (en) 1997-11-20 1998-01-21 Genencor Int Bv Gram positive microorganism formate pathway
CA2911947A1 (en) * 2013-05-10 2014-11-13 BiOWiSH Technologies, Inc. Compositions comprising a mixture of bacteria comprising pedoiococcus and lactobacillus and methods for decreasing the effects of alcohols
CN110088279A (en) * 2016-12-15 2019-08-02 株式会社钟化 Novel host cell and used its target protein manufacturing method
JPWO2020032261A1 (en) * 2018-08-10 2021-08-10 協和発酵バイオ株式会社 Microorganisms that produce eicosapentaenoic acid and methods for producing eicosapentaenoic acid
CN111471660B (en) * 2020-03-12 2023-11-24 广州辉园苑医药科技有限公司 Acetaldehyde dehydrogenase recombinant gene, lactic acid bacteria carrier and application thereof

Also Published As

Publication number Publication date
EP0938566A2 (en) 1999-09-01
WO1998007867A2 (en) 1998-02-26
CA2262418A1 (en) 1998-02-26
NZ334294A (en) 2000-02-28
WO1998007867A3 (en) 1998-05-22
AU3765997A (en) 1998-03-06

Similar Documents

Publication Publication Date Title
Söhling et al. Molecular analysis of the anaerobic succinate degradation pathway in Clostridium kluyveri
Goodlove et al. Cloning and sequence analysis of the fermentative alcohol-dehydrogenase-encoding gene of Escherichia coli
KR102493197B1 (en) Recombinant microorganisms exhibiting increased flux through a fermentation pathway
US7659105B2 (en) Methods and compositions for butanol production
CA2144053C (en) Improved enzymes for the production of 2-keto-l-gulonic acid
ITMI972080A1 (en) YEAST STRAWS FOR THE REPRODUCTION OF LACTIC ACID
CN112725210B (en) Recombinant acid-resistant yeast that inhibits lactic acid metabolism and ethanol production and methods for producing lactic acid using the same
KR102303832B1 (en) Yeast cell having acid tolerant property, method for preparing the yeast cell and use thereof
AU754472B2 (en) Novel genetically modified lactic acid bacteria having modified diacetyl reductase activities
JPH10229885A (en) Novel alcohol aldehyde dehydrogenase
Arnau et al. Cloning, expression, and characterization of the Lactococcus lactis pfl gene, encoding pyruvate formate-lyase
Jobin et al. Expression of the Oenococcus oeni trxA gene is induced by hydrogen peroxide and heat shock
EP0910642A1 (en) A method for increasing hemoprotein production in filamentous fungi
CN113528362A (en) Recombinant acid-tolerant yeast with inhibited glycerol production and method for producing lactic acid using the same
Denayrolles et al. Cloning and sequence analysis of the gene encoding Lactococcus lactis malolactic enzyme: relationships with malic enzymes
CN112789353A (en) Recombinant acid-resistant yeast inhibiting ethanol production and method for preparing lactic acid using same
US7160708B2 (en) Fatty alcohol oxidase genes and proteins from Candida tropicalis and methods relating thereto
Garmyn et al. Cloning, nucleotide sequence, and transcriptional analysis of the Pediococcus acidilactici L-(+)-lactate dehydrogenase gene
AU721803B2 (en) Metabolically engineered lactic acid bacteria and means for providing same
TW201211253A (en) An enzyme for producing methylmalonate semialdehyde
US20030199035A1 (en) Metabolically engineered lactic acid bacteria and means for providing same
Kirimura et al. Cloning and expression of Aspergillus niger icdA gene encoding mitochondrial NADP+-specific isocitrate dehydrogenase
JP4162383B2 (en) Genes involved in the production of homoglutamic acid and use thereof
EP0385451B1 (en) Cytochrome C gene derived from hydrogen bacterium
Yoneta et al. Characterization of chimeric isocitrate dehydrogenases of a mesophilic nitrogen-fixing bacterium, Azotobacter vinelandii, and a psychrophilic bacterium, Colwellia maris

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)